2012-02-15 125 views
2

我的程序使用来自用户的一定数量的簇的K均值聚类。对于这个k = 4,但我想通过matlabs朴素贝叶斯分类器运行聚类信息。MATLAB - 分类输出

有没有办法将簇分割并将它们馈入matlab中不同的朴素分类器?

朴素贝叶斯:

class = classify(test,training, target_class, 'diaglinear'); 

K均值:

%% generate sample data 
K = 4; 
numObservarations = 5000; 
dimensions = 42; 
%% cluster 
opts = statset('MaxIter', 500, 'Display', 'iter'); 
[clustIDX, clusters, interClustSum, Dist] = kmeans(data, K, 'options',opts, ... 
'distance','sqEuclidean', 'EmptyAction','singleton', 'replicates',3); 
%% plot data+clusters 
figure, hold on 
scatter3(data(:,1),data(:,2),data(:,3), 5, clustIDX, 'filled') 
scatter3(clusters(:,1),clusters(:,2),clusters(:,3), 100, (1:K)', 'filled') 
hold off, xlabel('x'), ylabel('y'), zlabel('z') 
%% plot clusters quality 
figure 
[silh,h] = silhouette(data, clustIDX); 
avrgScore = mean(silh); 
%% Assign data to clusters 
% calculate distance (squared) of all instances to each cluster centroid 
D = zeros(numObservarations, K);  % init distances 
for k=1:K 
%d = sum((x-y).^2).^0.5 
D(:,k) = sum(((data - repmat(clusters(k,:),numObservarations,1)).^2), 2); 
end 
% find for all instances the cluster closet to it 
[minDists, clusterIndices] = min(D, [], 2); 
% compare it with what you expect it to be 
sum(clusterIndices == clustIDX) 

类似k个簇outputing到格式K1,K2,K3然后将具有幼稚分类挑选那些起来,而不是测试它会是k1,k2 ..等

class = classify(k1,training, target_class, 'diaglinear'); 

但我只是不知道如何发送k个簇的输出在m atlab的某种格式? (真正的新本程序)

编辑

training = [1;0;-1;-2;4;0]; % this is the sample data. 
target_class = ['posi';'zero';'negi';'negi';'posi';'zero'];% This should have the same number of rows as training data. The elements and the class on the same row should correspond. 
% target_class are the different target classes for the training data; here 'positive' and 'negetive' are the two classes for the given training data 

% Training and Testing the classifier (between positive and negative) 
test = 10*randn(10,1) % this is for testing. I am generating random numbers. 
class = classify(test,training, target_class, 'diaglinear') % This command classifies the test data depening on the given training data using a Naive Bayes classifier 

% diaglinear is for naive bayes classifier; there is also diagquadratic 

回答

1

试试这个:

% create 100 random points (this is the training data) 
X = rand(100,3); 

% cluster into 5 clusters 
K = 5; 
[IDX, C] = kmeans(X, K); 

% now let us say you have new data and you want 
% to classify it based on the training: 
SAMPLE = rand(10,3); 
CLASS = classify(SAMPLE,X,IDX); 

如果你只是想筛选出集群之一了,你可以做数据的类似的东西:

K1 = X(IDX==1) 

希望有帮助..

+0

Zenpoy感谢一堆!但是,当你使用SAMLE作为测试数据时你不会使用K1?还是我混淆了测试,培训,target_class?我认为target_class应该是每个分类行的标签,训练将是用于学习如何识别的特定数据,并且测试数据将成为确定您的需求是否可以分类的第一个样本数据? (即我的具体问题集群之一) – 2012-02-19 16:15:27

+0

我不知道,但我认为你困惑的东西。根据文档'help classify':CLASS = classify(SAMPLE,TRAINING,GROUP)将SAMPLE中的每行数据分类到TRAINING中的一个组中。 SAMPLE和TRAINING必须是具有相同列数的矩阵。 GROUP是TRAINING的分组变量。其唯一值定义组,每个元素定义TRAINING对应的行属于哪个组。 GROUP可以是分类变量,数字向量,字符串数组或字符串的单元数组。 – zenpoy 2012-02-19 18:03:07

+0

啊等待有多种选择是的,你可以将他们分组,但你也可以单独分类他们。看到我上面的编辑代码。请注意,我将训练数据进行训练,并使用目标课对它们进行分类。然后我用随机数“测试”分类器。输出与分类的正数和负数分类。在我的例子中,我将简单地使用我的一个群集作为测试机制。 – 2012-02-20 10:28:33