2012-07-12 138 views
0

当应用这种方法:矩阵尺寸必须同意

%% When an outlier is considered to be more than three standard deviations away from the mean, use the following syntax to determine the number of outliers in each column of the count matrix: 

mu = mean(data) 
sigma = std(data) 
[n,p] = size(data); 
% Create a matrix of mean values by replicating the mu vector for n rows 
MeanMat = repmat(mu,n,1); 
% Create a matrix of standard deviation values by replicating the sigma vector for n rows 
SigmaMat = repmat(sigma,n,1); 
% Create a matrix of zeros and ones, where ones indicate the location of outliers 
outliers = abs(data - MeanMat) > 3*SigmaMat; 
% Calculate the number of outliers in each column 
nout = sum(outliers) 
% To remove an entire row of data containing the outlier 
data(any(outliers,2),:) = []; %% this line 

最后一行从我的数据集移除一定数量的观测(行)。后来我得到不过一个问题在我的计划,因为我已经手动陈述意见(行)的数量为1000

%% generate sample data 
K = 6; 
numObservarations = 1000; 
dimensions = 3; 

如果我改变numObservarationsdata我得到一个标量输出错误但是如果我不改变它,由于行的不匹配我得到这个错误的号码:

??? Error using ==> minus 
Matrix dimensions must agree. 

Error in ==> datamining at 106 
    D(:,k) = sum(((data - 
    repmat(clusters(k,:),numObservarations,1)).^2), 2); 

有没有一种方法来设置numObservarations因此它会自动检测data行和产出量作为只是一个数字?

回答

5

我一定是误解了一些东西。据我所知,这应该是足够的:

numObservations = size(data, 1);