在Matlab中向量化细胞查找和求和

有人请告诉我如何才能将这些代码从迭代转换为矢量化实现以加快Matlab性能？目前我的机器上的i=1:20每i大约需要8秒。在Matlab中向量化细胞查找和求和

classEachWordCount = zeros(nwords_train, nClasses); 
for i=1:nClasses % (20 classes) 
    for j=1:nwords_train % (53975 words) 
     classEachWordCount(j,i) = sum(groupedXtrain{i}(groupedXtrain{i}(:,2)==j,3)); 
    end 
end

如果上下文有利于基本上groupedXtrain为20点的矩阵代表不同的类，其中，每个类矩阵有3列的单元：document#,word#,wordcount，和行的数目不等的（几万）。我试图找出每个单词的计数总数。所以classEachWordCount应该是一个大小为53975x20的矩阵，其中每行代表一个不同的单词，每列代表一个不同的标签。必须有一个内置函数来协助这样的事情，对吗？

例如groupedXtrain{1}可能会开始像：

doc#,word#,wordcount 
    1 1 3 
    1 2 1 
    1 4 3 
    1 5 1 
    1 8 2 
    2 2 1 
    2 5 4 
    2 6 2

来源

2017-03-01 Austin

看起来像一个工作['accumarray'（https://www.mathworks.com/help/matlab/ref/accumarray.html） – beaker

感谢那些看起来很有希望，我会考虑它 – Austin

正如在评论中提到的，你可以在第二列中使用accumarray总结的值在第三列的每个唯一值的每个类

results = zeros(nwords_train, numel(groupedXtrain)); 

for k = 1:numel(groupedXtrain) 
    results(:,k) = accumarray(groupedXtrain{k}(:,2), groupedXtrain{k}(:,3), ... 
           [nwords_train 1], @sum); 
end

来源

2017-03-02 00:37:37 Suever

的我遇到的问题就是我墨水现在每个'结果{i}'的长度不同，而不是所有的长度都是'nwords_train'，并且没有找到没有找到的单词的零，所以我不知道哪一行总数对应于哪个单词了。有没有快速解决这个问题？ – Austin

@Jake如果您希望它们的大小相同，您可以将大小指定为'accumarray'的第三个输入。已更新 – Suever

我收到错误：使用accumarray时出错当SUBS是列向量时，第三个输入SZ的格式必须是[N 1]。 – Austin

在Matlab中向量化细胞查找和求和

回答

相关问题