我需要处理大量的混合类型的表格数据 - 字符串和双打。一个标准问题,我会想。 Matlab中处理这个问题的最好的数据结构是什么?混合类型的Matlab数据结构 - 什么是时间+空间高效?
Cellarray绝对不是答案。这是非常低效的内存。 (测试如下所示)。数据集(来自统计工具箱)在时间和空间上都非常低效。这给我留下了结构阵列或数组结构。我对下面的时间和内存的所有四个不同选项做了测试,在我看来,数组结构对于我测试的东西来说是最好的选择。
我对Matlab比较新,坦白说这有点令人失望。无论如何 - 寻找我是否缺少东西的建议,或者我的测试是否准确/合理。除了访问/转换/内存使用情况之外,我还会错过其他一些考虑事项,这些使用情况很可能会在我使用这些东西进行编码时出现。 (使用R2010b)
* *测试#1:访问速度访问数据项。
cellarray:0.002s
dataset:36.665s %<<< This is horrible
structarray:0.001s
struct of array:0.000s
* *测试#2:转换速度和内存使用情况,我从这个测试下降数据集。
Cellarray(doubles)->matrix:d->m: 0.865s
Cellarray(mixed)->structarray:c->sc: 0.268s
Cellarray(doubles)->structarray:d->sd: 0.430s
Cellarray(mixed)->struct of arrays:c->sac: 0.361s
Cellarray(doubles)->struct of arrays:d->sad: 0.887s
Name Size Bytes Class Attributes
c 100000x10 68000000 cell
d 100000x10 68000000 cell
m 100000x10 8000000 double
sac 1x1 38001240 struct
sad 1x1 8001240 struct
sc 100000x1 68000640 struct
sd 100000x1 68000640 struct
================== CODE:TEST#1
%% cellarray
c = cell(100000,10);
c(:,[1,3,5,7,9]) = num2cell(zeros(100000,5));
c(:,[2,4,6,8,10]) = repmat({'asdf'}, 100000, 5);
cols = strcat('Var', strtrim(cellstr(num2str((1:10)'))))';
te = tic;
for iii=1:1000
x = c(1234,5);
end
te = toc(te);
fprintf('cellarray:%0.3fs\n', te);
%% dataset
ds = dataset({ c, cols{:} });
te = tic;
for iii=1:1000
x = ds(1234,5);
end
te = toc(te);
fprintf('dataset:%0.3fs\n', te);
%% structarray
s = cell2struct(c, cols, 2);
te = tic;
for iii=1:1000
x = s(1234).Var5;
end
te = toc(te);
fprintf('structarray:%0.3fs\n', te);
%% struct of arrays
for iii=1:numel(cols)
if iii/2==floor(iii/2) % even => string
sac.(cols{iii}) = c(:,iii);
else
sac.(cols{iii}) = cell2mat(c(:,iii));
end
end
te = tic;
for iii=1:1000
x = sac.Var5(1234);
end
te = toc(te);
fprintf('struct of array:%0.3fs\n', te);
============= ===== CODE:TEST#2
%% cellarray
% c - cellarray containing mixed type
c = cell(100000,10);
c(:,[1,3,5,7,9]) = num2cell(zeros(100000,5));
c(:,[2,4,6,8,10]) = repmat({'asdf'}, 100000, 5);
cols = strcat('Var', strtrim(cellstr(num2str((1:10)'))))';
% c - cellarray containing doubles only
d = num2cell(zeros(100000, 10));
%% matrix
% doubles only
te = tic;
m = cell2mat(d);
te = toc(te);
fprintf('Cellarray(doubles)->matrix:d->m: %0.3fs\n', te);
%% structarray
% mixed
te = tic;
sc = cell2struct(c, cols, 2);
te = toc(te);
fprintf('Cellarray(mixed)->structarray:c->sc: %0.3fs\n', te);
% doubles
te = tic;
sd = cell2struct(d, cols, 2);
te = toc(te);
fprintf('Cellarray(doubles)->structarray:d->sd: %0.3fs\n', te);
%% struct of arrays
% mixed
te = tic;
for iii=1:numel(cols)
if iii/2==floor(iii/2) % even => string
sac.(cols{iii}) = c(:,iii);
else
sac.(cols{iii}) = cell2mat(c(:,iii));
end
end
te = toc(te);
fprintf('Cellarray(mixed)->struct of arrays:c->sac: %0.3fs\n', te);
% doubles
te = tic;
for iii=1:numel(cols)
sad.(cols{iii}) = cell2mat(d(:,iii));
end
te = toc(te);
fprintf('Cellarray(doubles)->struct of arrays:d->sad: %0.3fs\n', te);
%%
clear iii cols te;
whos
虽然'数据集'确实很慢,但您的计时非常缓慢。我在访问时获取“数据集:0.7s”,而其他人的顺序与您的相同。我在32位的WinXP – Amro 2013-04-26 19:57:37