这个问题是不是只适用于MATLAB用户 - 如果您知道PSEUDOCODE中的问题的答案，那么随时留下您的答案！合并两个表格的内容（寻找Matlab或伪代码）

我有两个表Ta和Tb具有不同的行数和列数不同。内容全部是单元格文本，但将来也可能包含单元格编号。

我想这些表的内容合并在一起，下面的以下规则：

取Ta(i,j)的值，如果Tb(i*,j*)是空的，反之亦然。
如果两者都可用，则取值Ta(i,j)（并且可选地检查它们是否相同）。

棘手的部分但是，我们没有唯一的行键，我们只有唯一的列键。请注意，我对i*和i进行了区分。原因是Ta中的行可能与Tb的索引不同，对于列j*和j也是如此。其含义如下：

我们首先需要确定Ta的哪一行对应于Tb的行，反之亦然。我们可以通过尝试交叉匹配表共享的任何列来做到这一点。但是，我们可能找不到匹配项（在这种情况下，我们不会将一行与另一行合并）。

问题

我们如何合并这两个表的内容一起以最有效的方式是什么？

这里有一些资源来解释更详细的问题：

Ta = cell2table({... 'a1', 'b1', 'c1'; ... 'a2', 'b2', 'c2'}, ... 'VariableNames', {'A','B', 'C'}) Tb = cell2table({... 'b2*', 'c2', 'd2'; ... 'b3', 'c3', 'd3'; ... 'b4', 'c4', 'd4'}, ... 'VariableNames', {'B','C', 'D'})

结果表锝应该是这样的：

1 Matlab的例子玩这个：

Tc = cell2table({... 'a1' 'b1' 'c1' ''; ... 'a2' 'b2' 'c2' 'd2'; ... '' 'b3' 'c3' 'd3'; ... '' 'b4' 'c4' 'd4'}, ... 'VariableNames', {'A', 'B','C', 'D'})

2.可能的第一步

我试过如下：

Tc = outerjoin(Ta, Tb, 'MergeKeys', true)

其中一期工程顺利，但问题是，它缺乏，似乎类似的行的堆叠。例如。上面的命令产生：

A B C D ____ _____ ____ ____ '' 'b2*' 'c2' 'd2' '' 'b3' 'c3' 'd3' '' 'b4' 'c4' 'd4' 'a1' 'b1' 'c1' '' 'a2' 'b2' 'c2' ''

这里的行

'' 'b2*' 'c2' 'd2' 'a2' 'b2' 'c2' ''

本来应该合并为一个：

'a2' 'b2' 'c2' 'd2'

所以我们还需要一步堆栈这两个一起？

3的一道坎

的例如，如果我们有这样的事：

Ta = A B C ____ _____ ____ 'a1' 'b1' 'c1' 'a2' 'b2' 'c2' Tb = A B C ____ _____ ____ 'a1' 'b2' 'c3'

然后出现的问题是否在B行应与第1行或第2行合并一个或所有行应合并或只是作为一个单独的行？关于如何处理这些类型的情况的想法也很好。

来源

2017-10-17 JohnAndrews

这是非常相似的[前一个问题（https：//开头计算器。 com/questions/46682751 /有效的方法来追加新的数据在matlab与示例代码）对吗？ – Wolfie

不是真的，因为我真的打算如何使用Matlab Table将两个表连接在一起。它与上一个问题不同，我区分行和列，以及我处理数字数据的位置 - 如果您可以向我展示与上一个问题的联系，那将很棒。 – JohnAndrews

还要注意，在这个问题中，没有唯一的行。它只是行数不同而已。 – JohnAndrews

这是一个概念性的答案，这可以让你在路上：

定义一个“评分功能”，告诉你每TB的排它有多好于钽相匹配的行。
用T填充Tc
对于Ta中的每一行，确定与Tb的最佳匹配。如果比赛质量高于您的标准，请将最佳匹配比赛定义为成功比赛。
如果succesfull找到匹配，“消费”它（使用来自铽的信息来充实相应行中锝如有必要）
一直走，直到你到达Ta的结束，一切还没有从铽消耗现在可以'附加'到Tc。

有待改进：

在比赛的选择

注玩弄消费，而不是Tb的钽，或使用更复杂的启发式算法来确定消费顺序（如计算所有'距离'并基于成本函数优化匹配）。

请注意，如果您在基本解决方案中遇到大量误匹配的情况，这些改进仅是必不可少的。对比赛质量的定义

注

我会建议你，如果你有4个领域开始非常简单，这一点，例如，简单地计算有多少个字段匹配，或者所有非空字段是否匹配。

如果您想进一步探讨，请考虑评估值之间的距离（例如mse）或文本距离的距离（例如levensteihn距离）。

来源

2017-10-19 14:01:35

我真的很喜欢这个。特别是得分功能是一个好主意，它可以让你用它来提高速度。 – JohnAndrews

这是一个试图完成这项工作的功能。您提供两个表格，一个用于决定是否合并两行的阈值以及一个逻辑，用于说明在合并冲突出现时您是否希望从第一个表格获取值。我没有为极端情况下准备，但看到它可以让你用：

TkeepAll=mergeTables(Tb,Ta,1,true) 
TmergeSome=mergeTables(Tb,Ta,0.25,true) 
TmergeAll=mergeTables(Tb,Ta,-1,true)

这里是功能：

function Tmerged=mergeTables(Ta,Tb,threshold,preferA) 
%% parameters 
% Ta and Tb are two the two tables to merge 
% threshold=0.25; minimal ratio of identical values in rows for merge. 
% example: you have one row in table A with 3 values, but you only have two 
% values for the same columns in data B. if one of the values is identical 
% and one isn't, you have ratio of 1/2 aka 0.5, which passes a threshold of 
% 0.25 
% preferA=true; which to take when there is merge conflict 
%% see how well rows fit to each other 
% T1 is the table with fewer rows 
if size(Ta,1)<=size(Tb,1) 
    T1=Ta; 
    T2=Tb; 
    prefer1=preferA; 
else 
    T1=Tb; 
    T2=Ta; 
    prefer1=~preferA; 
end 
[commonVar1,commonVar2]=ismember(T1.Properties.VariableNames,... 
    T2.Properties.VariableNames); 
commonVar1=find(commonVar1); 
commonVar2(commonVar2==0)=[]; 
% fit is a table with the size of N rows T1 by M rows T2, with values 
% describing what ratio of identical items between each row in 
% table 1 (shorter) and each row in table 2 (longer), among all not-missing 
% points 
for ii=1:size(T1,1) %rows of T1 
    for jj=1:size(T2,1) 
     fit(ii,jj)=sum(ismember(T1{ii,commonVar1},T2{jj,commonVar2}))/length(commonVar1); 
    end 
end 
%% pair rows according to fit 
% match has two columns, first one has T1 row number and secone one has the 
% matching T2 row number 
unpaired1=true(size(T1,1),1); 
unpaired2=true(size(T2,1),1); 
count=0; 
match=[]; 
maxv=max(fit,[],2); 
[~,order]=sort(maxv,'descend'); 
order=order'; 
for ii=order %1:size(T1,1) 
    [maxv,maxi]=max(fit,[],2); 
    if maxv(ii)>threshold 
     count=count+1; 
     match(count,1)=ii; 
     match(count,2)=maxi(ii); 
     unpaired1(ii)=false; 
     unpaired2(match(count,2))=false; 
     fit(:,match(count,2))=nan; %exclude paired row from next pairing 
    end 
end 

%% prepare new variables 
% first variables common to the two tables 
Nrows=sum(unpaired1)+sum(unpaired2)+size(match,1); 
namesCommon={}; 
namesCommon(1:length(commonVar1))={T1.Properties.VariableNames{commonVar1}}; 
for vari=1:length(commonVar1) 
    if isempty(match) 
     mergedData={}; 
    else 
     if prefer1 
      mergedData=T1{match(:,1),commonVar1(vari)}; %#ok<*NASGU> 
     else 
      mergedData=T2{match(:,2),commonVar2(vari)}; 
     end 
    end 
    data1=T1{unpaired1,commonVar1(vari)}; 
    data2=T2{unpaired2,commonVar2(vari)}; 
    eval([namesCommon{vari},'=[data1;mergedData;data2];']); 
end 
% variables only in 1 
uncommonVar1=1:size(T1,2); 
uncommonVar1(commonVar1)=[]; 
names1={}; 
names1(1:length(uncommonVar1))={T1.Properties.VariableNames{uncommonVar1}}; 
for vari=1:length(uncommonVar1) 
    data1=T1{:,uncommonVar1(vari)}; 
    tmp=repmat({''},Nrows-size(data1,1),1); 
    eval([names1{vari},'=[data1;tmp];']); 
end 
% variables only in 2 
uncommonVar2=1:size(T2,2); 
uncommonVar2(commonVar2)=[]; 
names2={}; 
names2(1:length(uncommonVar2))={T2.Properties.VariableNames{uncommonVar2}}; 
for vari=1:length(uncommonVar2) 
    data2=T2{:,uncommonVar2(vari)}; 
    tmp=repmat({''},Nrows-size(data2,1),1); 
    eval([names2{vari},'=[tmp;data2];']); 
end 
%% collect variables to a table 
names=sort([namesCommon,names1,names2]); 
str='table('; 
for vari=1:length(names) 
    str=[str,names{vari},',']; 
end 
str=[str(1:end-1),');']; 
Tmerged=eval(str);

来源

2017-10-23 17:53:45

合并两个表格的内容（寻找Matlab或伪代码）

回答

注玩弄消费，而不是Tb的钽，或使用更复杂的启发式算法来确定消费顺序（如计算所有'距离'并基于成本函数优化匹配）。 请注意，如果您在基本解决方案中遇到大量误匹配的情况，这些改进仅是必不可少的。对比赛质量的定义

注

相关问题