2016-03-04 39 views
1

正如你们可以看到我们从两个框架中丢失了值,因为某些键不匹配。我正在寻找的是记录left_frame和right_frame的不匹配条目的数量。我不知道如何做到这一点。使用python熊猫注意(保存)dataset_a和dataset_b的不匹配条目

left_frame

key left_value 
0 0   a 
1 1   b 
2 2   c 
3 3   d 
4 4   e 

right_frame

key right_value 
0 2   f 
1 3   g 
2 4   h 
3 5   i 
4 6   j 

pd.merge(left_frame, right_frame, on='key', how='inner') 

**希望的输出:1 **

key left_value right_value 
0 2 c   f 
1 3 d   g 
2 4 e   h 

**所需的输出:2 **

key left_value right_value  _merge 
0 0   a   NaN left_only 
1 1   b   NaN left_only 
5 5  NaN   i right_only 
6 6  NaN   j right_only 

所以基本上,我想有两个DataFrames,另一个用于不匹配的

回答

2

如果更改合并类型“内部”等,以“外”,并通过indicator=True,那么你能看到非匹配的行来自:

In [193]: 
pd.merge(left, right, how='outer', indicator=True) 

Out[193]: 
    key left_value right_value  _merge 
0 0   a   NaN left_only 
1 1   b   NaN left_only 
2 2   c   f  both 
3 3   d   g  both 
4 4   e   h  both 
5 5  NaN   i right_only 
6 6  NaN   j right_only 

您可以groupby在此列,并呼吁count

In [194]: 
pd.merge(left, right, how='outer', indicator=True).groupby('_merge').count() 

Out[194]: 
      key left_value right_value 
_merge         
left_only  2   2   0 
right_only 2   0   2 
both   3   3   3 

如果你想筛选和保存结果:

In [198]: 
merged = pd.merge(left, right, how='outer', indicator=True) 
merged 

Out[198]: 
    key left_value right_value  _merge 
0 0   a   NaN left_only 
1 1   b   NaN left_only 
2 2   c   f  both 
3 3   d   g  both 
4 4   e   h  both 
5 5  NaN   i right_only 
6 6  NaN   j right_only 

In [199]:  
both = merged[merged['_merge'] == 'both'] 
both 

Out[199]: 
    key left_value right_value _merge 
2 2   c   f both 
3 3   d   g both 
4 4   e   h both 

In [200]: 
other = merged[merged['_merge'] != 'both'] 
other 

Out[200]: 
    key left_value right_value  _merge 
0 0   a   NaN left_only 
1 1   b   NaN left_only 
5 5  NaN   i right_only 
6 6  NaN   j right_only 
+0

是的,但我想要存储它们后,我使用drop方法进行不匹配。 – mtkilic

+0

发布期望的输出 – EdChum

+0

我刚刚编辑了所需的输出。谢谢 – mtkilic