2017-10-18 90 views
1

我想合并两个数据框来查找任何新条目。目前这两个数据帧是相同的。熊猫合并复制所有行

数据帧甲

BusinessName      Ubi    IdentifierValue 
0 CHULA VISTA PAINTING/SERVICES  604000010  CHULAVP841MQ 
1 MANU TECH LLC      604000040  MANUTTL833BL 
2 HAWTHORN LANDSCAPE MTRILS INC  604000042  HAWTHLM845MM 
3 M M R CONSTRUCTION LLC    604000082  MMRCOCL848MM 
4 HURTADO PAINTING     604000120  HURTAP*831JJ 

数据帧乙

 BusinessName     Ubi    IdentifierValue 
0 CHULA VISTA PAINTING/SERVICES  604000010  CHULAVP841MQ 
1 MANU TECH LLC      604000040  MANUTTL833BL 
2 HAWTHORN LANDSCAPE MTRILS INC  604000042  HAWTHLM845MM 
3 M M R CONSTRUCTION LLC    604000082  MMRCOCL848MM 
4 HURTADO PAINTING     604000120  HURTAP*831JJ 

当我在UBI合并它重复所有的行。

A = A[['Ubi']] 
B = B[['Ubi']] 
A = A.merge(B, how='outer', indicator=True) 
A 


    Ubi   _merge 
0 604000010.0 left_only 
1 604000040.0 left_only 
2 604000042.0 left_only 
3 604000082.0 left_only 
4 604000120.0 left_only 
5 604000010.0 right_only 
6 604000040.0 right_only 
7 604000042.0 right_only 
8 604000082.0 right_only 
9 604000120.0 right_only 

如果我仅合并商业名称,但它按预期工作。

A = A[['BusinessName']] 
B = B[['BusinessName']] 
A = A.merge(B, how='outer', indicator=True) 
A 

BusinessName      _merge 
0 CHULA VISTA PAINTING/SERVICES both 
1 MANU TECH LLC     both 
2 HAWTHORN LANDSCAPE MTRILS INC both 
3 M M R CONSTRUCTION LLC   both 
4 HURTADO PAINTING    both 

这将是最好的Ubi合并,但我似乎无法找到问题。 Ubi列是Int64,而其他列是对象。当我在Ubi列合并时,我注意到列类型切换为float64。

回答

1

有问题不同类型,需要一样。

一下:

print (A['Ubi'].dtype) 
print (B['Ubi'].dtype) 

所以需要:

A['Ubi'] = A['Ubi'].astype(str) 
B['Ubi'] = B['Ubi'].astype(str) 

或者:

A['Ubi'] = A['Ubi'].astype(int) 
B['Ubi'] = B['Ubi'].astype(int) 
+1

是,检查dtypes第一 – Rockbar

+0

这是问题,谢谢! –