2016-07-07 56 views
3

我想比较两个不同长度的pandas DataFrame并确定匹配的索引号。当值匹配时,我想在新列中标记这些值。新列中的数据帧之间的标志相似性

df1: 
Index Column 1 
41660 Apple 
41935 Banana 
42100 Strawberry 
42599 Pineapple 

df2: 
Index Column 1 
42599 Pineapple 

Output: 
Index Column 1 'Matching Index?' 
41660 Apple 
41935 Banana 
42100 Strawberry 
42599 Pineapple True 
+0

的可能的复制[比较两列两个Python Pandas数据框并获取常用行](http://stackoverflow.com/questions/30291032/comparing-2-columns-of-two-python-pandas-dataframes-and-getting-the-common - ) – Andy

回答

4

如果这些真的是指数,那么你可以在指数使用intersection

In [61]: 
df1.loc[df1.index.intersection(df2.index), 'flag'] = True 
df1 

Out[61]: 
     Column 1 flag 
Index     
41660  Apple NaN 
41935  Banana NaN 
42100 Strawberry NaN 
42599 Pineapple True 

否则使用isin

In [63]: 
df1.loc[df1['Index'].isin(df2['Index']), 'flag'] = True 
df1 

Out[63]: 
    Index Column 1 flag 
0 41660  Apple NaN 
1 41935  Banana NaN 
2 42100 Strawberry NaN 
3 42599 Pineapple True 
+1

谢谢,这解决了我的问题。 – zbug

2

+1到@ EdChum的答案。如果你可以在你的匹配列不同的值,True住尝试:

>>> df1.merge(df2,how='outer',indicator='Flag') 
    Index  Column  Flag 
0 41660  Apple left_only 
1 41935  Banana left_only 
2 42100 Strawberry left_only 
3 42599 Pineapple  both 
2

使用ISIN() - 方法:

import pandas as pd 

df1 = pd.DataFrame(data=[ 
    [41660, 'Apple'], 
    [41935, 'Banana'], 
    [42100, 'Strawberry'], 
    [42599, 'Pineapple'], 
         ] 
        , columns=['Index', 'Column 1']) 

df2 = pd.DataFrame(data=[ 
    [42599, 'Pineapple'], 
         ] 
        , columns=['Index', 'Column 1']) 

df1['Matching'] = df1['Index'].isin(df2['Index']) 
print(df1) 

输出:

Index Column 1 Matching 
0 41660  Apple False 
1 41935  Banana False 
2 42100 Strawberry False 
3 42599 Pineapple  True 
+1

'isin'已经在我的回答中提及 – EdChum