新列中的数据帧之间的标志相似性

我想比较两个不同长度的pandas DataFrame并确定匹配的索引号。当值匹配时，我想在新列中标记这些值。新列中的数据帧之间的标志相似性

df1: 
Index Column 1 
41660 Apple 
41935 Banana 
42100 Strawberry 
42599 Pineapple 

df2: 
Index Column 1 
42599 Pineapple 

Output: 
Index Column 1 'Matching Index?' 
41660 Apple 
41935 Banana 
42100 Strawberry 
42599 Pineapple True

来源

2016-07-07 zbug

的可能的复制[比较两列两个Python Pandas数据框并获取常用行]（http://stackoverflow.com/questions/30291032/comparing-2-columns-of-two-python-pandas-dataframes-and-getting-the-common - ） – Andy

如果这些真的是指数，那么你可以在指数使用intersection：

In [61]: 
df1.loc[df1.index.intersection(df2.index), 'flag'] = True 
df1 

Out[61]: 
     Column 1 flag 
Index     
41660  Apple NaN 
41935  Banana NaN 
42100 Strawberry NaN 
42599 Pineapple True

否则使用isin：

In [63]: 
df1.loc[df1['Index'].isin(df2['Index']), 'flag'] = True 
df1 

Out[63]: 
    Index Column 1 flag 
0 41660  Apple NaN 
1 41935  Banana NaN 
2 42100 Strawberry NaN 
3 42599 Pineapple True

来源

2016-07-07 15:28:26 EdChum

谢谢，这解决了我的问题。 – zbug

+1到@ EdChum的答案。如果你可以在你的匹配列不同的值，True住尝试：

>>> df1.merge(df2,how='outer',indicator='Flag') 
    Index  Column  Flag 
0 41660  Apple left_only 
1 41935  Banana left_only 
2 42100 Strawberry left_only 
3 42599 Pineapple  both

来源

2016-07-07 15:34:17 bernie

使用ISIN（） - 方法：

import pandas as pd 

df1 = pd.DataFrame(data=[ 
    [41660, 'Apple'], 
    [41935, 'Banana'], 
    [42100, 'Strawberry'], 
    [42599, 'Pineapple'], 
         ] 
        , columns=['Index', 'Column 1']) 

df2 = pd.DataFrame(data=[ 
    [42599, 'Pineapple'], 
         ] 
        , columns=['Index', 'Column 1']) 

df1['Matching'] = df1['Index'].isin(df2['Index']) 
print(df1)

输出：

Index Column 1 Matching 
0 41660  Apple False 
1 41935  Banana False 
2 42100 Strawberry False 
3 42599 Pineapple  True

来源

2016-07-07 15:39:57 Blind0ne

'isin'已经在我的回答中提及 – EdChum

新列中的数据帧之间的标志相似性

回答

相关问题