2016-11-09 84 views
3

我有两个数据帧。我想看看其他数据框中是否存在特定的行(全部)。从df_subset实例行:检查另一个数据帧是否存在多行

id category value date 
1  A   10  01-01-15 
3  C   10  03-01-15 

其他df_full:

id category value date 
1  A   10  01-01-15 
2  B   10  02-01-15 
3  C   10  03-01-15 
4  D   16  04-01-15 

有什么方法来检查是否在另一个存在一个数据帧的行?像这样的东西(显然这不起作用):df_subset in df_full,存在吗?

> True 

回答

3

我认为你可以使用merge与内部连接(缺省)与DataFrame.equalsdf_subset比较:使用

print (pd.merge(df_subset,df).equals(df_subset)) 
True 
+0

简洁,典雅。谢谢。 – eljusticiero67

2

您可以使用merge(..., indicator=True)方法:

In [14]: pd.merge(df1, df2, indicator=True, how='outer') 
Out[14]: 
    id category value  date  _merge 
0 1  A  10 01-01-15  both 
1 3  C  10 03-01-15  both 
2 2  B  10 02-01-15 right_only 
3 4  D  16 04-01-15 right_only 
2

numpy

(df_subset.values[:, None] == df_full.values).all(2).any(1).all() 

True 

定时
enter image description here

解释

# using [:, None] to extend into new dimension at 
# take advantage of broadcasting 
a1 = df_subset.values[:, None] == df_full.values 

    # ━> third dimension ━> 
    # ━━━━> axis=2 ━━━> 
# 1st dim 
---->[[[ True True True True] # │ 
     [False False True False] # │ second dimension 
     [False False True False] # │ axis=1 
     [False False False False]] # ↓ 

# axis=0 
---->[[False False True False] # │ 
     [False False True False] # │ second dimension 
     [ True True True True] # │ axis=1 
     [False False False False]]] # ↓ 

# first row of subset with each row of full 
[[[ True True True True] <-- This one is true for all 
    [False False True False] 
    [False False True False] 
    [False False False False]] 

# second row of subset with each row of full 
[[False False True False] 
    [False False True False] 
    [ True True True True] <-- This one is true for all 
    [False False False False]]] 

a2 = a1.all(2) 

# ┌─ first row of subset all equal 
[[ True False False False] 
[False False True False]] 
#    └─ second row of subset all equal 

a3 = a2.any(1) 

# ┌─ first row of subset matched at least one row of full 
[ True True] 
#  └─ second row of subset matched at least one row of full 

a3.all() 

True 

df_subset所有行是df_full

+0

哇!它真的很聪明 – MaxU

+0

我将数组扩展到第三维。所有(2)沿第三轴。我会更新帖子,使其更清晰 – piRSquared

+0

雅,它真的很复杂。 Numpy对我来说很复杂,我更喜欢熊猫。 – jezrael