2016-05-16 48 views
2

我有这些2数据帧:卸下相交

df_test 
    dimension1_id dimension2_id dimension3_id dimension4_id dimension5_id \ 
0   -1   -1   -1   -1   -1 
1 1177314888  238198786 5770904146  133207291   Exact 
2 1177314888  238198786 5770904266 18395155770   Exact 
3 1177314888  238198786 5770904266 19338210057   Exact 
4 1177314888  238198786 5770904266 30907903234   Exact 

df_merge 
dimension1_id dimension2_id dimension3_id dimension4_id dimension5_id \ 
0   -1   -1   -1   -1   -1 
1 1177314888  238198786 5770904146  133207291   Exact 

我想删除的一切,是内部df_mergedf_test,基于该组合dimension1_id,dimension2_id,dimension3_id,dimension4_iddimension5_id

这是我的代码:

df_test = df_test[ 
(df_test['dimension5_id'].isin(df_merge.dimension5_id) == False) & 
(df_test['dimension4_id'].isin(df_merge.dimension4_id) == False) &        (df_test['dimension3_id'].isin(df_merge.dimension3_id) == False) &        (df_test['dimension2_id'].isin(df_merge.dimension2_id) == False) & 
(df_test['dimension1_id'].isin(df_merge.dimension1_id) == False) 
] 

但这代码返回一个空的数据帧。我如何才能从df_test中删除第一行和第二行?

回答

4

您可以使用逻辑索引来应用直接比较来掩盖所需的行。在这种情况下,可以检查值df_test其在df_merge

df_test.isin(df_merge) 

所得逻辑索引充当掩模:

dimension1_id dimension2_id dimension3_id dimension4_id dimension5_id  \ 
0   True   True   True   True   True True 
1   True   True   True   True   True True 
2   False   False   False   False   False False 
3   False   False   False   False   False False 
4   False   False   False   False   False False 

True值映射到匹配的行,所以我们可以简单地否定该索引使用~仅返回df_merge中的行,而不是df_test

df_test[~df_test.isin(df_merge)]