python pandas dataframe查找包含特定值并返回的行布尔

我想比较两个数据帧，即df1和df2。 df1是一个数据，每小时更新一次。 df2是存在的数据帧。我想追加更新的特定行。python pandas dataframe查找包含特定值并返回的行布尔

例如，这里是DF1

DF1：

fd1

含有5行的其中已经存在信息

和DF2

DF2：

df2

我们可以告诉大家，埃里克加入，但DF2没有表示。

我可能会覆盖DF2与DF1，但我不应该因为将有句话将由人被更新后的数据被写入。

所以，我决定通过其ID从DF2发现它删除数据的各行，并与循环

，并在这之后，会出现删除它们只Eric的行可以保留，这将让我有可能只是将eric附加到df2。

所以，我想，这是什么

for index, row in df1.iterrows(): 
    id = row['id'] 
    if df2.loc[df1['id'].isin(id)] = True: 
     df1[df1.id != id)

并返回语法错误....

我是在正确的轨道上？这是解决这个问题的最佳解决方案吗？我应该如何改变代码来实现我的目标？

来源

2017-10-09 Taewoo.Lim

您是否在寻找'pd.concat（[DF2，DF1 [〜df1.Id.isin（df2.Id ）]]，axis = 0） '？ – Wen

要解决你的代码...

l=[] 
for index, row in df1.iterrows(): 
    id = row['Id'] 
    if sum(df2['Id'].isin([id]))>0: 
     l.append(id) 
l 
Out[334]: [0, 1, 2, 3, 4] # those are the row you need to remove 

df1.loc[~df1.index.isin(l)]# you remove them by using `~` + .isin 
Out[339]: 
    Id Name 
5 5 F 
6 6 G

通过使用pd.concat

pd.concat([df2,df1[~df1.Id.isin(df2.Id)]],axis=0) 
Out[337]: 
    Id Name 
0 0 A 
1 1 B 
2 2 C 
3 3 D 
4 4 E 
5 5 F 
6 6 G

数据输入

fake = {'Id' : [0,1,2,3,4,5,6], 
     'Name' : ['A','B','C','D','E','F','G']} 
df1 = pd.DataFrame(fake) 

fake = {'Id' : [0,1,2,3,4], 
     'Name' : ['A','B','C','D','E']} 
df2 = pd.DataFrame(fake)

来源

2017-10-09 04:37:04 Wen

大熊猫有几个可用的功能，允许合并和加入不同DataFrames。一，你可以在这里用的是merge：https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

>>>merged = df1.merge(df2, how='left') 
    id name remark 
0 234 james  
1 212 steve  
2 153 jack smart 
3 567 ted  
4 432 eric NaN 
5 543 bob

如果你不想插入值是NaN，你总是可以使用fillna：https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html。

来源

2017-10-09 04:18:59 AetherUnbound

让我们假设'steve'有，我们要在df1保留了一句话和'jack'了我们想在df2保存的话。我们可以设置每个数据帧的指数来['id', 'name']和使用pd.Series.combine_first

设置

df1 = pd.DataFrame(dict(
    id=[12, 34, 56, 78, 90, 13], 
    name='james steve jack ted eric bob'.split(), 
    remark='', 
)) 
df1.at[1, 'remark'] = 'meh' 

df2 = pd.DataFrame(dict(
    id=[12, 34, 56, 78, 13], 
    name='james steve jack ted bob'.split(), 
    remark='', 
)) 
df2.at[2, 'remark'] = 'smart'

解决方案

s1 = df1.set_index(['id', 'name']).remark 
s2 = df2.set_index(['id', 'name']).remark 

s1.mask(s1.eq('')).combine_first(s2.mask(s2.eq(''))).fillna('').reset_index() 

    id name remark 
0 12 james  
1 13 bob  
2 34 steve meh 
3 56 jack smart 
4 78 ted  
5 90 eric

然而，supposin它完全如同OP介绍的那样！

设置

df1 = pd.DataFrame(dict(
    id=[12, 34, 56, 78, 90, 13], 
    name='james steve jack ted eric bob'.split(), 
    remark='', 
)) 

df2 = pd.DataFrame(dict(
    id=[12, 34, 56, 78, 13], 
    name='james steve jack ted bob'.split(), 
    remark='', 
)) 
df2.at[2, 'remark'] = 'smart'

解决方案

df2.append(df1).drop_duplicates(['id', 'name']).reset_index(drop=True) 

    id name remark 
0 12 james  
1 34 steve  
2 56 jack smart 
3 78 ted  
4 13 bob  
5 90 eric

来源

2017-10-09 04:49:41 piRSquared

python pandas dataframe查找包含特定值并返回的行布尔

回答

相关问题