2017-09-15 44 views
0

我试图做遗漏值的“简单”查找从另外一个数据帧的多个列:大熊猫据帧合并查询 - 在结果

somedict = {'col1':['a1','b2','c3','d4','d5','d6'], 'Col2':['a','b','c','b','e','a'], 'Col3':[33,56,74,55,99,86], 'Col4':['','',3,'',5,'']} 
dfa = pd.DataFrame(somedict) 

otherdic = {'Col2':['a','b'], 'Col4':['NEW', 'ALSONEW']} 
dfb = pd.DataFrame(otherdic) 

,所以我得到 DFB和DFA:

Col2 Col4 
0 a NEW 
1 b ALSONEW 

Col2 Col3 Col4 col1 
0 a  33   a1 
1 b  56   b2 
2 c  74  3 c3 
3 b  55   d4 
4 e  99  5 d5 
5 a  86   d6 

我所寻找的是

Col2 Col3 Col4 col1 
0 a  33  NEW a1 
1 b  56 ALSONEW b2 
2 c  74  3 c3 
3 b  55 ALSONEW d4 
4 e  99  5 d5 
5 a  86  NEW d6 

我曾尝试:

pd.merge(dfa, dfb, on='Col2', how='left') 

这将产生

Col2 Col3 Col4_x col1 Col4_y 
0 a  33    a1 NEW 
1 b  56    b2 ALSONEW 
2 c  74   3  c3 NaN 
3 b  55    d4 ALSONEW 
4 e  99   5  d5 NaN 
5 a  86    d6 NEW 

我是否做出不正确的假设,合并,应当 '知道' 是列名COL4比赛?
任何帮助表示赞赏。谢谢。

回答

1

单程替换Col4空白''dfb映射Col2Col4

In [499]: dfa.loc[dfa['Col4']=='', 'Col4'] = dfa['Col2'].map(dfb.set_index('Col2')['Col4']) 

In [500]: dfa 
Out[500]: 
    Col2 Col3  Col4 col1 
0 a 33  NEW a1 
1 b 56 ALSONEW b2 
2 c 74  3 c3 
3 b 55 ALSONEW d4 
4 e 99  5 d5 
5 a 86  NEW d6 

详细

In [485]: mapping = dfb.set_index('Col2')['Col4'] 

In [486]: mapping 
Out[486]: 
Col2 
a  NEW 
b ALSONEW 
Name: Col4, dtype: object 

In [487]: dfa['Col2'].map(mapping) 
Out[487]: 
0  NEW 
1 ALSONEW 
2  NaN 
3 ALSONEW 
4  NaN 
5  NEW 
Name: Col2, dtype: object 

In [488]: dfa.loc[dfa['Col4'] == '', 'Col4'] = dfa['Col2'].map(mapping) 

In [489]: dfa 
Out[489]: 
    Col2 Col3  Col4 col1 
0 a 33  NEW a1 
1 b 56 ALSONEW b2 
2 c 74  3 c3 
3 b 55 ALSONEW d4 
4 e 99  5 d5 
5 a 86  NEW d6 
+0

谢谢你的教育,约翰!我感觉自己像一个蹒跚学步的孩子与菲茨帕特里克谈话。 – johnaco

0
new = dfa.Col4.mask(
    dfa.Col4.eq(''), 
    dfa.Col2.map(dict(dfb.values)) 
) 
dfa.assign(Col4=new) 

    Col2 Col3  Col4 col1 
0 a 33  NEW a1 
1 b 56 ALSONEW b2 
2 c 74  3 c3 
3 b 55 ALSONEW d4 
4 e 99  5 d5 
5 a 86  NEW d6 
+0

也很好。谢谢piRSquared! – johnaco