2017-06-22 70 views
1

我有两只大熊猫dataframes: DF1:匹配一个表和图值等在大熊猫蟒蛇

LT  route_1 c2 
PM/2  120 44 
PM/52 110 49 
PM/522 103 51 
PM/522 103 51 
PM/24 105 48 
PM/536 109 67 
PM/536 109 67 
PM/5356 112 144 

DF2:

LT  W_ID 
PM/2  120.0 
PM/52 110.0 
PM/522 103.0 
PM/522 103.0 
PM/24 105.0 
PM/536 109.0 
PM/536 109.0 
PM/5356 112.0 

我需要从DF2映射W_ID成从DF1 route_1 ,要清楚,替换,但一个表中的LT需要与另一个表中的LT匹配。 所需的输出:

LT  route_1 c2 
PM/2  120.0 44 
PM/52 110.0 49 
PM/522 103.0 51 
PM/522 103.0 51 
PM/24 105.0 48 
PM/536 109.0 67 
PM/536 109.0 67 
PM/5356 112.0 144 

回答

1

我觉得map应该工作:

df1['route_1'] = df1['LT'].map(df2.set_index('LT')['W_ID']) 

遗憾的是没有:

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

编辑:

问题是duplicatesLT列。解决方法是通过cumcount独特left join通过merge添加辅助列:

df1['g'] = df1.groupby('LT').cumcount() 
df2['g'] = df2.groupby('LT').cumcount() 
df = pd.merge(df1, df2, on=['LT','g'], how='left') 
print (df) 
     LT route_1 c2 g W_ID 
0  PM/2  120 44 0 120.0 
1 PM/52  110 49 0 110.0 
2 PM/522  103 51 0 103.0 
3 PM/522  103 51 1 103.0 
4 PM/24  105 48 0 105.0 
5 PM/536  109 67 0 109.0 
6 PM/536  109 67 1 109.0 
7 PM/5356  112 144 0 112.0 

df1['route_1'] = df['W_ID'] 
df1.drop('g', axis=1, inplace=True) 
print (df1) 
     LT route_1 c2 
0  PM/2 120.0 44 
1 PM/52 110.0 49 
2 PM/522 103.0 51 
3 PM/522 103.0 51 
4 PM/24 105.0 48 
5 PM/536 109.0 67 
6 PM/536 109.0 67 
7 PM/5356 112.0 144 

类似的解决方案:

df1['g'] = df1.groupby('LT').cumcount() 
df2['g'] = df2.groupby('LT').cumcount() 
df = pd.merge(df1, df2, on=['LT','g'], how='left') 
     .drop(['g', 'route_1'], axis=1) 
     .rename(columns={'W_ID':'route_1'}) 
     .reindex_axis(['LT', 'route_1', 'c2'], axis=1) 
print (df) 
     LT route_1 c2 
0  PM/2 120.0 44 
1 PM/52 110.0 49 
2 PM/522 103.0 51 
3 PM/522 103.0 51 
4 PM/24 105.0 48 
5 PM/536 109.0 67 
6 PM/536 109.0 67 
7 PM/5356 112.0 144 
+0

我认为这是一个很好的方式,但我得到这个错误:重新索引只有唯一值的索引对象有效 – jovicbg