2017-09-04 155 views
1

我是熊猫模块的新手。我有一个关于熊猫合并方法的小问题。假设我有两个单独的表,如下所示:熊猫合并两个数据帧

Original_DataFrame

machine weekNum Percent 
M1  2  75 
M1  5  80 
M1  8  95 
M1  10  90 

New_DataFrame

machine weekNum Percent 
M1  1  100 
M1  2  100 
M1  3  100 
M1  4  100 
M1  5  100 
M1  6  100 
M1  7  100 
M1  8  100 
M1  9  100 
M1  10  100 

我用熊猫模块的合并方法,如下所示:

pd.merge(orig_df, new_df, on='weekNum', how='left') 

我得到如下:

machine weekNum Percent_x Percent_y 
0 M1   2  75   100 
1 M1   5  80   100 
2 M1   8  95   100 
3 M1   10  90   100 

不过,我期待填补跳过weekNums,并把100那些行得到需要的结果如下。

machine weekNum Percent 
M1  1  100 
M1  2  75 
M1  3  100 
M1  4  100 
M1  5  80 
M1  6  100 
M1  7  100 
M1  8  95 
M1  9  100 
M1  10  90 

任何人都可以请指导我如何继续?

回答

1

我觉得你共同列需要combine_first,但首先set_index

df11 = df1.set_index(['machine','weekNum']) 
df22 = df2.set_index(['machine','weekNum']) 

df = df11.combine_first(df22).astype(int).reset_index() 
print (df) 
    machine weekNum Percent 
0  M1  1  100 
1  M1  2  75 
2  M1  3  100 
3  M1  4  100 
4  M1  5  80 
5  M1  6  100 
6  M1  7  100 
7  M1  8  95 
8  M1  9  100 
9  M1  10  90 


df.plot.bar('weekNum', 'Percent') 

graph

编辑:

对于标签:

plt.figure(figsize=(12, 8)) 
ax = df.plot.bar('weekNum', 'Percent') 
rects = ax.patches 

for rect, label in zip(rects, df['Percent']): 
    height = rect.get_height() 
    ax.text(rect.get_x() + rect.get_width()/2, height + 1, label, ha='center', va='bottom') 

plt.ylim(ymax=120) 

graph2

+0

给我一个错误,如下所示,运行上次的代码之后: ValueError异常:无效的字面INT()基数为10:“M1” – SalN85

+0

对不起,我在代码的第一个版本错字。需要'df11'和'df22' - 'df = df11.combine_first(df22).astype(int).reset_index()' – jezrael

+0

仍然是同样的错误。 ValueError:无效文字为int()以10为基数:'M1' :( – SalN85

0

不一样优雅与其他解决方案,但无论如何作品:

# join 
merged = pd.merge(data1, data2, on=['machine','weekNum'], how='outer') 
# combine percent columns 
merged['Percent'] = merged['Percent_x'].fillna(merged['Percent_y']) 
# remove extra columns 
result = merged[['machine','weekNum', 'Percent']] 

结果:

machine weekNum Percent 
M1 2 75 
M1 5 80 
M1 8 95 
M1 10 90 
M1 1 100 
M1 3 100 
M1 4 100 
M1 6 100 
M1 7 100 
M1 9 100 
+0

这是真的,但我想用原始数据覆盖weekNumbers 2,5,8和10的记录。 – SalN85

+0

作品!谢谢derline – SalN85

0

你可以试试这个。根据您的总体目标,这可能不够“程序化”。

import pandas as pd  
df1 = pd.DataFrame({"machine":["M1"]*4, "WeekNum": [2,5,8,10], "Percent":[75,80,95,90]}) 
df2 = pd.DataFrame({"machine":["M1"]*10,"WeekNum":np.arange(1,11,1),"Percent":[100]*10}) 
newcol = df2.merge(df1, on = "WeekNum", how = "outer")["Percent_y"].fillna(100) 
df2["Percent"] = newcol