2016-09-28 226 views
2

,我有以下数据:在大熊猫扩大行数据帧

product Sales_band Hour_id sales 
prod_1 HIGH   1 200 
prod_1 HIGH   3 100 
prod_1 HIGH   4 300 
prod_1 VERY HIGH  2 100 
prod_1 VERY HIGH  5 253 
prod_1 VERY HIGH  6 234 

要添加一个基于hour_id值的行。 hour_id变量的值可以从1到10。因此,上述相同的数据将在缺少小时id的位置展开。虚拟输出:(销售= 0失踪小时ID

product Sales_band Hour_id sales 
prod_1 HIGH   1 200 
prod_1 HIGH   2 0 
prod_1 HIGH   3 100 
prod_1 HIGH   4 300 
prod_1 HIGH   5 0 
prod_1 HIGH   6 0 
prod_1 HIGH   7 0 
prod_1 HIGH   8 0 
prod_1 HIGH   9 0 
prod_1 HIGH   10 0 
prod_1 VERY HIGH  1 0 
prod_1 VERY HIGH  2 100 
prod_1 VERY HIGH  3 0 
prod_1 VERY HIGH  4 0 
prod_1 VERY HIGH  5 253 
prod_1 VERY HIGH  6 234 
prod_1 VERY HIGH  7 0 
prod_1 VERY HIGH  8 0 
prod_1 VERY HIGH  9 0 
prod_1 VERY HIGH  10 0 

我怎么能做到这一点使用python数据帧时。

+0

您应该结束了,每个产品和销售带10行? –

+0

是的,这应该是理想的最终输出 – Mukul

回答

2

使用groupbyreindex

print (df.groupby(['product','Sales_band'])['Hour_id','sales'] 
     .apply(lambda x: x.set_index('Hour_id').reindex(range(1, 11), fill_value=0)) 
     .reset_index()) 

    product Sales_band Hour_id sales 
0 prod_1  HIGH  1 200 
1 prod_1  HIGH  2  0 
2 prod_1  HIGH  3 100 
3 prod_1  HIGH  4 300 
4 prod_1  HIGH  5  0 
5 prod_1  HIGH  6  0 
6 prod_1  HIGH  7  0 
7 prod_1  HIGH  8  0 
8 prod_1  HIGH  9  0 
9 prod_1  HIGH  10  0 
10 prod_1 VERY HIGH  1  0 
11 prod_1 VERY HIGH  2 100 
12 prod_1 VERY HIGH  3  0 
13 prod_1 VERY HIGH  4  0 
14 prod_1 VERY HIGH  5 253 
15 prod_1 VERY HIGH  6 234 
16 prod_1 VERY HIGH  7  0 
17 prod_1 VERY HIGH  8  0 
18 prod_1 VERY HIGH  9  0 
19 prod_1 VERY HIGH  10  0 
+0

非常感谢。有效。将阅读更多关于set_index和reindex的信息。 – Mukul

+0

谢谢你的接受!美好的一天! – jezrael