可以使用2级的解决方案 - 使用pivot
或unstack
:
df1 = df.pivot(index='id', columns='step', values='step_description').add_prefix('step')
print (df1)
step step1 step2 step3
id
1 Start Continue Finish
df1 = df.set_index(['id', 'step'])['step_description'].unstack().add_prefix('step')
print (df1)
step step1 step2 step3
id
1 Start Continue Finish
但是,如果重复,需要pivot_table
或聚集体groupby
和apply
加盟:
print (df)
id step step_description
0 1 1 Start<-Same id=1, step=1
1 1 1 Start1<-Same id=1, step=1
2 1 2 Continue
3 1 3 Finish
df2=df.pivot_table(index='id',
columns='step',
values='step_description',
aggfunc=', '.join).add_prefix('step')
print (df2)
step step1 step2 step3
id
1 Start, Start1 Continue Finish
df2=df.groupby(['id', 'step'])['step_description'].apply(','.join)
.unstack().add_prefix('step')
print (df2)
step step1 step2 step3
id
1 Start,Start1 Continue Finish
编辑:
您需要2 DataFrame
秒,然后concat
他们:
cols = ['id','step','step_description','date']
df1 = df[cols].set_index(['id', 'step']).unstack().rename(columns={'step_description':'des'})
df1.columns = ['step{}_{}'.format(x[1], x[0]) for x in df1.columns]
print (df1)
step1_des step2_des step3_des step1_date step2_date step3_date
id
1 Start Continue Finish 8/6/2017 8/7/2017 8/7/2017
df2 = df.set_index(['id', 'stepA'])['stepA_description'].unstack().add_prefix('stepA')
print (df2)
stepA stepA1 stepA2 stepA3
id
1 Beginning Middle End
df = pd.concat([df1, df2], axis=1).reset_index()
print (df)
id step1_des step2_des step3_des step1_date step2_date step3_date \
0 1 Start Continue Finish 8/6/2017 8/7/2017 8/7/2017
stepA1 stepA2 stepA3
0 Beginning Middle End
谢谢!第一行完美工作 - 设置索引会导致'id'上的关键错误。此外,它不可能做一个多字段索引。我知道那不是我要求的! –
再次感谢你。数据透视表方法也允许使用多字段索引。你认为将另外两列加入混合有多困难?例如,如果我添加了“step_2”和“step_description_2”,需要将另外三列添加到数据透视表中?或者单独做每个主键并将dfs连接在一起会更简单? –
给我一些时间。 – jezrael