2017-05-03 36 views
2

我想在一列中使用重复值的数据帧进行旋转,以在新列中公开相关值,如下例所示。从熊猫文档我只是不知道如何从这个去...熊猫:使用重复数据点旋转

name car model 
rob mazda 626 
rob bmw 328 
james audi a4 
james VW  golf 
tom audi a6 
tom ford focus 

要这个......

name car_1 model_1 car_2 model_2 
rob mazda 626  bmw 328 
james audi a4  VW  golf 
tom audi a6  ford focus 

回答

3
x = df.groupby('name')['car','model'] \ 
     .apply(lambda x: pd.DataFrame(x.values.tolist(), 
      columns=['car','model'])) \ 
     .unstack() 
x.columns = ['{0[0]}_{0[1]}'.format(tup) for tup in x.columns] 

结果:

In [152]: x 
Out[152]: 
     car_0 car_1 model_0 model_1 
name 
james audi VW  a4 golf 
rob mazda bmw  626  328 
tom  audi ford  a6 focus 

如何对色谱柱进行分类:

In [157]: x.loc[:, x.columns.str[::-1].sort_values().str[::-1]] 
Out[157]: 
     model_0 car_0 model_1 car_1 
name 
james  a4 audi golf VW 
rob  626 mazda  328 bmw 
tom  a6 audi focus ford 
1

我们可以用groupbycumcount

i = df.groupby('name').cumcount() + 1 
df.set_index(['name', i2]).unstack() 

     car  model  
      1  2  1  2 
name       
james audi VW a4 golf 
rob mazda bmw 626 328 
tom  audi ford a6 focus 

设置索引或者,我们可以折叠pd.MultiIndex

i = df.groupby('name').cumcount() + 1 
d1 = df.set_index(['name', i2]).unstack().sort_index(1, 1) 
d1.columns = d1.columns.to_series().map('{0[0]}_{0[1]}'.format) 
d1 


     car_1 model_1 car_2 model_2 
name        
james audi  a4 VW golf 
rob mazda  626 bmw  328 
tom  audi  a6 ford focus