2017-02-24 199 views
5

说我们有一个数据帧,看起来像这样:如何旋转大熊猫数据帧

day_of_week ice_cream  count proportion 
0 Friday vanilla  638  0.094473 
1 Friday chocolate  2048 0.663506 
2 Friday strawberry 4088 0.251021 
3 Monday vanilla  448  0.079736 
4 Monday chocolate  2332 0.691437 
5 Monday strawberry 441  0.228828 
6 Saturday vanilla  24  0.073350 
7 Saturday chocolate  244  0.712930 ... ... 

我想要一个新的数据帧坍塌到day_of_week作为索引,所以它看起来是这样的:

day_of_week vanilla chocolate strawberry 
0 Friday  0.094473 0.663506 0.251021 
1 Monday  0.079736 0.691437 0.228828 
2 Saturday ...  ...   ... 

我可以实现这个最干净的方式是什么?

+0

查找枢轴功能上大熊猫 – lordingtar

回答

4

df.pivot_table是正确的解决方案:

In[31]: df.pivot_table(values='proportion', index='day_of_week', columns='ice_cream').reset_index() 
Out[31]: 
    ice_cream day_of_week chocolate strawberry vanilla 
0    Friday 0.663506 0.251021 0.094473 
1    Monday 0.691437 0.228828 0.079736 
2   Saturday 0.712930   NaN 0.073350 

如果你离开了reset_index()它实际上将返回一个索引数据帧,这可能对你更有用。

请注意,当values列不是元组(index, columns)的函数时,数据透视表必然会执行维度降低。如果有多个(index, columns)对与不同valuepivot_table通过使用聚合函数将维度降至1,默认情况下为mean

+1

'.reset_index()'来获得OP的期望的输出? – AChampion

2

您正在寻找pivot_table

df = pd.pivot_table(df, index='day_of_week', columns='ice_cream', values = 'proportion') 

你得到:

ice_cream chocolate strawberry vanilla 
day_of_week   
Friday  0.663506 0.251021 0.094473 
Monday  0.691437 0.228828 0.079736 
Saturday 0.712930 NaN   0.073350 
1

使用数据透视表:

import pandas as pd 
import numpy as np 

df = pd.DataFrame({'day_of_week':['Friday','Sunday','Monday','Sunday','Friday','Friday'], \ 
'count':[200,300,100,50,110,90], 'ice_cream':['choco','vanilla','vanilla','choco','choco','straw'],\ 
'proportion':[.9,.1,.2,.3,.8,.4]}) 

print df 

# If you like replace np.nan with zero 
tab = pd.pivot_table(df,index='day_of_week',columns='ice_cream', values=['proportion'],fill_value=np.nan) 
print tab 

输出:

count day_of_week ice_cream proportion 
0 200  Friday  choco   0.9 
1 300  Sunday vanilla   0.1 
2 100  Monday vanilla   0.2 
3  50  Sunday  choco   0.3 
4 110  Friday  choco   0.8 
5  90  Friday  straw   0.4 
      proportion    
ice_cream  choco straw vanilla 
day_of_week       
Friday   0.85 0.4  NaN 
Monday    NaN NaN  0.2 
Sunday   0.30 NaN  0.1 
+0

哇你真的花时间来创建一个DataFrame。你知道'pd.read_clipboard()'存在吗? –

1

使用​​和unstack

df.set_index(['day_of_week', 'ice_cream']).proportion.unstack() \ 
    .reset_index().rename_axis([None], 1) 

    day_of_week chocolate strawberry vanilla 
0  Friday 0.663506 0.251021 0.094473 
1  Monday 0.691437 0.228828 0.079736 
2 Saturday 0.712930   NaN 0.073350 

时机VS pivot_table

enter image description here