2017-04-09 92 views
1

我有一个dataFrame是一个观察列表,按'name'列分组。我很难将其转换为multiIndex格式。如何将Pandas DataFrame转换为MultiIndexed形式的clustermap?

我有类似:

name | ratio | DayOfWeek | HourOfDay 
    foo | 0.7 | Mon  | 0 
    foo | 0.2 | Mon  | 1 
    foo | 0.11 | Mon  | 2 
    foo | 0.45 | Mon  | 3 
.. 
    foo | 0.2 | Mon  | 23 
    foo | 0.1 | Tue  | 0 
    foo | 0.6 | Tue  | 1 
    foo | 0.2 | Tue  | 2 
.. 
    foo | 0.1 | Sun  | 23 
    bar | 0.2 | Mon  | 0 
    bar | 0.11 | Mon  | 1 
.. 

等。

我想要的是我可以与seaborn clustermaps一起使用,以显示每天(作为整体)“名称”的“比率”与天内特定小时之间的相关性。

例如我需要这样的东西(不确定的,如果正确的,但是这是我尝试过):

     | foo | bar | ... 
DayOfWeek HourOfDay | 
Mon  0   | 0.7 | 0.2 | ... 
      1   | ... 
      2   | ... 
... 
Tue  0   | 0.1 | ... 
      1   | ... 
...  2 

一旦我有,我希望能够XS()成由seaborn热图/的ClusterMap可用的片。

回答

1

您可以使用set_indexunstack

df = df.set_index(['DayOfWeek','HourOfDay','name'])['ratio'].unstack() 
print (df) 
name     bar foo 
DayOfWeek HourOfDay    
Mon  0   0.20 0.70 
      1   0.11 0.20 
      2   NaN 0.11 
      3   NaN 0.45 
      23   NaN 0.20 
Sun  23   NaN 0.10 
Tue  0   NaN 0.10 
      1   NaN 0.60 
      2   NaN 0.20 

但如果需要重复使用pivot_tablemeansum一些骨料FUNC ...:

print (df) 
    name ratio DayOfWeek HourOfDay 
0 foo 0.70  Mon   0 <- duplicate for same name, DayOfWeek and HourOfDay - 0.7 
1 foo 0.90  Mon   0 <- duplicate for same name, DayOfWeek and HourOfDay - 0.9 
2 foo 0.20  Mon   1 
3 foo 0.11  Mon   2 
4 foo 0.45  Mon   3 
5 foo 0.20  Mon   23 
6 foo 0.10  Tue   0 
7 foo 0.60  Tue   1 
8 foo 0.20  Tue   2 
9 foo 0.10  Sun   23 
10 bar 0.20  Mon   0 
11 bar 0.11  Mon   1 


df = df.pivot_table(index=['DayOfWeek','HourOfDay'], 
        columns='name', 
        values='ratio', 
        aggfunc='mean') 
print (df) 

name     bar foo 
DayOfWeek HourOfDay    
Mon  0   0.20 0.80 < (0.7 + 0.9)/2 = 0.8 
      1   0.11 0.20 
      2   NaN 0.11 
      3   NaN 0.45 
      23   NaN 0.20 
Sun  23   NaN 0.10 
Tue  0   NaN 0.10 
      1   NaN 0.60 
      2   NaN 0.20 

替代与groupby

df = df.groupby(['DayOfWeek','HourOfDay','name'])['ratio'].mean().unstack() 
print (df) 
name     bar foo 
DayOfWeek HourOfDay    
Mon  0   0.20 0.80 < (0.7 + 0.9)/2 = 0.8 
      1   0.11 0.20 
      2   NaN 0.11 
      3   NaN 0.45 
      23   NaN 0.20 
Sun  23   NaN 0.10 
Tue  0   NaN 0.10 
      1   NaN 0.60 
      2   NaN 0.20 
相关问题