使用多索引列标题将整个数据帧重新整形为长

我想将熊猫数据框重新整形为长整型。挑战在于列已经有多索引列标题。数据帧如下所示：使用多索引列标题将整个数据帧重新整形为长

category price1   price2   
year  2011 2012 2013 2011 2012 2013 
1   33 22 48 135 144 149 
2   22 26 37 136 127 129 
3   39 30 47 123 148 148 
4   45 42 21 140 126 121 
5   20 37 35 141 142 147 
6   29 20 34 122 121 132 
7   20 35 45 128 123 130 
8   39 34 49 125 120 131 
9   24 20 36 122 146 130 
10   24 37 43 142 133 138 
11   23 22 40 124 135 131 
12   27 22 40 147 149 132

下面是生成相同数据帧的代码片段。您还将看到我通过连接两个其他数据框来构建此数据框。

这里的片段：

import pandas as pd 
import numpy as np 

# Make dataframe df1 with 12 observations over 3 years 
# with multiindexed column headers 
np.random.seed(123) 
df1 = pd.DataFrame(np.random.randint(20, 50, size = (12,3)), columns=[2011,2012,2013]) 
df1.index = np.arange(1,len(df1)+1) 
colNames1 = df1.columns 
header1 = pd.MultiIndex.from_product([['price1'], colNames1], names=['category','year']) 
df1.columns = header1 

# Make dataframe df2 with 12 observations over 3 years 
# with multiindexed column headers 
df2 = pd.DataFrame(np.random.randint(120, 150, size = (12,3)), columns=[2011,2012,2013]) 
df2.index = np.arange(1,len(df2)+1) 
colNames1 = df2.columns 
header1 = pd.MultiIndex.from_product([['price2'], colNames1], names=['category','year']) 
df2.columns = header1 

df3 = pd.concat([df1, df2], axis = 1)

这里是所需的输出：

 price1 price2 
1 2011 33 135 
2 2011 22 136 
3 2011 39 123 
4 2011 45 140 
5 2011 20 141 
6 2011 29 122 
7 2011 20 128 
8 2011 39 125 
9 2011 24 122 
10 2011 24 142 
11 2011 23 124 
12 2011 27 147 
1 2012 22 144 
2 2012 26 127 
3 2012 30 148 
4 2012 42 126 
5 2012 37 142 
6 2012 20 121 
7 2012 35 123 
8 2012 34 120 
9 2012 20 146 
10 2012 37 133 
11 2012 22 135 
12 2012 22 149 
1 2013 48 149 
2 2013 37 129 
3 2013 47 148 
4 2013 21 121 
5 2013 35 147 
6 2013 34 132 
7 2013 45 130 
8 2013 49 131 
9 2013 36 130 
10 2013 43 138 
11 2013 40 131 
12 2013 40 132

我试过基于与Reshape和pandas.wide_to_long建议不同的解决方案，但我与挣扎多索引列名称。那么为什么不把它删除呢？主要是因为这是我现实世界问题的样子，也是因为我拒绝相信这是不可能完成的。

谢谢你的任何建议！

来源

2017-10-06 vestland

使用stack是最后一级和sort_index，加rename_axis和reset_index为列：

df3 = (df3.stack() 
     .sort_index(level=[1,0]) 
     .rename_axis(['months','year']) 
     .reset_index() 
     .rename_axis(None, 1)) 
print (df3.head(15)) 
    months year price1 price2 
0  1 2011  33  135 
1  2 2011  22  136 
2  3 2011  39  123 
3  4 2011  45  140 
4  5 2011  20  141 
5  6 2011  29  122 
6  7 2011  20  128 
7  8 2011  39  125 
8  9 2011  24  122 
9  10 2011  24  142 
10  11 2011  23  124 
11  12 2011  27  147 
12  1 2012  22  144 
13  2 2012  26  127 
14  3 2012  30  148

如果需要MutliIndex：

df3 = df3.stack().sort_index(level=[1,0]) 
print (df3.head(15)) 
category price1 price2 
    year     
1 2011  33  135 
2 2011  22  136 
3 2011  39  123 
4 2011  45  140 
5 2011  20  141 
6 2011  29  122 
7 2011  20  128 
8 2011  39  125 
9 2011  24  122 
10 2011  24  142 
11 2011  23  124 
12 2011  27  147 
1 2012  22  144 
2 2012  26  127 
3 2012  30  148

来源

2017-10-06 10:59:34 jezrael

爵士一样，以同样的愿望我来了位earlier.And完全一样的东西从prev回答？ haha – Dark

不，因为这里我通过最后一级重新排列并排序;） – jezrael

谢谢！在该示例中，索引值1到12表示一年的月份。有关如何在堆叠数据框中保留这些信息的任何建议？ – vestland

使用多索引列标题将整个数据帧重新整形为长

回答

相关问题