熊猫堆叠DataFrame并连接列索引的名称

我在Pandas/Python中的这种特定格式有问题。我的DataFrame看起来像这样。 Current dataframe 熊猫堆叠DataFrame并连接列索引的名称

所需的输出是这样的。

Id Predicted 
1_1 0 
1_2 0 
1_3 0 
1_4 0 
1_5 0 
1_6 0 
1_7 0 
1_8 0 
1_9 0 
2_1 0 
2_2 0 
2_3 0 
2_4 0 
2_5 0 
2_6 0 
2_8 0 
2_9 0

其中Id由索引加上连接的列名组成，预测值是DataFrame中此特定坐标的预测值。

1_1指数1列1，1_2指标1，第2栏等

我想写输出到CSV，但不知道如何通过数据帧重复，以获得该形状。

来源

2016-01-22 jackal

首先，你可以用stack重塑数据框：

In [29]: df = pd.DataFrame(np.random.randn(3,3)) 

In [30]: df 
Out[30]: 
      0   1   2 
0 -1.138655 -1.633784 0.328994 
1 -0.952137 1..327618 
2 -1.318940 1.191259 0.133112 

In [31]: df2 = df.stack() 

In [32]: df2 
Out[32]: 
0 0 -1.138655 
    1 -1.633784 
    2 0.328994 
1 0 -0.952137 
    1 1.
    2 1.327618 
2 0 -1.318940 
    1 1.191259 
    2 0.133112 
dtype: float64

这给了你一个多指标（二级指标水平，从原来的索引和列名）系列。然后，您可以按如下重新格式化该多指标：

In [33]: df2.index = [str(i) + '_'+ str(j) for i, j in df2.index] 

In [34]: df2 
Out[34]: 
0_0 -1.138655 
0_1 -1.633784 
0_2 0.328994 
1_0 -0.952137 
1_1 1.
1_2 1.327618 
2_0 -1.318940 
2_1 1.191259 
2_2 0.133112 
dtype: float64

注意，我包括一个_在这里，因为我的例子中数据帧的列名还不具备这一点。

来源

2016-01-22 14:57:00 joris

超级，这正是我需要的:)谢谢你joris – jackal

酷@joris感谢您的伟大答案。你能不能解释这是如何工作的？在第二步之前检查'df2.index'，我看到它有两个属性，一个是'.levels'，另一个是'.labels'。我现在知道关卡是'df.index.values'。然而，labels属性显示'df.columns.values'的位置。我不知道为什么列表理解能正常工作。我的意思是，'j'如何映射到'df.columns.values [j]'？谢谢 –

'df.index.values'返回元组（所以这与'df.index.levels'不同），所以对这些值进行迭代可以将该行的索引元素作为元组返回。 – joris

熊猫堆叠DataFrame并连接列索引的名称

回答

相关问题