熊猫据帧堆叠多个列中的值成单列

假设下面的数据帧。熊猫据帧堆叠多个列中的值成单列

key.0 key.1 key.2 topic 
1 abc def ghi  8 
2 xab xcd xef  9

我如何可以将所有的关键值*成一列的“钥匙”栏目，这是与主题关联对应于键的值。*列？这是结果，我想：

topic key 
1  8 abc 
2  8 def 
3  8 ghi 
4  9 xab 
5  9 xcd 
6  9 xef

注意key.N列数是一些外部变量N.

来源

2015-12-19 borice

你可以融化你的数据框：

>>> keys = [c for c in df if c.startswith('key.')] 
>>> pd.melt(df, id_vars='topic', value_vars=keys, value_name='key') 

    topic variable key 
0  8 key.0 abc 
1  9 key.0 xab 
2  8 key.1 def 
3  9 key.1 xcd 
4  8 key.2 ghi 
5  9 key.2 xef

这也给你是钥匙的来源。

从v0.20，melt是pd.DataFrame类的第一类函数：

>>> df.melt('topic', value_name='key').drop('variable', 1) 

    topic key 
0  8 abc 
1  9 xab 
2  8 def 
3  9 xcd 
4  8 ghi 
5  9 xef

来源

2015-12-19 22:55:48 Alexander

简单，速度非常快。谢谢。 – borice

尝试各种方式之后，我发现下面是多还是少直观，提供stack的魔法了解：

# keep topic as index, stack other columns 'against' it 
stacked = df.set_index('topic').stack() 
# set the name of the new series created 
df = stacked.reset_index(name='key') 
# drop the 'source' level (key.*) 
df.drop('level_1', axis=1, inplace=True)

所得数据帧是根据需要：

topic key 
0  8 abc 
1  8 def 
2  8 ghi 
3  9 xab 
4  9 xcd 
5  9 xef

您可能要打印中间结果，了解全过程。如果你不介意超过所需的列，关键步骤是set_index('topic')，stack()和reset_index(name='key')。

来源

2015-12-19 23:09:21 miraculixx

我似乎无法找到关于'reset_index'了'name'参数的任何文件，你能解释它是如何工作的？ – imp9

它是[Series.reset_index（）]（http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.reset_index.html?highlight=reset_index） – miraculixx

OK，导致当前的答案之一是标记为重复这个问题，我会在这里回答。

使用wide_to_long

pd.wide_to_long(df, ['key'], 'topic', 'age').reset_index().drop('age',1) 
Out[123]: 
    topic key 
0  8 abc 
1  9 xab 
2  8 def 
3  9 xcd 
4  8 ghi 
5  9 xef

来源

2017-09-15 13:07:54 Wen

熊猫据帧堆叠多个列中的值成单列

回答

相关问题