Python熊猫：在宽型DataFrame中传递某些变量

数据传递问题：我如何有选择地从一个过宽的DataFrame中传递某些变量？Python熊猫：在宽型DataFrame中传递某些变量

例如，我想转：

df1 = pd.DataFrame(
    [[1,'a','b',.1,-1,10], 
    [2,'a','b',.2,-3,12], 
    [3,'c','d',.3,-5,14]], 
    columns=['sample','id1','id2','x','y1','y2']) 
print df1 
# sample id1 id2 x y1 y2 
#0  1 a b 0.1 -1 10 
#1  2 a b 0.2 -3 12 
#2  3 c d 0.3 -5 14

成：

# sample id position x y 
#0  1 a   1 0.1 -1 
#1  1 b   2 0.1 10 
#2  2 a   1 0.2 -3 
#3  2 b   2 0.2 12 
#4  3 c   1 0.3 -5 
#5  3 d   2 0.3 14

注意，x被复制，和y与位置对齐。

直线pd.melt()创建混合变量和数据类型，这些混合变量和数据类型不易选择性地重新转换为宽泛形式。

print pd.melt(df1, id_vars='sample') 
# sample variable value 
#0  1  id1  a 
#1  2  id1  a 
#2  3  id1  c 
#3  1  id2  b 
#4  2  id2  b 
#5  3  id2  d 
#6  1  x 0.1 
#7  2  x 0.2 
#8  3  x 0.3 
#9  1  y1 -1 
#10  2  y1 -3 
#11  3  y1 -5 
#12  1  y2 10 
#13  2  y2 12 
#14  3  y2 14

有什么建议吗？谢谢！

来源

2017-03-16 Matthew Davis

你可以试试这个：

# set columns that don't change as index 
df1.set_index(['sample', 'x'], inplace=True) 

# create multi-index columns based on the names pattern 
df1.columns = pd.MultiIndex.from_arrays(df1.columns.str.extract(r"(\D+)(\d+)", expand=True).T.values) 

# transform the multi-index data frames to long format with stack 
df1.stack(level=1).rename_axis(('sample', 'x', 'position')).reset_index()

来源

2017-03-16 19:18:07 Psidom

首先，真棒答案。由于df.columns.str.extract（）对我来说是一个新问题：如果列名更复杂，例如'['id1，f22'，'id2，f22'，'var50_a1'，'var50_a2 “]'。你只需要使用一些正则表达式来提取正确的var名称/位置？ –

我不认为正则表达式可以很容易地处理混合模式列，它必须有一个清晰的模式来将它分割为多个索引，例如'a1，a2，b1，b2，c1，c2'或'var1_a1，var1_a2 ，var2_a1，var2_a2'都应该没问题，但对于后者而言，正则表达式应该是'（[^ _] +）_（[^ _] +）'。所以确保你的列名不会发疯会有所帮助。 – Psidom

很酷，很容易在提取之前重命名列。 –

Python熊猫：在宽型DataFrame中传递某些变量

回答

相关问题