在数据帧拼合多列一列

我有这样一个数据帧：在数据帧拼合多列一列

id other_id_1 other_id_2 other_id_3 
1  100   101   102 
2  200   201   202 
3  300   301   302

我想这一点：

id other_id 
1  100 
1  101 
1  102 
2  200 
2  201 
2  202 
3  300 
3  301 
3  302

我可以很容易地得到我想要的输出是这样的：

to_keep = {} 
for idx in df.index: 
    identifier = df.loc[idx]['id'] 
    to_keep[identifier] = [] 
    for col in ['other_id_1', 'other_id_2', 'other_id_3']: 
     row_val = df.loc[idx][col] 
     to_keep[identifier].append(row_val)

这给我这个：

{1: [100, 101, 102], 2: [200, 201, 202], 3: [300, 301, 302]}

我可以很容易地写到一个文件。不过，我正在努力在本土熊猫中做到这一点。我猜想这似乎换位会比较简单，但我在努力...

来源

2017-09-26 blacksite

好吧，如果你还没有准备好，设置id作为索引：

>>> df 
    id other_id_1 other_id_2 other_id_3 
0 1   100   101   102 
1 2   200   201   202 
2 3   300   301   302 
>>> df.set_index('id', inplace=True) 
>>> df 
    other_id_1 other_id_2 other_id_3 
id 
1   100   101   102 
2   200   201   202 
3   300   301   302

然后，您可以简单地使用pd.concat ：

>>> df = pd.concat([df[col] for col in df]) 
>>> df 
id 
1 100 
2 200 
3 300 
1 101 
2 201 
3 301 
1 102 
2 202 
3 302 
dtype: int64

如果你需要排序的值：

>>> df.sort_values() 
id 
1 100 
1 101 
1 102 
2 200 
2 201 
2 202 
3 300 
3 301 
3 302 
dtype: int64 
>>>

来源

2017-09-26 20:37:34

如果id不是指数，首次设置：

df = df.set_index('id') 

df 

    other_id_1 other_id_2 other_id_3 
id          
1   100   101   102 
2   200   201   202 
3   300   301   302

现在，调用pd.DataFrame构造。您必须使用np.repeat来平铺索引。

df_new = pd.DataFrame({'other_id' : df.values.reshape(-1,)}, 
         index=np.repeat(df.index, len(df.columns))) 
df_new 

    other_id 
id   
1  100 
1  101 
1  102 
2  200 
2  201 
2  202 
3  300 
3  301 
3  302

来源

2017-09-26 20:38:33

通过使用pd.wide_to_long：

pd.wide_to_long(df,'other_id_',i='id',j='drop').reset_index().drop('drop',axis=1).sort_values('id') 
    Out[36]: 
     id other_id_ 
    0 1  100 
    3 1  101 
    6 1  102 
    1 2  200 
    4 2  201 
    7 2  202 
    2 3  300 
    5 3  301 
    8 3  302

或unstack

df.set_index('id').unstack().reset_index().drop('level_0',1).rename(columns={0:'other_id'}) 

Out[43]: 
    id other_id 
0 1  100 
1 2  200 
2 3  300 
3 1  101 
4 2  201 
5 3  301 
6 1  102 
7 2  202 
8 3  302

来源

2017-09-26 20:42:14 Wen

我看你使用你的最爱。 ;-) –

@cᴏʟᴅsᴘᴇᴇᴅ是的..只是想让更多的人注意到这个功能... – Wen

一个以上（或者更确切地说，二）:)

pd.melt(df, id_vars='id', value_vars=['other_id_1', 'other_id_2', 'other_id_3'], value_name='other_id')\ 
.drop('variable', 1).sort_values(by = 'id')

选项2：

种

df.set_index('id').stack().reset_index(1,drop = True).reset_index()\ 
.rename(columns = {0:'other_id'})

两种方式你

id other_id 
0 1 100 
1 1 101 
2 1 102 
3 2 200 
4 2 201 
5 2 202 
6 3 300 
7 3 301 
8 3 302

来源

2017-09-26 20:49:48 Vaishali

在数据帧拼合多列一列

回答

相关问题