我的熊猫数据框中的一列包含一个列表。我想扩展它并转换下面的垂直形状。如何做到这一点？如何将列中的列表转换为垂直形状？

之前（代码）：

import pandas as pd 
pd.DataFrame({ 
    'col1':['fruit', 'veicle', 'animal'], 
    'col2':['apple', 'bycicle', 'cat'], 
    'col3':[1,4,2], 
    'list':[ 
     [10, 20], 
     [1.2, 3.0, 2.75], 
     ['tommy', 'tom'] 
    ] 
})

之前（表）：

|col1 |col2 |col3|list   | 
    |------|-------|----|----------------| 
    |fruit |apple | 1|[10, 20]  | 
    |veicle|bicycle| 4|[1.2, 3.0, 2.75]| 
    |animal|cat | 2|['tommy', 'tom']|

注1后

|col1 |col2 |col3|list | 
    |------|-------|----|-------| 
    |fruit |apple | 1|10  | 
    |fruit |apple | 1|20  | 
    |viecle|bycicle| 4|1.2 | 
    |viecle|bycicle| 4|3.0 | 
    |viecle|bycicle| 4|2.75 | 
    |animal|cat | 2|'tommy'| 
    |animal|cat | 2|'tom |

：列表的长度和类型是不同的。

注2：我可以不是修改生成datafarme的代码。

谢谢您的阅读。

来源

2017-08-27 AkiraIsaka

的[爆炸与熊猫不同长度的列表]可能的复制（https://stackoverflow.com/questions/45885143/爆炸列表与不同长度的熊猫） – Wen

之前问你可以简单地谷歌它，https://stackoverflow.com/questions/45885143/explode-lists-with-different-lengths-in-pandas/45886206 ＃45886206 – Wen

谢谢你有用的链接，请原谅我发布重复的问题。我仔细搜索了Google，但我找不到那篇文章。 – AkiraIsaka

学到PIR这个凉爽的伎俩有一天，使用np.repeat和np.concatenate：

idx = np.arange(len(df)).repeat(df.list.str.len(), 0)  
out = df.iloc[idx, :-1].assign(list=np.concatenate(df.list.values)) 
print(out) 

    col1  col2 col3 list 
0 fruit apple  1  10 
0 fruit apple  1  20 
1 veicle bycicle  4 1.2 
1 veicle bycicle  4 3.0 
1 veicle bycicle  4 2.75 
2 animal  cat  2 tommy 
2 animal  cat  2 tom

性能

小

# Bharath 
%timeit df.set_index(['col1','col2','col3']['list'].apply(pd.Series).stack()\ 
       .reset_index().drop('level_3',axis=1) 
100 loops, best of 3: 7.75 ms per loop 

# Mine 
%%timeit 
idx = np.arange(len(df)).repeat(df.list.str.len(), 0)  
out = df.iloc[idx, :-1].assign(list=np.concatenate(df.list.values))  
1000 loops, best of 3: 1.41 ms per loop

大

df_test = pd.concat([df] * 10000) 

# Bharath 
%timeit df_test.set_index(['col1','col2','col3'])['list'].apply(pd.Series).stack()\ 
       .reset_index().drop('level_3',axis=1) 
1 loop, best of 3: 7.09 s per loop 

# Mine 
%%timeit 
idx = np.arange(len(df_test)).repeat(df_test.list.str.len(), 0)  
out = df_test.iloc[idx, :-1].assign(list=np.concatenate(df_test.list.values)) 
10 loops, best of 3: 123 ms per loop

作为1套，巴拉斯的答案是矮，但速度缓慢。下面是一个使用数据帧的构造函数，而不是df.apply对大数据的200倍加速改进：

idx = df.set_index(['col1', 'col2', 'col3']).index 
out = pd.DataFrame(df.list.values.tolist(), index=idx).stack()\ 
       .reset_index().drop('level_3', 1).rename(columns={0 : 'list'}) 

print(out) 

    col1  col2 col3 list 
0 fruit apple  1  10 
1 fruit apple  1  20 
2 veicle bycicle  4 1.2 
3 veicle bycicle  4  3 
4 veicle bycicle  4 2.75 
5 animal  cat  2 tommy 
6 animal  cat  2 tom

小

100 loops, best of 3: 4.7 ms per loop

大

10 loops, best of 3: 28.9 ms per loop

来源

2017-08-27 14:35:15

Numpy非常快。它很难打败一个不起眼的答案。 – Dark

@Bharathshetty是的，但我没想到熊猫会这么慢。 –

我用过。所以是的，它有点慢。我认为应用总是杀死一点表现。 – Dark

可以set_index前三列的和然后将pd.Series应用于列表的列，然后堆叠它们。

df.set_index(['col1','col2','col3'])['list'].apply(pd.Series).stack().reset_index().drop('level_3',axis=1)

输出：

 
    col1  col2 col3  0 
0 fruit apple 1  10 
1 fruit apple 1  20 
2 veicle bycicle 4  1.2 
3 veicle bycicle 4  3  
4 veicle bycicle 4  2.75 
5 animal cat  2  tommy 
6 animal cat  2  tom

来源

2017-08-27 14:41:38 Dark

增加了一些时间安排：https://stackoverflow.com/a/45906100/4909087 –

这里大约是如何完成这个任务。这不是精确解，但你如何完成你的任务会给你一个想法：

original_df = <your dataframe to start> 
new_empty_df = pd.DataFrame() 
# now go through each row of the original df 
for i in range(original_df.shape[0]): 
    row_Series = original_df.iloc[i] 
    row_list = row_Series['list'] 
    for item in row_list: 
     new_empty_df.append({'col1':row_Series['col1'], 
           'col2':row_Series['col2'], 
           'list':item})

来源

2017-08-27 14:58:58 Heapify

如何将列中的列表转换为垂直形状？

回答

小

大

小

大

相关问题