大熊猫含系列阵列

我有一个熊猫数据框柱看起来有点像：大熊猫含系列阵列

Out[67]: 
0  ["cheese", "milk... 
1  ["yogurt", "cheese... 
2  ["cheese", "cream"... 
3  ["milk", "cheese"...

现在，最终我想这是一个平坦的列表，但在试图拉平这个，我注意到，大熊猫对待["cheese", "milk", "cream"]作为str而非list

我将如何去压扁这使我结束了：

["cheese", "milk", "yogurt", "cheese", "cheese"...]

[编辑] 所以下面给出的答案似乎是：

s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"])

s = s.str.strip("[]") 
df = s.str.split(',', expand=True) 
df = df.applymap(lambda x: x.replace("'", '').strip()) 
l = df.values.flatten() 
print (l.tolist())

这是伟大的，问题解答，答案接受，但它给我的印象相当不雅的解决方案。

来源

2016-03-01 toast

的可能的复制[蟒蛇大熊猫展平数据帧到列表（http://stackoverflow.com/questions/25440008/python- pandas-flatten-a-dataframe-to-a-list） – soon

不，它不是重复的，因为列的类型是字符串而不是列表 – jezrael

您可以使用numpy.flatten然后平嵌套lists - see：

print df 
        a 
0 [cheese, milk] 
1 [yogurt, cheese] 
2 [cheese, cream] 

print df.a.values 
[[['cheese', 'milk']] 
[['yogurt', 'cheese']] 
[['cheese', 'cream']]] 

l = df.a.values.flatten() 
print l 
[['cheese', 'milk'] ['yogurt', 'cheese'] ['cheese', 'cream']] 

print [item for sublist in l for item in sublist] 
['cheese', 'milk', 'yogurt', 'cheese', 'cheese', 'cream']

编辑：

您可以尝试：

import pandas as pd 

s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"]) 

#remove [] 
s = s.str.strip('[]') 
print s 
0  'cheese', 'milk' 
1 'yogurt', 'cheese' 
2  'cheese', 'cream' 
dtype: object 

df = s.str.split(',', expand=True) 
#remove ' and strip empty string 
df = df.applymap(lambda x: x.replace("'", '').strip()) 
print df 
     0  1 
0 cheese milk 
1 yogurt cheese 
2 cheese cream 

l = df.values.flatten() 
print l.tolist() 
['cheese', 'milk', 'yogurt', 'cheese', 'cheese', 'cream']

来源

2016-03-01 11:59:58 jezrael

我认为'df.values.a中有一个错字.flatten（）'它应该是'df.a.values.flatten（）' – shanmuga

是的，你是对的。我纠正它。谢谢。 – jezrael

这只是为我打印每个单独的字母： 's = pd.Series（[“['cheese'，'milk']”，“['酸奶'，'奶酪']”，“'干酪' '）''））' 'l = s.values.flatten（）' 'print（[sublist中的item列表中的item列表]）' – toast

从STR转换列值，列出你可以使用df.columnName.tolist()和压扁你可以做df.columnName.values.flatten()

来源

2016-03-01 11:59:49

您可以将Series转换成DataFrame，然后调用stack：

s.apply(pd.Series).stack().tolist()

来源

2016-03-01 12:27:28 Colin

这会返回一个包含['milk'，'cheese']'s = pd.Series（[“['cheese'，'milk']”）的字符串列表， “（'酸奶'，'奶酪']”，“['奶酪'，'奶油']”]）' 's.apply（pd.Series）.stack（）。tolist（）' – toast

从原始描述中，我认为这是'Series'的类型是字符串列表：'s2 = pd.Series（[['cheese'，'milk']，['yogurt'，'cheese']，['cheese '，'cream']]）'，在这种情况下's2.apply（pd.Series）.stack（）。tolist（）'应该工作。如果'Series'的类型是一个表示字符串列表的字符串，那么可以添加一个eval：'s.apply（lambda x：pd.Series（eval（x）））。stack（）。tolist（）' – Colin

大熊猫含系列阵列

回答

相关问题