2017-06-19 113 views
-1

假设我有一个熊猫数据帧,它看起来像这样的事情:熊猫:如何计算每一行中各个单词的数据帧

sentences 
['this', 'is', 'a', 'sentence', 'and', 'this', 'one', 'as', 'well'] 
['this', 'is', 'another', 'sentence', 'and', 'this', 'sentence', 'looks', 'like', 'other', 'sentences'] 

我试图计算每个每个单词的计数行,并以一种我可以在需要时轻松使用它的方式存储它们。到目前为止,我失败了,我会很感激一些帮助。

谢谢!

+0

您是否尝试过使用df.column_name [。 value_counts()](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html)? – Tbaki

回答

0

您可以使用CounterDataFrame构造,但对遗漏值获得NaNs

from collections import Counter 

print (type(df.loc[0, 'sentences'])) 
<class 'list'> 

df1 = pd.DataFrame([Counter(x) for x in df['sentences']]) 
print (df1) 
    a and another as is like looks one other sentence sentences \ 
0 1.0 1  NaN 1.0 1 NaN NaN 1.0 NaN   1  NaN 
1 NaN 1  1.0 NaN 1 1.0 1.0 NaN 1.0   2  1.0 

    this well 
0  2 1.0 
1  2 NaN 

如果需要更换NaNs0添加DataFrame.fillna

df1 = pd.DataFrame([Counter(x) for x in df['sentences']]).fillna(0).astype(int) 
print (df1) 
    a and another as is like looks one other sentence sentences \ 
0 1 1  0 1 1  0  0 1  0   1   0 
1 0 1  1 0 1  1  1 0  1   2   1 

    this well 
0  2  1 
1  2  0 
+0

感谢您的迅速响应!如果不按字母顺序重新排列,可以这样做吗? – emreorta

+0

不幸的是,因为'DataFrame'构造函数对它进行排序:( – jezrael

+0

呃,好像我们不能拥有所有东西:D再次感谢! – emreorta