熊猫：如何计算每一行中各个单词的数据帧

-1

假设我有一个熊猫数据帧，它看起来像这样的事情：熊猫：如何计算每一行中各个单词的数据帧

sentences 
['this', 'is', 'a', 'sentence', 'and', 'this', 'one', 'as', 'well'] 
['this', 'is', 'another', 'sentence', 'and', 'this', 'sentence', 'looks', 'like', 'other', 'sentences']

我试图计算每个每个单词的计数行，并以一种我可以在需要时轻松使用它的方式存储它们。到目前为止，我失败了，我会很感激一些帮助。

谢谢！

来源

2017-06-19 emreorta

您是否尝试过使用df.column_name [。 value_counts（）]（https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html）？ – Tbaki

您可以使用Counter与DataFrame构造，但对遗漏值获得NaNs：

from collections import Counter 

print (type(df.loc[0, 'sentences'])) 
<class 'list'> 

df1 = pd.DataFrame([Counter(x) for x in df['sentences']]) 
print (df1) 
    a and another as is like looks one other sentence sentences \ 
0 1.0 1  NaN 1.0 1 NaN NaN 1.0 NaN   1  NaN 
1 NaN 1  1.0 NaN 1 1.0 1.0 NaN 1.0   2  1.0 

    this well 
0  2 1.0 
1  2 NaN

如果需要更换NaNs到0添加DataFrame.fillna：

df1 = pd.DataFrame([Counter(x) for x in df['sentences']]).fillna(0).astype(int) 
print (df1) 
    a and another as is like looks one other sentence sentences \ 
0 1 1  0 1 1  0  0 1  0   1   0 
1 0 1  1 0 1  1  1 0  1   2   1 

    this well 
0  2  1 
1  2  0

来源

2017-06-19 08:36:18 jezrael

感谢您的迅速响应！如果不按字母顺序重新排列，可以这样做吗？ – emreorta

不幸的是，因为'DataFrame'构造函数对它进行排序:( – jezrael

呃，好像我们不能拥有所有东西：D再次感谢！ – emreorta

熊猫：如何计算每一行中各个单词的数据帧

回答

相关问题