2017-02-12 100 views
1

我有2列(数据帧)CSV文件 列1包含了一句我爱香蕉wordcloud在Python中的CSV文件

和2列包含一个CLASSE我已经5 classes

我需要一个wordcloud每个类 其实每个类都对应每个类都有可能做到这一点吗? 它试试这个代码,但id不工作的数据集

text       classe 
i love banana     positive 
i hate banana     negetive 
maybe i love maybe no   neutral 
bit yes bit no    not_sure 
wooooooooooow     like_it 
+1

@MaxU是的,我修改说明 –

+0

@MaxU是的,我修改了描述 –

+0

是的,我注意到了。我目前正在学习'wordcloud'是如何工作的;;) – MaxU

回答

2

import matplotlib.pyplot as plt 
cloud = WordCloud(background_color="white", max_words=20, stopwords=stopwords) 
tuples = tuple([tuple(x) for x in df.Phrase.value_counts().reset_index().values]) 
a = cloud.generate_from_frequencies(tuples) 

plt.imshow(a) 
plt.axis("off") 
plt.title("a") 
plt.show() 

例如下面是一个类的实例:positive

假设我们有以下DF:

In [79]: df 
Out[79]: 
        text classe 
0   i love banana positive 
1    love apple positive 
2  love, love, love positive 
3   i hate banana negative 
4    it sucks negative 
5 maybe i love maybe no neutral 
6   bit yes bit no not_sure 
7   wooooooooooow like_it 

解决方案:

In [80]: %paste 
from wordcloud import WordCloud 
from nltk.corpus import stopwords 

cloud = WordCloud(background_color="white", max_words=20, stopwords=stopwords.words('english')) 

positive_cloud = cloud.generate(df.loc[df.classe == 'positive', 'text'].str.cat(sep='\n')) 
plt.figure() 
plt.imshow(positive_cloud) 
plt.axis("off") 
plt.show() 
## -- End pasted text -- 

结果:

enter image description here

几点说明:

单个class生成的文本:

In [81]: df.loc[df.classe == 'positive', 'text'].str.cat(sep='\n') 
Out[81]: 'i love banana\nlove apple\nlove, love, love'