NLTK情绪维达：排序

我刚刚运行我的数据集维德情感分析：NLTK情绪维达：排序

from nltk.sentiment.vader import SentimentIntensityAnalyzer 
from nltk import tokenize 
sid = SentimentIntensityAnalyzer() 
for sentence in filtered_lines2: 
    print(sentence) 
    ss = sid.polarity_scores(sentence) 
    for k in sorted(ss): 
     print('{0}: {1}, '.format(k, ss[k]),) 
     print()

这里我的结果的一个样本：我想

Are these guests on Samsung and Google event mostly Chinese Wow Theyre 
boring 

Google Samsung 

('compound: 0.3612, ',) 

() 

('neg: 0.12, ',) 

() 


('neu: 0.681, ',) 


() 


('pos: 0.199, ',) 


() 

Adobe lose 135bn to piracy Report 


('compound: -0.4019, ',) 


() 


('neg: 0.31, ',) 


() 


('neu: 0.69, ',) 


() 


('pos: 0.0, ',) 


() 

Samsung Galaxy Nexus announced 

('compound: 0.0, ',) 

() 

('neg: 0.0, ',) 

() 

('neu: 1.0, ',) 

() 

('pos: 0.0, ',) 

()

知道有多少次“化合物”等于，大于或小于零。

我知道这可能很简单，但我对Python和编码一般都很陌生。我尝试了很多不同的方式来创建我需要的东西，但我找不到任何解决方案。

（请编辑我的问题，如果“成绩的样品”是不正确的，因为我不知道写的正确方法）

来源

2016-09-29 Luca Perinati

看起来你正在编写Python 3的代码，但与Python 2运行（这无关你的问题，但可能让你最终陷入困境）。 – lenz

谢谢你的建议！ –

到目前为止，这并不是最pythonic的做法，但我认为这将是最容易理解的，如果你没有太多的python经验。本质上，你创建一个有0值的字典并在每个案例中增加值。

from nltk.sentiment.vader import SentimentIntensityAnalyzer 
from nltk import tokenize 
sid = SentimentIntensityAnalyzer() 
res = {"greater":0,"less":0,"equal":0} 
for sentence in filtered_lines2: 
    ss = sid.polarity_scores(sentence) 
    if ss["compound"] == 0.0: 
     res["equal"] +=1 
    elif ss["compound"] > 0.0: 
     res["greater"] +=1 
    else: 
     res["less"] +=1 
print(res)

来源

2016-09-29 12:00:44

我认为这是非常pythonic。毕竟，Python只是简单易懂而已！对于一个简单的问题，不需要复杂的解决方案。 – lenz

@lenz我完全同意。但是作为Python的for循环可以通过3行代码实现（至少乍一看）。 –

谢谢，我认为这是最简单的方法，它完美的工作！ –

您可以使用一个简单的计数器为每个类：

positive, negative, neutral = 0, 0, 0

然后，句子循环内，测试该化合物的价值和增加相应的计数器：

... 
    if ss['compound'] > 0: 
     positive += 1 
    elif ss['compound'] == 0: 
     neutral += 1 
    elif ...

等

来源

2016-09-29 11:58:38 lenz

我可能会返回多数民众赞成由一个文件表示不平等的类型的函数：

def inequality_type(val): 
    if val == 0.0: 
     return "equal" 
    elif val > 0.0: 
     return "greater" 
    return "less"

然后在所有的句子化合物分数用这个来增加相应的计不平等类型。

from collections import defaultdict 

def count_sentiments(sentences): 
    # Create a dictionary with values defaulted to 0 
    counts = defaultdict(int) 

    # Create a polarity score for each sentence 
    for score in map(sid.polarity_scores, sentences): 
     # Increment the dictionary entry for that inequality type 
     counts[inequality_type(score["compound"])] += 1 

    return counts

然后，您可以在您的过滤行上调用它。

然而，这可以通过只使用collections.Counter被省略：

from collections import Counter 

def count_sentiments(sentences): 
    # Count the inequality type for each score in the sentences' polarity scores 
    return Counter((inequality_type(score["compound"]) for score in map(sid.polarity_scores, sentences)))

来源

2016-09-29 12:13:55 erip

'collections.Counter'使第二步变得微不足道。 – alexis

@alexis是的，非常好的一点！将补充说。 – erip

@erip非常感谢。它工作得很好！但我认为Alex的解决方案对于像我这样的人来说更容易理解和使用，从而迈出了编码的第一步。 –

NLTK情绪维达：排序

回答

相关问题