R词典在词典中的情感分析

我对一组推特进行了情感分析，我现在想知道如何将词组添加到正面和负面词典。R词典在词典中的情感分析

我已经阅读过我想测试的短语文件，但是在运行情感分析时，它并没有给我一个结果。

在通过情感算法进行阅读时，我可以看到它将单词与词典匹配，但有没有方法可以扫描单词以及短语？

下面是代码：

score.sentiment = function(sentences, pos.words, neg.words, .progress='none') 
{ 
    require(plyr) 
    require(stringr) 
    # we got a vector of sentences. plyr will handle a list 
    # or a vector as an "l" for us 
    # we want a simple array ("a") of scores back, so we use 
    # "l" + "a" + "ply" = "laply": 
    scores = laply(sentences, function(sentence, pos.words, neg.words) { 
    # clean up sentences with R's regex-driven global substitute, gsub(): 
    sentence = gsub('[[:punct:]]', '', sentence) 
    sentence = gsub('[[:cntrl:]]', '', sentence) 
    sentence = gsub('\\d+', '', sentence)  
    # and convert to lower case:  
    sentence = tolower(sentence)  
    # split into words. str_split is in the stringr package  
    word.list = str_split(sentence, '\\s+')  
    # sometimes a list() is one level of hierarchy too much  
    words = unlist(word.list)  
    # compare our words to the dictionaries of positive & negative terms 
    pos.matches = match(words, pos) 
    neg.matches = match(words, neg) 
    # match() returns the position of the matched term or NA  
    # we just want a TRUE/FALSE:  
    pos.matches = !is.na(pos.matches) 
    neg.matches = !is.na(neg.matches) 
    # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum(): 
    score = sum(pos.matches) - sum(neg.matches)  
    return(score)  
    }, pos.words, neg.words, .progress=.progress) 
    scores.df = data.frame(score=scores, text=sentences) 
    return(scores.df) 
} 
analysis=score.sentiment(Tweets, pos, neg) 
table(analysis$score)

这是结果我得到：

0 
20

，而我的标准表后，该功能提供例如

-2 -1 0 1 2 
1 2 3 4 5

例如。

有没有人有关于如何在短语上运行此任何想法？注意：TWEETS文件是一个句子文件。

来源

2015-09-04 L. Natalka

不知道，但我想你可能意味着lapply而不是laply？ – dd3

@ dd3它是从plyr包裹中重叠的，而不是从基地的lapply。 – WhiteViking

我是R的初学者。你在这里做什么“进展”？好像你没有在你的功能中使用它？ – alwaysaskingquestions

功能score.sentiment似乎工作。如果我尝试一个非常简单的设置，

Tweets = c("this is good", "how bad it is") 
neg = c("bad") 
pos = c("good") 
analysis=score.sentiment(Tweets, pos, neg) 
table(analysis$score)

我得到预期的结果，

> table(analysis$score) 

-1 1 
1 1

你是如何喂养20个鸣叫的方法？根据你发布的结果，那0 20，我想说你的问题是你的20条推文没有任何正面或负面的词，尽管当然这是你会注意到的。也许如果你在你的推文列表上发布更多的细节，你的正面和负面的话会更容易帮助你。

无论如何，你的功能似乎工作得很好。

希望它有帮助。

通过评论澄清后编辑：

其实，解决你的问题，你需要你的句子记号化到n-grams，其中n将对应于您正在使用您的肯定列表和文字的最大数量负数n-grams。你可以看到如何做到这一点，例如在this SO question。为了完整性，并且由于我自己测试了它，下面是您可以做的一个示例。我它简化到bigrams（N = 2），并使用以下输入：

Tweets = c("rewarding hard work with raising taxes and VAT. #LabourManifesto", 
       "Ed Miliband is offering 'wrong choice' of 'more cuts' in #LabourManifesto") 
pos = c("rewarding hard work") 
neg = c("wrong choice")

可以创建一个两字组标记生成器像这样，

library(tm) 
library(RWeka) 
BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min=2,max=2))

并对其进行测试，

> BigramTokenizer("rewarding hard work with raising taxes and VAT. #LabourManifesto") 
[1] "rewarding hard"  "hard work"   "work with"   
[4] "with raising"   "raising taxes"  "taxes and"   
[7] "and VAT"    "VAT #LabourManifesto"

然后在你的方法中，你简单地用这条线代替，

word.list = str_split(sentence, '\\s+')

本

word.list = BigramTokenizer(sentence)

虽然当然，如果你改变word.list到ngram.list或类似的东西，它会更好。

结果是，正如预期，

> table(analysis$score) 

-1 0 
1 1

只是决定你n-gram大小并将其添加到Weka_control，你应该罚款。

希望它有帮助。

来源

2015-09-06 17:37:54 lrnzcig

@Irnczig。我设法让score.sentiment与我的正面和负面词典一起工作，但是如果我想补充一下，例如，对词典来说“好”和“有多糟糕”，而不仅仅是“坏”和“好” “你会知道如何工作吗？ –

例如，以下推文：[[[“提高税收和增值税的努力工作#LabourManifesto”，“Ed Miliband在#LabourManifesto中提供'更多削减'的'错误选择'。]]]字典，我想为积极的“奖励辛勤工作”，“提高税收”，“更多削减”负面。我运行情绪，它分裂了这些短语。 –

好的，理解。让我看一看。 – lrnzcig

R词典在词典中的情感分析

回答

相关问题