我尝试与R创建一个函数,但我遇到了subDF框架的positive.ponderate.polarity列的问题。这些值不正确。 我认为probleme来自这些行:与R数据框的列中的错误
EDIT2:
if(any(unlist(strsplit(as.character(context), " ")) %in% booster_words))
{
subDF$positive.ponderate.polarity <- subDF$positive.polarity * 3
}
else
{
subDF$positive.ponderate.polarity <- subDF$positive.polarity/3
}
# calculate the total polarity of the sentence and store in the vector
polarity[i] <- sum(subDF$positive.ponderate.polarity) - sum(subDF$negative.polarity)
}
你能帮助我吗?
谢谢
### function to calculate the polarity of sentences
calcPolarity <- function(sentiment_DF,sentences){
booster_words <- c("more","enough", "a lot", "as")
# separate each sentence in words using regular expression
# (it returns a list with the words of each sentence)
sentencesSplitInWords <- regmatches(sentences,gregexpr("[[:word:]]+",sentences,perl=TRUE))
# pre-allocate the polarity result vector with size = number of sentences
polarity <- rep.int(0,length(sentencesSplitInWords))
for(i in 1:length(polarity)){
# get the i-th sentence words
wordsOfASentence <- sentencesSplitInWords[[i]]
# get the rows of sentiment_DF corresponding to the words in the sentence using match
# N.B. if a word occurs twice, there will be two equal rows
# (but I think it's correct since in this way you count its polarity twice)
subDF <- sentiment_DF[match(wordsOfASentence,sentiment_DF$word,nomatch = 0),]
# extract a context of 3 words before the word in the dataframe
context <- stringr::str_extract(sentences, "([^\\s]+\\s){3}subDF$word(\\s[^\\s]+){3}")
# check there is a words of the context in the booster_words list
if(any(unlist(strsplit(as.character(context), " ")) %in% booster_words))
{
subDF$positive.ponderate.polarity <- 1.12
}
else
{
subDF$positive.ponderate.polarity <- 14
}
# calculate the total polarity of the sentence and store in the vector
polarity[i] <- sum(subDF$positive.ponderate.polarity) - sum(subDF$negative.polarity)
}
return(polarity)
}
用法:
sentiment_DF <- data.frame(word=c('interesting','boring','pretty'),
positive.polarity=c(1,0,1),
negative.polarity=c(0,1,0))
sentences <- c("The course was interesting, but the professor was so boring.",
"stackoverflow is an interesting place with interesting people!")
result <- calcPolarity(sentiment_DF,sentences)
编辑
我期待这样的结果数据框:
word positive.polarity nagative.polarity positive.ponderate.polarity
interesting 1 0 1.12
boring 0 1 14
因为我预计有15.12(1.12 + 14) - 1 = 14.12
好的,谢谢你杰里米,我会测试它明天:) – Poisson
我,你与它的成功。 :-) – Jeremy
非常感谢Jeremy,很抱歉让你忙于解决我的问题。欢呼 – Poisson