如何在R中的文本中查找特定的句子？

我有一个数据集，有很多人提供自己的工作。重点是，我想从每个评论中检索一些非常特定的句子，我有一个.txt文件。到目前为止，我还没有设法做到这一点。如何在R中的文本中查找特定的句子？

score.sentiment <- function(sentences, pos.words, .progress='none') 
{ 
    require(plyr) 
    require(stringr) 
    scores <- laply(sentences, function(sentence, pos.words){ 
sentence <- gsub('[[:punct:]]', "", sentence) 
    sentence <- gsub('[[:cntrl:]]', "", sentence) 
    sentence <- gsub('\\d+', "", sentence) 
    sentence <- tolower(sentence) 
    word.list <- str_split(sentence, '\\s+') 
    words <- unlist(word.list) 
    pos.matches <- match(words, pos.words) 
    score <- pos.matches 
    return(score) 
    }, pos.words, .progress=.progress) 
    scores.df <- data.frame(text=sentences) 
    return(scores.df) 
} 
results <- score.sentiment(sentences = serv$service_description, pos.words)

文本文件被称为pos.words，它包含在句子西班牙语这样的：

tengo 25 años 
tengo 47 años 
tengo 34 años

另一个文件包含一个变量，名为服务包含每人评论解释自己的能力，他们的教育等。而我想要做的就是从他们写的文字中获得他们的年龄。从服务文件

例子：

"Me llamo Adrián y tengo 24 años. He estudiado Data Science y me gusta trabajar en el sector tecnológico"

所以从这个示例中，我想获得我的年龄。到目前为止，我的想法是创建一个pos.words.txt文件，其中包含所有可能的西班牙文句子，说明年龄并将其与评论文件进行匹配。

到目前为止出现的主要问题是，我不能创建一个正确的功能来做到这一点;我不知道如何让R从pos.words.txt中识别整个句子，因为现在它将每个单词作为一个字符。除此之外，我在这里发布的一段代码解释了我的功能不起作用（暴徒的生活...）

我真的很感谢一些帮助解决这个问题！

非常感谢您的帮助！

阿德里安

来源

2016-04-26 adrian1121

这将是有益的，如果你能提供什么你输入一些重复性的例子txt文件和您正在搜索的txt文件看起来像是一旦它们被导入到R. – AOGSTA

阅读此指南以帮助指导您的可重现示例：http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-可重现的例子 - 如果您的代码格式一致，它也会有所帮助。 –

这种分裂成句子，抓住了`“TENGO A号”的最后一个实例：

inp <- "blah blah blah tengo 25 años more blah. 
    Even more blha then tengo 47 años. 
    Me llamo Adrián y tengo 34 años." 
rl <- readLines(textConnection(inp)) # might need to split at periods 
    # Then use a capture class to get the digits flanked by "tengo" and "años" 
gsub("^.+tengo[ ](\\d+)[ ]años.+$", "\\1", rl) 
[1] "25" "47" "34"

来源

2016-04-27 04:38:05

如何在R中的文本中查找特定的句子？

回答

相关问题