在R语料库中搜索以“esque”结尾的所有单词

我使用R的tm包使用字典方法获取单词频率。我希望找到以“esque”结尾的所有单词，无论它们拼写为“abcd-esque”，“abcdesque”还是“abcd esque”（因为我的语料库中存在所有不同的拼写）。我如何为此创建正则表达式？这是我迄今为止所拥有的。任何帮助/提示将不胜感激。在R语料库中搜索以“esque”结尾的所有单词

text <- Corpus(DirSource("txt/")) 
text <- tm_map(text,tolower) 
text <- tm_map(text,stripWhitespace) 
dtm.text <- DocumentTermMatrix(text) 
list<-inspect(
    DocumentTermMatrix(text,list(dictionary = c("rose", "green", "esque"))) 
)

来源

2014-12-19 torentino

'grep的（ “式的$”，X）'？ – thelatemail 2014-12-19 03:41:27

inspect(dtm.text[, grepl("esque$", dtm.text$dimnames$Terms)])

作为一个侧面说明tolower不会随着tm当前版本。您应该使用contetn_transformer代替：

tm_map(text, content_transformer(tolower))

来源

2014-12-19 03:43:05 zero323

谢谢大家。这个解决了这个问题。 – torentino 2014-12-19 03:58:04

words = c("rose", "green", "esque", "abcd-esque", "abcdesque", "abcd esque") 
grep("esque$", words)

来源

2014-12-19 03:52:29 user51855

在R语料库中搜索以“esque”结尾的所有单词

回答

相关问题