使用'tm'包删除@

我有一个推文语料库，其中一些人有我想删除的@mentions，我使用tm package的tm_map函数，但没有得到想要的结果。这里有一个例子：使用'tm'包删除@

tweetscorrected[[1]]$content 
>@abc thank you for the treat 
tweetmentionsremoved<- tm_map(tweetscorrected, removeWords, "@\\w+") 
tweetmentionsremoved[[1]]$content 
>@abc thank you for the treat 
tweetmentionsremoved<- tm_map(tweetscorrected, removeWords, "y\\w+") 
>@abc thank for the treat 
tweetmentionsremoved<- tm_map(tweetscorrected, removeWords, "a\\w+") 
>@ thank you for the treat

因此，我所看到的是任何“字母表”是越来越正确地删除，但“@”条件确实没有变化的。但是我想删除@abc，@xyz基本上任何以@开头的“单词”。

任何帮助，高度赞赏。

来源

2016-03-03 Anurag H

它看起来并不像['tm'（HTTP：//www.inside-r。 org/packages/cran/tm/docs/tm_map）支持正则表达式作为第三个参数。你需要得到的最终输出是什么？为什么不使用'gsub'去除所有'\\ B @ \\ w +'？请在问题主体中添加一些可重复的代码。 –

如果我在语料库上使用'gsub'，尽管它可以工作，但它会使结构混乱 –

如果我在语料库上使用'gsub'，尽管它起作用，但它使结构有点像''list（内容= \“谢谢你的处理https：//./nkzy606vcC#clv #analytics https：//./fsbwd03m8x \”，meta = list（author = character（0），datetimestamp = list（sec = 51.526330947876，min = 8，hour = 9，mday = 3，mon = 2，year = 116，wday = 4，yday = 62，isdst = 0），description = character（0），heading = character（0），id = \“ 12 \“，language = \”en \“，origin = character（0）））''使用'twitteR'来读取数据并读入语料库 –

从Wiktor的Stribiżew一种方式考虑的线索来解决这将是

for(j in seq(stripwhitespacetweets)) 
    { 
    removementions[[j]] <- gsub("@\\w+", " ", stripwhitespacetweets[[j]]) 
    }

来源：https://rstudio-pubs-static.s3.amazonaws.com/31867_8236987cf0a8444e962ccd2aec46d9c3.html

来源

2016-03-03 14:22:03

使用'tm'包删除@

回答

相关问题