-1
我想在R中的语料库中进行预处理,并且我需要删除以$开头的单词。下面的代码删除$但不是$字,我很困惑。如何删除以R开头的语料库中的单词?
inspect(data.corpus1[1:2])
# <<SimpleCorpus>>
# Metadata: corpus specific: 1, document level (indexed): 0
# Content: documents: 2
#
# [1] $rprx loading mid .60's, think potential. 12m vol fri already 11m today
# [2] members report success see track record $itek $rprx $nete $cnet $zn $cwbr $inpx
removePunctWords <- function(x) {
gsub(pattern = "\\$", "", x)
}
data.corpus1 <-
tm_map(data.corpus1,
content_transformer(removePunctWords))
inspect(data.corpus1[1:2])
# <<SimpleCorpus>>
# Metadata: corpus specific: 1, document level (indexed): 0
# Content: documents: 2
#
# [1] rprx loading mid .60's, think potential. 12m vol fri already 11m today
# [2] members report success see track record itek rprx nete cnet zn cwbr inpx
我不是最好的正则表达式,但也许是“。”?例如:'gsub(pattern =“\\ $。*”,“”,x)'? – shea
@shea在第一个$之后会杀死所有的东西。您只需要消除$和立即出现的单词字符。 – G5W
@ G5W感谢您解释。我不知道这个“*”会是那么贪婪。 – shea