0
我是tm
程序包的新手,尝试应用TermDocumentMatrix
函数时遇到了障碍。在tm包中创建TermDocumentMatrix时出错
我用下面的代码,直到函数调用失败:
myCorpus <- Corpus(VectorSource(posts$message))
myCorpus <- tm_map(myCorpus, content_transformer(tolower))
myCorpus <- tm_map(myCorpus, removePunctuation)
myCorpus <- tm_map(myCorpus, removeNumbers)
removeURL <- function(x) gsub("http[[:alnum:]]*", "", x)
myCorpus <- tm_map(myCorpus, removeURL)
myStopwords <- c(stopwords("english"))
myCorpus <- tm_map(myCorpus, removeWords, myStopwords)
myCorpusCopy <- myCorpus
myCorpus <- tm_map(myCorpus, stemDocument)
经检验它好像文档列表是它应该是什么:
> for(i in 1:5) {
+ cat(paste("[[", i, "]] ", sep =""))
+ writeLines(myCorpus[[i]])
+ }
[[1]] syntel recruitment drive week freshers newregistrationlink passout graduates
qualification graduatebebtechmcamemtech
syntel registration link
limited referrals available
comment emailids reference future job upd
[[2]] dont miss opportunity get placed one best mnc companies world ebay freshers week january
qualification graduate can apply
ebay registration link
comment emailids fast beacuse referrals left
[[3]] recent passouts eligible apply wipro go updated link lastday reference drive jan apply link fresher referral
apply link
go link apply asap
[[4]] robertbosch recruitment drive week freshers newregistrationlink passout graduates
qualification graduatebebtechmcamemtech
robertbosch registration link
limited referrals available
comment emailids reference future job upd
[[5]] mega job openings year
mphasis recruitment freshers january
qualification btech bsc bca graduates mca mba mtech post graduates
mphasis registration link
comment emailids comment box reference future job updates emailbox
然而,在创建之后一个完整的语料库副本,问题就出现了。
myCorpus <- tm_map(myCorpus, stemCompletion,
dictionary = myCorpusCopy, lazy = TRUE)
> tdm <- TermDocumentMatrix(myCorpus, control = list(wordLengths = c(1, Inf)))
Error in UseMethod("meta", x) :
no applicable method for 'meta' applied to an object of class "try-error"
In addition: Warning messages:
1: In mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) :
all scheduled cores encountered errors in user code
2: In mclapply(unname(content(x)), termFreq, control) :
all scheduled cores encountered errors in user code
解决方法的任何想法?