R中的n-克错误：无效的“时间”参数

我正在尝试关注this example，但遇到了错误。R中的n-克错误：无效的“时间”参数

> library("RWeka") 
> library("tm") 
Loading required package: NLP 
> data("crude") 
> BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 2)) 
> tdm <- TermDocumentMatrix(crude, control = list(tokenize = BigramTokenizer)) 
Error in rep(seq_along(x), sapply(tflist, length)) : 
    invalid 'times' argument 
In addition: Warning message: 
In mclapply(unname(content(x)), termFreq, control) : 
    scheduled core 1 encountered error in user code, all values of the job will be affected

任何想法？

来源

2016-07-27 geotheory

只需使用一些更好的/现代的包装。我可以提出几种选择：

使用text2vec而不是tm。例子参见vignettes。（我是作者）。
值得检查quanteda
如果出于某些你喜欢tm原因，尽量tokenizers包更换RWeka NGRAM分词器。

来源

2016-08-02 10:00:14

这正是我所追求的，真是令人震撼的C++速度！ – geotheory

R中的n-克错误：无效的“时间”参数

回答

相关问题