2017-05-08 122 views
1

我正在使用quanteda软件包生成词频计数的稀疏矩阵。我想做一个改变,所以输出结果只是1或0,这个词是否存在,但我不知道如何用稀疏矩阵做到这一点。在dfm稀疏矩阵中替换值

install.packages(quanteda) 

例矩阵

trainingset <- as.dfm(matrix(c(1, 2, 0, 0, 0, 0, 
        0, 2, 0, 0, 1, 0, 
        0, 1, 0, 1, 0, 0, 
        0, 1, 1, 0, 0, 1, 
        0, 3, 1, 0, 0, 1), 
        ncol=6, nrow=5, byrow=TRUE, 
        dimnames = list(docs = paste("d", 1:5, sep = ""), 
            features = c("Beijing", "Chinese", "Japan", "Macao", 
               "Shanghai", "Tokyo")))) 

回答

1

如果你看看str(trainingset)你可以看到矩阵的插槽。与稀疏矩阵一样,x插槽保存数据,因此您可以将其更改为二进制使用

[email protected] <- as.numeric([email protected] > 0) 

Document-feature matrix of: 5 documents, 6 features (60% sparse). 
5 x 6 sparse Matrix of class "dfmSparse" 
    features 
docs Beijing Chinese Japan Macao Shanghai Tokyo 
    d1  1  1  0  0  0  0 
    d2  0  1  0  0  1  0 
    d3  0  1  0  1  0  0 
    d4  0  1  1  0  0  1 
    d5  0  1  1  0  0  1