2016-11-29 232 views
0

我想将包含列表(具有可变长度的元素)的矩阵转换为稀疏矩阵。这是一个玩具例子:将列表转换为稀疏矩阵

mOrig = matrix(
    c(rep(c('a_b', 'X'), 3), 
    rep(c('a_b_e', 'X'), 2), 
    rep(c('a_b_f', 'X'), 1), 
    rep(c('c_d', 'Y'), 3), 
    rep(c('c_d_e', 'Y'), 2), 
    rep(c('c_d_f', 'Y'), 1)), 
    ncol=2, byrow=TRUE 
) 
colnames(mOrig) = c('in', 'out') 
mOrig 

     in  out 
[1,] "a_b" "X" 
[2,] "a_b" "X" 
[3,] "a_b" "X" 
[4,] "a_b_e" "X" 
[5,] "a_b_e" "X" 
[6,] "a_b_f" "X" 
[7,] "c_d" "Y" 
[8,] "c_d" "Y" 
[9,] "c_d" "Y" 
[10,] "c_d_e" "Y" 
[11,] "c_d_e" "Y" 
[12,] "c_d_f" "Y" 

输出矩阵应该是这样的:

 a b c d e f X Y 
[1,] 1 1 0 0 0 0 1 0 
[2,] 1 1 0 0 0 0 1 0 
[3,] 1 1 0 0 0 0 1 0 
[4,] 1 1 0 0 1 0 1 0 
[5,] 1 1 0 0 1 0 1 0 
[6,] 1 1 0 0 0 1 1 0 
[7,] 0 0 1 1 0 0 0 1 
[8,] 0 0 1 1 0 0 0 1 
[9,] 0 0 1 1 0 0 0 1 
[10,] 0 0 1 1 1 0 0 1 
[11,] 0 0 1 1 1 0 0 1 
[12,] 0 0 1 1 0 1 0 1 

我靠近一个解决方案,但现在看起来完全低效unique(unlist(strsplit()))for循环等。有谁知道一些有效的解决方案,例如,将利用来自Matrix包的sparseMatrix(或sparse.model.matrix)?

非常感谢!

+0

尝试'库(qdapTools); cbind(mtabulate(strsplit(mOrig [,1],“_”)),X = rep(c(1,0),c(6,6)),Y = rep(c(0,1),c 6,6)))' – akrun

回答

0

写入稀疏矩阵的最快方法之一似乎是使用myMatrix[matrix] <- value的形式。这在下面使用,连同lapply和strsplit。

library(Matrix) 

mx <- Matrix(0,12,8, dimnames = list(NULL, c(letters[1:6], LETTERS[24:25]))) 

mOrig_split <- strsplit(mOrig[,'in'], '_') 

long_fm <- do.call(rbind, lapply(seq_along(mOrig_split), function(x) { 
    cbind(x,c(mOrig_split[[x]], mOrig[x,2]))})) 

mx[cbind(as.numeric(long_fm[,1]), match(long_fm[,2], colnames(mx)))] <- 1 

mx 

这可能是稍快做匹配的前期,从数字保存转换为字符和背部:

mx <- Matrix(0,12,8, dimnames = list(NULL, c(letters[1:6], LETTERS[24:25]))) 

mOrig_split <- lapply(strsplit(mOrig[,'in'], '_'), match, colnames(mx)) 
mOrig_out <- match(mOrig[,2], colnames(mx)) 

long_fm <- do.call(rbind, lapply(seq_along(mOrig_split), function(x) { 
    cbind(x,c(mOrig_split[[x]], mOrig_out[x]))})) 

mx[long_fm] <- 1