找到一个矩阵

列的最佳组合，假设我有2000列的大型矩阵（matrix_1）。每个单元格的值为0或1.我想要找到10列的最佳组合。最佳组合给出每行非0值的最大数量。因此，它基本上提供了最大的找到一个矩阵

sum (apply (matrix_2, 1, function(x) any(x == 1)))

我不能去通过所有可能的组合，因为它是计算量太大（有2.758988e + 26）。有什么建议么？

举一个例子借此矩阵具有4行，我一次只

mat <- matrix (c(1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0), nrow = 4, byrow = FALSE) 
mat 
# combination of columns 2 and 3 is best: 3 rows with at least a single 1 value 
sum (apply (mat[, c(2, 3)], 1, function(x) any (x == 1))) 
# combination of columns 1 and 2 is worse: 2 rows with at least a single 1 value 
sum (apply (mat[, c(1, 2)], 1, function(x) any (x == 1)))

来源

2017-07-24 Pavel Shliaha

在你的矩阵有多少行？ – CPak

100-200行。取决于应用程序通过'colSums（COL）' –

你不能为了你的列并选择前10名？我问，因为我不是100％确定你想要什么，这有助于我更好地了解你在找什么。 – CPak

你可以使用这样的功能选择2列...

find10 <- function(mat,n=10){ 
    cols <- rep(FALSE,ncol(mat)) #columns to exclude 
    rows <- rep(TRUE,nrow(mat)) #rows to include 
    for(i in 1:n){ 
    colsums <- colSums(mat[rows,]) 
    colsums[cols] <- -1 #to exclude those already accounted for 
    maxcol <- which.max(colsums) 
    cols[maxcol] <- TRUE 
    rows <- rows & !as.logical(mat[,maxcol]) 
    } 
    return(which(cols)) 
}

它看起来对于大多数非零的列，从比较中删除这些行，然后重复。它返回n个最佳列的列号。

一个例子...

m <- matrix(sample(0:1,100,prob = c(0.8,0.2),replace=TRUE),nrow=10) 

m 
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 
[1,] 0 1 0 0 0 0 0 1 1  0 
[2,] 1 0 0 0 0 0 0 0 1  1 
[3,] 0 0 0 0 1 0 0 0 0  0 
[4,] 0 0 0 1 0 1 0 1 0  1 
[5,] 0 0 0 0 1 0 0 1 0  0 
[6,] 0 0 0 1 0 1 1 0 0  0 
[7,] 0 0 1 0 0 0 0 0 0  0 
[8,] 0 0 0 0 0 0 0 0 1  0 
[9,] 0 0 0 0 0 0 0 1 0  0 
[10,] 0 0 0 0 0 0 0 0 0  0 

find10(m,5) 
[1] 3 4 5 8 9

它还2,3你给的例子出现。

来源

2017-07-24 15:51:16

有趣的解决方案。我得想想！ –

是的，你是对的！很好的答案！非常感谢！ –

找到一个矩阵

回答

相关问题