2017-04-12 59 views
0

我想在大稀疏矩阵(目前12000条12000)运营的最大值。 我想要做的就是它的块设置为零,但保持该块内的最大值。 我已经有密集矩阵正在运行的解决方案:SciPy的/ numpy的:只保留一个稀疏矩阵块

import numpy as np 
from scipy.sparse import random 

np.set_printoptions(precision=2) 
#x = random(10,10,density=0.5) 
x = np.random.random((10,10)) 
x = x.T * x 
print(x) 

def keep_only_max(a,b,c,d): 
    sub = x[a:b,c:d] 
    z = np.max(sub) 
    sub[sub < z] = 0 


sizes = np.asarray([0,1,5,4]) 
sizes_sum = np.cumsum(sizes) 

for i in range(1,len(sizes)): 
    current_i_min = sizes_sum[i-1] 
    current_i_max = sizes_sum[i] 
    for j in range(1,len(sizes)): 
    if i >= j: 
     continue 
    current_j_min = sizes_sum[j-1] 
    current_j_max = sizes_sum[j] 

    keep_only_max(current_i_min, current_i_max, current_j_min, current_j_max) 
    keep_only_max(current_j_min, current_j_max, current_i_min, current_i_max) 

print(x) 

然而,这不适用于稀疏矩阵(重试取消注释在上面的线)。 任何想法我怎么能有效地实现这一点没有调用todense()?

回答

0

感谢hpaulj我设法用scipy.sparse.bmat实现解决方案:

from scipy.sparse import coo_matrix 
from scipy.sparse import csr_matrix 
from scipy.sparse import rand 
from scipy.sparse import bmat 
import numpy as np 


np.set_printoptions(precision=2) 

# my matrices are symmetric, so generate random symmetric matrix 
x = rand(10,10,density=0.4) 
x = x.T * x 
x = x 


def keep_only_max(a,b,c,d): 
    sub = x[a:b,c:d] 
    z = np.unravel_index(sub.argmax(),sub.shape) 
    i1 = z[0] 
    j1 = z[1] 
    new = csr_matrix(([sub[i1,j1]],([i1],[j1])),shape=(b-a,d-c)) 
    return new 

def keep_all(a,b,c,d): 
    return x[a:b,c:d].copy() 

# we want to create a chessboard pattern where the first central block is 1x1, the second 5x5 and the last 4x4 
sizes = np.asarray([0,1,5,4]) 
sizes_sum = np.cumsum(sizes) 

# acquire 2D array to store our chessboard blocks 
r = range(len(sizes)-1) 
blocks = [[0 for x in r] for y in r] 


for i in range(1,len(sizes)): 
    current_i_min = sizes_sum[i-1] 
    current_i_max = sizes_sum[i] 

    for j in range(i,len(sizes)): 

     current_j_min = sizes_sum[j-1] 
     current_j_max = sizes_sum[j] 

     if i == j: 
      # keep the blocks at the diagonal completely 
      sub = keep_all(current_i_min, current_i_max, current_j_min, current_j_max) 
      blocks[i-1][j-1] = sub 
     else: 
      # the blocks not on the digonal only keep their maximum value 
      current_j_min = sizes_sum[j-1] 
      current_j_max = sizes_sum[j] 

      # we can leverage the matrix symmetry and only calculate one new matrix. 
      m1 = keep_only_max(current_i_min, current_i_max, current_j_min, current_j_max) 
      m2 = m1.T 

      blocks[i-1][j-1] = m1 
      blocks[j-1][i-1] = m2 


z = bmat(blocks) 
print(z.todense()) 
0
def keep_only_max(a,b,c,d): 
    sub = x[a:b,c:d] 
    z = np.max(sub) 
    sub[sub < z] = 0 

对于稀疏x,该sub切片工程csr格式。这不会是为等效密片一样快,但它会创建的x那部分的副本。

我不得不检查稀疏max功能。但我可以想象将sub转换为coo格式,使用np.argmax上的.data属性,以及相应的rowcol值,构造一个具有相同形状但只有一个非零值的新矩阵。

如果你的块以规则,不重叠的方式覆盖x,我建议用sparse.bmat构建一个新的矩阵。基本上收集所有组件的coo属性,将它们加入具有适当偏移量的一组阵列中,并制作新的矩阵。

如果块散在或重叠,您可能需要生成,并通过一个插入它们放回x之一。 csr格式应该适用于此,但会发出稀疏效率警告。 lil应该更快地改变值。我认为它会接受块。

我可以想象用稀疏矩阵做这件事,但它需要时间来设置测试用例和调试过程。

+0

块确实是不重叠的,因为我或多或少产生棋盘图案,其中每个细胞只保留它的最大价值。 – Zahlii