如何叠加几个稀疏矩阵（特征矩阵）？

我有3个稀疏矩阵：如何叠加几个稀疏矩阵（特征矩阵）？

In [39]: 

mat1 


Out[39]: 
(1, 878049) 
<1x878049 sparse matrix of type '<type 'numpy.int64'>' 
    with 878048 stored elements in Compressed Sparse Row format> 

In [37]: 

mat2 


Out[37]: 
(1, 878049) 
<1x878049 sparse matrix of type '<type 'numpy.int64'>' 
    with 744315 stored elements in Compressed Sparse Row format> 

In [35]: 

mat3 



Out[35]: 
(1, 878049) 
<1x878049 sparse matrix of type '<type 'numpy.int64'>' 
    with 788618 stored elements in Compressed Sparse Row format>

从documentation，我读到有可能hstack，vstack，和它们concatenate这种类型的矩阵。于是，我就hstack他们：

import numpy as np 

matrix1 = np.hstack([[address_feature, dayweek_feature]]).T 
matrix2 = np.vstack([[matrix1, pddis_feature]]).T 


X = matrix2

然而，尺寸不匹配：

In [41]: 

X_combined_features.shape 

Out[41]: 

(2, 1)

请注意，我堆叠这样的矩阵，因为我想用scikit学习的分类算法使用它们。因此，我应该如何hstack一些不同的稀疏矩阵？。

来源

2016-06-09 john doe

使用vstack的sparse版本。作为一般规则，您需要使用稀疏函数和方法，而不是名称相似的numpy。 sparse矩阵不是numpyndarray的子类。

但是，你的3个三矩阵看起来并不稀疏。他们是1x878049。一个有878048个非零元素 - 这意味着只有一个0元素。

所以你可以把它们变成密集阵列（使用.toarray()或.A）并使用np.hstack或np.vstack。

np.hstack([address_feature.A, dayweek_feature.A])

而且不要使用双括号。所有连接函数都采用数组的简单列表或元组。并且该列表可以有两个以上的阵列

In [296]: A=sparse.csr_matrix([0,1,2,0,0,1]) 

In [297]: B=sparse.csr_matrix([0,0,0,1,0,1]) 

In [298]: C=sparse.csr_matrix([1,0,0,0,1,0]) 

In [299]: sparse.vstack([A,B,C]) 
Out[299]: 
<3x6 sparse matrix of type '<class 'numpy.int32'>' 
    with 7 stored elements in Compressed Sparse Row format> 

In [300]: sparse.vstack([A,B,C]).A 
Out[300]: 
array([[0, 1, 2, 0, 0, 1], 
     [0, 0, 0, 1, 0, 1], 
     [1, 0, 0, 0, 1, 0]], dtype=int32) 

In [301]: sparse.hstack([A,B,C]).A 
Out[301]: array([[0, 1, 2, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0]], dtype=int32) 

In [302]: np.vstack([A.A,B.A,C.A]) 
Out[302]: 
array([[0, 1, 2, 0, 0, 1], 
     [0, 0, 0, 1, 0, 1], 
     [1, 0, 0, 0, 1, 0]], dtype=int32)

来源

2016-06-09 05:33:24 hpaulj

感谢您的帮助，很好的回答！ –

如何叠加几个稀疏矩阵（特征矩阵）？

回答

相关问题