2016-07-31 49 views
1

我生成numpy的阵列序列如下:如何有效地vstack大序列的numpy数组块?

def chunker(seq, size): 
    return (seq[pos:pos + size] for pos in range(0, len(seq), size)) 

for i in chunker(X,10000): 
    e = function(i) 
    print('new marix',e) 

new matrix (10000, 3208) 
new matrix (10000, 3208) 
new matrix (10000, 3208) 
new matrix (10000, 3208) 
new matrix (10000, 3208) 
new matrix (10000, 3208) 
new matrix (10000, 3208) 
new matrix (10000, 3208) 
... 
new matrix (10000, 3208) 

我想vstack上述n矩阵中的单独一个。因此,我试过如下:

X = np.vstack(e) 

然而,当我打印X我又收到:

new matrix (10000, 3208) 
new matrix (10000, 3208) 
new matrix (10000, 3208) 
new matrix (10000, 3208) 
new matrix (10000, 3208) 
new matrix (10000, 3208) 
new matrix (10000, 3208) 
new matrix (10000, 3208) 
... 
new matrix (10000, 3208) 

取而代之的是新的vstacked单个矩阵。任何想法如何vstack这个numpy数组序列?

更新

从杰德沃德的答案我编辑我的代码如下:

进口numpy的为NP

def chunker(seq, size): 
    return (seq[pos:pos + size] for pos in range(0, len(seq), size)) 

for (r,i) in enumerate(chunker(X,10000)): 
    e = function(i) 
    print('new matrix',e) 
    X[r,:] = e 

print(X) 
+1

'vstack'的输入应该是一个匹配最后一个维度的数组列表。 'e'看起来不像那样。您需要将个人'e'收集到列表中。 – hpaulj

+1

在你的循环中,“e”的形状是什么? 'X'的? 'X [R,:]'? – hpaulj

+0

X.shape =(878049,3208),e.shape =(10000,3208),merged [r,:]。shape =(3208,)。核心似乎已经死亡。它会自动重启。感谢@hpaulj的帮助!我也越来越:'核心似乎已经死亡。它会自动重新启动.' –

回答

1

的一种方式,虽然可能不是最有效的,是创建列表中您想要堆叠的列表,然后在循环外堆叠一次。

例如:

import numpy as np 

def chunker(seq, size): 
    return (seq[pos:pos + size] for pos in range(0, len(seq), size)) 

# Some fake function (n.b. this is a silly way to reverse a list) 
def function(arr): 
    arr.reverse() 
    return arr 

# Generate fake X 
X = list(range(100)) 

chunks = [] 
for i in chunker(X,10): 
    e = function(i) 
    print('new matrix',e) 
    chunks.append(e) 

merged = np.vstack(chunks) 
print(merged) 

输出:

 
new matrix [9, 8, 7, 6, 5, 4, 3, 2, 1, 0] 
new matrix [19, 18, 17, 16, 15, 14, 13, 12, 11, 10] 
new matrix [29, 28, 27, 26, 25, 24, 23, 22, 21, 20] 
new matrix [39, 38, 37, 36, 35, 34, 33, 32, 31, 30] 
new matrix [49, 48, 47, 46, 45, 44, 43, 42, 41, 40] 
new matrix [59, 58, 57, 56, 55, 54, 53, 52, 51, 50] 
new matrix [69, 68, 67, 66, 65, 64, 63, 62, 61, 60] 
new matrix [79, 78, 77, 76, 75, 74, 73, 72, 71, 70] 
new matrix [89, 88, 87, 86, 85, 84, 83, 82, 81, 80] 
new matrix [99, 98, 97, 96, 95, 94, 93, 92, 91, 90] 
[[ 9 8 7 6 5 4 3 2 1 0] 
[19 18 17 16 15 14 13 12 11 10] 
[29 28 27 26 25 24 23 22 21 20] 
[39 38 37 36 35 34 33 32 31 30] 
[49 48 47 46 45 44 43 42 41 40] 
[59 58 57 56 55 54 53 52 51 50] 
[69 68 67 66 65 64 63 62 61 60] 
[79 78 77 76 75 74 73 72 71 70] 
[89 88 87 86 85 84 83 82 81 80] 
[99 98 97 96 95 94 93 92 91 90]] 

或者创建中间列表:

merged = np.zeros([0,10]) 
for i in chunker(X,10): 
    e = function(i) 
    print('new matrix',e) 
    merged = np.vstack([merged, e]) 

print(merged) 

但最有效的是初始化numpy的阵列在循环之前,然后在内部设置该数组的行循环。你需要首先计算最后的merged数组的尺寸(这里我只是将它创建为10x10矩阵,因为我知道尺寸)。

merged = np.zeros([10,10]) 
for (r,i) in enumerate(chunker(X,10)): 
    e = function(i) 
    print('new matrix',e) 
    merged[r,:] = e 

print(merged) 
+0

这些都是非常大的数组,是否有更高效的方法呢? –

+1

我增加了两个附加选项。底部是迄今为止最高效的。 – jedwards

+0

我得到了这个异常:'ValueError:无法将形状(100)的输入数组广播成形(3208)'如何进行任何想法?...感谢您的帮助! –

相关问题