“平铺”使用numpy的

我特林通过取多数正方形块阵列中的和写入这些另一个数组减少2D阵列的尺寸的二维数组。方块的大小是可变的，让我们说一边的n值。该数组的数据类型将是一个整数。我目前在Python中使用循环来将每个块分配给一个临时数组，然后从tmpArray中提取唯一值。然后我通过这些循环找到发生次数最多的一次。正如你可以想象的那样，随着输入数组大小的增加，这个过程很快变得太慢。“平铺”使用numpy的

我见过的例子取最小值，最大值，并从我的正方形块的意思，但我不知道如何将它们转换为多数。 Grouping 2D numpy array in average 和 resize with averaging or rebin a numpy 2d array

我正在寻找使用numpy的整个数组上执行此过程加快这一进程的一些手段。（切换为输入变得太大，无法在内存中平铺阵列的部分，我可以处理这方面）

感谢

#snippet of my code 
#pull a tmpArray representing one square chunk of my input array 
kernel = sourceDs.GetRasterBand(1).ReadAsArray(int(sourceRow), 
            int(sourceCol), 
            int(numSourcePerTarget), 
            int(numSourcePerTarget)) 
#get a list of the unique values 
uniques = np.unique(kernel) 
curMajority = -3.40282346639e+038 
for val in uniques: 
    numOccurances = (array(kernel)==val).sum() 
    if numOccurances > curMajority: 
     ans = val 
     curMajority = numOccurances 

#write out our answer 
outBand.WriteArray(curMajority, row, col) 

#This is insanity!!!

继勃固的外观极好建议，我觉得我好上通向解决方案的途径。这是我到目前为止。我做的一个改变是使用原始网格形状中的一个（x y，n n）数组。我遇到的问题是，我似乎无法弄清楚如何将where，counts和uniq_a步骤从一维转换为两维。

#test data 
grid = np.array([[ 37, 1, 4, 4, 6, 6, 7, 7], 
       [ 1, 37, 4, 5, 6, 7, 7, 8], 
       [ 9, 9, 11, 11, 13, 13, 15, 15], 
       [9, 10, 11, 12, 13, 14, 15, 16], 
       [ 17, 17, 19, 19, 21, 11, 23, 23], 
       [ 17, 18, 19, 20, 11, 22, 23, 24], 
       [ 25, 25, 27, 27, 29, 29, 31, 32], 
       [25, 26, 27, 28, 29, 30, 31, 32]]) 
print grid 

n = 4 
X, Y = grid.shape 
x = X // n 
y = Y // n 
grid = grid.reshape((x, n, y, n)) 
grid = grid.transpose([0, 2, 1, 3]) 
grid = grid.reshape((x*y, n*n)) 
grid = np.sort(grid) 
diff = np.empty((grid.shape[0], grid.shape[1]+1), bool) 
diff[:, 0] = True 
diff[:, -1] = True 
diff[:, 1:-1] = grid[:, 1:] != grid[:, :-1] 
where = np.where(diff) 

#This is where if falls apart for me as 
#where returns two arrays: 
# row indices [0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3] 
# col indices [ 0 2 5 6 9 10 13 14 16 0 3 7 8 11 12 15 16 0 3 4 7 8 11 12 15 
# 16 0 2 3 4 7 8 11 12 14 16] 
#I'm not sure how to get a 
counts = where[:, 1:] - where[:, -1] 
argmax = counts[:].argmax() 
uniq_a = grid[diff[1:]] 
print uniq_a[argmax]

来源

2012-01-30 Colin Talbert

这是一个函数，它会根据numpy.unique的实现更加快速地找到多数。

def get_majority(a): 
    a = a.ravel() 
    a = np.sort(a) 
    diff = np.empty(len(a)+1, 'bool') 
    diff[0] = True 
    diff[-1] = True 
    diff[1:-1] = a[1:] != a[:-1] 
    where = np.where(diff)[0] 
    counts = where[1:] - where[:-1] 
    argmax = counts.argmax() 
    uniq_a = a[diff[1:]] 
    return uniq_a[argmax]

让我知道这是否有帮助。

更新

您可以执行以下操作，让您的阵列是(n*n, x, y)，应该设置你的第一轴运行，得到这个矢量化的方式来完成。

X, Y = a.shape 
x = X // n 
y = Y // n 
a = a.reshape((x, n, y, n)) 
a = a.transpose([1, 3, 0, 2]) 
a = a.reshape((n*n, x, y))

只需记住一些事情。尽管重塑和转置尽可能返回视图，但我认为重塑转置重塑将被迫复制。还应该推广上述方法以在轴上操作，但可能需要一点创意。

来源

2012-01-30 23:10:07

这有一定的帮助！我仍然希望一次在整个数据集中实现这个算法。像grid.reshape（（5，grid.shape [0] // 55，-1））.max（axis = 3）.max（1）将提供最大值。如果我想出来，我会发布解决方案。 – 2012-01-31 00:10:07

真是太棒了！看到我需要学习多少，真是令人h目。我已经通过您的建议，但将其更改为（x * y，n * n）数组。 – 2012-01-31 18:53:11

这可能是一个有点红脸了，但我最终诉诸scipy.stats.stats模式功能，发现大多数的值。我不确定在处理时间方面与其他解决方案相比如何。

import scipy.stats.stats as stats 
#test data 
grid = np.array([[ 37, 1, 4, 4, 6, 6, 7, 7], 
       [ 1, 37, 4, 5, 6, 7, 7, 8], 
       [ 9, 9, 11, 11, 13, 13, 15, 15], 
       [9, 10, 11, 12, 13, 14, 15, 16], 
       [ 17, 17, 19, 19, 21, 11, 23, 23], 
       [ 17, 18, 19, 20, 11, 22, 23, 24], 
       [ 25, 25, 27, 27, 29, 29, 31, 32], 
       [25, 26, 27, 28, 29, 30, 31, 32]]) 
print grid 

n = 2 
X, Y = grid.shape 
x = X // n 
y = Y // n 
grid = grid.reshape((x, n, y, n)) 
grid = grid.transpose([0, 2, 1, 3]) 
grid = grid.reshape((x*y, n*n)) 
answer = np.array(stats.mode(grid, 1)[0]).reshape(x, y)

来源

2012-01-31 20:30:57

我认为stats.mode是一个很好的选择。对不起，因为模式返回数组，所以你可以把np.array放在最后一行。 – 2012-01-31 23:36:25

“平铺”使用numpy的

回答

相关问题