2010-12-21 117 views
10

我有一个二维数组,也就是一个也是数组的序列数组。对于每个序列,我想计算自相关,因此对于(5,4)数组,我会得到5个结果或维数组(5,7)。numpy中多维数组的自相关

我知道我可以在第一维上循环,但这很慢,也是我的最后一招。有另一种方法吗?

谢谢!

编辑:

根据所选择的答案加上mtrw的评论,我有以下功能:

def xcorr(x): 
    """FFT based autocorrelation function, which is faster than numpy.correlate""" 
    # x is supposed to be an array of sequences, of shape (totalelements, length) 
    fftx = fft(x, n=(length*2-1), axis=1) 
    ret = ifft(fftx * np.conjugate(fftx), axis=1) 
    ret = fftshift(ret, axes=1) 
    return ret 

注意长度是在我的代码一个全局变量,所以一定要申报它。我也没有将结果限制为实数,因为我需要考虑复数。

回答

8

使用FFT-based autocorrelation

import numpy 
from numpy.fft import fft, ifft 

data = numpy.arange(5*4).reshape(5, 4) 
print data 
##[[ 0 1 2 3] 
## [ 4 5 6 7] 
## [ 8 9 10 11] 
## [12 13 14 15] 
## [16 17 18 19]] 
dataFT = fft(data, axis=1) 
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real 
print dataAC 
##[[ 14.  8.  6.  8.] 
## [ 126. 120. 118. 120.] 
## [ 366. 360. 358. 360.] 
## [ 734. 728. 726. 728.] 
## [ 1230. 1224. 1222. 1224.]] 

我有点困惑你对有答案维度声明(5,7),所以也许有什么东西我不理解很重要。

编辑:在mtrw的建议,即不环绕软垫版本:

import numpy 
from numpy.fft import fft, ifft 

data = numpy.arange(5*4).reshape(5, 4) 
padding = numpy.zeros((5, 3)) 
dataPadded = numpy.concatenate((data, padding), axis=1) 
print dataPadded 
##[[ 0. 1. 2. 3. 0. 0. 0. 0.] 
## [ 4. 5. 6. 7. 0. 0. 0. 0.] 
## [ 8. 9. 10. 11. 0. 0. 0. 0.] 
## [ 12. 13. 14. 15. 0. 0. 0. 0.] 
## [ 16. 17. 18. 19. 0. 0. 0. 0.]] 
dataFT = fft(dataPadded, axis=1) 
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real 
print numpy.round(dataAC, 10)[:, :4] 
##[[ 14.  8.  3.  0.  0.  3.  8.] 
## [ 126. 92. 59. 28. 28. 59. 92.] 
## [ 366. 272. 179. 88. 88. 179. 272.] 
## [ 734. 548. 363. 180. 180. 363. 548.] 
## [ 1230. 920. 611. 304. 304. 611. 920.]] 

必须有这样做更有效的方式,特别是因为自相关是对称的,我不利用这一点。

+4

+1了基于FFT的方法。至于(5,7)形状的答案,你已经计算了循环相关性(http://en.wikipedia.org/wiki/Discrete_Fourier_transform#Circular_convolution_theorem_and_cross-correlation_theorem)。只需用3个零填充每行,以便谱乘法不会环绕,并且您将得到要求提供的原始问题。 – mtrw 2010-12-21 20:48:36

1

对于非常大的数组,具有n = 2 ** p(其中p是整数)变得非常重要。这将为您节省大量时间。例如:

def xcorr(x): 
    l = 2 ** int(np.log2(length * 2 - 1)) 
    fftx = fft(x, n = l, axis = 1) 
    ret = ifft(fftx * np.conjugate(fftx), axis = 1) 
    ret = fftshift(ret, axes=1) 
    return ret 

这可能会导致环绕错误。尽管如此,对于大型阵列来说,自动关联在边缘附近应该是微不足道的。

1

也许这只是一个偏好,但我想从定义中追随。我个人觉得这样更容易一些。这是我对任意nd数组的实现。

 
from itertools import product 
from numpy import empty, roll

def autocorrelate(x): """ Compute the multidimensional autocorrelation of an nd array. input: an nd array of floats output: an nd array of autocorrelations """ # used for transposes t = roll(range(x.ndim), 1) # pairs of indexes # the first is for the autocorrelation array # the second is the shift ii = [list(enumerate(range(1, s - 1))) for s in x.shape] # initialize the resulting autocorrelation array acor = empty(shape=[len(s0) for s0 in ii]) # iterate over all combinations of directional shifts for i in product(*ii): # extract the indexes for # the autocorrelation array # and original array respectively i1, i2 = asarray(i).T x1 = x.copy() x2 = x.copy() for i0 in i2: # clip the unshifted array at the end x1 = x1[:-i0] # and the shifted array at the beginning x2 = x2[i0:] # prepare to do the same for # the next axis x1 = x1.transpose(t) x2 = x2.transpose(t) # normalize shifted and unshifted arrays x1 -= x1.mean() x1 /= x1.std() x2 -= x2.mean() x2 /= x2.std() # compute the autocorrelation directly # from the definition acor[tuple(i1)] = (x1 * x2).mean() return acor