我注意到了numpy.dot()
函数的一个有趣行为。 我的企业RedHat 6.7有两个Xeon CPU,每个CPU有12个内核。我运行下面的代码片段,然后检查CPU利用率htop
为什么numpy.dot无法在多于2维的场景上并行化
下面的代码使用所有的内核我的服务器上:
import numpy as np
a = np.random.rand(1000, 1000)
b = np.random.rand(1000, 5)
z = a.dot(b) #or use %timeit a.dot(b) if you use ipython
但是,只要我像以下那样再添加一个尺寸到b
,就只用了一个内核。
import numpy as np
a = np.random.rand(1000, 1000)
b = np.random.rand(1, 1000, 5) #or np.random.rand(n, 1000, 5) where n>=1
z = a.dot(b) #or use %timeit a.dot(b) if you use ipython
下面的代码是我的Python环境从import sys; sys.version
配置
'2.7.11 |Continuum Analytics, Inc.| (default, Dec 6 2015, 18:08:32) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]'
下面是配置来自numpy.show_config()
的信息
lapack_opt_info:
libraries = ['mkl_lapack95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
library_dirs = ['/opt/anaconda2/envs/portopt/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/anaconda2/envs/portopt/include']
blas_opt_info:
libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
library_dirs = ['/opt/anaconda2/envs/portopt/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/anaconda2/envs/portopt/include']
openblas_lapack_info: NOT AVAILABLE
lapack_mkl_info:
libraries = ['mkl_lapack95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
library_dirs = ['/opt/anaconda2/envs/portopt/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/anaconda2/envs/portopt/include']
blas_mkl_info:
libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
library_dirs = ['/opt/anaconda2/envs/portopt/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/anaconda2/envs/portopt/include']
mkl_info:
libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread']
library_dirs = ['/opt/anaconda2/envs/portopt/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/anaconda2/envs/portopt/include']
有人看过这个吗?我倾向于认为这是一个错误,而不是设计,因为显然还有更多的工作要做。另外,有没有办法强制numpy.dot的腭化? 在此先感谢!
更新: 我找到了一种解决方法来加速计算。请参阅下面的代码片段。
import numpy as np
a = np.random.rand(1000, 1000) #in my program a variable
b = np.random.rand(100, 1000, 5) #b is a constant
z1 = a.dot(b)
c=b.swapaxes(0, 1).reshape(1000, 5*100) #the trick is to turn the 3d array into a 2d matrix
z2 = a.dot(c).reshape(z1.shape) #then reshape the result to the desired shape.
np.allclose(z1, z2) #the results are identical but the computation of z2 is more than 10 times faster than that of z1 on my server.
不过,我同意从长远来看,我们应该研究numpy的代码@hpaulj曾建议并修复问题(事件是一个bug)一劳永逸。
分享您的HTOP的检查,请! –