巨大数组的点积numpy

我有一个巨大的数组，我想用小数组计算点积。但我越来越'阵列太大'有没有解决办法？巨大数组的点积numpy

import numpy as np 

eMatrix = np.random.random_integers(low=0,high=100,size=(20000000,50)) 
pMatrix = np.random.random_integers(low=0,high=10,size=(50,50)) 

a = np.dot(eMatrix,pMatrix) 

Error: 
/Library/Python/2.7/site-packages/numpy/random/mtrand.so in mtrand.RandomState.random_integers (numpy/random/mtrand/mtrand.c:9385)() 

/Library/Python/2.7/site-packages/numpy/random/mtrand.so in mtrand.RandomState.randint (numpy/random/mtrand/mtrand.c:7051)() 

ValueError: array is too big.

来源

2014-09-05 Lanc

这种情况已经发生在eMatrix =，no？您要求10^9个整数 - 每个整数字节数的1倍。所以至少应该将它们放入dtype int8而不是默认的int64数组中。 – mdurant 2014-09-05 14:33:04

但是我有一台64位的机器，内存为16GB RAM – Lanc 2014-09-05 14:51:56

所以8GB的第一个ePrime，至少也是一样的，也许还有一些看不见的中间产品。 – mdurant 2014-09-05 14:53:59

我认为唯一的“简单”答案是获得更多的RAM。

它花了15GB，但我能够在我的MacBook上做到这一点。

In [1]: import numpy 
In [2]: e = numpy.random.random_integers(low=0, high=100, size=(20000000, 50)) 
In [3]: p = numpy.random.random_integers(low=0, high=10, size=(50, 50)) 
In [4]: a = numpy.dot(e, p) 
In [5]: a[0] 
Out[5]: 
array([14753, 12720, 15324, 13588, 16667, 16055, 14144, 15239, 15166, 
     14293, 16786, 12358, 14880, 13846, 11950, 13836, 13393, 14679, 
     15292, 15472, 15734, 12095, 14264, 12242, 12684, 11596, 15987, 
     15275, 13572, 14534, 16472, 14818, 13374, 14115, 13171, 11927, 
     14226, 13312, 16070, 13524, 16591, 16533, 15466, 15440, 15595, 
     13164, 14278, 13692, 12415, 13314])

一种可能的解决方案可能是使用sparse matrix和稀疏矩阵点运算符。

例如，在我的机器上只用e作为一个密度矩阵使用8GB的RAM。构建一个类似的稀疏矩阵eprime：

In [1]: from scipy.sparse import rand 
In [2]: eprime = rand(20000000, 50)

具有在内存方面可忽略的成本。

来源

2014-09-05 14:51:27 stderr

我相信，一旦你做了一个像点一样的计算，你将再次拥有一个密集的矩阵。 – mdurant 2014-09-05 14:52:46

嘿@stderr正如我上面提到的我也试图在Mac上有16GB内存，但它是失败的。 – Lanc 2014-09-05 14:54:45

另外我不想稀疏矩阵，我的矩阵需要密集 – Lanc 2014-09-05 14:55:28

我相信答案是你没有足够的内存，也可能你正在运行一个32位版本的python。也许澄清你正在运行的操作系统。许多操作系统将运行32位和64位程序。

来源

2014-09-05 15:13:24 beiller

如何检查我是否运行32位版本的Python？ – Lanc 2014-09-05 15:34:14

如上所述，在这里看到如何确定您是否运行64位或32位的python可执行文件：http://stackoverflow.com/questions/1405913/how-do-i-determine-if-my-python-shell-正在执行32位或64位模式的操作系统 – beiller 2014-09-05 17:44:52

如果确定数组的总大小（如果它溢出本机int类型see here以确定源代码行），则会引发该错误。

为了实现这一点，无论您的机器是64位，您几乎肯定会运行32位版本的Python（和NumPy）。 You can check if that is the case by doing：

>>> import sys 
>>> sys.maxsize 
2147483647 # <--- 2**31 - 1, on a 64 bit version you would get 2**63 - 1

话又说回来，你数组是 “唯一” 20000000 * 50 = 1000000000，这是刚下2**30。如果我尝试重现上32位numpy的搜索结果，我得到一个MemoryError：

>>> np.random.random_integers(low=0,high=100,size=(20000000,50)) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "mtrand.pyx", line 1420, in mtrand.RandomState.random_integers (numpy\random\mtrand\mtrand.c:12943) 
    File "mtrand.pyx", line 938, in mtrand.RandomState.randint (numpy\random\mtrand\mtrand.c:10338) 
MemoryError

，除非我增加大小超出了魔术2**31 - 1门槛

>>> np.random.random_integers(low=0,high=100,size=(2**30, 2)) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "mtrand.pyx", line 1420, in mtrand.RandomState.random_integers (numpy\random\mtrand\mtrand.c:12943) 
    File "mtrand.pyx", line 938, in mtrand.RandomState.randint (numpy\random\mtrand\mtrand.c:10338) 
ValueError: array is too big.

鉴于该行号的区别在你的回溯和我的，我怀疑你正在使用一个旧版本。这个输出在你的系统上有什么作用：

>>> np.__version__ 
'1.10.0.dev-9c50f98'

来源

2014-09-05 16:19:39 Jaime

感谢您的洞察！我正在使用numpy 1.8.2版本 – Lanc 2014-09-07 10:31:40

巨大数组的点积numpy

回答

相关问题