itertools.Numpy等效产品

我知道约itertools.product迭代关键字的几个维度的列表。举例来说，如果我有这样的：itertools.Numpy等效产品

categories = [ 
    [ 'A', 'B', 'C', 'D'], 
    [ 'E', 'F', 'G', 'H'], 
    [ 'I', 'J', 'K', 'L'] 
]

我用itertools.product()过它，我有这样的：

>>> [ x for x in itertools.product(*categories) ] 
('A', 'E', 'I'), 
('A', 'E', 'J'), 
('A', 'E', 'K'), 
('A', 'E', 'L'), 
('A', 'F', 'I'), 
('A', 'F', 'J'), 
# and so on...

是否有同等学历，numpy的阵列做同样的事情的简单方法？

来源

2015-02-23 Jivan

这个问题已经被问了几次不已：

Using numpy to build an array of all combinations of two arrays

itertools product speed up

第一个环节有一个工作numpy的解决方案，它声称是比itertools快好几倍，虽然没有提供基准。此代码由名为pv的用户编写。请按照链接和支持他的答案，如果你觉得它有用：

import numpy as np 

def cartesian(arrays, out=None): 
    """ 
    Generate a cartesian product of input arrays. 

    Parameters 
    ---------- 
    arrays : list of array-like 
     1-D arrays to form the cartesian product of. 
    out : ndarray 
     Array to place the cartesian product in. 

    Returns 
    ------- 
    out : ndarray 
     2-D array of shape (M, len(arrays)) containing cartesian products 
     formed of input arrays. 

    Examples 
    -------- 
    >>> cartesian(([1, 2, 3], [4, 5], [6, 7])) 
    array([[1, 4, 6], 
      [1, 4, 7], 
      [1, 5, 6], 
      [1, 5, 7], 
      [2, 4, 6], 
      [2, 4, 7], 
      [2, 5, 6], 
      [2, 5, 7], 
      [3, 4, 6], 
      [3, 4, 7], 
      [3, 5, 6], 
      [3, 5, 7]]) 

    """ 

    arrays = [np.asarray(x) for x in arrays] 
    dtype = arrays[0].dtype 

    n = np.prod([x.size for x in arrays]) 
    if out is None: 
     out = np.zeros([n, len(arrays)], dtype=dtype) 

    m = n/arrays[0].size 
    out[:,0] = np.repeat(arrays[0], m) 
    if arrays[1:]: 
     cartesian(arrays[1:], out=out[0:m,1:]) 
     for j in xrange(1, arrays[0].size): 
      out[j*m:(j+1)*m,1:] = out[0:m,1:] 
    return out

然而，在同一职位亚历克斯·马尔泰利 - 他是SO一个巨大的Python大师 - 写，那itertools是最快的方式这个任务完成了。所以这是一个快速的基准，证明了Alex的话。

import numpy as np 
import time 
import itertools 


def cartesian(arrays, out=None): 
    ... 


def test_numpy(arrays): 
    for res in cartesian(arrays): 
     pass 


def test_itertools(arrays): 
    for res in itertools.product(*arrays): 
     pass 


def main(): 
    arrays = [np.fromiter(range(100), dtype=int), np.fromiter(range(100, 200), dtype=int)] 
    start = time.clock() 
    for _ in range(100): 
     test_numpy(arrays) 
    print(time.clock() - start) 
    start = time.clock() 
    for _ in range(100): 
     test_itertools(arrays) 
    print(time.clock() - start) 

if __name__ == '__main__': 
    main()

输出：

0.421036 
0.06742

所以，你一定要使用itertools。

来源

2015-02-23 22:42:46

感谢您的延伸答案和随之而来的建议 – Jivan 2015-02-23 22:43:47

速度差异是因为您正在遍历笛卡尔（）结果，并且对numpy数组的迭代比对Python迭代器的迭代要慢。如果你只想构造数组，你需要比较'cartesian（...）'和'np.array（list（itertools.product（...）））''。对于迭代，itertools是正确的答案，但是这里的问题是关于构造的问题。 – 2015-02-24 09:42:31

@Jivan As pv。他指出，由于将Python迭代器（由'itertools.product'产生）转换为一个numpy数组的明显开销，因此他的numpy函数将更快地构建一个numpy数组，因为numpy的对象数组（在本例中为元组）直接从迭代器创建。在我的测试中，它快了5倍，但是你应该记住，在numpy数组上迭代的速度要慢得多（根据我上面的测试，速度要慢5倍），所以如果速度是你最关心的问题，你应该使用迭代器。 – 2015-02-24 10:48:53

itertools.Numpy等效产品

回答

相关问题