NumPy
阵列非常适合性能和易用性(更容易切片,索引比列表)。加速结构化NumPy阵列
我尝试构建一个NumPy structured array
而不是dict
的NumPy arrays
的数据容器。问题是性能差得多。使用同类数据约2.5倍,异构数据约32倍(我正在谈论NumPy
数据类型)。
有没有办法加快结构化阵列的速度?我尝试将记忆顺序从'c'更改为'f',但这没有任何影响。
这里是我的分析代码:
import time
import numpy as np
NP_SIZE = 100000
N_REP = 100
np_homo = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.double)], order='c')
np_hetro = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.int32)], order='c')
dict_homo = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE)}
dict_hetro = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE, np.int32)}
t0 = time.time()
for i in range(N_REP):
np_homo['a'] += i
t1 = time.time()
for i in range(N_REP):
np_hetro['a'] += i
t2 = time.time()
for i in range(N_REP):
dict_homo['a'] += i
t3 = time.time()
for i in range(N_REP):
dict_hetro['a'] += i
t4 = time.time()
print('Homogeneous Numpy struct array took {:.4f}s'.format(t1 - t0))
print('Hetoregeneous Numpy struct array took {:.4f}s'.format(t2 - t1))
print('Homogeneous Dict of numpy arrays took {:.4f}s'.format(t3 - t2))
print('Hetoregeneous Dict of numpy arrays took {:.4f}s'.format(t4 - t3))
编辑:忘了把我的时间数字:
Homogenious Numpy struct array took 0.0101s
Hetoregenious Numpy struct array took 0.1367s
Homogenious Dict of numpy arrays took 0.0042s
Hetoregenious Dict of numpy arrays took 0.0042s
EDIT2:我添加了一些额外的测试案例与TIMIT模块:
import numpy as np
import timeit
NP_SIZE = 1000000
def time(data, txt, n_rep=1000):
def intern():
data['a'] += 1
time = timeit.timeit(intern, number=n_rep)
print('{} {:.4f}'.format(txt, time))
np_homo = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.double)], order='c')
np_hetro = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.int32)], order='c')
dict_homo = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE)}
dict_hetro = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE, np.int32)}
time(np_homo, 'Homogeneous Numpy struct array')
time(np_hetro, 'Hetoregeneous Numpy struct array')
time(dict_homo, 'Homogeneous Dict of numpy arrays')
time(dict_hetro, 'Hetoregeneous Dict of numpy arrays')
结果于:
Homogeneous Numpy struct array 0.7989
Hetoregeneous Numpy struct array 13.5253
Homogeneous Dict of numpy arrays 0.3750
Hetoregeneous Dict of numpy arrays 0.3744
运行之间的比例似乎相当稳定。使用这两种方法和不同大小的数组。
对于offcase它的问题: 蟒蛇:3.4 NumPy的:1.9.2
由于这个问题是关于NumPy的一个特定性能问题,而不是一般性的批评,因此它已经从Code Review迁移到Stack Overflow。 –
如果你真的想使用结构化数组,我会建议尝试[pandas](http://pandas.pydata.org/)。 –
看到这个问题:https://github.com/numpy/numpy/issues/6467 – MaxNoe