我有一个python脚本,可以从列表中计算矩阵的特征值,我想按照原始矩阵的顺序将这些特征值插入到另一个集合中,喜欢通过产生多个进程来做到这一点。序列化迭代器对象在Python中的进程之间传递
这里是我的代码:
import time
import collections
import numpy as NP
from scipy import linalg as LA
from joblib import Parallel, delayed
def computeEigenV(unit_of_work):
current_index = unit_of_work[0]
current_matrix = unit_of_work[1]
e_vals, e_vecs = LA.eig(current_matrix)
finished_unit = (current_index, lowEV[::-1])
return finished_unit
def run(work_list):
pool = Parallel(n_jobs = -1, verbose = 1, pre_dispatch = 'all')
results = pool(delayed(computeEigenV)(unit_of_work) for unit_of_work in work_list)
return results
if __name__ == '__main__':
# create original array of matrices
original_matrix_list = []
work_list = []
#basic set up so we can run this test
for i in range(0, 100):
# generate the matrix & unit or work
matrix = NP.random.random_integers(0, 100, (500, 500))
#insert into respective resources
original_matrix_list.append(matrix)
for i, matrix in enumerate(original_matrix_list):
unit_of_work = [i, matrix]
work_list.append(unit_of_work)
work_result = run(work_list)
所以work_result
应保存所有从每个矩阵的特征值后的所有过程完成。我使用的迭代器是unit_of_work
,它是一个包含矩阵索引(来自original_matrix_list
)和矩阵本身的列表。
奇怪的是,如果我通过做python matrix.py
运行此代码一切正常。但是当我使用auto(一个程序,它的计算微分方程解?)来运行我的脚本,打字auto matrix.py
给了我以下错误:
Traceback (most recent call last):
File "matrix.py", line 50, in <module>
work_result = run(work_list)
File "matrix.py", line 27, in run
results = pool(delayed(computeEigenV)(unit_of_work) for unit_of_work in work_list)
File "/Library/Python/2.7/site-packages/joblib/parallel.py", line 805, in __call__
while self.dispatch_one_batch(iterator):
File "/Library/Python/2.7/site-packages/joblib/parallel.py", line 658, in dispatch_one_batch
tasks = BatchedCalls(itertools.islice(iterator, batch_size))
File "/Library/Python/2.7/site-packages/joblib/parallel.py", line 69, in __init__
self.items = list(iterator_slice)
File "matrix.py", line 27, in <genexpr>
results = pool(delayed(computeEigenV)(unit_of_work) for unit_of_work in work_list)
File "/Library/Python/2.7/site-packages/joblib/parallel.py", line 162, in delayed
pickle.dumps(function)
TypeError: expected string or Unicode object, NoneType found
注:当我跑这跟auto
我不得不改变if __name__ == '__main__':
到if __name__ == '__builtin__':
我查了一下这个错误,好像我没有正确地将迭代器unit_of_work
序列化到不同的进程中。然后我尝试使用serialized_unit_of_work = pickle.dumps(unit_of_work)
,通过那个,当我需要使用迭代器时做pickle.loads
,但我仍然得到相同的错误。
有人可以请帮助指出我在正确的方向,我该如何解决这个问题?我不愿意使用pickle.dump(obj, file[, protocol])
,因为最终我将运行这个来计算数千个矩阵的特征值,而且如果可能的话,我并不想创建很多文件来存储序列化的迭代器。
谢谢! :)