2017-10-18 113 views
1

当从子类numpy.ndarray的子类的实例列表中使用map()multiprocessing.Pool()时,自己类的新属性被删除。multiprocessing.Pool.map()删除子类化的ndarray的属性

下面的基础上,numpy docs subclassing example小例子,再现了问题:

from multiprocessing import Pool 
import numpy as np 


class MyArray(np.ndarray): 

    def __new__(cls, input_array, info=None): 
     obj = np.asarray(input_array).view(cls) 
     obj.info = info 
     return obj 

    def __array_finalize__(self, obj): 
     if obj is None: return 
     self.info = getattr(obj, 'info', None) 

def sum_worker(x): 
    return sum(x) , x.info 

if __name__ == '__main__': 
    arr_list = [MyArray(np.random.rand(3), info=f'foo_{i}') for i in range(10)] 
    with Pool() as p: 
     p.map(sum_worker, arr_list) 

属性info被丢弃

AttributeError: 'MyArray' object has no attribute 'info' 

使用内置map()工作正常

arr_list = [MyArray(np.random.rand(3), info=f'foo_{i}') for i in range(10)] 
list(map(sum_worker, arr_list2)) 

目的该方法的__array_finalize__()是对象保持属性切片

arr = MyArray([1,2,3], info='foo') 
subarr = arr[:2] 
print(subarr.info) 

但对于Pool.map()此方法以某种方式不工作...后

回答

2

由于多使用pickle序列化的数据向/从单独的进程,这基本上是this question的重复。

适应从这个问题接受的解决方案,你的例子就变成:

from multiprocessing import Pool 
import numpy as np 

class MyArray(np.ndarray): 

    def __new__(cls, input_array, info=None): 
     obj = np.asarray(input_array).view(cls) 
     obj.info = info 
     return obj 

    def __array_finalize__(self, obj): 
     if obj is None: return 
     self.info = getattr(obj, 'info', None) 

    def __reduce__(self): 
     pickled_state = super(MyArray, self).__reduce__() 
     new_state = pickled_state[2] + (self.info,) 
     return (pickled_state[0], pickled_state[1], new_state) 

    def __setstate__(self, state): 
     self.info = state[-1] 
     super(MyArray, self).__setstate__(state[0:-1]) 

def sum_worker(x): 
    return sum(x) , x.info 

if __name__ == '__main__': 
    arr_list = [MyArray(np.random.rand(3), info=f'foo_{i}') for i in range(10)] 
    with Pool() as p: 
     p.map(sum_worker, arr_list) 

注意,第二个答案表明自悲怆使用dill而不是pickle您可能能够使用pathos.multiprocessing与不适应原代码。但是,当我测试它时,这不起作用。