2014-10-08 140 views
7

我打算在我的代码中使用multiprocessing以获得更好的性能。我可以在一个类的方法中使用multiprocessing.Pool吗?

但是,我得到了一个错误如下:

Traceback (most recent call last): 
    File "D:\EpubBuilder\TinyEpub.py", line 49, in <module> 
    e.epub2txt() 
    File "D:\EpubBuilder\TinyEpub.py", line 43, in epub2txt 
    tempread = self.get_text() 
    File "D:\EpubBuilder\TinyEpub.py", line 29, in get_text 
    txtlist = pool.map(self.char2text,charlist) 
    File "C:\Python34\lib\multiprocessing\pool.py", line 260, in map 
    return self._map_async(func, iterable, mapstar, chunksize).get() 
    File "C:\Python34\lib\multiprocessing\pool.py", line 599, in get 
    raise self._value 
    File "C:\Python34\lib\multiprocessing\pool.py", line 383, in _handle_tasks 
    put(task) 
    File "C:\Python34\lib\multiprocessing\connection.py", line 206, in send 
    self._send_bytes(ForkingPickler.dumps(obj)) 
    File "C:\Python34\lib\multiprocessing\reduction.py", line 50, in dumps 
    cls(buf, protocol).dump(obj) 
TypeError: cannot serialize '_io.BufferedReader' object 

我已经尝试过的其他方式得到这个错误:

TypeError: cannot serialize '_io.TextIOWrapper' object 

我的代码如下所示:

from multiprocessing import Pool 
class Book(object): 
    def __init__(self, arg): 
     self.namelist = arg 
    def format_char(self,char): 
     char = char + "a" 
     return char 
    def format_book(self): 
     self.tempread = "" 
     charlist = [f.read() for f in self.namelist] #list of char 
     with Pool() as pool: 
      txtlist = pool.map(self.format_char,charlist) 
     self.tempread = "".join(txtlist) 
     return self.tempread 

if __name__ == '__main__': 
    import os 
    b = Book([open(f) for f in os.listdir()]) 
    t = b.format_book() 
    print(t) 

我认为错误是由于在主函数中没有使用Pool而引发的。

我的猜想是对的吗?我怎样才能修改我的代码来修复错误?

+0

'type(charlist [0])'说什么?这有点令人困惑,因为您的错误信息与您发布的代码不匹配。 ('char2text'与'format_char')。 – 2014-10-08 05:05:41

+0

@JohnZwinck我的真实代码很长,这里的代码简化了一些。如果它看起来像混淆,我会编辑它.type(charlist [0])是'string' – PaleNeutron 2014-10-08 05:10:49

回答

16

问题是您在Book实例中有一个不可取的实例变量(namelist)。由于您在实例方法上调用pool.map,并且您在Windows上运行,因此需要将整个实例选择为可传递给子进程。 Book.namelist是一个打开的文件对象(_io.BufferedReader),它不能被酸洗。你可以通过几种方法解决这个问题。基于示例代码,它看起来像你可以只让format_char顶级功能:

def format_char(char): 
    char = char + "a" 
    return char 


class Book(object): 
    def __init__(self, arg): 
     self.namelist = arg 

    def format_book(self): 
     self.tempread = "" 
     charlist = [f.read() for f in self.namelist] #list of char 
     with Pool() as pool: 
      txtlist = pool.map(format_char,charlist) 
     self.tempread = "".join(txtlist) 
     return self.tempread 

但是,如果在现实中,你需要format_char是一个实例方法,你可以使用__getstate__/__setstate__使Book picklable通过酸洗前去除实例的namelist说法:

class Book(object): 
    def __init__(self, arg): 
     self.namelist = arg 

    def __getstate__(self): 
     """ This is called before pickling. """ 
     state = self.__dict__.copy() 
     del state['namelist'] 
     return state 

    def __setstate__(self, state): 
     """ This is called while unpickling. """ 
     self.__dict__.update(state) 

    def format_char(self,char): 
     char = char + "a" 

    def format_book(self): 
     self.tempread = "" 
     charlist = [f.read() for f in self.namelist] #list of char 
     with Pool() as pool: 
      txtlist = pool.map(self.format_char,charlist) 
     self.tempread = "".join(txtlist) 
     return self.tempread 

因为你并不需要访问namelist子进程这将是确定的,只要。

+0

谢谢!它现在运行良好,我的猜想是错误的。 – PaleNeutron 2014-10-08 05:19:15

相关问题