2015-10-17 95 views
0

鉴于大量的搜索,我仍然很难得到使用多个进程运行的特定功能。要求是:proceses的多处理混淆 - 基础

  • 限制数
  • 传递多个参数映射

的最新尝试运行,但是time.sleep似乎影响到所有进程 - 执行时间相同 - 20秒,无论池是否用于多进程foofoo直接调用(它应分别为4/20秒)。我错过了什么?

from multiprocessing import Pool, Process, Lock 
import os 
import time 

def foo(arg): 
    print '{} - {}'.format(arg[0], os.getpid()) 
    time.sleep(1) 

if __name__ == '__main__': 
    script_start_time = time.time() 

    pool = Pool(processes=5) 
    for i in range(20): 
     arg = [i, i] 
     pool.map(foo, [arg]) 

    pool.close() #necessary to prevent zombies 
    pool.join() #wait for all processes to finish 

    print 'Execution time {}s '.format(time.time() - script_start_time) 

结果:

0 - 5660 
1 - 5672 
2 - 5684 
3 - 5704 
4 - 5716 
5 - 5660 
6 - 5672 
7 - 5684 
8 - 5704 
9 - 5716 
10 - 5660 
11 - 5672 
12 - 5684 
13 - 5704 
14 - 5716 
15 - 5660 
16 - 5672 
17 - 5684 
18 - 5704 
19 - 5716 
Execution time 20.4240000248s 
+1

变化'map'到'map_async'。 – roippi

+1

Pool.map阻塞,直到评估完所有传递的迭代为止。使用所有参数调用映射一次,您将获得并发性。 – Javier

回答

0

正如在评论中提到的,pool.map将阻塞,直到执行完毕,所以你必须要么apply_asyncmap_async提交作业,并使用一个回调来处理你的函数返回数据。或者,您可以提前建立所有输入,并立即致电map

在此示例中,apply_async和map_async非常相似,区别在于apply_async一次只能提交一个作业,并且支持传递多个args和kwargs。例如:

from multiprocessing import Pool 
import os 
import time 

def add(a, b): 
    c = a+b 
    print(f'{a}+{b} = {c} from process: {os.getpid()}') #python 3 f-strings are nifty :) 
    time.sleep(1) 
    return c 

if __name__ == '__main__': 
    script_start_time = time.time() 
    pool = Pool(processes=5) 
    results = [] 
    for a in range(5): 
     for b in range(5,10): 
      pool.apply_async(add, (a,b), callback=lambda c: results.append(c)) 
    pool.close() #necessary to prevent zombies 
    pool.join() #wait for all processes to finish 
    print('results', results) 
    print('Execution time {}s '.format(time.time() - script_start_time))

注意到如何调用apply_async当参数传递。

或者,您可以使用普通地图一次性传递参数,但这需要您的函数只接受一个参数。这是starmap方法有用的地方。它需要的元组可迭代,并解包元组到该函数的参数,所以)的pool.starmap(foo, [(a,b),(c,d),(e,f)]输入将解包的每对进foo,它采用两个参数:

if __name__ == '__main__': 
    script_start_time = time.time() 
    pool = Pool(processes=5) 
    args = [(a,b) for a in "abc" for b in "ABC"] 
    print(pool.starmap(add, args)) #same add function from before (works with strings too) 
    pool.close() #necessary to prevent zombies 
    pool.join() #wait for all processes to finish 
    print('Execution time {}s '.format(time.time() - script_start_time))