2012-03-16 75 views
10

我想在任何时候都可以并行运行多个进程,并且能够执行stdout。我应该怎么做?我是否需要为每个subprocess.Popen()调用运行线程,一个什么?并行Python子进程

+0

可能重复[如何使用python运行多个可执行文件?](http://stackoverflow.com/questions/9724499/how-to-run-several-executable-using-python) – 2012-03-17 01:28:07

+0

相关:这里是如何[运行多个shell命令(并可以同时捕获它们的输出)](http://stackoverflow.com/a/23616229/4279) – jfs 2014-07-26 14:16:55

回答

13

你可以在一个线程中完成。

假设你有一个随机时间打印行的脚本:

#!/usr/bin/env python 
#file: child.py 
import os 
import random 
import sys 
import time 

for i in range(10): 
    print("%2d %s %s" % (int(sys.argv[1]), os.getpid(), i)) 
    sys.stdout.flush() 
    time.sleep(random.random()) 

而且要尽快,因为它成为可用收集输出,你可以在POSIX系统使用select@zigg suggested

#!/usr/bin/env python 
from __future__ import print_function 
from select  import select 
from subprocess import Popen, PIPE 

# start several subprocesses 
processes = [Popen(['./child.py', str(i)], stdout=PIPE, 
        bufsize=1, close_fds=True, 
        universal_newlines=True) 
      for i in range(5)] 

# read output 
timeout = 0.1 # seconds 
while processes: 
    # remove finished processes from the list (O(N**2)) 
    for p in processes[:]: 
     if p.poll() is not None: # process ended 
      print(p.stdout.read(), end='') # read the rest 
      p.stdout.close() 
      processes.remove(p) 

    # wait until there is something to read 
    rlist = select([p.stdout for p in processes], [],[], timeout)[0] 

    # read a line from each process that has output ready 
    for f in rlist: 
     print(f.readline(), end='') #NOTE: it can block 

更便携的解决方案(可在Windows,Linux,OSX上运行)可以为每个进程使用读取器线程,请参阅Non-blocking read on a subprocess.PIPE in python

这里的os.pipe()为基础的解决方案,在Unix和Windows的工作原理:

#!/usr/bin/env python 
from __future__ import print_function 
import io 
import os 
import sys 
from subprocess import Popen 

ON_POSIX = 'posix' in sys.builtin_module_names 

# create a pipe to get data 
input_fd, output_fd = os.pipe() 

# start several subprocesses 
processes = [Popen([sys.executable, 'child.py', str(i)], stdout=output_fd, 
        close_fds=ON_POSIX) # close input_fd in children 
      for i in range(5)] 
os.close(output_fd) # close unused end of the pipe 

# read output line by line as soon as it is available 
with io.open(input_fd, 'r', buffering=1) as file: 
    for line in file: 
     print(line, end='') 
# 
for p in processes: 
    p.wait() 
+2

您似乎在最后一个解决方案中将所有孩子的标准输出复用到单个fd(output_fd)。如果两个孩子同时打印,不会弄乱输出(例如'AAA \ n'+'BBB \ n' - >'ABBB \ nAA \ n') – dan3 2013-11-15 07:09:43

+1

@ dan3:这是一个有效的担忧。小于“PIPE_BUF”字节的“写入”是原子的。否则来自多个进程的数据可能被交织。 POSIX至少需要512个字节。在Linux上,'PIPE_BUF'是4096字节。 – jfs 2013-11-15 19:55:53

+0

这里有一个类似的问题,我最近在这里发布,http://stackoverflow.com/questions/36624056/running-a-secondary-script-in-a-new-terminal将是太棒了,如果你可以帮忙,谢谢在任何情况下。 – 2016-04-14 14:42:16

4

您不需要为每个进程运行线程。您可以查看每个进程的stdout流而不阻塞它们,并且只有在有数据可供读取的情况下才从它们读取。

必须小心,不要意外阻止他们,虽然,如果你不打算。

+0

我做了'p = subprocess.Popen(...)'然后'print p.communicate() [0]'几次。但'communic()'在进程结束之前就等待。 – sashab 2012-03-16 20:26:43

+1

是的,这就是为什么如果你想使用单线程你不能使用'communic()'的原因。除了'communic()'外,还有其他一些获取stdout的方法。 – Amber 2012-03-16 20:27:26

+2

您可能需要查看[select](http://docs.python.org/library/select.html)模块,以便一次等待多个子进程。 – zigg 2012-03-16 20:28:55

6

您也可以同时使用twisted收集来自多个子进程的stdout:

#!/usr/bin/env python 
import sys 
from twisted.internet import protocol, reactor 

class ProcessProtocol(protocol.ProcessProtocol): 
    def outReceived(self, data): 
     print data, # received chunk of stdout from child 

    def processEnded(self, status): 
     global nprocesses 
     nprocesses -= 1 
     if nprocesses == 0: # all processes ended 
      reactor.stop() 

# start subprocesses 
nprocesses = 5 
for _ in xrange(nprocesses): 
    reactor.spawnProcess(ProcessProtocol(), sys.executable, 
         args=[sys.executable, 'child.py'], 
         usePTY=True) # can change how child buffers stdout 
reactor.run() 

Using Processes in Twisted