Python为S3上传生成了AWS CLI进程，并且变得非常慢

我的Python应用程序为AWS CLI S3上传创建了一个子进程。Python为S3上传生成了AWS CLI进程，并且变得非常慢

command = 'aws s3 sync /tmp/tmp_dir s3://mybucket/tmp_dir' 
# spawn the process 
sp = subprocess.Popen(
    shlex.split(str(command)), 
    stdout=subprocess.PIPE, stderr=subprocess.PIPE) 
# wait for a while 
sp.wait() 
out, err = sp.communicate() 

if sp.returncode == 0: 
    logger.info("aws return code: %s", sp.returncode) 
    logger.info("aws cli stdout `{}`".format(out)) 
    return 

# handle error

/tmp/tmp_dir是〜0.5Gb并包含约100个文件。上传过程需要约25分钟，这是非常缓慢的。

如果我直接运行AWS命令（不使用Python），它只需不到1分钟。

怎么了？任何帮助表示赞赏。

来源

2017-02-03 Andrii Skaliuk

我注意到有关wait()使用情况的文档中的警告（请参见下文）。然而，不要调试它，为什么不重写它来使用Python SDK而不是shell来支持aws cli？可能你会得到更好的性能和更干净的代码。

https://boto3.readthedocs.io/en/latest/guide/s3.html

警告此使用标准输出=管和/或标准错误= PIPE和子进程时就会死锁产生足够的输出到管道，使得它阻止等待OS管缓冲器接受更多数据。使用通信（）来避免这种情况。

https://docs.python.org/2/library/subprocess.html

EDIT3：

这里是我只是测试的解决方案，它运行而不阻塞。有一些便利的方法，它们使用wait（）或communicat（），它们更容易使用，比如check_output：

#!/usr/bin/env python 
import subprocess 
from subprocess import CalledProcessError 

command = ['aws','s3','sync','/tmp/test-sync','s3://bucket-name/test-sync'] 
try: 
    result = subprocess.check_output(command) 
    print(result) 
except CalledProcessError as err: 
    # handle error, check err.returncode which is nonzero. 
    pass

来源

2017-02-03 20:42:25

Python SDK现在不提供相同的功能。我正在使用'sync'。这可能会更好，但方式更费时。你能提供一个代码来避免管道阻塞的例子吗？谢谢。 –

嗯，是的，我明白你的意思是同步（递归拷贝dir）没有被执行。这里有一个我发现可能有用的要点：https://gist.github.com/SavvyGuard/6115006#file-botos3upload-py-L30 –

另外我编辑了我的答案，建议另外使用'subprocess'。 –

Python为S3上传生成了AWS CLI进程，并且变得非常慢

回答

相关问题