2016-08-15 29 views
0

我正在寻找已运行的AWS EMR的良好BOTO3示例,并且希望将猪步骤注入该EMR。以前,我用的boto2.42版本:寻找一个boto3 python示例注入一个aws猪步到一个已经运行的emr中?

from boto.emr.connection import EmrConnection 
from boto.emr.step import InstallPigStep, PigStep 

# AWS_ACCESS_KEY = '' # REQUIRED 
# AWS_SECRET_KEY = '' # REQUIRED 
# conn = EmrConnection(AWS_ACCESS_KEY, AWS_SECRET_KEY) 

# loop next element on bucket_compare list 

pig_file = 's3://elasticmapreduce/samples/pig-apache/do-reports2.pig' 
INPUT = 's3://elasticmapreduce/samples/pig-apache/input/access_log_1' 
OUTPUT = '' # REQUIRED, S3 bucket for job output 

pig_args = ['-p', 'INPUT=%s' % INPUT, 
      '-p', 'OUTPUT=%s' % OUTPUT] 
pig_step = PigStep('Process Reports', pig_file, pig_args=pig_args) 
steps = [InstallPigStep(), pig_step] 

conn.run_jobflow(name='prs-dev-test', steps=steps, 
      hadoop_version='2.7.2-amzn-2', ami_version='latest', 
      num_instances=2, keep_alive=False) 

现在主要的问题是,BOTO3不使用:从boto.emr.connection进口EmrConnection,也没有从boto.emr.step进口InstallPigStep,PigStep,我无法找到一个相当于一组模块?

回答

0

经过一番检查,我发现了一种非常简单的方法,使用awscli和subprocess模块​​从Python内部注入Pig Script命令。一个可以导入awscli &子,然后封装并注入所需的PIG步骤已经运行的EMR有:

import awscli 
import subprocess 


cmd='aws emr add-steps --cluster-id j-GU07FE0VTHNG --steps Type=PIG,Name="AggPigProgram",ActionOnFailure=CONTINUE,Args=[-f,s3://dev-end2end-test/pig_scripts/AggRuleBag.pig,-p,INPUT=s3://dev-end2end-test/input_location,-p,OUTPUT=s3://end2end-test/output_location]' 

push=subprocess.Popen(cmd, shell=True, stdout = subprocess.PIPE) 
print(push.returncode) 

当然,你必须使用类似找到你JobFlowID:

aws emr list-clusters --active 

使用上面相同的子进程和push命令。当然,您可以将监控添加到您的心中,而不仅仅是一个打印声明。

+0

不能使用boto3来完成吗? – hitrix

相关问题