是的,这是可能的。
- 启动您的EMR集群
- 开始TaskRunner与选项
--workerGroup=name-of-the-worker-group
- 在您的管道的活动不指定
runsOn
参数,通过你的工人组,而不是主实例。
下面是使用CloudFormation定义与这样的参数活动的一个例子:
...
{
"Id": "S3ToRedshiftCopyActivity",
"Name": "S3ToRedshiftCopyActivity",
"Fields": [
{
"Key": "type",
"StringValue": "RedshiftCopyActivity"
},
{
"Key": "workerGroup",
"StringValue": "name-of-the-worker-group"
},
{
"Key": "insertMode",
"StringValue": "#{myInsertMode}"
},
{
"Key": "commandOptions",
"StringValue": "FORMAT CSV"
},
{
"Key": "dependsOn",
"RefValue": "RedshiftTableCreateActivity"
},
{
"Key": "input",
"RefValue": "S3StagingDataNode"
},
{
"Key": "output",
"RefValue": "DestRedshiftTable"
}
]
}
...
你可以找到详细的文档如何做到这一点的位置: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-how-task-runner-user-managed.html