2017-08-09 54 views
0

对于BigQuery批处理管道,模板只能执行一次,因为BigQuery作业ID是在模板创建时设置的。我正在使用Apache beam v2.0.0,并且无法多次执行模板。我们可以在头部使用光束进行这种限制吗?如果是的话,我想知道的第一件事是什么是梁?我的Apache Beam程序需要多次支持多次模板执行的确切更改?使用BigQuery作为接收器多次执行模板

Maven的相关性:

<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-jms</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-examples-java</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-examples-java8</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-common-fn-api</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-build-tools</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-core</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-extensions-google-cloud-platform-core</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-extensions-join-library</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-extensions-protobuf</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-extensions-sorter</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-amqp</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-jdbc</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-kafka</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-kinesis</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-mongodb</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-mqtt</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-solr</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-runners-core-construction-java</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-runners-core-java</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-runners-direct-java</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-runners-google-cloud-dataflow-java</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-common-runner-api</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 

回答

1

这是问题BEAM-2058。如果您使用Beam github repository的最新代码,它应该是固定的。除了构建Beam的新版本并更新您的pom.xml以使用它之外,您不需要执行任何操作。

或者,等待正在准备的Beam 2.1.0版本。

+0

感谢您的回复。我已经更新了我的pom.xml文件,它基于github存储库中存在的pom.xml文件,但仍面临同样的问题。在github存储库上,多个代码组件显示在不同的文件夹下。您能否让我知道最新代码的确切路径,以便我可以在DataFlow程序中使用它?另外,请让我知道我的DF程序中需要添加哪些片段(来自GitHub代码)来解决模板执行问题? 对不起,我是github的新手。你能帮我解决吗 –

+0

你不应该根据Github中的内容修改你的代码。相反,您需要克隆github存储库并使用Maven进行安装。这将使您的代码可以使用2.2.0-SNAPSHOT版本。然后,您只需更新您的pom.xml以引用您构建的Beam的2.2.0-SNAPSHOT版本,然后使用它。 –

+0

[Beam贡献指南](https://beam.apache.org/contribute/contribution-guide/#one-time-setup)还介绍了从Github代码库构建Beam所需的步骤,这可能有助于安装最新版本所需的步骤。完成之后,除了使用您构建的新版本之外,您不需要对项目进行任何更改。 –