2017-08-09 127 views
0

我使用的是Apache Beam 2.0.0和相同版本的FlinkRunner(scala 2.10)。我正在使用FlinkRunner依赖关系测试进程内Flink主控(默认配置),显然在运行时引入Flink 1.2.1(查看MVN依赖关系树)。Apache Beam/Flink ExceptionInChainedStubException

当出现“用户异常”时,找出实际出错的最佳方式是什么?这不是一个关于我这次做错了的问题;而是如何说明 - 一般来说 - 如何从Beam或Flink中获取更多信息。这里是堆栈跟踪:

Exception in thread "main" java.lang.RuntimeException: Pipeline execution failed 
at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:122) 
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295) 
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281) 
at com.mapfit.flow.data.environment.MFEnvironment.run(MFEnvironment.java:70) 
at com.mapfit.flow.main.Scratch.main(Scratch.java:35) 
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed. 
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:910) 
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:853) 
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:853) 
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) 
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) 
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) 
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) 
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 
Caused by: org.apache.beam.sdk.util.UserCodeException: org.apache.flink.runtime.operators.chaining.ExceptionInChainedStubException 
at org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:36) 
at org.apache.beam.sdk.transforms.MapElements$1$auxiliary$PCieS8xh.invokeProcessElement(Unknown Source) 
at org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:197) 
at org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:158) 
at org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate.processElement(DoFnRunnerWithMetricsUpdate.java:65) 
at org.apache.beam.runners.flink.translation.functions.FlinkDoFnFunction.mapPartition(FlinkDoFnFunction.java:118) 
at org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103) 
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:490) 
at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:355) 
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:665) 
at java.lang.Thread.run(Thread.java:745) 
Caused by: org.apache.flink.runtime.operators.chaining.ExceptionInChainedStubException 
at org.apache.flink.runtime.operators.chaining.ChainedFlatMapDriver.collect(ChainedFlatMapDriver.java:82) 
at org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect(CountingCollector.java:35) 
at org.apache.beam.runners.flink.translation.functions.FlinkDoFnFunction$MultiDoFnOutputManager.output(FlinkDoFnFunction.java:165) 
at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnContext.outputWindowedValue(SimpleDoFnRunner.java:355) 
at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:629) 
at org.apache.beam.sdk.transforms.MapElements$1.processElement(MapElements.java:122) 

通知完全缺乏相关的代码什么我写的(不是我的电话对方pipeline.run())。

我下载了每个链接的jar的源代码,然后我进入了第82行抛出异常的ChainedFlatMapDriver,最终看到由Java对象序列化中的调用产生的EOFException(我的值使用默认编码器)。我以为自己陷入了一些问题,但看起来EOFException的原因在于第79行的SimpleCollectingOutputView,这引发了很多,并且经常被吞并,因为Flink似乎是例行执行的东西。

任何知道如何让Flink公开故障信息的人的指点?

调试后,发现了更多的信息:

Just found more info after walking through more Flink code in the debugger: java.lang.InterruptedException 
at java.lang.Object.wait(Native Method) 
at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBuffer(LocalBufferPool.java:168) 
at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:138) 
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.sendToTarget(RecordWriter.java:131) 
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:88) 
at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65) 
at org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect(CountingCollector.java:35) 
at org.apache.beam.runners.flink.translation.functions.FlinkMultiOutputPruningFunction.flatMap(FlinkMultiOutputPruningFunction.java:46) 
at org.apache.beam.runners.flink.translation.functions.FlinkMultiOutputPruningFunction.flatMap(FlinkMultiOutputPruningFunction.java:30) 
at org.apache.flink.runtime.operators.chaining.ChainedFlatMapDriver.collect(ChainedFlatMapDriver.java:80) 
at org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect(CountingCollector.java:35) 
at org.apache.beam.runners.flink.translation.functions.FlinkDoFnFunction$MultiDoFnOutputManager.output(FlinkDoFnFunction.java:165) 
at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnContext.outputWindowedValue(SimpleDoFnRunner.java:355) 
at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:629) 
at org.apache.beam.sdk.transforms.MapElements$1.processElement(MapElements.java:122) 
at org.apache.beam.sdk.transforms.MapElements$1$auxiliary$vuuNRtio.invokeProcessElement(Unknown Source) 
at org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:197) 
at org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:158) 
at org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate.processElement(DoFnRunnerWithMetricsUpdate.java:65) 
at org.apache.beam.runners.flink.translation.functions.FlinkDoFnFunction.mapPartition(FlinkDoFnFunction.java:118) 
at org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103) 
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:490) 
at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:355) 
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:665) 
at java.lang.Thread.run(Thread.java:745) 

回答

1

在这两个环节上看看: EOFException related to memory segments during run of Beam pipeline on Flink

https://issues.apache.org/jira/browse/BEAM-2831

我以前看到类似的例外,而在纱线上flinkrunner运行横梁。该问题页面中的建议编码人员提供了帮助。

除此之外,我建议广泛使用记录器,直到您的管道平稳运行。在纱线日志中可以使用纱线日志命令检索。不知道你的情况(过程中的Flink主),但你应该能够写一些我假设的日志...

+0

我会将此标记为公认的答案,因为我打开此案(前半个月你提到的链接存在),像往常一样没有人能帮助;然后以与链接中提到的方式基本相同的方式解决问题,然后忽略此票证:) – mephicide