2017-10-05 234 views
0

我一直在努力在纱线集群模式下使用spark 2.0.0运行示例工作,工作存在exitCode:-1000而没有任何其他线索。相同的作业在本地模式下正常运行。Spark工作容器用退出代码退出:-1000

火花命令:

spark-submit \ 
--conf "spark.yarn.stagingDir=/xyz/warehouse/spark" \ 
--queue xyz \ 
--class com.xyz.TestJob \ 
--master yarn \ 
--deploy-mode cluster \ 
--conf "spark.local.dir=/xyz/warehouse/tmp" \ 
/xyzpath/java-test-1.0-SNAPSHOT.jar [email protected] 

TestJob类:

public class TestJob { 
    public static void main(String[] args) throws InterruptedException { 
     SparkConf conf = new SparkConf(); 
     JavaSparkContext jsc = new JavaSparkContext(conf); 
     System.out.println(
       "TOtal count:"+ 
         jsc.parallelize(Arrays.asList(new Integer[]{1,2,3,4})).count()); 
     jsc.stop(); 
    } 
} 

错误日志:

17/10/04 22:26:52 INFO Client: Application report for application_1506717704791_130756 (state: ACCEPTED) 
17/10/04 22:26:52 INFO Client: 
     client token: N/A 
     diagnostics: N/A 
     ApplicationMaster host: N/A 
     ApplicationMaster RPC port: -1 
     queue: root.xyz 
     start time: 1507181210893 
     final status: UNDEFINED 
     tracking URL: http://xyzserver:8088/proxy/application_1506717704791_130756/ 
     user: xyz 
17/10/04 22:26:53 INFO Client: Application report for application_1506717704791_130756 (state: ACCEPTED) 
17/10/04 22:26:54 INFO Client: Application report for application_1506717704791_130756 (state: ACCEPTED) 
17/10/04 22:26:55 INFO Client: Application report for application_1506717704791_130756 (state: ACCEPTED) 
17/10/04 22:26:56 INFO Client: Application report for application_1506717704791_130756 (state: FAILED) 
17/10/04 22:26:56 INFO Client: 
     client token: N/A 
     diagnostics: Application application_1506717704791_130756 failed 5 times due to AM Container for appattempt_1506717704791_130756_000005 exited with exitCode: -1000 
For more detailed output, check application tracking page:http://xyzserver:8088/cluster/app/application_1506717704791_130756Then, click on links to logs of each attempt. 
Diagnostics: Failing this attempt. Failing the application. 
     ApplicationMaster host: N/A 
     ApplicationMaster RPC port: -1 
     queue: root.xyz 
     start time: 1507181210893 
     final status: FAILED 
     tracking URL: http://xyzserver:8088/cluster/app/application_1506717704791_130756 
     user: xyz 
17/10/04 22:26:56 INFO Client: Deleted staging directory /xyz/spark/.sparkStaging/application_1506717704791_130756 
Exception in thread "main" org.apache.spark.SparkException: Application application_1506717704791_130756 finished with failed status 
     at org.apache.spark.deploy.yarn.Client.run(Client.scala:1167) 
     at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1213) 

当我浏览页面http://xyzserver:8088/cluster/app/application_1506717704791_130756它不存在。

无纱线应用程序日志中发现 -

$yarn logs -applicationId application_1506717704791_130756 
/apps/yarn/logs/xyz/logs/application_1506717704791_130756 does not have any log files. 

可能是什么这个错误的可能是根本原因,以及如何获得详细的错误日志?

+0

你从来没有申请开始运行。很可能是由于YARN上Spark的配置错误。你有没有经历过这个:https://spark.apache.org/docs/latest/running-on-yarn.html? – philantrovert

+0

问题出在一个配置参数上。当我删除它开始工作。顺便说一句,感谢您的评论。 –

回答

0

花了差不多一整天的时间后,我发现了根本原因。当我删除spark.yarn.stagingDir它开始工作,我仍然不知道为什么火花抱怨它 -

上一页星火呈交

spark-submit \ 
--conf "spark.yarn.stagingDir=/xyz/warehouse/spark" \ 
--queue xyz \ 
--class com.xyz.TestJob \ 
--master yarn \ 
--deploy-mode cluster \ 
--conf "spark.local.dir=/xyz/warehouse/tmp" \ 
/xyzpath/java-test-1.0-SNAPSHOT.jar [email protected] 

新建 -

spark-submit \ 
--queue xyz \ 
--class com.xyz.TestJob \ 
--master yarn \ 
--deploy-mode cluster \ 
--conf "spark.local.dir=/xyz/warehouse/tmp" \ 
/xyzpath/java-test-1.0-SNAPSHOT.jar [email protected]