2017-04-17 104 views
0

我正在集群中运行一个Spark任务,作业运行几分钟,然后失败地说容器异常。我尝试增加执行程序和驱动程序内存,但没有用。我一次又一次地得到同样的例外。任何人都可以帮忙。关闭SparkContext java.lang.NumberFormatException

ERROR scheduler.DAGSchedulerEventProcessLoop: DAGSchedulerEventProcessLoop failed; shutting down SparkContext java.lang.NumberFormatException: For input string: "spark.locality.wait"

17/04/17 15:07:56 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 
17/04/17 15:07:57 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1492433648235_0024_000001 
17/04/17 15:07:57 INFO spark.SecurityManager: Changing view acls to: xwcedt,ubiadmin 
17/04/17 15:07:57 INFO spark.SecurityManager: Changing modify acls to: xwcedt,ubiadmin 
17/04/17 15:07:57 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xwcedt, ubiadmin); users with modify permissions: Set(xwcedt, ubiadmin) 
17/04/17 15:07:57 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread 
17/04/17 15:07:57 INFO yarn.ApplicationMaster: Waiting for spark context initialization 
17/04/17 15:07:57 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 
17/04/17 15:07:57 INFO spark.SparkContext: Running Spark version 1.3.0 
17/04/17 15:07:57 INFO spark.SparkContext: Spark configuration: 
spark.akka.failure-detector.threshold=300.0 
spark.akka.frameSize=10 
spark.akka.heartbeat.interval=1000 
spark.akka.heartbeat.pauses=600 
spark.akka.threads=4 
spark.akka.timeout=100 
spark.app.name=LoadIngestFeedback 
spark.broadcast.blockSize=4096 
spark.broadcast.compress=true 
spark.broadcast.factory=org.apache.spark.broadcast.TorrentBroadcastFactory 
spark.closure.serializer=org.apache.spark.serializer.JavaSerializer 
spark.cores.max=1 
spark.default.parallelism=1 
spark.driver.extraClassPath=guava11-18overrides-0.0.1.jar 
spark.eventLog.dir=hdfs:///tmp/logs/spark/logs 
spark.eventLog.enabled=true 
spark.executor.extraClassPath=guava11-18overrides-0.0.1.jar 
spark.executor.heartbeatInterval=10000 
spark.executor.instances=2 
spark.executor.logs.rolling.maxRetainedFiles=5 
spark.executor.logs.rolling.time.interval=daily 
spark.executor.memory=2g 
spark.executor.userClassPathFirst=true 
spark.files.fetchTimeout=false 
spark.files.overwrite=false 
spark.hadoop.validateOutputSpecs=true 
spark.history.fs.logDirectory=hdfs:///tmp/logs/hadoop/logs 
spark.io.compression.codec=org.apache.spark.io.LZ4CompressionCodec 
spark.io.compression.lz4.block.size=32768 
spark.io.compression.snappy.block.size=32768 
spark.kryo.referenceTracking=true 
spark.kryo.registrationRequired=false 
spark.kryoserializer.buffer.max.mb=64 
spark.kryoserializer.buffer.mb=0.064 
spark.localExecution.enabled=false 
spark.locality.wait=3000 
spark.locality.wait.node=spark.locality.wait 
spark.locality.wait.process=spark.locality.wait 
spark.locality.wait.rack=spark.locality.wait 
spark.logConf=true 
spark.master=yarn-cluster 
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS=ffhddb10qxdu.qa.oclc.org 
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES=http://ffhddb10qxdu.qa.oclc.org:8001/proxy/application_1492433648235_0024 
spark.port.maxRetries=16 
spark.rdd.compress=false 
spark.reducer.maxMbInFlight=48 
spark.scheduler.maxRegisteredResourcesWaitingTime=30000 
spark.scheduler.minRegisteredResourcesRatio=0 
spark.scheduler.mode=FIFO 
spark.scheduler.revive.interval=1000 
spark.serializer.objectStreamReset=100 
spark.shuffle.compress=true 
spark.shuffle.consolidateFiles=true 
spark.shuffle.file.buffer.kb=32 
spark.shuffle.manager=HASH 
spark.shuffle.memoryFraction=0.2 
spark.shuffle.sort.bypassMergeThreshold=200 
spark.shuffle.spill=true 
spark.shuffle.spill.compress=true 
spark.speculation=false 
spark.speculation.interval=100 
spark.speculation.multiplier=1.5 
spark.speculation.quantile=0.75 
spark.storage.memoryFraction=0.6 
spark.storage.memoryMapThreshold=8192 
spark.storage.unrollFraction=0.2 
spark.streaming.blockInterval=200 
spark.streaming.unpersist=true 
spark.task.cpus=1 
spark.task.maxFailures=4 
spark.ui.filters=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 
spark.ui.port=0 
spark.yarn.app.container.log.dir=/prod/higgins/2015-10-07_1/yarn_userlogs/application_1492433648235_0024/container_1492433648235_0024_01_000001 
spark.yarn.app.id=application_1492433648235_0024 
spark.yarn.historyServer.address=ffhddb02qxdu.qa.oclc.org:8070 
spark.yarn.secondary.jars=commons-charconverters-1.1.jar,commons-charset-1.0.3.jar,commons-csv-1.4.jar,elasticsearch-2.2.0.jar,groovy-all-1.8.6.jar,guava11-18overrides-0.0.1.jar,hppc-0.7.1.jar,ingest-batchload-schema-1.0.39.jar,ingest-message-1.0.20.jar,jaxb2-basics-runtime-0.9.4.jar,joda-time-2.9.4.jar,json-simple-1.1.jar,jsr166e-1.1.0.jar,lucene-core-5.4.1.jar,marc4j-2.17.jar,normalizer-2.6.jar,t-digest-3.0.jar 

17/04/17 15:07:59 INFO spark.SparkContext: Created broadcast 0 from textFile at FeedbackProcessor.java:105 
17/04/17 15:07:59 INFO storage.MemoryStore: ensureFreeSpace(283817) called with curMem=306693, maxMem=1030823608 
17/04/17 15:07:59 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 277.2 KB, free 982.5 MB) 
17/04/17 15:07:59 INFO storage.MemoryStore: ensureFreeSpace(22924) called with curMem=590510, maxMem=1030823608 
17/04/17 15:07:59 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 22.4 KB, free 982.5 MB) 
17/04/17 15:07:59 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on ffhddb10qxdu.qa.oclc.org:48927 (size: 22.4 KB, free: 983.0 MB) 
17/04/17 15:07:59 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0 
17/04/17 15:07:59 INFO spark.SparkContext: Created broadcast 1 from textFile at FeedbackProcessor.java:110 
17/04/17 15:07:59 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 
17/04/17 15:07:59 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 2182b3bcef598d4fa76d3966fca47e80ed7bceb7] 
17/04/17 15:07:59 INFO mapred.FileInputFormat: Total input paths to process : 2 
17/04/17 15:07:59 INFO mapred.FileInputFormat: Total input paths to process : 2 
17/04/17 15:07:59 INFO spark.SparkContext: Starting job: saveAsNewAPIHadoopDataset at FeedbackProcessor.java:235 
17/04/17 15:07:59 INFO scheduler.DAGScheduler: Registering RDD 5 (mapToPair at FeedbackProcessor.java:163) 
17/04/17 15:07:59 INFO scheduler.DAGScheduler: Registering RDD 2 (mapToPair at FeedbackProcessor.java:139) 
17/04/17 15:07:59 INFO scheduler.DAGScheduler: Got job 0 (saveAsNewAPIHadoopDataset at FeedbackProcessor.java:235) with 1 output partitions (allowLocal=false) 
17/04/17 15:07:59 INFO scheduler.DAGScheduler: Final stage: Stage 2(saveAsNewAPIHadoopDataset at FeedbackProcessor.java:235) 
17/04/17 15:07:59 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 0, Stage 1) 
17/04/17 15:07:59 INFO scheduler.DAGScheduler: Missing parents: List(Stage 0, Stage 1) 
17/04/17 15:07:59 INFO scheduler.DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[5] at mapToPair at FeedbackProcessor.java:163), which has no missing parents 
17/04/17 15:07:59 INFO storage.MemoryStore: ensureFreeSpace(3440) called with curMem=613434, maxMem=1030823608 
17/04/17 15:07:59 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.4 KB, free 982.5 MB) 
17/04/17 15:07:59 INFO storage.MemoryStore: ensureFreeSpace(2193) called with curMem=616874, maxMem=1030823608 
17/04/17 15:07:59 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.1 KB, free 982.5 MB) 
17/04/17 15:07:59 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on ffhddb10qxdu.qa.oclc.org:48927 (size: 2.1 KB, free: 983.0 MB) 
17/04/17 15:07:59 INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0 
17/04/17 15:07:59 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:839 
17/04/17 15:07:59 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (MapPartitionsRDD[5] at mapToPair at FeedbackProcessor.java:163) 
17/04/17 15:07:59 INFO cluster.YarnClusterScheduler: Adding task set 0.0 with 2 tasks 
17/04/17 15:07:59 ERROR scheduler.DAGSchedulerEventProcessLoop: DAGSchedulerEventProcessLoop failed; shutting down SparkContext 
java.lang.NumberFormatException: For input string: "spark.locality.wait" 
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
    at java.lang.Long.parseLong(Long.java:441) 
    at java.lang.Long.parseLong(Long.java:483) 
    at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:230) 
    at scala.collection.immutable.StringOps.toLong(StringOps.scala:31) 
    at org.apache.spark.scheduler.TaskSetManager.org$apache$spark$scheduler$TaskSetManager$$getLocalityWait(TaskSetManager.scala:853) 
    at org.apache.spark.scheduler.TaskSetManager.computeValidLocalityLevels(TaskSetManager.scala:872) 
    at org.apache.spark.scheduler.TaskSetManager.<init>(TaskSetManager.scala:162) 
    at org.apache.spark.scheduler.TaskSchedulerImpl.createTaskSetManager(TaskSchedulerImpl.scala:187) 
    at org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:161) 
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:872) 
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:781) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:780) 
    at scala.collection.immutable.List.foreach(List.scala:318) 
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:780) 
    at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) 
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) 
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) 
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 
17/04/17 15:07:59 INFO cluster.YarnClusterScheduler: Cancelling stage 0 
17/04/17 15:07:59 INFO scheduler.DAGScheduler: Job 0 failed: saveAsNewAPIHadoopDataset at FeedbackProcessor.java:235, took 0.075610 s 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 
17/04/17 15:07:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} 
17/04/17 15:08:04 INFO yarn.YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them. 
17/04/17 15:08:04 INFO yarn.ExecutorRunnable: Starting Executor Container 
17/04/17 15:08:04 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 500 
17/04/17 15:08:04 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 500 
17/04/17 15:08:04 INFO yarn.ExecutorRunnable: Setting up ContainerLaunchContext 
17/04/17 15:08:04 INFO yarn.ExecutorRunnable: Setting up ContainerLaunchContext 
17/04/17 15:08:04 INFO yarn.ExecutorRunnable: Preparing Local resources 
17/04/17 15:08:04 INFO yarn.ExecutorRunnable: Preparing Local resources 
17/04/17 15:08:08 ERROR cluster.YarnClusterScheduler: Lost executor 1 on ffhddb10qxdu.qa.oclc.org: remote Akka client disassociated 
17/04/17 15:08:09 INFO yarn.YarnAllocator: Completed container container_1492433648235_0024_01_000002 (state: COMPLETE, exit status: 1) 
17/04/17 15:08:09 INFO yarn.YarnAllocator: Container marked as failed: container_1492433648235_0024_01_000002. Exit status: 1. Diagnostics: Exception from container-launch. 
Container id: container_1492433648235_0024_01_000002 
Exit code: 1 
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
    at org.apache.hadoop.util.Shell.run(Shell.java:455) 
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) 
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) 
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
    at java.lang.Thread.run(Thread.java:745) 


Container exited with a non-zero exit code 1 

17/04/17 15:08:09 INFO yarn.YarnAllocator: Completed container container_1492433648235_0024_01_000003 (state: COMPLETE, exit status: 1) 
17/04/17 15:08:09 INFO yarn.YarnAllocator: Container marked as failed: container_1492433648235_0024_01_000003. Exit status: 1. Diagnostics: Exception from container-launch. 
Container id: container_1492433648235_0024_01_000003 
Exit code: 1 
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
    at org.apache.hadoop.util.Shell.run(Shell.java:455) 
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) 
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) 
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
    at java.lang.Thread.run(Thread.java:745) 


Container exited with a non-zero exit code 1 

17/04/17 15:08:14 INFO yarn.YarnAllocator: Will request 2 executor containers, each with 1 cores and 2432 MB memory including 384 MB overhead 
17/04/17 15:08:14 INFO yarn.YarnAllocator: Container request (host: Any, capability: <memory:2432, vCores:1>) 
17/04/17 15:08:14 INFO yarn.YarnAllocator: Container request (host: Any, capability: <memory:2432, vCores:1>) 




<property> 
<name>yarn.scheduler.maximum-allocation-mb</name> 
<value>5120</value> 
<source>yarn-site.xml</source> 
</property> 


<property> 
<name>yarn.scheduler.minimum-allocation-mb</name> 
<value>1024</value> 
<source>yarn-site.xml</source> 
</property> 

回答

2

我无法验证这一点,但这个问题似乎你的这部分配置:

spark.locality.wait=3000 
spark.locality.wait.node=spark.locality.wait 
spark.locality.wait.process=spark.locality.wait 
spark.locality.wait.rack=spark.locality.wait 

属性文件不是代码 - 你不能用一个属性的名称spark.locality.wait)作为另一个属性的(例如spark.locality.wait.node),并期望使用第一个属性的值代替。

你可以通过删除这里粘贴的最后三行来解决这个问题 - 正如documentation所述,这三个属性默认会得到spark.locality.wait的值,所以如果你只是从配置中忽略它们,你应该得到想要的结果。

+0

谢谢@Zohar为我工作:) – Neethu

相关问题