2016-08-17 229 views
5

我有1个主站和6个使用hadoop 2.6.0和spark 1.6.2的预建版本的集群。我正在运行hadoop MR和spark作业,而没有在所有节点上安装openjdk 7时出现任何问题。但是,当我在所有节点上将openjdk 7升级到openjdk 8时,会引发提交和spark-shell导致的错误。运行的纱线与火花不与Java一起工作8

16/08/17 14:06:22 ERROR client.TransportClient: Failed to send RPC 4688442384427245199 to /xxx.xxx.xxx.xx:42955: java.nio.channels.ClosedChannelExce  ption 
java.nio.channels.ClosedChannelException 
16/08/17 14:06:22 WARN netty.NettyRpcEndpointRef: Error sending message [message = RequestExecutors(0,0,Map())] in 1 attempts 
org.apache.spark.SparkException: Exception thrown in awaitResult 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) 
     at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
     at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) 
     at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) 
     at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) 
     at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) 
     at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply$m  cV$sp(YarnSchedulerBackend.scala:271) 
     at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply(Y  arnSchedulerBackend.scala:271) 
     at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply(Y  arnSchedulerBackend.scala:271) 
     at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) 
     at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
     at java.lang.Thread.run(Thread.java:745) 
Caused by: java.io.IOException: Failed to send RPC 4688442384427245199 to /xxx.xxx.xxx.xx:42955: java.nio.channels.ClosedChannelException 
     at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) 
     at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) 
     at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) 
     at io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845) 
     at io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873) 
     at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) 
     at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) 
     at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) 
     ... 1 more 
Caused by: java.nio.channels.ClosedChannelException 
16/08/17 14:06:22 ERROR spark.SparkContext: Error initializing SparkContext. 
java.lang.IllegalStateException: Spark context stopped while waiting for backend 
     at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:581) 
     at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:162) 
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:549) 
     at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) 
     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
     at py4j.Gateway.invoke(Gateway.java:236) 
     at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) 
     at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) 
     at py4j.GatewayConnection.run(GatewayConnection.java:211) 
     at java.lang.Thread.run(Thread.java:745) 
Traceback (most recent call last): 
    File "/home/hd_spark/spark2/python/pyspark/shell.py", line 49, in <module> 
    spark = SparkSession.builder.getOrCreate() 
    File "/home/hd_spark/spark2/python/pyspark/sql/session.py", line 169, in getOrCreate 
    sc = SparkContext.getOrCreate(sparkConf) 
    File "/home/hd_spark/spark2/python/pyspark/context.py", line 294, in getOrCreate 
    SparkContext(conf=conf or SparkConf()) 
    File "/home/hd_spark/spark2/python/pyspark/context.py", line 115, in __init__ 
    conf, jsc, profiler_cls) 
    File "/home/hd_spark/spark2/python/pyspark/context.py", line 168, in _do_init 
    self._jsc = jsc or self._initialize_context(self._conf._jconf) 
    File "/home/hd_spark/spark2/python/pyspark/context.py", line 233, in _initialize_context 
    return self._jvm.JavaSparkContext(jconf) 
    File "/home/hd_spark/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 1183, in __call__ 
    File "/home/hd_spark/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. 
: java.lang.IllegalStateException: Spark context stopped while waiting for backend 
     at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:581) 
     at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:162) 
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:549) 
     at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) 
     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
     at py4j.Gateway.invoke(Gateway.java:236) 
     at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) 
     at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) 
     at py4j.GatewayConnection.run(GatewayConnection.java:211) 
     at java.lang.Thread.run(Thread.java:745) 

我已出口JAVA_HOME上的.bashrc并且已经设置了使用

sudo update-alternatives --config java 
sudo update-alternatives --config javac 

这些命令的openjdk 8为默认的java。此外,我已经尝试与甲骨文Java 8和相同的错误出现。从属节点上的容器日志具有相同的错误,如下所示。

SLF4J: Class path contains multiple SLF4J bindings. 
SLF4J: Found binding in [jar:file:/tmp/hadoop-hd_spark/nm-local-dir/usercache/hd_spark/filecache/17/__spark_libs__8247267244939901627.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 
16/08/17 14:05:11 INFO executor.CoarseGrainedExecutorBackend: Started daemon with process name: [email protected] 
16/08/17 14:05:11 INFO util.SignalUtils: Registered signal handler for TERM 
16/08/17 14:05:11 INFO util.SignalUtils: Registered signal handler for HUP 
16/08/17 14:05:11 INFO util.SignalUtils: Registered signal handler for INT 
16/08/17 14:05:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
16/08/17 14:05:11 INFO spark.SecurityManager: Changing view acls to: hd_spark 
16/08/17 14:05:11 INFO spark.SecurityManager: Changing modify acls to: hd_spark 
16/08/17 14:05:11 INFO spark.SecurityManager: Changing view acls groups to: 
16/08/17 14:05:11 INFO spark.SecurityManager: Changing modify acls groups to: 
16/08/17 14:05:11 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hd_spark); groups with view permissions: Set(); users with modify permissions: Set(hd_spark); groups with modify permissions: Set() 
16/08/17 14:05:12 INFO client.TransportClientFactory: Successfully created connection to /xxx.xxx.xxx.xx:37417 after 78 ms (0 ms spent in bootstraps) 
16/08/17 14:05:12 INFO spark.SecurityManager: Changing view acls to: hd_spark 
16/08/17 14:05:12 INFO spark.SecurityManager: Changing modify acls to: hd_spark 
16/08/17 14:05:12 INFO spark.SecurityManager: Changing view acls groups to: 
16/08/17 14:05:12 INFO spark.SecurityManager: Changing modify acls groups to: 
16/08/17 14:05:12 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hd_spark); groups with view permissions: Set(); users with modify permissions: Set(hd_spark); groups with modify permissions: Set() 
16/08/17 14:05:12 INFO client.TransportClientFactory: Successfully created connection to /xxx.xxx.xxx.xx:37417 after 1 ms (0 ms spent in bootstraps) 
16/08/17 14:05:12 INFO storage.DiskBlockManager: Created local directory at /tmp/hadoop-hd_spark/nm-local-dir/usercache/hd_spark/appcache/application_1471352972661_0005/blockmgr-d9f23a56-1420-4cd4-abfd-ae9e128c688c 
16/08/17 14:05:12 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB 
16/08/17 14:05:12 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: spark://[email protected]:37417 
16/08/17 14:05:13 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM 
16/08/17 14:05:13 INFO storage.DiskBlockManager: Shutdown hook called 
16/08/17 14:05:13 INFO util.ShutdownHookManager: Shutdown hook called 

我试图与火花1.6.2预建版本,2.0火花预建版本,并且还通过建立它自己试着用火花2.0。

即使在升级到java 8之后,Hadoop作业仍能正常工作。当我切换回到java 7时,spark工作正常。

我的scala版本是2.11,OS是Ubuntu 14.04.4 LTS。

如果有人能给我一个解决这个问题的想法,这将是非常好的。

谢谢!

ps我在日志中将我的IP地址更改为xxx.xxx.xxx.xx。

+0

貌似工作人员的下列属性克服这种尝试连接到驱动程序,但失败:'16/08/17 14:05:12 INFO executor.CoarseGrainedExecutorBackend:连接到驱动程序:spark://[email protected]:37417 16/08/17 14:05:13 ERROR executor.CoarseGrainedExecutorBackend:RECEIVED SIGNAL TERM'。司机日志说什么? –

+0

我在哪里可以找到驱动程序日志?我在hadoop/logs/userlog目录中发现了工作节点日志,但在主节点中找不到与驱动程序相关的任何日志。在spark/logs目录中,只有历史服务器日志和主节点中的hadoop/logs/userlog为空。谢谢! – jmoa

+0

http://spark.apache.org/docs/latest/running-on-yarn.html –

回答

8

由于2016年9月12日,这是一个拦截器的问题:https://issues.apache.org/jira/browse/YARN-4714

您可以通过设置在纱线的site.xml

<property> 
    <name>yarn.nodemanager.pmem-check-enabled</name> 
    <value>false</value> 
</property> 

<property> 
    <name>yarn.nodemanager.vmem-check-enabled</name> 
    <value>false</value> 
</property> 
+0

感谢您的回复!我最近已经回到了Java 7,但我会尝试它并评论它是否可行。 – jmoa

+0

@jmoa好运吗? – simpleJack

+0

这对我来说非常合适。 –