2017-04-19 399 views
3

我试图解决我们的Hortonworks集群上新添加的datanode的问题。节点的YARN namenode管理器在启动后不久就会失败。以下错误消息日志返回:Hortonworks Nodemanager启动但失败:连接拒绝:8042

Connection failed to http://(ipaddress):8042/ws/v1/node/info (Traceback (most recent call last): 
    File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 166, in execute 
    connection_timeout=curl_connection_timeout, kinit_timer_ms = kinit_timer_ms) 
    File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/curl_krb_request.py", line 198, in curl_krb_request 
    _, curl_stdout, curl_stderr = get_user_call_output(curl_command, user=user, env=kerberos_env) 
    File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py", line 61, in get_user_call_output 
    raise ExecutionFailed(err_msg, code, files_output[0], files_output[1]) 
ExecutionFailed: Execution of 'curl --location-trusted -k --negotiate -u : -b /var/lib/ambari-agent/tmp/cookies/4268dd36-9f72-4be0-8d82-5f0a124a3a72 -c /var/lib/ambari-agent/tmp/cookies/4268dd36-9f72-4be0-8d82-5f0a124a3a72 http://gdcdrwhdb821.dir.ucb-group.com:8042/ws/v1/node/info --connect-timeout 5 --max-time 7 1>/tmp/tmp7pZrbM 2>/tmp/tmpgM4wdg' returned 7. % Total % Received % Xferd Average Speed Time Time  Time Current 
           Dload Upload Total Spent Left Speed 
    0  0 0  0 0  0  0  0 --:--:-- --:--:-- --:--:--  0curl: (7) Failed connect to (ipaddress):8042; Connection refused 
) 

这并没有真正告诉我为什么连接被拒绝,虽然,只是无论纱过程对应端口8042没有运行:

netstat -tulpn | grep 8042 

我一直在寻找另一个nodemanager日志,可能有更多的信息,但在/ var/log/hadoop-yarn或yarn.nodemanager.local-dirs/yarn.nodemanager.log-dirs中找不到任何有用的信息。有其他地方我可以找到纱线nodemanager错误日志?有谁知道这可能是什么原因造成的?

编辑:重新检查后,我发现这个有用位/var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-(ipaddress).log

2017-04-19 14:01:14,670 FATAL nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(549)) - Error starting NodeManager 
org.apache.hadoop.service.ServiceStateException: java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService 

回答

0

你有没有能够解决这个?

我今天遇到了类似的问题。

我在我的HDP集群中停止了YARN,并删除了/ var/log/hadoop-yarn/nodemanager/recovery-state目录并再次启动了YARN。

nodemanager正在运行而没有失败。