2014-01-07 45 views
4

我试图在Hadoop 2.2.0群集上运行wordcount示例。由于此例外,许多地图都失败:由于ConnectException,Hadoop映射失败

2014-01-07 05:07:12,544 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From slave2-machine/127.0.1.1 to slave2-machine:49222 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) 
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526) 
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) 
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) 
    at org.apache.hadoop.ipc.Client.call(Client.java:1351) 
    at org.apache.hadoop.ipc.Client.call(Client.java:1300) 
    at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231) 
    at com.sun.proxy.$Proxy6.getTask(Unknown Source) 
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:133) 
Caused by: java.net.ConnectException: Connection refused 
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) 
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708) 
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) 
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) 
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) 
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547) 
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642) 
    at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314) 
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399) 
    at org.apache.hadoop.ipc.Client.call(Client.java:1318) 
    ... 4 more 

每次运行作业时,有问题的端口都会更改,但地图任务仍会失败。我不知道哪个流程应该听该端口。我也试着在运行期间跟踪输出netstat -ntlp,并且没有进程从未听过端口。

更新:对于主节点内容/etc/hosts是这样的:

127.0.0.1 localhost 
127.0.1.1 master-machine 

# The following lines are desirable for IPv6 capable hosts 
::1  ip6-localhost ip6-loopback 
fe00::0 ip6-localnet 
ff00::0 ip6-mcastprefix 
ff02::1 ip6-allnodes 
ff02::2 ip6-allrouters 
192.168.1.101 slave1 slave1-machine 
192.168.1.102 slave2 slave2-machine 
192.168.1.1 master 

和SLAVE1是:

127.0.0.1 localhost 
127.0.1.1 slave1-machine 

# The following lines are desirable for IPv6 capable hosts 
::1  ip6-localhost ip6-loopback 
fe00::0 ip6-localnet 
ff00::0 ip6-mcastprefix 
ff02::1 ip6-allnodes 
ff02::2 ip6-allrouters 
192.168.1.1 master 
192.168.1.101 slave1 
192.168.1.102 slave2 slave2-machine 

为SLAVE2这就像有轻微的变化SLAVE1我想你可以猜到。最后,在主的yarn/hadoop/etc/hadoop/slaves内容是:

slave1 
slave2 

回答

7

1.检查的Hadoop节点是否可以ssh对方与否。 2.检查所有配置文件中的hadoop守护进程的地址和端口是否相像。 3.检查所有节点的/ etc/hosts。 这是一个有用的链接,用于检查您是否正确启动了群集: cluster setup

我明白了!你的/ etc/hosts不正确。你应该删除127.0.1.1行。我的意思是他们应该是这样的:

127.0.0.1  localhost 
192.168.1.101 master 
192.168.1.103 slave1 
192.168.1.104 slave2 
192.168.1.105 slave3 

并复制粘贴像这样的所有奴隶。另外,奴隶也应该能够互相打招呼。

+0

1-主节点可以ssh每个从机。你的意思是奴隶必须能够相互ssh?! 2-是的,它们是相同的。我使用'/ etc/hosts'内容更新了这个问题。 – Mehraban

+0

我更新了答案。告诉我是否还有问题 – masoumeh

+0

嗯,我也发现它[这里](http://mail-archives.apache.org/mod_mbox/hadoop-user/201208.mbox/%3CCAKEkMX-ca44-FnHaXip=DBbYHqgg9JW+GPc38Cq4Gvt8keigYQ @ mail.gmail.com%3E),但是我做了另一个问题。这与机器的名称有关。 'master'应该是机器名称。 – Mehraban