2011-02-11 61 views
26

我有一个包含18个数据节点的Hadoop集群。 两小时前我重新启动了名称节点,名称节点仍处于安全模式。Hadoop安全模式恢复 - 花费太长时间!

我一直在寻找为什么这可能需要很长时间,我找不到一个好的答案。 这里发帖: Hadoop safemode recovery - taking lot of time 是相关的,但我不知道如果我想/需要进行更改此设置为这篇文章后,重新启动名称节点提到:

<property> 
<name>dfs.namenode.handler.count</name> 
<value>3</value> 
<final>true</final> 
</property> 

在任何情况下,本就是我在 'Hadoop的Hadoop的NameNode的-Hadoop的名称node.log' 已经越来越:

2011-02-11 01:39:55,226 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020, call delete(/tmp/hadoop-hadoop/mapred/system, true) from 10.1.206.27:54864: error: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hadoop-hadoop/mapred/system. Name node is in safe mode. 
The reported blocks 319128 needs additional 7183 blocks to reach the threshold 0.9990 of total blocks 326638. Safe mode will be turned off automatically. 
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hadoop-hadoop/mapred/system. Name node is in safe mode. 
The reported blocks 319128 needs additional 7183 blocks to reach the threshold 0.9990 of total blocks 326638. Safe mode will be turned off automatically. 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1711) 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1691) 
    at org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:565) 
    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:616) 
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:416) 
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960) 

任何建议表示赞赏。 谢谢!

+0

你的复制因素是什么? – 2011-02-11 08:58:29

+0

复制因子是3.它仍然处于安全模式! – 2011-02-11 11:58:56

回答

43

我曾经有过一次,其中一些块从未报告过。我不得不强制让namenode离开安全模式(hadoop dfsadmin -safemode leave),然后运行fsck删除丢失的文件。