2017-09-25 97 views
0

我正在4个节点(3个从机)上建立一个Hadoop集群,VPC内的所有独立EC2。大致步骤如下(但安装Hadoop的2.8.1代替):http://arturmkrtchyan.com/how-to-setup-multi-node-hadoop-2-yarn-clusterHDFS没有格式化,但没有错误

我格式化名称节点,这给了以下回应:

$ hdfs namenode -format 
17/09/26 07:05:34 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************ 
STARTUP_MSG: Starting NameNode 
STARTUP_MSG: user = hduser 
STARTUP_MSG: host = ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com/10.0.0.190 
STARTUP_MSG: args = [-format] 
STARTUP_MSG: version = 2.8.1 
STARTUP_MSG: classpath = /usr/... 

STARTUP_MSG: build = Unknown -r Unknown; compiled by 'hduser' on 2017-09-22T14:53Z 
STARTUP_MSG: java = 1.8.0_144 
************************************************************/ 
17/09/26 07:07:33 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT] 
17/09/26 07:07:33 INFO namenode.NameNode: createNameNode [-format] 
Formatting using clusterid: CID-15524170-7dfa-481b-add9-4c2542a55ca5 
17/09/26 07:07:33 INFO namenode.FSEditLog: Edit logging is async:false 
17/09/26 07:07:33 INFO namenode.FSNamesystem: KeyProvider: null 
17/09/26 07:07:33 INFO namenode.FSNamesystem: fsLock is fair: true 
17/09/26 07:07:33 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false 
17/09/26 07:07:33 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000 
17/09/26 07:07:33 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=false 
17/09/26 07:07:33 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000 
17/09/26 07:07:33 INFO blockmanagement.BlockManager: The block deletion will start around 2017 Sep 26 07:07:33 
17/09/26 07:07:33 INFO util.GSet: Computing capacity for map BlocksMap 
17/09/26 07:07:33 INFO util.GSet: VM type  = 64-bit 
17/09/26 07:07:33 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB 
17/09/26 07:07:33 INFO util.GSet: capacity  = 2^21 = 2097152 entries 
17/09/26 07:07:33 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false 
17/09/26 07:07:33 INFO blockmanagement.BlockManager: defaultReplication   = 3 
17/09/26 07:07:33 INFO blockmanagement.BlockManager: maxReplication    = 512 
17/09/26 07:07:33 INFO blockmanagement.BlockManager: minReplication    = 1 
17/09/26 07:07:33 INFO blockmanagement.BlockManager: maxReplicationStreams  = 2 
17/09/26 07:07:33 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000 
17/09/26 07:07:33 INFO blockmanagement.BlockManager: encryptDataTransfer  = false 
17/09/26 07:07:33 INFO blockmanagement.BlockManager: maxNumBlocksToLog   = 1000 
17/09/26 07:07:33 INFO namenode.FSNamesystem: fsOwner    = hduser (auth:SIMPLE) 
17/09/26 07:07:33 INFO namenode.FSNamesystem: supergroup   = supergroup 
17/09/26 07:07:33 INFO namenode.FSNamesystem: isPermissionEnabled = false 
17/09/26 07:07:33 INFO namenode.FSNamesystem: HA Enabled: false 
17/09/26 07:07:33 INFO namenode.FSNamesystem: Append Enabled: true 
17/09/26 07:07:34 INFO util.GSet: Computing capacity for map INodeMap 
17/09/26 07:07:34 INFO util.GSet: VM type  = 64-bit 
17/09/26 07:07:34 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB 
17/09/26 07:07:34 INFO util.GSet: capacity  = 2^20 = 1048576 entries 
17/09/26 07:07:34 INFO namenode.FSDirectory: ACLs enabled? false 
17/09/26 07:07:34 INFO namenode.FSDirectory: XAttrs enabled? true 
17/09/26 07:07:34 INFO namenode.NameNode: Caching file names occurring more than 10 times 
17/09/26 07:07:34 INFO util.GSet: Computing capacity for map cachedBlocks 
17/09/26 07:07:34 INFO util.GSet: VM type  = 64-bit 
17/09/26 07:07:34 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB 
17/09/26 07:07:34 INFO util.GSet: capacity  = 2^18 = 262144 entries 
17/09/26 07:07:34 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033 
17/09/26 07:07:34 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0 
17/09/26 07:07:34 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension  = 30000 
17/09/26 07:07:34 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10 
17/09/26 07:07:34 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10 
17/09/26 07:07:34 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25 
17/09/26 07:07:34 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 
17/09/26 07:07:34 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis 
17/09/26 07:07:34 INFO util.GSet: Computing capacity for map NameNodeRetryCache 
17/09/26 07:07:34 INFO util.GSet: VM type  = 64-bit 
17/09/26 07:07:34 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB 
17/09/26 07:07:34 INFO util.GSet: capacity  = 2^15 = 32768 entries 
Re-format filesystem in Storage Directory /usr/local/hadoop/data/namenode ? (Y or N) 
$ Y 
17/09/26 07:09:21 INFO namenode.FSImage: Allocated new BlockPoolId: BP-793961451-10.0.0.190-1506409761821 
17/09/26 07:09:21 INFO common.Storage: Storage directory /usr/local/hadoop/data/namenode has been successfully formatted. 
17/09/26 07:09:21 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/data/namenode/current/fsimage.ckpt_0000000000000000000 using no compression 
17/09/26 07:09:21 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/data/namenode/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds. 
17/09/26 07:09:21 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 
17/09/26 07:09:21 INFO util.ExitUtil: Exiting with status 0 
17/09/26 07:09:21 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************ 
SHUTDOWN_MSG: Shutting down NameNode at ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com/10.0.0.190 
************************************************************/ 

当我启动DFS和纱线它似乎正确启动:

$ start-dfs.sh 
Starting namenodes on [ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com] 
ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com: starting namenode, logging to ... 
10.0.0.185: starting datanode, logging to ... 
10.0.0.244: starting datanode, logging to ... 
10.0.0.83: starting datanode, logging to ... 
Starting secondary namenodes [ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com] 
ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com: starting secondarynamenode, logging to ... 


$ start-yarn.sh 
starting yarn daemons 
starting resourcemanager, logging to ... 
10.0.0.185: starting nodemanager, logging to ... 
10.0.0.83: starting nodemanager, logging to ... 
10.0.0.244: starting nodemanager, logging to ... 

$ jps 
14326 NameNode 
14998 Jps 
14552 SecondaryNameNode 
14729 ResourceManager 

而且对其他节点是这样的:

15880 Jps 
15563 DataNode 
15693 NodeManager 

但是,当我尝试将数据写入HDFS时,它告诉我没有任何节点实际可用。这似乎是一个非常普遍的错误,我无法找到问题所在。

$ hdfs dfs -put pg1661.txt /samples/input 
WARN hdfs.DataStreamer: DataStreamer Exception 
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /samples/input/pg1661.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation. 

然后,当我检查状态,它似乎并没有正常工作:

$ hdfs dfsadmin -report 
Configured Capacity: 0 (0 B) 
Present Capacity: 0 (0 B) 
DFS Remaining: 0 (0 B) 
DFS Used: 0 (0 B) 
DFS Used%: NaN% 
Under replicated blocks: 0 
Blocks with corrupt replicas: 0 
Missing blocks: 0 
Missing blocks (with replication factor 1): 0 
Pending deletion blocks: 0 

我检查日志文件,而且它们并不表示任何(致命)的错误,除了当试图上传文件。

鉴于上述情况在启动时不会产生任何错误,并且错误消息本身非常普遍,我发现很难找到错误。

回答

0

从您的“hdfs dfsadmin -report”命令输出中显示容量为0.它看起来像您可能忘记格式化namenode。在启动HDFS之前,您需要在命令之下运行。

hdfs namenode -format 

在此之后“HDFS dfsadmin -report”输出应该类似于下,

Configured Capacity: 32195477504 (29.98 GB) 
Present Capacity: 29190479872 (27.19 GB) 
DFS Remaining: 29190471680 (27.19 GB) 
DFS Used: 8192 (8 KB) 
DFS Used%: 0.00% 
Under replicated blocks: 0 
Blocks with corrupt replicas: 0 
Missing blocks: 0 
Missing blocks (with replication factor 1): 0 
Pending deletion blocks: 0 

我有以下链接单个节点设置视频教程。希望它能帮助你。这是Hadoop的版本2.8.1,

http://hadooptutorials.info/2017/09/14/hadoop-installation-on-signle-node-cluster/

+0

THX您的回复。我确实运行了这个命令。响应以'SHUTDOWN_MSG:关闭NameNode在ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com//10.0.0.190'结束。这是否表明format命令失败?它不会给出任何错误消息,除非告诉我它关闭了。我会更新这个问题。 – Dendrobates

+0

我包含了我(尝试)格式化名称节点时得到的响应。 – Dendrobates

+0

我认为格式化namenode时,关闭消息是正常的。我想可能是namenode无法SSH进入数据节点。你有没有将数据节点定义为单独的服务器或同一台服务器?也许你可以先尝试单节点设置,即同一台服务器上的namenode和数据节点。一旦工作,尝试添加其他数据节点。它将隔离一些问题。你也可以与core-site.xml,hdfs-site.xml一起共享你的主人和奴隶文件吗? –

相关问题