我正在4个节点(3个从机)上建立一个Hadoop集群,VPC内的所有独立EC2。大致步骤如下(但安装Hadoop的2.8.1代替):http://arturmkrtchyan.com/how-to-setup-multi-node-hadoop-2-yarn-clusterHDFS没有格式化,但没有错误
我格式化名称节点,这给了以下回应:
$ hdfs namenode -format
17/09/26 07:05:34 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: user = hduser
STARTUP_MSG: host = ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com/10.0.0.190
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.8.1
STARTUP_MSG: classpath = /usr/...
STARTUP_MSG: build = Unknown -r Unknown; compiled by 'hduser' on 2017-09-22T14:53Z
STARTUP_MSG: java = 1.8.0_144
************************************************************/
17/09/26 07:07:33 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
17/09/26 07:07:33 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-15524170-7dfa-481b-add9-4c2542a55ca5
17/09/26 07:07:33 INFO namenode.FSEditLog: Edit logging is async:false
17/09/26 07:07:33 INFO namenode.FSNamesystem: KeyProvider: null
17/09/26 07:07:33 INFO namenode.FSNamesystem: fsLock is fair: true
17/09/26 07:07:33 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
17/09/26 07:07:33 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
17/09/26 07:07:33 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=false
17/09/26 07:07:33 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
17/09/26 07:07:33 INFO blockmanagement.BlockManager: The block deletion will start around 2017 Sep 26 07:07:33
17/09/26 07:07:33 INFO util.GSet: Computing capacity for map BlocksMap
17/09/26 07:07:33 INFO util.GSet: VM type = 64-bit
17/09/26 07:07:33 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
17/09/26 07:07:33 INFO util.GSet: capacity = 2^21 = 2097152 entries
17/09/26 07:07:33 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
17/09/26 07:07:33 INFO blockmanagement.BlockManager: defaultReplication = 3
17/09/26 07:07:33 INFO blockmanagement.BlockManager: maxReplication = 512
17/09/26 07:07:33 INFO blockmanagement.BlockManager: minReplication = 1
17/09/26 07:07:33 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
17/09/26 07:07:33 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
17/09/26 07:07:33 INFO blockmanagement.BlockManager: encryptDataTransfer = false
17/09/26 07:07:33 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
17/09/26 07:07:33 INFO namenode.FSNamesystem: fsOwner = hduser (auth:SIMPLE)
17/09/26 07:07:33 INFO namenode.FSNamesystem: supergroup = supergroup
17/09/26 07:07:33 INFO namenode.FSNamesystem: isPermissionEnabled = false
17/09/26 07:07:33 INFO namenode.FSNamesystem: HA Enabled: false
17/09/26 07:07:33 INFO namenode.FSNamesystem: Append Enabled: true
17/09/26 07:07:34 INFO util.GSet: Computing capacity for map INodeMap
17/09/26 07:07:34 INFO util.GSet: VM type = 64-bit
17/09/26 07:07:34 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
17/09/26 07:07:34 INFO util.GSet: capacity = 2^20 = 1048576 entries
17/09/26 07:07:34 INFO namenode.FSDirectory: ACLs enabled? false
17/09/26 07:07:34 INFO namenode.FSDirectory: XAttrs enabled? true
17/09/26 07:07:34 INFO namenode.NameNode: Caching file names occurring more than 10 times
17/09/26 07:07:34 INFO util.GSet: Computing capacity for map cachedBlocks
17/09/26 07:07:34 INFO util.GSet: VM type = 64-bit
17/09/26 07:07:34 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
17/09/26 07:07:34 INFO util.GSet: capacity = 2^18 = 262144 entries
17/09/26 07:07:34 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
17/09/26 07:07:34 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
17/09/26 07:07:34 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
17/09/26 07:07:34 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
17/09/26 07:07:34 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
17/09/26 07:07:34 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
17/09/26 07:07:34 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
17/09/26 07:07:34 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
17/09/26 07:07:34 INFO util.GSet: Computing capacity for map NameNodeRetryCache
17/09/26 07:07:34 INFO util.GSet: VM type = 64-bit
17/09/26 07:07:34 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
17/09/26 07:07:34 INFO util.GSet: capacity = 2^15 = 32768 entries
Re-format filesystem in Storage Directory /usr/local/hadoop/data/namenode ? (Y or N)
$ Y
17/09/26 07:09:21 INFO namenode.FSImage: Allocated new BlockPoolId: BP-793961451-10.0.0.190-1506409761821
17/09/26 07:09:21 INFO common.Storage: Storage directory /usr/local/hadoop/data/namenode has been successfully formatted.
17/09/26 07:09:21 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/data/namenode/current/fsimage.ckpt_0000000000000000000 using no compression
17/09/26 07:09:21 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/data/namenode/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds.
17/09/26 07:09:21 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
17/09/26 07:09:21 INFO util.ExitUtil: Exiting with status 0
17/09/26 07:09:21 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com/10.0.0.190
************************************************************/
当我启动DFS和纱线它似乎正确启动:
$ start-dfs.sh
Starting namenodes on [ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com]
ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com: starting namenode, logging to ...
10.0.0.185: starting datanode, logging to ...
10.0.0.244: starting datanode, logging to ...
10.0.0.83: starting datanode, logging to ...
Starting secondary namenodes [ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com]
ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com: starting secondarynamenode, logging to ...
$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to ...
10.0.0.185: starting nodemanager, logging to ...
10.0.0.83: starting nodemanager, logging to ...
10.0.0.244: starting nodemanager, logging to ...
$ jps
14326 NameNode
14998 Jps
14552 SecondaryNameNode
14729 ResourceManager
而且对其他节点是这样的:
15880 Jps
15563 DataNode
15693 NodeManager
但是,当我尝试将数据写入HDFS时,它告诉我没有任何节点实际可用。这似乎是一个非常普遍的错误,我无法找到问题所在。
$ hdfs dfs -put pg1661.txt /samples/input
WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /samples/input/pg1661.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
然后,当我检查状态,它似乎并没有正常工作:
$ hdfs dfsadmin -report
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
我检查日志文件,而且它们并不表示任何(致命)的错误,除了当试图上传文件。
鉴于上述情况在启动时不会产生任何错误,并且错误消息本身非常普遍,我发现很难找到错误。
THX您的回复。我确实运行了这个命令。响应以'SHUTDOWN_MSG:关闭NameNode在ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com//10.0.0.190'结束。这是否表明format命令失败?它不会给出任何错误消息,除非告诉我它关闭了。我会更新这个问题。 – Dendrobates
我包含了我(尝试)格式化名称节点时得到的响应。 – Dendrobates
我认为格式化namenode时,关闭消息是正常的。我想可能是namenode无法SSH进入数据节点。你有没有将数据节点定义为单独的服务器或同一台服务器?也许你可以先尝试单节点设置,即同一台服务器上的namenode和数据节点。一旦工作,尝试添加其他数据节点。它将隔离一些问题。你也可以与core-site.xml,hdfs-site.xml一起共享你的主人和奴隶文件吗? –