2017-04-04 384 views
2

我想要将文件从外部Windows服务器上传到不同服务器中的Hdfs。 Hdfs是该服务器中cloudera docker容器的一部分。将文件从服务器上传到另一台服务器中的Hdfs

我连接到HDFS从Windows服务器如下:

Configuration conf = new Configuration(); 
conf.set("fs.defaultFS", "hdfs://%HDFS_SERVER_IP%:8020"); 
fs = FileSystem.get(conf); 

当我打电话fs.copyFromLocalFile(localFilePath, hdfsFilePath);,它会抛出异常下方,并创建文件,而无需在HDFS中的任何内容。 :

org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/test/test.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. 
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1595) 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3287) 
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:677) 
    at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:213) 
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:485) 
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) 
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) 
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) 
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) 

    at org.apache.hadoop.ipc.Client.call(Client.java:1475) 
    at org.apache.hadoop.ipc.Client.call(Client.java:1412) 
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) 
    at com.sun.proxy.$Proxy15.addBlock(Unknown Source) 
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) 
    at com.sun.proxy.$Proxy16.addBlock(Unknown Source) 
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1455) 
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1251) 
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448) 

而且似乎在数据节点的问题,以下是从它的日志复制:

重试连接到服务器:0.0.0.0/0.0.0.0:8022。已经尝试过0 次(s);重试的政策是 RetryUpToMaximumCountWithFixedSleep(maxRetries = 10,休眠时间= 1000 毫秒)

我格式化的数据节点,并重新启动HDFS但仍无法上传文件在这种情况下。除了阅读,写文件等其他功能,配置文件也可以在本地系统和Hdfs在同一台服务器上传输文件。

服务器连接到代理服务器,我配置了Hdfs的Docker容器的代理环境。通过在不同服务器之间使用Hdfs Java Api来传输文件如何?

更新1:

HDFS dfsadmin -report:

17/04/05 07:14:02 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032 
Total Nodes:1 
     Node-Id    Node-State Node-Http-Address  Number-of-Running-Containers 
quickstart.cloudera:37449    RUNNING quickstart.cloudera:8042         0 
[[email protected] conf]# hdfs dfsadmin -report 
Configured Capacity: 211243687936 (196.74 GB) 
Present Capacity: 78773199014 (73.36 GB) 
DFS Remaining: 77924307110 (72.57 GB) 
DFS Used: 848891904 (809.57 MB) 
DFS Used%: 1.08% 
Under replicated blocks: 0 
Blocks with corrupt replicas: 0 
Missing blocks: 0 
Missing blocks (with replication factor 1): 0 

------------------------------------------------- 
Live datanodes (1): 

Name: XXXX:50010 (quickstart.cloudera) 
Hostname: quickstart.cloudera 
Decommission Status : Normal 
Configured Capacity: 211243687936 (196.74 GB) 
DFS Used: 848891904 (809.57 MB) 
Non DFS Used: 132470488922 (123.37 GB) 
DFS Remaining: 77924307110 (72.57 GB) 
DFS Used%: 0.40% 
DFS Remaining%: 36.89% 
Configured Cache Capacity: 0 (0 B) 
Cache Used: 0 (0 B) 
Cache Remaining: 0 (0 B) 
Cache Used%: 100.00% 
Cache Remaining%: 0.00% 
Xceivers: 6 
Last contact: Wed Apr 05 07:15:00 UTC 2017 

纱线节点-list -all:

17/04/05 07:14:02 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032 
Total Nodes:1 
     Node-Id    Node-State Node-Http-Address  Number-of-Running-Containers 
quickstart.cloudera:37449    RUNNING quickstart.cloudera:8042         0 

芯-site.xml中:

<?xml version="1.0"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 

<configuration> 
    <property> 
    <name>fs.defaultFS</name> 
    <value>hdfs://quickstart.cloudera:8020</value> 
    </property> 

    <!-- OOZIE proxy user setting --> 
    <property> 
    <name>hadoop.proxyuser.oozie.hosts</name> 
    <value>*</value> 
    </property> 
    <property> 
    <name>hadoop.proxyuser.oozie.groups</name> 
    <value>*</value> 
    </property> 

    <!-- HTTPFS proxy user setting --> 
    <property> 
    <name>hadoop.proxyuser.httpfs.hosts</name> 
    <value>*</value> 
    </property> 
    <property> 
    <name>hadoop.proxyuser.httpfs.groups</name> 
    <value>*</value> 
    </property> 

    <!-- Llama proxy user setting --> 
    <property> 
    <name>hadoop.proxyuser.llama.hosts</name> 
    <value>*</value> 
    </property> 
    <property> 
    <name>hadoop.proxyuser.llama.groups</name> 
    <value>*</value> 
    </property> 

    <!-- Hue proxy user setting --> 
    <property> 
    <name>hadoop.proxyuser.hue.hosts</name> 
    <value>*</value> 
    </property> 
    <property> 
    <name>hadoop.proxyuser.hue.groups</name> 
    <value>*</value> 
    </property> 

</configuration> 

HDFS-site.xml中:

<?xml version="1.0"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 

<configuration> 
    <property> 
    <name>dfs.replication</name> 
    <value>1</value> 
    </property> 
    <!-- Immediately exit safemode as soon as one DataNode checks in. 
     On a multi-node cluster, these configurations must be removed. --> 
    <property> 
    <name>dfs.safemode.extension</name> 
    <value>0</value> 
    </property> 
    <property> 
    <name>dfs.safemode.min.datanodes</name> 
    <value>1</value> 
    </property> 
    <property> 
    <name>dfs.permissions.enabled</name> 
    <value>false</value> 
    </property> 
    <property> 
    <name>dfs.permissions</name> 
    <value>false</value> 
    </property> 
    <property> 
    <name>dfs.safemode.min.datanodes</name> 
    <value>1</value> 
    </property> 
    <property> 
    <name>dfs.webhdfs.enabled</name> 
    <value>true</value> 
    </property> 
    <property> 
    <name>hadoop.tmp.dir</name> 
    <value>/var/lib/hadoop-hdfs/cache/${user.name}</value> 
    </property> 
    <property> 
    <name>dfs.namenode.name.dir</name> 
    <value>/var/lib/hadoop-hdfs/cache/${user.name}/dfs/name</value> 
    </property> 
    <property> 
    <name>dfs.namenode.checkpoint.dir</name> 
    <value>/var/lib/hadoop-hdfs/cache/${user.name}/dfs/namesecondary</value> 
    </property> 
    <property> 
    <name>dfs.datanode.data.dir</name> 
    <value>/var/lib/hadoop-hdfs/cache/${user.name}/dfs/data</value> 
    </property> 
    <property> 
    <name>dfs.namenode.rpc-bind-host</name> 
    <value>0.0.0.0</value> 
    </property> 

    <property> 
    <name>dfs.namenode.servicerpc-address</name> 
    <value>0.0.0.0:8022</value> 
    </property> 
    <property> 
    <name>dfs.https.address</name> 
    <value>0.0.0.0:50470</value> 
    </property> 
    <property> 
    <name>dfs.namenode.http-address</name> 
    <value>0.0.0.0:50070</value> 
    </property> 
    <property> 
    <name>dfs.datanode.address</name> 
    <value>0.0.0.0:50010</value> 
    </property> 
    <property> 
    <name>dfs.datanode.ipc.address</name> 
    <value>0.0.0.0:50020</value> 
    </property> 
    <property> 
    <name>dfs.datanode.http.address</name> 
    <value>0.0.0.0:50075</value> 
    </property> 
    <property> 
    <name>dfs.datanode.https.address</name> 
    <value>0.0.0.0:50475</value> 
    </property> 
    <property> 
    <name>dfs.namenode.secondary.http-address</name> 
    <value>0.0.0.0:50090</value> 
    </property> 
    <property> 
    <name>dfs.namenode.secondary.https-address</name> 
    <value>0.0.0.0:50495</value> 
    </property> 

    <!-- Impala configuration --> 
    <property> 
    <name>dfs.datanode.hdfs-blocks-metadata.enabled</name> 
    <value>true</value> 
    </property> 
    <property> 
    <name>dfs.client.file-block-storage-locations.timeout.millis</name> 
    <value>10000</value> 
    </property> 
    <property> 
    <name>dfs.client.read.shortcircuit</name> 
    <value>true</value> 
    </property> 
    <property> 
    <name>dfs.domain.socket.path</name> 
    <value>/var/run/hadoop-hdfs/dn._PORT</value> 
    </property> 
</configuration> 
+0

你从哪里运行这段代码?它必须位于Windows Server上。还发布完整的堆栈跟踪。 – franklinsijo

+0

如何初始化文件系统? – Serhiy

+0

datanode是否有足够的空间!添加'hdfs dfsadmin -report','yarn node -list -all'和'core-site.xml','hdfs-site.xml'属性的输出。 – franklinsijo

回答

0

我只改conf.set("fs.defaultFS", "hdfs://%HDFS_SERVER_IP%:8020")conf.set("fs.defaultFS", "webhdfs://%HDFS_SERVER_IP%:50070"),然后我成功地上传文件到HDFS在不同的服务器。我提到这个link

1

RPC端口的属性fs.defaultFScore-site.xmlhdfs-site.xmldfs.namenode.servicerpc-address之间的冲突。

将其修改为hdfs-site.xml并重新启动服务。

<property> 
    <name>dfs.namenode.servicerpc-address</name> 
    <value>0.0.0.0:8020</value> 
</property> 
+0

我修改并且namenode无法初始化。我将该属性的名称更改为dfs.namenode.rpc-address,并且仍然有相同的异常。 org.apache.hadoop.ipc.Client:重试连接到服务器:0.0.0.0/0.0.0.0:8020。仍然在datanode的日志中。 – isspek

+0

你得到的错误是什么? – franklinsijo

+0

它们是RemoteException,与问题和org.apache.hadoop.ipc.Client中描述的相同:重试连接到服务器:数据节点日志中的0.0.0.0/0.0.0.0:8020 – isspek

相关问题