英文:
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException) for hadoop 3.1.3
问题
我正尝试运行一个MapReduce作业,但在Hadoop-3.1.3中遇到错误。
错误信息如下:
2020-04-04 19:59:11,379 INFO client.RMProxy: 连接到ResourceManager:/0.0.0.0:8032
2020-04-04 19:59:12,499 WARN mapreduce.JobResourceUploader: 未执行Hadoop命令行选项解析。实现Tool接口,并使用ToolRunner执行应用程序以解决此问题。
2020-04-04 19:59:12,569 INFO mapreduce.JobResourceUploader: 为路径禁用Erasure Coding:/tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007
2020-04-04 19:59:12,727 WARN hdfs.DataStreamer: DataStreamer 异常
org.apache.hadoop.ipc.RemoteException(java.io.IOException):仅可以将文件 /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007/job.jar 写入 1 个最小副本数节点中。当前有 0 个数据节点正在运行,并且在此操作中有 0 个节点被排除在外。
...
2020-04-04 19:59:12,734 INFO mapreduce.JobSubmitter: 清理暂存区 /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException):仅可以将文件 /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007/job.jar 写入 1 个最小副本数节点中。当前有 0 个数据节点正在运行,并且在此操作中有 0 个节点被排除在外。
...
更新内容如下:
core-site.xml
配置:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>C:\hadoop\hdfstmp</value>
</property>
</configuration>
hdfs-site.xml
配置:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:\hadoop\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:\hadoop\data\datanode</value>
</property>
<property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>0</value>
</property>
</configuration>
jps
命令的输出:
16832 NodeManager
5556 ResourceManager
18280 NameNode
11708 Jps
datanode
错误日志:
2020-04-04 21:42:25,150 WARN common.Storage: 添加存储目录失败 [DISK]file:/C:/hadoop/data/datanode
java.io.IOException: C:\hadoop\data\datanode 中的集群 ID 不兼容:namenode 集群 ID = CID-199fd5c5-1f1d-4c44-9e39-80995486695e;datanode 集群 ID = CID-16d0af22-57e1-4531-a5c8-4bf3eefd351d
...
2020-04-04 21:42:25,156 ERROR datanode.DataNode: 对于 Block 池 <registering> (Datanode Uuid 7578b7ba-c42a-476b-abc2-2088b15b3474) 服务于 localhost/127.0.0.1:9000,初始化失败。退出。
java.io.IOException: 所有指定的目录加载失败。
...
2020-04-04 21:42:25,158 WARN datanode.DataNode: 为 Block 池 <registering> (Datanode Uuid 7578b7ba-c42a-476b-abc2-2088b15b3474) 服务于 localhost/127.0.0.1:9000,结束块池服务。
2020-04-04 21:42:25,261 INFO datanode.DataNode: 已删除 Block 池 <registering> (Datanode Uuid 7578b7ba-c42a-476b-abc2-2088b15b3474)
2020-04-04 21:42:27,274 WARN datanode.DataNode: 退出 Datanode
英文:
I am trying to run a mapreduce job but I am getting error for Hadoop-3.1.3
hadoop jar WordCount.jar WordcountDemo.WordCount /mapwork/Mapwork /r_out
Error
2020-04-04 19:59:11,379 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2020-04-04 19:59:12,499 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2020-04-04 19:59:12,569 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007
2020-04-04 19:59:12,727 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007/job.jar could only be written to 0 of the 1 minReplication nodes. There are 0 datanode(s) running and 0 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2205)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2731)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:892)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:568)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2916)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1545)
at org.apache.hadoop.ipc.Client.call(Client.java:1491)
at org.apache.hadoop.ipc.Client.call(Client.java:1388)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:514)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1081)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1866)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1668)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
2020-04-04 19:59:12,734 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007/job.jar could only be written to 0 of the 1 minReplication nodes. There are 0 datanode(s) running and 0 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2205)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2731)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:892)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:568)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2916)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1545)
at org.apache.hadoop.ipc.Client.call(Client.java:1491)
at org.apache.hadoop.ipc.Client.call(Client.java:1388)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:514)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1081)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1866)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1668)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
Update (from comments):
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>C:\hadoop\hdfstmp</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:\hadoop\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:\hadoop\data\datanode</value>
</property>
<property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>0</value>
</property>
</configuration>
Output of jps
:
16832 NodeManager
5556 ResourceManager
18280 NameNode
11708 Jps
datanode
error log:
2020-04-04 21:42:25,150 WARN common.Storage: Failed to add storage directory [DISK]file:/C:/hadoop/data/datanode
java.io.IOException: Incompatible clusterIDs in C:\hadoop\data\datanode: namenode clusterID = CID-199fd5c5-1f1d-4c44-9e39-80995486695e; datanode clusterID = CID-16d0af22-57e1-4531-a5c8-4bf3eefd351d
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:744)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:294)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:407)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:387)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:559)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1743)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1679)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:390)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:282)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:822)
at java.lang.Thread.run(Thread.java:748)
2020-04-04 21:42:25,156 ERROR datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid 7578b7ba-c42a-476b-abc2-2088b15b3474) service to localhost/127.0.0.1:9000. Exiting.
java.io.IOException: All specified directories have failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:560)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1743)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1679)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:390)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:282)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:822)
at java.lang.Thread.run(Thread.java:748)
2020-04-04 21:42:25,158 WARN datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid 7578b7ba-c42a-476b-abc2-2088b15b3474) service to localhost/127.0.0.1:9000
2020-04-04 21:42:25,261 INFO datanode.DataNode: Removed Block pool <registering> (Datanode Uuid 7578b7ba-c42a-476b-abc2-2088b15b3474)
2020-04-04 21:42:27,274 WARN datanode.DataNode: Exiting Datanode
答案1
得分: 1
由于无法访问HDFS,导致Mapreduce作业失败,错误信息为“在此操作中没有运行任何数据节点(datanode),且没有节点(node)被排除在外。”
从数据节点日志中可以理解到,由于“不兼容的集群ID(clusterID)”,Datanode
守护程序无法向HDFS集群注册自身。
当格式化(安装和设置过程中)名称节点时,会生成一个 clusterID
,并在每个守护程序初始化时将此 clusterID
存储在 VERSION
文件中。此 clusterID
用作数据节点的标识符,使它们能够在停止和启动时重新加入集群。
节点之间存在不兼容的 clusterID
可能是因为在活动集群上格式化了名称节点,而其他守护程序未重新初始化。
要恢复集群的正常运行,
- 停止集群
- 删除以下目录的内容:
C:\hadoop\hdfstmp
、C:\hadoop\data\namenode
、C:\hadoop\data\datanode
- 格式化名称节点
- 启动集群
然后,您需要重新复制Mapreduce作业所需的数据,并运行该作业。
英文:
The Mapreduce job fails because it is unable to access HDFS since There are 0 datanode(s) running and 0 node(s) are excluded in this operation.
And from the datanode logs, it is understood that the Datanode
daemon is unable to register itself with the HDFS cluster due to Incompatible clusterIDs
.
When a namenode is formatted (during installation and setup), a clusterID
is generated and this clusterID is stored in the VERSION
file of each daemon when they initialize. This clusterID acts as the identifier for the datanodes, letting them to rejoin the cluster whenever they are stopped and started.
Incompatible clusterIDs among the nodes can happen when the namenode is formatted on an active cluster and the other daemons are not re-initialized.
To get the cluster back in form,
- Stop the cluster
- Delete the contents of the following
directoriesC:\hadoop\hdfstmp
,C:\hadoop\data\namenode
,
C:\hadoop\data\datanode
- Format the namenode
- Start the cluster
You have recopy the data required for the Mapreduce job and run the job.
答案2
得分: 0
我没有关闭和重启集群的选项。然而,运行以下命令解决了问题,而且据我所见没有引起其他任何问题。
hdfs dfsadmin -safemode leave
参见如下链接:
- https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#DFSAdmin_Command
- https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode
英文:
I do not have the option to shut down and restart my cluster. However, running the following command solved the problem without causing any other issue that I could see.
hdfs dfsadmin -safemode leave
See the following:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论