Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException) for hadoop 3.1.3

huangapple go评论81阅读模式
英文:

Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException) for hadoop 3.1.3

问题

我正尝试运行一个MapReduce作业,但在Hadoop-3.1.3中遇到错误。

错误信息如下:

2020-04-04 19:59:11,379 INFO client.RMProxy: 连接到ResourceManager:/0.0.0.0:8032
2020-04-04 19:59:12,499 WARN mapreduce.JobResourceUploader: 未执行Hadoop命令行选项解析。实现Tool接口,并使用ToolRunner执行应用程序以解决此问题。
2020-04-04 19:59:12,569 INFO mapreduce.JobResourceUploader: 为路径禁用Erasure Coding:/tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007
2020-04-04 19:59:12,727 WARN hdfs.DataStreamer: DataStreamer 异常
org.apache.hadoop.ipc.RemoteException(java.io.IOException):仅可以将文件 /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007/job.jar 写入 1 个最小副本数节点中。当前有 0 个数据节点正在运行,并且在此操作中有 0 个节点被排除在外。
    ...
2020-04-04 19:59:12,734 INFO mapreduce.JobSubmitter: 清理暂存区 /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException):仅可以将文件 /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007/job.jar 写入 1 个最小副本数节点中。当前有 0 个数据节点正在运行,并且在此操作中有 0 个节点被排除在外。
    ...

更新内容如下:

core-site.xml 配置:

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>C:\hadoop\hdfstmp</value>
</property>
</configuration>

hdfs-site.xml 配置:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:\hadoop\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:\hadoop\data\datanode</value>
</property>
<property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>0</value>
</property>
</configuration>

jps 命令的输出:

16832 NodeManager
5556 ResourceManager
18280 NameNode
11708 Jps

datanode 错误日志:

2020-04-04 21:42:25,150 WARN common.Storage: 添加存储目录失败 [DISK]file:/C:/hadoop/data/datanode
java.io.IOException: C:\hadoop\data\datanode 中的集群 ID 不兼容:namenode 集群 ID = CID-199fd5c5-1f1d-4c44-9e39-80995486695e;datanode 集群 ID = CID-16d0af22-57e1-4531-a5c8-4bf3eefd351d
...
2020-04-04 21:42:25,156 ERROR datanode.DataNode: 对于 Block 池 <registering> (Datanode Uuid 7578b7ba-c42a-476b-abc2-2088b15b3474) 服务于 localhost/127.0.0.1:9000,初始化失败。退出。
java.io.IOException: 所有指定的目录加载失败。
...
2020-04-04 21:42:25,158 WARN datanode.DataNode: 为 Block 池 <registering> (Datanode Uuid 7578b7ba-c42a-476b-abc2-2088b15b3474) 服务于 localhost/127.0.0.1:9000,结束块池服务。
2020-04-04 21:42:25,261 INFO datanode.DataNode: 已删除 Block 池 <registering> (Datanode Uuid 7578b7ba-c42a-476b-abc2-2088b15b3474)
2020-04-04 21:42:27,274 WARN datanode.DataNode: 退出 Datanode
英文:

I am trying to run a mapreduce job but I am getting error for Hadoop-3.1.3

hadoop jar WordCount.jar WordcountDemo.WordCount  /mapwork/Mapwork /r_out

Error

2020-04-04 19:59:11,379 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2020-04-04 19:59:12,499 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2020-04-04 19:59:12,569 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007
2020-04-04 19:59:12,727 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007/job.jar could only be written to 0 of the 1 minReplication nodes. There are 0 datanode(s) running and 0 node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2205)
        at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2731)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:892)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:568)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2916)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1545)
        at org.apache.hadoop.ipc.Client.call(Client.java:1491)
        at org.apache.hadoop.ipc.Client.call(Client.java:1388)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
        at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:514)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1081)
        at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1866)
        at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1668)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
2020-04-04 19:59:12,734 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007
Exception in thread &quot;main&quot; org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007/job.jar could only be written to 0 of the 1 minReplication nodes. There are 0 datanode(s) running and 0 node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2205)
        at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2731)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:892)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:568)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2916)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1545)
        at org.apache.hadoop.ipc.Client.call(Client.java:1491)
        at org.apache.hadoop.ipc.Client.call(Client.java:1388)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
        at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:514)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1081)
        at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1866)
        at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1668)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)

Update (from comments):

core-site.xml

&lt;configuration&gt; 
&lt;property&gt; 
&lt;name&gt;fs.default.name&lt;/name&gt; 
&lt;value&gt;hdfs://localhost:9000&lt;/value&gt; 
&lt;/property&gt; 
&lt;property&gt; 
&lt;name&gt;hadoop.tmp.dir&lt;/name&gt; 
&lt;value&gt;C:\hadoop\hdfstmp&lt;/value&gt; 
&lt;/property&gt; 
&lt;/configuration&gt; 

hdfs-site.xml

&lt;configuration&gt; 
&lt;property&gt; 
&lt;name&gt;dfs.replication&lt;/name&gt; 
&lt;value&gt;1&lt;/value&gt; 
&lt;/property&gt; 
&lt;property&gt; 
&lt;name&gt;dfs.namenode.name.dir&lt;/name&gt; 
&lt;value&gt;C:\hadoop\data\namenode&lt;/value&gt; 
&lt;/property&gt; 
&lt;property&gt; 
&lt;name&gt;dfs.datanode.data.dir&lt;/name&gt; 
&lt;value&gt;C:\hadoop\data\datanode&lt;/value&gt; 
&lt;/property&gt; 
&lt;property&gt; 
&lt;name&gt;dfs.datanode.failed.volumes.tolerated&lt;/name&gt; 
&lt;value&gt;0&lt;/value&gt; 
&lt;/property&gt; 
&lt;/configuration&gt;

Output of jps:

16832 NodeManager 
5556 ResourceManager 
18280 NameNode 
11708 Jps

datanode error log:

2020-04-04 21:42:25,150 WARN common.Storage: Failed to add storage directory [DISK]file:/C:/hadoop/data/datanode
java.io.IOException: Incompatible clusterIDs in C:\hadoop\data\datanode: namenode clusterID = CID-199fd5c5-1f1d-4c44-9e39-80995486695e; datanode clusterID = CID-16d0af22-57e1-4531-a5c8-4bf3eefd351d
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:744)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:294)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:407)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:387)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:559)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1743)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1679)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:390)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:282)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:822)
        at java.lang.Thread.run(Thread.java:748)
2020-04-04 21:42:25,156 ERROR datanode.DataNode: Initialization failed for Block pool &lt;registering&gt; (Datanode Uuid 7578b7ba-c42a-476b-abc2-2088b15b3474) service to localhost/127.0.0.1:9000. Exiting.
java.io.IOException: All specified directories have failed to load.
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:560)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1743)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1679)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:390)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:282)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:822)
        at java.lang.Thread.run(Thread.java:748)
2020-04-04 21:42:25,158 WARN datanode.DataNode: Ending block pool service for: Block pool &lt;registering&gt; (Datanode Uuid 7578b7ba-c42a-476b-abc2-2088b15b3474) service to localhost/127.0.0.1:9000
2020-04-04 21:42:25,261 INFO datanode.DataNode: Removed Block pool &lt;registering&gt; (Datanode Uuid 7578b7ba-c42a-476b-abc2-2088b15b3474)
2020-04-04 21:42:27,274 WARN datanode.DataNode: Exiting Datanode

答案1

得分: 1

由于无法访问HDFS,导致Mapreduce作业失败,错误信息为“在此操作中没有运行任何数据节点(datanode),且没有节点(node)被排除在外。”

从数据节点日志中可以理解到,由于“不兼容的集群ID(clusterID)”,Datanode 守护程序无法向HDFS集群注册自身。

当格式化(安装和设置过程中)名称节点时,会生成一个 clusterID,并在每个守护程序初始化时将此 clusterID 存储在 VERSION 文件中。此 clusterID 用作数据节点的标识符,使它们能够在停止和启动时重新加入集群。

节点之间存在不兼容的 clusterID 可能是因为在活动集群上格式化了名称节点,而其他守护程序未重新初始化。

要恢复集群的正常运行,

  1. 停止集群
  2. 删除以下目录的内容:C:\hadoop\hdfstmpC:\hadoop\data\namenodeC:\hadoop\data\datanode
  3. 格式化名称节点
  4. 启动集群

然后,您需要重新复制Mapreduce作业所需的数据,并运行该作业。

英文:

The Mapreduce job fails because it is unable to access HDFS since There are 0 datanode(s) running and 0 node(s) are excluded in this operation.

And from the datanode logs, it is understood that the Datanode daemon is unable to register itself with the HDFS cluster due to Incompatible clusterIDs.

When a namenode is formatted (during installation and setup), a clusterID is generated and this clusterID is stored in the VERSION file of each daemon when they initialize. This clusterID acts as the identifier for the datanodes, letting them to rejoin the cluster whenever they are stopped and started.

Incompatible clusterIDs among the nodes can happen when the namenode is formatted on an active cluster and the other daemons are not re-initialized.

To get the cluster back in form,

  1. Stop the cluster
  2. Delete the contents of the following
    directories C:\hadoop\hdfstmp, C:\hadoop\data\namenode,
    C:\hadoop\data\datanode
  3. Format the namenode
  4. Start the cluster

You have recopy the data required for the Mapreduce job and run the job.

答案2

得分: 0

我没有关闭和重启集群的选项。然而,运行以下命令解决了问题,而且据我所见没有引起其他任何问题。

hdfs dfsadmin -safemode leave

参见如下链接:

  1. https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#DFSAdmin_Command
  2. https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode
英文:

I do not have the option to shut down and restart my cluster. However, running the following command solved the problem without causing any other issue that I could see.

hdfs dfsadmin -safemode leave

See the following:

  1. https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#DFSAdmin_Command
  2. https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode

huangapple
  • 本文由 发表于 2020年4月4日 22:34:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/61029653.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定