hadoop2.7.2集群运行HQL时，异常Job Submission failed with exception 'org.apache.hadoop.ipc.RemoteException

96 阅读 0 评论 64 点赞

我是靠谱客的博主愉快大侠，这篇文章主要介绍hadoop2.7.2集群运行HQL时，异常Job Submission failed with exception 'org.apache.hadoop.ipc.RemoteException，现在分享给大家，希望可以做个参考。

在运行了接近50天集群（期间集群没有重启过）之后，运行的是HQL脚本，就是一条简单的查询语句，集群报错，以下是报错的具体信息，最终的解决方案是：手动重启集群，解决了。

在重启集群时：发现不能运行sh stop-all.sh来关闭，会提示：

This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [192.168.1.190]
192.168.1.190: no namenode to stop
192.168.1.191: no datanode to stop
192.168.1.193: no datanode to stop
192.168.1.194: no datanode to stop
192.168.1.192: no datanode to stop
Stopping secondary namenodes [192.168.1.190]
192.168.1.190: no secondarynamenode to stop
stopping yarn daemons
no resourcemanager to stop
192.168.1.191: no nodemanager to stop
192.168.1.193: no nodemanager to stop
192.168.1.192: no nodemanager to stop
192.168.1.194: no nodemanager to stop
no proxyserver to stop

但是在各个从节点（datanode）上利用命令jps查看时，发现各个进行都是存在的，就是不能stop，

究其原因是因为，其余节点和主节点失去了通信，此时只能一台机器一台机器去手动kill响应的进程

然后重启集群，再测试一遍HQL，报错信息就不再出现了，是什么导致主从节点之间的通信断开，

在这一点上，目前只能归咎于hadoop集群的不稳定性。

===========以下是报错信息===========

Logging initialized using configuration in file:/opt/hive/apache-hive-1.2.1-bin/conf/hive-log4j.properties

OK
Time taken: 1.052 seconds
Query ID = hadoop_20161102163133_5e027a14-3452-4278-9057-e0a244a61952
Total jobs = 1
16/11/02 16:31:37 WARN conf.HiveConf: HiveConf of name hive.files.umask.value does not exist
Execution log at: /tmp/hadoop/hadoop_20161102163133_5e027a14-3452-4278-9057-e0a244a61952.log
2016-11-02 16:31:37     Starting to launch local task to process map join;      maximum memory = 508559360
2016-11-02 16:31:38     Dump the side-table for tag: 0 with group count: 103 into file: file:/tmp/hive/local/24a2fae4-017e-4555-a7c5-6bc9a13419e5/hive_2016-11-02_16-31-33_691_9110187415324308596-1/-local-10002/HashTable-Stage-4/MapJoin-mapfile00--.hashtable
2016-11-02 16:31:38     Uploaded 1 File to: file:/tmp/hive/local/24a2fae4-017e-4555-a7c5-6bc9a13419e5/hive_2016-11-02_16-31-33_691_9110187415324308596-1/-local-10002/HashTable-Stage-4/MapJoin-mapfile00--.hashtable (945102 bytes)
2016-11-02 16:31:38     End of local task; Time Taken: 1.215 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator

org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging/hadoop/.staging/job_1476427217749_1066/libjars/janino-2.7.6.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and 4 node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

        at org.apache.hadoop.ipc.Client.call(Client.java:1475)
        at org.apache.hadoop.ipc.Client.call(Client.java:1412)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
        at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
J ob Submission failed with exception 'org.apache.hadoop.ipc.RemoteException(File /tmp/hadoop-yarn/staging/hadoop/.staging/job_1476427217749_1066/libjars/janino-2.7.6.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and 4 node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask