概述
一个简单的问题折腾了接近半天的时间,不过最终发现了问题出在哪里了,UDF这个是用户自定义函数,和sum等这种函数类似,主要作用是输入一个值,然后通过适当的处理后返回一个值。UDF倒是很简单的,基本上就是导入包,继承UDF类,实现一个evaluate方法就行了。
转载请注明出处:Hive数据仓库--UDF自定义函数以及其中的坑
UDF实例
简单的一个测试程序,是分析下json格式的字符串,解析成对象数组的形式,然后做相应的调整。
public class UDFTest extends UDF {
public Text evaluate(Text origStr) {
List<TagInfo> tagList = null;
if (origStr != null && origStr.toString().trim() != "") {
tagList = JSONArray.parseArray(origStr.toString(), TagInfo.class);
for (TagInfo tagInfo : tagList) {
Integer result = tagInfo.getLabelWeight() / 2;
if (result > 0) {
tagInfo.setLabelWeight(result);
}
else {
tagList.remove(tagInfo);
}
}
return new Text(JSONArray.toJSONString(tagList));
}
return new Text("");
}
}
主要jar包
这里不再截图,数目太多,主要在Hive中lib目录下的所有的jar包+ hadoop-common-2.4.1.jar + 程序中使用的fastjson包。
遇到的坑
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hive-root/hive_2016-10-31_07-42-05_630_8137311191494691848-1/-mr-10003/0/emptyFile could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at $Proxy18.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at $Proxy19.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
Job Submission failed with exception 'org.apache.hadoop.ipc.RemoteException(File /tmp/hive-root/hive_2016-10-31_07-42-05_630_8137311191494691848-1/-mr-10003/0/emptyFile could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
hive> select test(null) from db.data_page;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1477918402104_0012, Tracking URL = http://hadoopwy1:8088/proxy/application_1477918402104_0012/
Kill Command = /usr/local/hadoop2/bin/hadoop job -kill job_1477918402104_0012
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-10-31 07:43:35,324 Stage-1 map = 0%, reduce = 0%
2016-10-31 07:44:08,527 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.74 sec
MapReduce Total cumulative CPU time: 1 seconds 740 msec
Ended Job = job_1477918402104_0012
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.74 sec HDFS Read: 8300 HDFS Write: 24 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 740 msec
CouldOnlyBeReplicatedTo
不过,有时候很可能是由于NameNode和DataNode之间的版本不一致造成,这时候,需要重新格式化NameNode,注意:格式化NameNode的时候,清除temp文件中的文件,此外清除后可能会丢失数据。
转载请注明出处:Hive数据仓库--UDF自定义函数以及其中的坑
最后
以上就是文艺小猫咪为你收集整理的Hive数据仓库--UDF自定义函数以及其中的坑的全部内容,希望文章能够帮你解决Hive数据仓库--UDF自定义函数以及其中的坑所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复