flume对接hive 踩坑吐血failed to connecting to Endpoint 或者Internal error processing open_txns

216 阅读 0 评论 143 点赞

我是靠谱客的博主活力音响，这篇文章主要介绍flume对接hive 踩坑吐血failed to connecting to Endpoint 或者Internal error processing open_txns，现在分享给大家，希望可以做个参考。

java面试题网站：www.javaoffers.com

在被flume折腾的要吐血的时候，最终也解决了，慢慢的踩坑路。记录一下！

#1：如果flume与hive进行对接，首先hive-site.xml 中的信息要如下：红色为重点

<configuration>
        <property>
                <name>javax.jdo.option.ConnectionURL</name>
                <value>jdbc:mysql://node1:3306/hivedb</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionDriverName</name>
                <value>com.mysql.jdbc.Driver</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionUserName</name>
                <value>root</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionPassword</name>
                <value>123</value>
        </property>

<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>1</value>
</property>
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>

<property>
<name>hive.in.test</name>
<value>false</value>
</property>
</configuration>

#2：hive表一定要分桶，clustered by (clom) into XX buckets, 保存为orc:stored as orc,开启事务： TBLPROPERTIES ('transactional'='true');

#3：保证 metastore 开启，默认端口是9083

bin/hive --service metastore &

#4：如果报错：failed to connecting to Endpoint 或者Internal error processing open_txns，什么首先检查metastore 是否开启，如果已经开启任然报这个错误，那你去使用

hive --hiveconf hive.root.logger=DEBUG,console 这个命令去启动hive命令，查看是否存在报错，如果存在报错（本人就是在这踩坑，坑到吐血，搞了整整一天），然后你随便写一个sql 语句，比如show Tables; 如果报带有lock的错，那你估计就碰到很大的麻烦，我起先配置hive的时候，在配置文件中没有上面红色的部分，是后来追加进去的，关于这个报错本在在谷歌百度都均未找到解决的答案（都试过都不行，基本上接近崩溃），最后解决这个问题入口点就是先解决启动hive 时不报错，并且 show Tables 也不会报错：本人的解决方案是：删除hive库，重新建hive库，编码用latin1, 防止报字段长度过长的错。最后这个坑就这么解决了。真是坑到不行坑到吐血。所以给我的教训就是，以后hive库一定要手动创建。编码用latin1.尽量不要 if not exists ,在hive --hiveconf hive.root.logger=DEBUG,console 命令执行的时候一定要查看滚动的日志是否出现错误日志。此命令会创建hive自己的表（如果不存在）。并且如果出现什么异常你也可看见。坑到吐血记录一下。

flume对接hive的描述信息：

# 定义这个agent中各组件的名字
fhive.sources = r1
fhive.sinks = k1
fhive.channels = c1

#配置source
#fhive.sources.r1.type = spooldir
#fhive.sources.r1.spoolDir = /home/log/flume
#fhive.sources.r1.fileHeader = true

fhive.sources.r1.type = http
fhive.sources.r1.bind = node1
fhive.sources.r1.port = 44444
fhive.sources.r1.handler = spark_api.flume.MHJsonHandler#主要把信息也放在header中，因为分区字段表达式 %{ctime}是从header中获取的。

# 描述和配置channel组件，此处使用是内存缓存的方式
fhive.channels.c1.type = memory
fhive.channels.c1.capacity = 10000
fhive.channels.c1.transactionCapacity = 600

#配置sinks
fhive.sinks.k1.type = hive
fhive.sinks.k1.channel = c1
fhive.sinks.k1.hive.metastore = thrift://node1:9083
fhive.sinks.k1.hive.database = default
fhive.sinks.k1.hive.table = url
fhive.sinks.k1.hive.partition = %{ctime} # 分区的字段名字，
fhive.sinks.k1.autoCreatePartitions = true
fhive.sinks.k1.useLocalTimeStamp = false
fhive.sinks.k1.round = true
fhive.sinks.k1.roundValue = 10
fhive.sinks.k1.roundUnit = minute
fhive.sinks.k1.serializer = DELIMITED
fhive.sinks.k1.serializer.delimiter = ","
fhive.sinks.k1.serializer.serdeSeparator = ','
fhive.sinks.k1.serializer.fieldnames =isvalid,url,ip
fhive.sinks.k1.callTimeout = 2000000