我是靠谱客的博主 懵懂电话,最近开发中收集的这篇文章主要介绍flume--收集日志到hive,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

方案一:收集到hdfs中
方案二:插入已经有的表,使用flume收集数据到hive,hive中数据必须以orc格式保存
source
网络日志
channel 本地磁盘+memory,优先使用内存,如果内存使用完毕,就使用本地磁盘作为缓冲
sink
hive
a1.sources = s1
a1.channels=c1
a1.sinks=k1
#tcp协议
a1.sources.s1.type =
syslogtcp
a1.sources.s1.port= 5140
a1.sources.s1.host= wangfutai
a1.sources.s1.channels = c1
a1.channels = c1
a1.channels.c1.type = SPILLABLEMEMORY
a1.channels.c1.memoryCapacity = 10000
a1.channels.c1.overflowCapacity = 1000000
a1.channels.c1.byteCapacity = 800000
a1.channels.c1.checkpointDir =/home/wangfutai/a/flume/checkPoint
a1.channels.c1.dataDirs = /home/wangfutai/a/flume/data
a1.sinks = k1
a1.sinks.k1.type = hive
a1.sinks.k1.channel = c1
a1.sinks.k1.hive.metastore = thrift://wangfutai:9083
a1.sinks.k1.hive.database = hive
a1.sinks.k1.hive.table = flume
#a1.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%M
#a1.sinks.k1.useLocalTimeStamp = false
a1.sinks.k1.round = true
a1.sinks.k1.roundValue = 10
a1.sinks.k1.roundUnit = minute
a1.sinks.k1.serializer = DELIMITED
a1.sinks.k1.serializer.delimiter = ","
a1.sinks.k1.serializer.serdeSeparator = 't'
a1.sinks.k1.serializer.fieldnames =id,name,age
19/01/16 22:24:59 ERROR node.PollingPropertiesFileConfigurationProvider: Failed to start agent because dependencies were not found in classpath. Error follows.
java.lang.NoClassDefFoundError: org/apache/hive/hcatalog/streaming/RecordWriter
at org.apache.flume.sink.hive.HiveSink.createSerializer(HiveSink.java:219)
at org.apache.flume.sink.hive.HiveSink.configure(HiveSink.java:202)
1.将/home/wangfutai/module/hive-1.1.0-cdh5.15.0/hcatalog/share/hcatalog下的所有包,拷贝入
/home/wangfutai/module/apache-flume-1.6.0-cdh5.15.0-bin/lib
2..bash_profile
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*
3.hive-site.xml
<property>
   
<name>hive.support.concurrency</name>
   
<value>true</value>
</property>
<property>
   
<name>hive.enforce.bucketing</name>
   
<value>true</value>
</property>
<property>
    <name>hive.txn.manager</name>
    <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
4.表要分桶,和orc 格式
create table hive.flume2
( id int ,
name string,
age int )
clustered by (id) into 2 buckets
stored as orc
tblproperties("transactional"='true');
5.将hive.xml和hive-env.sh放到apache-flume-1.6.0-cdh5.15.0-bin/conf下

 

最后

以上就是懵懂电话为你收集整理的flume--收集日志到hive的全部内容,希望文章能够帮你解决flume--收集日志到hive所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(66)

评论列表共有 0 条评论

立即
投稿
返回
顶部