概述
方案一:收集到hdfs中
方案二:插入已经有的表,使用flume收集数据到hive,hive中数据必须以orc格式保存
source
网络日志
channel 本地磁盘+memory,优先使用内存,如果内存使用完毕,就使用本地磁盘作为缓冲
sink
hive
a1.sources = s1
a1.channels=c1
a1.sinks=k1
#tcp协议
a1.sources.s1.type =
syslogtcp
a1.sources.s1.port= 5140
a1.sources.s1.host= wangfutai
a1.sources.s1.channels = c1
a1.channels = c1
a1.channels.c1.type = SPILLABLEMEMORY
a1.channels.c1.memoryCapacity = 10000
a1.channels.c1.overflowCapacity = 1000000
a1.channels.c1.byteCapacity = 800000
a1.channels.c1.checkpointDir =/home/wangfutai/a/flume/checkPoint
a1.channels.c1.dataDirs = /home/wangfutai/a/flume/data
a1.sinks = k1
a1.sinks.k1.type = hive
a1.sinks.k1.channel = c1
a1.sinks.k1.hive.metastore = thrift://wangfutai:9083
a1.sinks.k1.hive.database = hive
a1.sinks.k1.hive.table = flume
#a1.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%M
#a1.sinks.k1.useLocalTimeStamp = false
a1.sinks.k1.round = true
a1.sinks.k1.roundValue = 10
a1.sinks.k1.roundUnit = minute
a1.sinks.k1.serializer = DELIMITED
a1.sinks.k1.serializer.delimiter = ","
a1.sinks.k1.serializer.serdeSeparator = 't'
a1.sinks.k1.serializer.fieldnames =id,name,age
19/01/16 22:24:59 ERROR node.PollingPropertiesFileConfigurationProvider: Failed to start agent because dependencies were not found in classpath. Error follows.
java.lang.NoClassDefFoundError: org/apache/hive/hcatalog/streaming/RecordWriter
at org.apache.flume.sink.hive.HiveSink.createSerializer(HiveSink.java:219)
at org.apache.flume.sink.hive.HiveSink.configure(HiveSink.java:202)
1.将/home/wangfutai/module/hive-1.1.0-cdh5.15.0/hcatalog/share/hcatalog下的所有包,拷贝入
/home/wangfutai/module/apache-flume-1.6.0-cdh5.15.0-bin/lib
2..bash_profile
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*
3.hive-site.xml
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
4.表要分桶,和orc 格式
create table hive.flume2
( id int ,
name string,
age int )
clustered by (id) into 2 buckets
stored as orc
tblproperties("transactional"='true');
5.将hive.xml和hive-env.sh放到apache-flume-1.6.0-cdh5.15.0-bin/conf下
最后
以上就是懵懂电话为你收集整理的flume--收集日志到hive的全部内容,希望文章能够帮你解决flume--收集日志到hive所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复