我是靠谱客的博主 碧蓝爆米花,最近开发中收集的这篇文章主要介绍Flume 采集rsyslog整个配置和流程,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

使用flume采集web服务器日志,架构见图

 

每台webserver 的agent的配置:

 
  1. #configuration 'flume74Agent'

  2. flume74Agent.sources=source74

  3. flume74Agent.sinks=sink74-1 sink74-2

  4. flume74Agent.channels=channel74

  5.  
  6. #configuration sinks group

  7. flume74Agent.sinksgroups=group74

  8.  
  9. #配置source为syslogtcp的源

  10. flume74Agent.sources.source74.type=syslogtcp

  11. flume74Agent.sources.source74.port=514

  12. flume74Agent.sources.source74.host=10.21.3.74

  13. flume74Agent.sources.source74.channels=channel74

  14.  
  15. #配置 memory channels,说明capacity必须大于transactionCapacity,容量配置越小,Agent挂了之后丢失的数据量越少,keep-alive的单位是秒,存活时间

  16. flume74Agent.channels.channel74.type=memory

  17. flume74Agent.channels.channel74.capacity=2000

  18. flume74Agent.channels.channel74.transactionCapacity=1000

  19. flume74Agent.channels.channel74.keep-alive=30

  20.  
  21. #配置 file channel,为了提高效率checkpointDir和dataDir的目录最好分开

  22. #flume74Agent.channels.channel74.type=file

  23. #flume74Agent.channels.channel74.checkpointDir=/usr/local/new-cluster/apache-flume-1.6.0-bin/checkpoint

  24. #flume74Agent.channels.channel74.dataDirs=/usr/local/new-cluster/apache-flume-1.6.0-bin/data

  25. #flume74Agent.channels.channel74.transactionCapacity=10000

  26. #flume74Agent.channels.channel74.checkpointInterval=60000

  27. #flume74Agent.channels.channel74.capacity=20000

  28. #flume74Agent.channels.channel74.keep-alive=30

  29.  
  30. #配置第一个sink sink74-1

  31. flume74Agent.sinks.sink74-1.type=avro

  32. flume74Agent.sinks.sink74-1.port=4141

  33. flume74Agent.sinks.sink74-1.hostname=10.21.3.73

  34. flume74Agent.sinks.sink74-1.channel=channel74

  35.  
  36. #配置第二个sink sink74-2

  37. flume74Agent.sinks.sink74-2.type=avro

  38. flume74Agent.sinks.sink74-2.port=4141

  39. flume74Agent.sinks.sink74-2.hostname=10.21.3.75

  40. flume74Agent.sinks.sink74-2.channel=channel74

  41. #配置sink组

  42. flume74Agent.sinkgroups.group74.sinks=sink74-1 sink74-2

  43.  
  44. # 配置sink组的负载均衡,既能分摊压力又能防止其中一个collect采集挂了丢失数据问题

  45. flume74Agent.sinkgroups.group74.processor.type = load_balance

  46. flume74Agent.sinkgroups.group74.processor.backoff = true

  47. flume74Agent.sinkgroups.group74.processor.selector = random

flume collect的agent配置:

 
  1. collection75Agent.sources=source75

  2. collection75Agent.sinks=sink75-1

  3. collection75Agent.channels=channel75

  4.  
  5. #configuration source

  6. collection75Agent.sources.source75.type=avro

  7. collection75Agent.sources.source75.channels=channel75

  8. collection75Agent.sources.source75.bind=10.21.3.75

  9. collection75Agent.sources.source75.port=4141

  10. collection75Agent.sources.source75.interceptors = i1 i2

  11. collection75Agent.sources.source75.interceptors.i1.type = org.apache.flume.interceptor.HostInterceptor$Builder

  12. collection75Agent.sources.source75.interceptors.i1.preserveExisting = false

  13. collection75Agent.sources.source75.interceptors.i1.hostHeader = hostname

  14. collection75Agent.sources.source75.interceptors.i2.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

  15.  
  16. #configuration memory channel

  17. collection75Agent.channels.channel75.type=memory

  18. collection75Agent.channels.channel75.capacity=2000

  19. collection75Agent.channels.channel75.transactionCapacity=1000

  20. collection75Agent.channels.channel75.keep-alive=30

  21.  
  22. #configuration file channel

  23. #collection75Agent.channels.channel75.type=file

  24. #collection75Agent.channels.channel75.checkpointDir=/usr/local/new-cluster/apache-flume-1.6.0-bin/checkpoint

  25. #collection75Agent.channels.channel75.dataDirs=/usr/local/new-cluster/apache-flume-1.6.0-bin/data

  26. #collection75Agent.channels.channel75.transactionCapacity=10000

  27. #collection75Agent.channels.channel75.checkpointInterval=60000

  28. #collection75Agent.channels.channel75.capacity=20000

  29. #collection75Agent.channels.channel75.keep-alive=30

  30.  
  31. #confituration sinks

  32. collection75Agent.sinks.sink75-1.type=hdfs

  33. collection75Agent.sinks.sink75-1.channel=channel75

  34. collection75Agent.sinks.sink75-1.hdfs.path=hdfs://mycluster1/flume/%Y-%m

  35. collection75Agent.sinks.sink75-1.hdfs.filePrefix=syslog75.%Y-%m-%d

  36. collection75Agent.sinks.sink75-1.hdfs.fileSuffix=.log

  37. collection75Agent.sinks.sink75-1.hdfs.round=true

  38. collection75Agent.sinks.sink75-1.hdfs.roundValue=10

  39. collection75Agent.sinks.sink75-1.hdfs.roundUnit=minute

  40. collection75Agent.sinks.sink75-1.hdfs.rollInterval=0 #多久后重新生成日志文件,0从不生成日志文件

  41. collection75Agent.sinks.sink75-1.hdfs.rollSize=0 #日志多大后重新生成日志文件

  42. collection75Agent.sinks.sink75-1.hdfs.batchSize=1000 #flush到hdfs的日志条数

  43. collection75Agent.sinks.sink75-1.hdfs.rollCount=0 #多少条后重新生成日志文件

  44. collection75Agent.sinks.sink75-1.hdfs.fileType = DataStream

  45. collection75Agent.sinks.sink75-1.hdfs.writeFormat=Text

  46. collection75Agent.sinks.sink75-1.hdfs.callTimeout=600000 #和hdfs通讯多久超时

  47. collection75Agent.sinks.sink75-1.hdfs.threadsPoolSize=20

  48. collection75Agent.sinks.sink75-1.hdfs.rollTimerPoolSize=5

  49. collection75Agent.sinks.sink75-1.hdfs.idleTimeout=600 #间隔多久没有往该日志文件写数据,那么把这个文件结束重命名去除.tmp状态,单位为s

  50.  
  51. #confituration sinks

  52. #collection75Agent.sinks.sink75-2.type=hdfs

  53. #collection75Agent.sinks.sink75-2.channel=channel75

  54. #collection75Agent.sinks.sink75-2.hdfs.path=hdfs://mycluster1/flume/%Y-%m

  55. #collection75Agent.sinks.sink75-2.hdfs.filePrefix=syslog2.%Y-%m-%d

  56. #collection75Agent.sinks.sink75-2.hdfs.fileSuffix=.log

  57. #collection75Agent.sinks.sink75-2.hdfs.round=true

  58. #collection75Agent.sinks.sink75-2.hdfs.roundValue=10

  59. #collection75Agent.sinks.sink75-2.hdfs.roundUnit=minute

  60. #collection75Agent.sinks.sink75-2.hdfs.rollInterval=0

  61. #collection75Agent.sinks.sink75-2.hdfs.rollSize=0

  62. #collection75Agent.sinks.sink75-2.hdfs.batchSize=1000

  63. #collection75Agent.sinks.sink75-2.hdfs.rollCount=0

  64. #collection75Agent.sinks.sink75-2.hdfs.fileType = DataStream

  65. #collection75Agent.sinks.sink75-2.hdfs.writeFormat=Text

  66. #collection75Agent.sinks.sink75-2.hdfs.callTimeout=600000

  67. #collection75Agent.sinks.sink75-2.hdfs.threadsPoolSize=20

  68. #collection75Agent.sinks.sink75-2.hdfs.rollTimerPoolSize=5

  69. #collection75Agent.sinks.sink75-2.channel=channel75

后台启动flume Agent:

nohup flume-ng agent -c conf/ -f conf/collection73Agent.conf -n collection73Agent  > start.log 2>&1 &

rsyslog.conf配置图:

 

补充:flume-env.sh配置

JAVA_OPTS="-Xms2048m -Xmx2048m -Xss256k -Xmn512m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit"

最后

以上就是碧蓝爆米花为你收集整理的Flume 采集rsyslog整个配置和流程的全部内容,希望文章能够帮你解决Flume 采集rsyslog整个配置和流程所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(40)

评论列表共有 0 条评论

立即
投稿
返回
顶部