我是靠谱客的博主 积极羊,这篇文章主要介绍Flume数据采集至HDFS的排雷日记写在前面一、Flume-agent配置二、连续报错排雷三、hdfs日志生成,现在分享给大家,希望可以做个参考。

文章目录

  • 写在前面
  • 一、Flume-agent配置
  • 二、连续报错排雷
    • org/apache/hadoop/io/SequenceFile$CompressionType
    • org/apache/commons/configuration/Configuration
    • org/apache/hadoop/util/PlatformName
    • org/apache/htrace/core/Tracer$Builder
    • No FileSystem for scheme: hdfs
    • java.nio.charset.MalformedInputException
    • java.lang.OutOfMemoryError: GC overhead limit exceeded
  • 三、hdfs日志生成


写在前面

       本篇文章对于想了解Flume采集数据至HDFS的过程中有哪些需要注意的小伙伴有一定的帮助,这里为了模拟真实环境,临时搭建一台虚拟机,将数据存入TOMCAT中后,我们将数据从当前虚拟机传输至另外一台虚拟机的HDFS上。

环境所涉及版本:

  • apache-tomcat-8.5.63
  • flume-ng-1.6.0-cdh5.14.2
  • hadoop-2.6.0-cdh5.14.2

一、Flume-agent配置

话不多说,直接上agent代码,简单的解释下每行的意义:
(如果还不够清楚,见官网手册 )

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# 定义source,channel,和sink的名字 a1.channels = c1 a1.sources = s1 a1.sinks = k1 #设置source为Spooling Directory Source(专门对文件提取的一种source) a1.sources.s1.type = spooldir a1.sources.s1.channels = c1 #s设置提取文件目录位置 a1.sources.s1.spoolDir = /opt/software/tomcat8563/webapps/mycurd/log #设置输入字符编码(Flume默认是UTF-8的,这里我的日志字符类型为GBK) a1.sources.s1.inputCharset = GBK #设置channel为File Channel a1.channels.c1.type = file #设置检查点目录 a1.channels.c1.checkpointDir = /opt/flume/checkpoint #设置数据目录 a1.channels.c1.dataDirs = /opt/flume/data #设置sink为HDFS Sink a1.sinks.k1.type = hdfs #设置HDFS目录路径(后面加了转义序列) a1.sinks.k1.hdfs.path = hdfs://192.168.237.130:9000/upload/%Y%m%d #设置文件的开头 a1.sinks.k1.hdfs.filePrefix = upload- #设置使用本地时间戳 a1.sinks.k1.hdfs.useLocalTimeStamp = true #设置刷写至HDFS的事件数 a1.sinks.k1.hdfs.batchSize = 100 #设置文件流类型 a1.sinks.k1.hdfs.fileType = DataStream #设置滚动至下一个文件等待的秒数 a1.sinks.k1.hdfs.rollInterval = 600 #设置滚动至下一个文件时当前文件的最大文件大小(单位字节) a1.sinks.k1.hdfs.rollSize = 134217700 #设置截断文件的事件数(设置为0就不因为event数量截断文件) a1.sinks.k1.hdfs.rollCount = 0 #设置hdfs存放副本数 a1.sinks.k1.hdfs.minBlockReplicas = 1 #设置通道 a1.sinks.k1.channel = c1

TIPS:channel的checkpointDir和dataDirs目录需要提前在虚拟机上创建好!


二、连续报错排雷

上面配置完后,,博主和大家一样迫不及待的启动agent试了起来:

复制代码
1
2
flume-ng agent --name a1 --conf /opt/software/flume160/conf/ -f /opt/flumeconf/file-hdfs.conf -Dflume.root.logger=DEBUG,console

然后一盆冷水接一盆冷水的浇来,我们来看看有哪些冷水打扰了我们的兴致:


org/apache/hadoop/io/SequenceFile$CompressionType

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
2021-03-10 23:58:20,087 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:146)] Failed to start agent because dependencies were not found in classpath. Error follows. java.lang.NoClassDefFoundError: org/apache/hadoop/io/SequenceFile$CompressionType at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:235) at org.apache.flume.conf.Configurables.configure(Configurables.java:41) at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:411) at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102) at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:141) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.SequenceFile$CompressionType at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 12 more

解决方法:将HADOOP下的jar包拷贝至Flume/lib目录下;

jar包名:${HADOOP_HOME}share/hadoop/common/hadoop-common-2.6.0-cdh5.14.2.jar


org/apache/commons/configuration/Configuration

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
2021-03-11 08:45:13,867 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:447)] process failed java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36) at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:139) at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:259) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2979) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2971) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2834) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:260) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:252) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:701) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:698) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 18 more

解决方法:将HADOOP下的jar包拷贝至Flume/lib目录下;

jar包名:${HADOOP_HOME}share/hadoop/common/lib/commons-configuration-1.6.jar


org/apache/hadoop/util/PlatformName

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName at org.apache.hadoop.security.UserGroupInformation.getOSLoginModuleName(UserGroupInformation.java:442) at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:487) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2979) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2971) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2834) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:260) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:252) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:701) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:698) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 16 more

解决方法:将HADOOP下的jar包拷贝至Flume/lib目录下;

jar包名:${HADOOP_HOME}share/hadoop/common/lib/hadoop-auth-2.6.0-cdh5.14.2.jar


org/apache/htrace/core/Tracer$Builder

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2021-03-11 09:07:27,157 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:447)] process failed java.lang.NoClassDefFoundError: org/apache/htrace/core/Tracer$Builder at org.apache.hadoop.fs.FsTracer.get(FsTracer.java:42) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2803) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2853) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2835) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:260) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:252) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:701) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:698) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

解决方法:将HADOOP下的jar包拷贝至Flume/lib目录下;

jar包名:${HADOOP_HOME}share/hadoop/common/lib/htrace-core4-4.0.1-incubating.jar


No FileSystem for scheme: hdfs

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
2021-03-11 09:14:59,911 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:443)] HDFS IO error java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2796) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2853) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2835) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:260) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:252) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:701) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:698) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

解决方法:将HADOOP下的jar包拷贝至Flume/lib目录下;

jar包名:${HADOOP_HOME}share/hadoop/hdfs/hadoop-hdfs-2.6.0-cdh5.14.2.jar


java.nio.charset.MalformedInputException

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
2021-03-10 22:07:14,385 (pool-5-thread-1) [ERROR - org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:280)] FATAL: Spool Directory source s1: { spoolDir: /opt/software/tomcat8563/webapps/mycurd/log }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing. java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:281) at org.apache.flume.serialization.ResettableFileInputStream.readChar(ResettableFileInputStream.java:283) at org.apache.flume.serialization.LineDeserializer.readLine(LineDeserializer.java:132) at org.apache.flume.serialization.LineDeserializer.readEvent(LineDeserializer.java:70) at org.apache.flume.serialization.LineDeserializer.readEvents(LineDeserializer.java:89) at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readDeserializerEvents(ReliableSpoolingFileEventReader.java:343) at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:318) at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:250) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

解决方法:agent配置source需要添加字符集设置(Flume默认是UTF-8):

复制代码
1
2
a1.sources.s1.inputCharset = GBK

java.lang.OutOfMemoryError: GC overhead limit exceeded

问题原因:内存溢出

解决方法:首先进入此目录${FLUME_HOME/bin,编辑flume-ng

复制代码
1
2
3
4
5
6
# set default params FLUME_CLASSPATH="" FLUME_JAVA_LIBRARY_PATH="" JAVA_OPTS="-Xmx1024m" #调整JVM堆的设置 LD_LIBRARY_PATH=""

三、hdfs日志生成

恭喜你!消灭了那么多障碍,终于成功了,文件顺利生成!

在这里插入图片描述

PS:如果有写错或者写的不好的地方,欢迎各位大佬在评论区留下宝贵的意见或者建议,敬上!如果这篇博客对您有帮助,希望您可以顺手帮我点个赞!不胜感谢!


原创作者:wsjslient

作者主页:https://blog.csdn.net/wsjslient


最后

以上就是积极羊最近收集整理的关于Flume数据采集至HDFS的排雷日记写在前面一、Flume-agent配置二、连续报错排雷三、hdfs日志生成的全部内容,更多相关Flume数据采集至HDFS内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(71)

评论列表共有 0 条评论

立即
投稿
返回
顶部