文章目录
- 写在前面
- 一、Flume-agent配置
- 二、连续报错排雷
- org/apache/hadoop/io/SequenceFile$CompressionType
- org/apache/commons/configuration/Configuration
- org/apache/hadoop/util/PlatformName
- org/apache/htrace/core/Tracer$Builder
- No FileSystem for scheme: hdfs
- java.nio.charset.MalformedInputException
- java.lang.OutOfMemoryError: GC overhead limit exceeded
- 三、hdfs日志生成
写在前面
本篇文章对于想了解Flume采集数据至HDFS的过程中有哪些需要注意的小伙伴有一定的帮助,这里为了模拟真实环境,临时搭建一台虚拟机,将数据存入TOMCAT中后,我们将数据从当前虚拟机传输至另外一台虚拟机的HDFS上。
环境所涉及版本:
- apache-tomcat-8.5.63
- flume-ng-1.6.0-cdh5.14.2
- hadoop-2.6.0-cdh5.14.2
一、Flume-agent配置
话不多说,直接上agent代码,简单的解释下每行的意义:
(如果还不够清楚,见官网手册 )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43# 定义source,channel,和sink的名字 a1.channels = c1 a1.sources = s1 a1.sinks = k1 #设置source为Spooling Directory Source(专门对文件提取的一种source) a1.sources.s1.type = spooldir a1.sources.s1.channels = c1 #s设置提取文件目录位置 a1.sources.s1.spoolDir = /opt/software/tomcat8563/webapps/mycurd/log #设置输入字符编码(Flume默认是UTF-8的,这里我的日志字符类型为GBK) a1.sources.s1.inputCharset = GBK #设置channel为File Channel a1.channels.c1.type = file #设置检查点目录 a1.channels.c1.checkpointDir = /opt/flume/checkpoint #设置数据目录 a1.channels.c1.dataDirs = /opt/flume/data #设置sink为HDFS Sink a1.sinks.k1.type = hdfs #设置HDFS目录路径(后面加了转义序列) a1.sinks.k1.hdfs.path = hdfs://192.168.237.130:9000/upload/%Y%m%d #设置文件的开头 a1.sinks.k1.hdfs.filePrefix = upload- #设置使用本地时间戳 a1.sinks.k1.hdfs.useLocalTimeStamp = true #设置刷写至HDFS的事件数 a1.sinks.k1.hdfs.batchSize = 100 #设置文件流类型 a1.sinks.k1.hdfs.fileType = DataStream #设置滚动至下一个文件等待的秒数 a1.sinks.k1.hdfs.rollInterval = 600 #设置滚动至下一个文件时当前文件的最大文件大小(单位字节) a1.sinks.k1.hdfs.rollSize = 134217700 #设置截断文件的事件数(设置为0就不因为event数量截断文件) a1.sinks.k1.hdfs.rollCount = 0 #设置hdfs存放副本数 a1.sinks.k1.hdfs.minBlockReplicas = 1 #设置通道 a1.sinks.k1.channel = c1
TIPS:channel的checkpointDir和dataDirs目录需要提前在虚拟机上创建好!
二、连续报错排雷
上面配置完后,,博主和大家一样迫不及待的启动agent试了起来:
1
2flume-ng agent --name a1 --conf /opt/software/flume160/conf/ -f /opt/flumeconf/file-hdfs.conf -Dflume.root.logger=DEBUG,console
然后一盆冷水接一盆冷水的浇来,我们来看看有哪些冷水打扰了我们的兴致:
org/apache/hadoop/io/SequenceFile$CompressionType
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
222021-03-10 23:58:20,087 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:146)] Failed to start agent because dependencies were not found in classpath. Error follows. java.lang.NoClassDefFoundError: org/apache/hadoop/io/SequenceFile$CompressionType at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:235) at org.apache.flume.conf.Configurables.configure(Configurables.java:41) at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:411) at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102) at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:141) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.SequenceFile$CompressionType at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 12 more
解决方法:将HADOOP下的jar包拷贝至Flume/lib目录下;
jar包名:${HADOOP_HOME}share/hadoop/common/hadoop-common-2.6.0-cdh5.14.2.jar
org/apache/commons/configuration/Configuration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
282021-03-11 08:45:13,867 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:447)] process failed java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36) at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:139) at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:259) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2979) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2971) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2834) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:260) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:252) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:701) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:698) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 18 more
解决方法:将HADOOP下的jar包拷贝至Flume/lib目录下;
jar包名:${HADOOP_HOME}share/hadoop/common/lib/commons-configuration-1.6.jar
org/apache/hadoop/util/PlatformName
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName at org.apache.hadoop.security.UserGroupInformation.getOSLoginModuleName(UserGroupInformation.java:442) at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:487) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2979) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2971) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2834) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:260) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:252) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:701) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:698) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 16 more
解决方法:将HADOOP下的jar包拷贝至Flume/lib目录下;
jar包名:${HADOOP_HOME}share/hadoop/common/lib/hadoop-auth-2.6.0-cdh5.14.2.jar
org/apache/htrace/core/Tracer$Builder
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
192021-03-11 09:07:27,157 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:447)] process failed java.lang.NoClassDefFoundError: org/apache/htrace/core/Tracer$Builder at org.apache.hadoop.fs.FsTracer.get(FsTracer.java:42) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2803) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2853) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2835) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:260) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:252) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:701) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:698) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
解决方法:将HADOOP下的jar包拷贝至Flume/lib目录下;
jar包名:${HADOOP_HOME}share/hadoop/common/lib/htrace-core4-4.0.1-incubating.jar
No FileSystem for scheme: hdfs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
202021-03-11 09:14:59,911 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:443)] HDFS IO error java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2796) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2853) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2835) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:260) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:252) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:701) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:698) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
解决方法:将HADOOP下的jar包拷贝至Flume/lib目录下;
jar包名:${HADOOP_HOME}share/hadoop/hdfs/hadoop-hdfs-2.6.0-cdh5.14.2.jar
java.nio.charset.MalformedInputException
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
182021-03-10 22:07:14,385 (pool-5-thread-1) [ERROR - org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:280)] FATAL: Spool Directory source s1: { spoolDir: /opt/software/tomcat8563/webapps/mycurd/log }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing. java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:281) at org.apache.flume.serialization.ResettableFileInputStream.readChar(ResettableFileInputStream.java:283) at org.apache.flume.serialization.LineDeserializer.readLine(LineDeserializer.java:132) at org.apache.flume.serialization.LineDeserializer.readEvent(LineDeserializer.java:70) at org.apache.flume.serialization.LineDeserializer.readEvents(LineDeserializer.java:89) at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readDeserializerEvents(ReliableSpoolingFileEventReader.java:343) at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:318) at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:250) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
解决方法:agent配置source需要添加字符集设置(Flume默认是UTF-8
):
1
2a1.sources.s1.inputCharset = GBK
java.lang.OutOfMemoryError: GC overhead limit exceeded
问题原因:内存溢出
解决方法:首先进入此目录${FLUME_HOME/bin,编辑flume-ng
1
2
3
4
5
6# set default params FLUME_CLASSPATH="" FLUME_JAVA_LIBRARY_PATH="" JAVA_OPTS="-Xmx1024m" #调整JVM堆的设置 LD_LIBRARY_PATH=""
三、hdfs日志生成
恭喜你!消灭了那么多障碍,终于成功了,文件顺利生成!
PS:如果有写错或者写的不好的地方,欢迎各位大佬在评论区留下宝贵的意见或者建议,敬上!如果这篇博客对您有帮助,希望您可以顺手帮我点个赞!不胜感谢!
原创作者:wsjslient |
作者主页:https://blog.csdn.net/wsjslient |
最后
以上就是积极羊最近收集整理的关于Flume数据采集至HDFS的排雷日记写在前面一、Flume-agent配置二、连续报错排雷三、hdfs日志生成的全部内容,更多相关Flume数据采集至HDFS内容请搜索靠谱客的其他文章。
发表评论 取消回复