我是靠谱客的博主 现代洋葱,最近开发中收集的这篇文章主要介绍java zip 读取,在Java中高效读取zip文件,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

I working on a project which works on a very large amount of data.

I have a lot(thousands) of zip files, each containing ONE simple txt file with thousands of lines(about 80k lines).

What I am currently doing is the following:

for(File zipFile: dir.listFiles()){

ZipFile zf = new ZipFile(zipFile);

ZipEntry ze = (ZipEntry) zf.entries().nextElement();

BufferedReader in = new BufferedReader(new InputStreamReader(zf.getInputStream(ze)));

...

In this way I can read the file line by line, but it is definetely too slow.

Given the large number of files and lines that need to be read, I need to read them in a more efficient way.

I have looked for a different approach, but I haven't been able to find anything.

What I think I should use are the java nio APIs intended right for intensive I/O operations, but I don't know how to use them with zip files.

Any help would really be appreciated.

Thanks,

Marco

解决方案

I have a lot(thousands) of zip files. The zipped files are about 30MB each, while the txt inside the zip file is about 60/70 MB. Reading and processing the files with this code takes a lot of hours, around 15, but it depends.

Let's do some back-of-the-envelope calculations.

Let's say you have 5000 files. If it takes 15 hours to process them, this equates to ~10 seconds per file. The files are about 30MB each, so the throughput is ~3MB/s.

This is between one and two orders of magnitude slower than the rate at which ZipFile can decompress stuff.

Either there's a problem with the disks (are they local, or a network share?), or it is the actual processing that is taking most of the time.

The best way to find out for sure is by using a profiler.

最后

以上就是现代洋葱为你收集整理的java zip 读取,在Java中高效读取zip文件的全部内容,希望文章能够帮你解决java zip 读取,在Java中高效读取zip文件所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(46)

评论列表共有 0 条评论

立即
投稿
返回
顶部