概述
I working on a project which works on a very large amount of data.
I have a lot(thousands) of zip files, each containing ONE simple txt file with thousands of lines(about 80k lines).
What I am currently doing is the following:
for(File zipFile: dir.listFiles()){
ZipFile zf = new ZipFile(zipFile);
ZipEntry ze = (ZipEntry) zf.entries().nextElement();
BufferedReader in = new BufferedReader(new InputStreamReader(zf.getInputStream(ze)));
...
In this way I can read the file line by line, but it is definetely too slow.
Given the large number of files and lines that need to be read, I need to read them in a more efficient way.
I have looked for a different approach, but I haven't been able to find anything.
What I think I should use are the java nio APIs intended right for intensive I/O operations, but I don't know how to use them with zip files.
Any help would really be appreciated.
Thanks,
Marco
解决方案
I have a lot(thousands) of zip files. The zipped files are about 30MB each, while the txt inside the zip file is about 60/70 MB. Reading and processing the files with this code takes a lot of hours, around 15, but it depends.
Let's do some back-of-the-envelope calculations.
Let's say you have 5000 files. If it takes 15 hours to process them, this equates to ~10 seconds per file. The files are about 30MB each, so the throughput is ~3MB/s.
This is between one and two orders of magnitude slower than the rate at which ZipFile can decompress stuff.
Either there's a problem with the disks (are they local, or a network share?), or it is the actual processing that is taking most of the time.
The best way to find out for sure is by using a profiler.
最后
以上就是现代洋葱为你收集整理的java zip 读取,在Java中高效读取zip文件的全部内容,希望文章能够帮你解决java zip 读取,在Java中高效读取zip文件所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复