java生成excel，大数据缓存，解决内存溢出问题

100 阅读 0 评论 66 点赞

我是靠谱客的博主缥缈秀发，最近开发中收集的这篇文章主要介绍java生成excel，大数据缓存，解决内存溢出问题，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

使用poi生成excel，导入jar包

<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.10-FINAL</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.10-FINAL</version>
</dependency>

生成excel主要的对象是：Workbook 是一个接口，主要有三种创建方式

1,创建2003 版本的excle，后缀 .xls，最多只能存5.6w行，256列左右的数据
Workbook workbook = new XSSFWorkbook()；
2，创建2007 及以后的版本， 后缀 .xlsx 可以存104w行，16384列左右的数据,但这种方式数据稍微大一点就会OOM
Workbook workbook = new HSSFWorkbook()；
3, 专门用来生成大数据excel，但也会受到 excel 104w 上限的限制
Workbook workbook = new SXSSFWorkbook()；
4，生成 .csv 后缀的excel文件， .csv理论上就纯文本，存多少数据都没问题，但数据太大打开会非常慢
用上面三种方式都可以生成

先来看看普通数据的读取，标准 .xls .xlsx 可以直接生成Workbook对象，但CSV文件是根据特殊字符分割生成数组。

//读取 excel 根据路径生成输入流，获取workbook对象，获取sheet，获取行row，获取列cell
/**
* @param filePath
文件路径
* @param endWith
结束符
* @return
* @throws Exception
*/
public static List<List<List<String>>> readXls(String filePath, String endWith) throws Exception {
InputStream is = null;
try {
is = new FileInputStream(filePath);
//HSSFWorkbook表示整个Excel
HSSFWorkbook hssfWorkbook = new HSSFWorkbook(is);
//循环每一页，并处理当前的循环页
for (int numSheet = 0; numSheet < hssfWorkbook.getNumberOfSheets(); numSheet++) {
//HSSFSheet表示某一页
HSSFSheet hssfSheet = hssfWorkbook.getSheetAt(numSheet);
if (hssfSheet == null) {
continue;
}
//处理当前页，循环处理每一行的数据
List<List<String>> sheetResult = new ArrayList<>();
for (int rowNumIndex = 0; rowNumIndex <= hssfSheet.getLastRowNum(); rowNumIndex++) {
//HSSFRow表示每一行的数据
HSSFRow hssfRow = hssfSheet.getRow(rowNumIndex);
int minColIx = hssfRow.getFirstCellNum();
int maxColIx = hssfRow.getLastCellNum();
String firstColumn = hssfRow.getCell(0).getStringCellValue();
//结尾符
if (StringUtils.isBlank(firstColumn) ||
(StringUtils.isNotBlank(endWith) && firstColumn.contains(endWith))) {
break;
}
List<String> rowList = new ArrayList<>();
//遍历该行，并获取每一个cell的数据
for (int colIx = minColIx; colIx < maxColIx; colIx++) {
HSSFCell hssfCell = hssfRow.getCell(colIx);
if (hssfCell == null) {
continue;
}
rowList.add(getStringVal(hssfCell));
}
sheetResult.add(rowList);
}
result.add(sheetResult);
}
return result;
} catch (FileNotFoundException e) {
throw new BusinessException(e);
} finally {
//这里关闭流的顺序，最后用完，最先关闭，后进先出，
if (is != null) {
try {
is.close();
} catch (IOException e) {
throw new BusinessException(e);
}
}
}
}
//处理不同数据格式
private static String getStringVal(HSSFCell cell) {
switch (cell.getCellType()) {
case Cell.CELL_TYPE_BOOLEAN:
return cell.getBooleanCellValue() ? Boolean.TRUE.toString() : Boolean.FALSE.toString();
case Cell.CELL_TYPE_FORMULA:
return cell.getCellFormula();
case Cell.CELL_TYPE_NUMERIC:
cell.setCellType(Cell.CELL_TYPE_STRING);
return cell.getStringCellValue();
case Cell.CELL_TYPE_STRING:
return cell.getStringCellValue();
default:
return "";
}
}
//读取csv
/**
* @param filePath:文件全路径
* @return
* @throws Exception
*/
public static List<List<String>> readCsv(String filePath, String endLine) throws Exception {
DataInputStream in = null;
BufferedReader reader = null;
try {
List<List<String>> result = new ArrayList<>();
in = new DataInputStream(new FileInputStream(new File(filePath)));
//这里判断文件编码格式，一般都是GBK
if ("is UTF-8") {
reader = new BufferedReader(new InputStreamReader(in, StandardCharsets.UTF_8));
} else {
reader = new BufferedReader(new InputStreamReader(in, "GBK"));
}
for (int i = 1; i < startRow; i++) {
reader.readLine();
}
String line = null;
while ((line = reader.readLine()) != null) {
if (line.contains(endLine)) {
//模板最后几行不做解析
break;
}
String[] item = line.split(",");
result.add(Arrays.asList(item));
}
return result;
} catch (Exception e) {
throw new BusinessException(e);
} finally {
if (in != null) {
try {
in.close();
} catch (IOException e) {
throw new BusinessException(e);
}
}
if (reader != null) {
try {
reader.close();
} catch (IOException e) {
throw new BusinessException(e);
}
}
}
}

写数据，需要先往response 里面设置头信息类型等，再获取contentType，否则会下载文件类型一直不生效

//生成workbook
public static void download2Excel(Workbook workbook, Stirng downloadName, List<List<String>> datas,HttpServletResponse response) throws Exception {
Sheet sheet = workbook.getSheet(sheetName);
if (sheet == null) {
sheet = workbook.createSheet(sheetName);
}
for (int i = 0; i < datas.size(); i++) {
Row row = sheet.createRow(i);
List<String> cells = datas.get(i);
for (int j = 0; j < cells.size(); j++) {
Cell cell = row.createCell(j);
String data = cells.get(j);
if (StringUtils.isBlank(data)) {
cell.setCellValue("");
continue;
}
cell.setCellValue(String.valueOf(data));
}
}
//生成文件
if (response == null) {
String filePath = "server_path" + downloadName;
downloadByPath(filePath, workbook);
} else {
downLoadByResponse(response, workbook, downloadName);
}
}
public void downLoadByResponse(HttpServletResponse response,
Workbook workbook, String downloadName){
ByteArrayOutputStream outputStream = null;
OutputStream out = null;
try {
//转码
downloadName = new String(downloadName.getBytes("UTF-8"), "ISO8859-1");
outputStream = new ByteArrayOutputStream();
workbook.write(outputStream);
//转换成字节流分批写入
ByteArrayInputStream tempIn = new ByteArrayInputStream(outputStream.toByteArray());
response.setContentType("application/octet-stream");
response.setHeader("Content-Disposition", "attachment;filename="" + downloadName + """);
//如果有样式生成的文件打开之后会弹出警告框，网上查资料这里可以解决，但我试过并没有，有大神知道还望不吝赐教
//这里还有一个坑，如果弄成通用文件下载，会判断文件后缀设置不同的contentType，如果先获取response.getOutputStream() 这个输出流，再response.setContentType("")设置，contentType会一直是null，导致下载不正确，
response.setHeader("Content-Length", String.valueOf(tempIn.available()));
out = response.getOutputStream();
byte[] buffer = new byte[1024];
int a;
//分批写入，有效解决内存溢出，但主要优化还不是这里 ，
while ((a = tempIn.read(buffer)) != -1) {
out.write(buffer, 0, a);
}
response.flushBuffer();
logger.info("数据写入完成");
out.flush();
} catch (Exception e) {
throw new BusinessException(e);
} finally {
if (out != null) {
try {
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if (outputStream != null) {
try {
outputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
public void downloadByPath(Stirng filePath, Workbook workbook){
//生成文件
FileOutputStream out = null;
try {
out = new FileOutputStream(filePath);
workbook.write(out);
} catch (Exception e) {
logger.error("file write error, ", e);
} finally {
if (out != null) {
try {
//主要是这里写入数据
out.flush();
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if (workbook != null) {
workbook.dispose();
}
}
}

server层调用

//读取数据
public void readFile(){
String filePath = "/Users/xinfei/Documents/temp/datas.xls";
List<List<String>> lists = readXls(filePath, "总交易单数");
}
//写数据
public void writeData(){
String tempFilePath = '在服务器生成的临时文件.xlsx';
SXSSFWorkbook workbook = new SXSSFWorkbook(new XSSFWorkbook(new FileInputStream(tempFilePath)));
workbook.setCompressTempFiles(true);
List<List<String>> sheetData = new ArrayList<>();
for (int pageNo = 1; pageNo <= pageTotal; pageNo++) {
SimplePage page = new SimplePage(pageNo, count, pageSize);
List<UserDatas> payLogs = userMapper.selectList(page);
if (CollectionUtils.isEmpty(payLogs)) {
continue;
}
//将对象转换成list数据
translateData(payLogs, sheetData);
}
//将转化后的数据写入文件
download2Excel(workbook, downloadName, datas, null)
sheetData.clear();
}

总结：

1，读数据，获取文件路径 -> 获取workbook对象 -> 获取sheet -> 获取row -> 获取cell的values -> 生成list对象 -> 转换成模型对象

2，写数据，获取模型对象 -> 转换成 list对象 -> new workbook对象 -> 获取sheet -> 往sheet写数据 -> 转换成输出流 -> 生成文件

容易出现内存溢出，要么数据获取太多，要么出现死循环，上面写数据每一步都可以做拆分

2.1 获取模型对象，在内存只有512M的情况下， 60个字段，获取1w数据，全放内存中容易 gc overhead limit exceeded 这时候可以采取分页获取数据，

2.2 将分页数据转化成list，每页单独写入workbook，这样虽然解决了模型对象过大，但workbook会越来越大，到1.5w 数据量也会内存溢出

2.3 XSSFWorkbook HSSFWorkbook对大数据的写入兼容不是很好，但SXSSFWorkbook专门用来写大数据，

查看源码，构造XSSFWorkbook对象，调用write方法会生成临时文件，主要做的事情将大部分数据写入这个临时文件并且不能修改，少数数据留在内存中可以修改

public class SXSSFWorkbook implements Workbook {
//.....
public SXSSFWorkbook() {
this((XSSFWorkbook)null);
}
public SXSSFWorkbook(XSSFWorkbook workbook) {
this(workbook, 100);
}
public void write(OutputStream stream) throws IOException {
Iterator i$ = this._xFromSxHash.values().iterator();
while(i$.hasNext()) {
SXSSFSheet sheet = (SXSSFSheet)i$.next();
sheet.flushRows();
}
File tmplFile = File.createTempFile("poi-sxssf-template", ".xlsx");
try {
FileOutputStream os = new FileOutputStream(tmplFile);
try {
this._wb.write(os);
} finally {
os.close();
}
this.injectData(tmplFile, stream);
} finally {
tmplFile.delete();
}
}
}

2.4 基于XSSFWorkbook 这个特性， SXSSFWorkbook workbook = new SXSSFWorkbook(new XSSFWorkbook(new FileInputStream(tempFilePath))); 初始化传入服务器路径（方便删除，用完删除），每页获取的数据转化成list之后调用write方法，直到所有数据查完再调用 flush方法，刷新数据，

以上测试都是在512M内存，很奇怪不管怎么优化都会内存溢出，即使用了SXSSFWorkbook ，大部分性能都消耗在gc上，但内存上了1G会好多了，workbook存10w数据，再write，最后再flush 也没事。

转载于:https://my.oschina.net/u/4079875/blog/3089801