概述
做一个demo,解析docx的word文档,替换占位符,并导出成pdf
分两步,第一步是解析word,替换占位符,生成一个新的word文件。第二步再导出成pdf
Springboot:2.4.0
一、解析word,替换占位符
1.所需要的依赖:
<!-- apache poi-->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-scratchpad</artifactId>
<version>3.8</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.6</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.8</version>
</dependency>
2.将word模板文件放到resource下的static/aaa.docx
大致如下:
3.编写代码
service:
package com.example.rabmq.ramqdemo.word;
import org.apache.poi.POIXMLDocument;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.springframework.core.io.ClassPathResource;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Component;
import java.io.*;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
/**
* @author Honey
* @Date 2021/9/9
* @Description
*/
@Component
public class GeneratePdf {
public XWPFDocument ttt() {
Resource resource = new ClassPathResource("static/hubei-protocol-template2021.docx");
InputStream inputStream = null;
Map<String, String> map = new HashMap<>();
map.put("${CONTRACT_BH}", "HT11");
map.put("${PARTY_A_NAME}", "测试公司");
String absolutePath = null;
try {
absolutePath = resource.getFile().getAbsolutePath();
System.out.println(absolutePath);
XWPFDocument document = new XWPFDocument(POIXMLDocument.openPackage(absolutePath));
Iterator<XWPFParagraph> paragraphsIterator = document.getParagraphsIterator();
while (paragraphsIterator.hasNext()) {
XWPFParagraph next = paragraphsIterator.next();
List<XWPFRun> runs = next.getRuns();
for (int i = 0; i < runs.size(); i++) {
XWPFRun xwpfRun = runs.get(i);
String text = xwpfRun.getText(xwpfRun.getTextPosition());
System.out.println(text);
if (text == null || text.trim().equals("")) {
continue;
}
//替换
for (Map.Entry<String, String> entry : map.entrySet()) {
String key = entry.getKey();
String value = entry.getValue();
if (text.contains(key)) {
text = text.replace(key, value);
xwpfRun.setText(text, 0);
break;
}
}
}
}
return document;
// System.out.println("aa");
// FileOutputStream fos = new FileOutputStream("C:/资源/aaa.docx");
// document.write(fos);
// ByteArrayOutputStream ostream = new ByteArrayOutputStream();
// FileOutputStream out = new FileOutputStream(absolutePath);
// document.write(out);
// out.flush();
// out.close();
// out.write(ostream.toByteArray());
} catch (IOException e) {
e.printStackTrace();
return null;
}
}
}
controller:
package com.example.rabmq.ramqdemo.word;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import javax.servlet.http.HttpServletResponse;
import java.io.OutputStream;
/**
* @author Honey
* @Date 2021/9/9
* @Description
*/
@Controller
public class PdfController {
@Autowired
private GeneratePdf generatePdf;
@RequestMapping("testExport")
public void gen(HttpServletResponse response) throws Exception {
OutputStream outputStream = response.getOutputStream();
XWPFDocument ttt = generatePdf.ttt();
StringBuilder contentDispositionValue = new StringBuilder();
contentDispositionValue.append("attachment; filename=")
.append("testFile.docx")
.append(";")
.append("filename*=")
.append("utf-8''")
.append("testFile.docx");
response.setHeader("Content-disposition", contentDispositionValue.toString());
ttt.write(outputStream);
}
}
3.项目启动后,使用postman测试接口
4.打开保存的文件,可以看到目标值被替换了
PS:在写demo进行测试的时候,遇到一个报错的问题:
Can’t obtain the input stream from /docProps/app.xml
在文本替换后,将文件写入到一个word文件里,poi报错了。
原因发现是,FileOutputStream写出的文件用的之前的同一个文件导致的,这里代码进行如下改造之后就没问题了。
//原代码
FileOutputStream fos = new FileOutputStream(absolutePath);
//改成
FileOutputStream fos = new FileOutputStream("C:/资源/aaa.docx");
//可能是不能使用原来文件的绝对路径。
二、解析后的word导出成pdf文件
经发现,word转换成pdf有好几种方式,但是各有优劣。
1.apache poi + itext
这种方式无需安装其他组件,使用java代码就可以运行,但是word里的图片,还有表格等会丢失,只适合比较简单的文档处理。
2.jacob的jar包
这种方式貌似效率还可以,但是jacob只支持windows,如果你的程序需要在linux服务器上运行,那么就用不了。
3.openOffice
需要使用openOffice搭建一个服务,比较繁琐,运维成本比较高,支持windows和linux。
这里使用第一个poi + itext做一个转换的demo
1.加入itext相关依赖
<!-- apache poi-->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.14</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-scratchpad</artifactId>
<version>3.14</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>xdocreport</artifactId>
<version>1.0.6</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml-schemas</artifactId>
<version>3.14</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>ooxml-schemas</artifactId>
<version>1.3</version>
</dependency>
<dependency>
<groupId>com.itextpdf.tool</groupId>
<artifactId>xmlworker</artifactId>
<version>5.5.11</version>
</dependency>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.11.3</version>
</dependency>
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itextpdf</artifactId>
<version>5.5.11</version>
</dependency>
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itext-asian</artifactId>
<version>5.2.0</version>
</dependency>
2.增加代码
package com.example.rabmq.ramqdemo.word;
import com.itextpdf.text.BaseColor;
import com.itextpdf.text.Font;
import com.itextpdf.text.FontProvider;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.pdf.BaseFont;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import org.apache.poi.POIXMLDocument;
import org.apache.poi.xwpf.converter.core.BasicURIResolver;
import org.apache.poi.xwpf.converter.core.FileImageExtractor;
import org.apache.poi.xwpf.converter.core.utils.StringUtils;
import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Entities;
import org.jsoup.select.Elements;
import org.springframework.core.io.ClassPathResource;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Component;
import java.io.*;
import java.nio.charset.Charset;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
/**
* @author Honey
* @Date 2021/9/9
* @Description
*/
@Component
public class GeneratePdf {
public XWPFDocument ttt() {
Resource resource = new ClassPathResource("static/hubei-protocol-template2021.docx");
InputStream inputStream = null;
Map<String, String> map = new HashMap<>();
map.put("${CONTRACT_BH}", "HT11");
map.put("${PARTY_A_NAME}", "测试公司");
String absolutePath = null;
try {
absolutePath = resource.getFile().getAbsolutePath();
System.out.println(absolutePath);
XWPFDocument document = new XWPFDocument(POIXMLDocument.openPackage(absolutePath));
Iterator<XWPFParagraph> paragraphsIterator = document.getParagraphsIterator();
while (paragraphsIterator.hasNext()) {
XWPFParagraph next = paragraphsIterator.next();
List<XWPFRun> runs = next.getRuns();
for (int i = 0; i < runs.size(); i++) {
XWPFRun xwpfRun = runs.get(i);
String text = xwpfRun.getText(xwpfRun.getTextPosition());
if (text == null || text.trim().equals("")) {
continue;
}
//替换
for (Map.Entry<String, String> entry : map.entrySet()) {
String key = entry.getKey();
String value = entry.getValue();
if (text.contains(key)) {
text = text.replace(key, value);
xwpfRun.setText(text, 0);
break;
}
}
}
}
return document;
// System.out.println("aa");
// FileOutputStream fos = new FileOutputStream("C:/资源/aaa.docx");
// document.write(fos);
// ByteArrayOutputStream ostream = new ByteArrayOutputStream();
// FileOutputStream out = new FileOutputStream(absolutePath);
// document.write(out);
// out.flush();
// out.close();
// out.write(ostream.toByteArray());
} catch (IOException e) {
e.printStackTrace();
return null;
}
}
/**
* 将docx格式文件转成html
*
* @param docxPath docx文件路径
* @param imageDir docx文件中图片存储目录
* @return html
*/
public static String docx2Html(String docxPath, String imageDir) {
String content = null;
FileInputStream in = null;
ByteArrayOutputStream baos = null;
try {
// 1> 加载文档到XWPFDocument
in = new FileInputStream(new File(docxPath));
XWPFDocument document = new XWPFDocument(in);
// 2> 解析XHTML配置(这里设置IURIResolver来设置图片存放的目录)
XHTMLOptions options = XHTMLOptions.create();
// 存放word中图片的目录
options.setExtractor(new FileImageExtractor(new File(imageDir)));
options.URIResolver(new BasicURIResolver(imageDir));
options.setIgnoreStylesIfUnused(false);
options.setFragment(true);
// 3> 将XWPFDocument转换成XHTML
baos = new ByteArrayOutputStream();
XHTMLConverter.getInstance().convert(document, baos, options);
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
if (in != null) {
in.close();
}
if (baos != null) {
content = new String(baos.toByteArray(), "utf-8");
baos.close();
}
} catch (Exception e) {
e.printStackTrace();
}
}
return content;
}
/**
* 使用jsoup规范化html
*
* @param html html内容
* @return 规范化后的html
*/
public static String formatHtml(String html) {
org.jsoup.nodes.Document doc = Jsoup.parse(html);
// 去除过大的宽度
String style = doc.attr("style");
if (StringUtils.isNotEmpty(style) && style.contains("width")) {
doc.attr("style", "");
}
Elements divs = doc.select("div");
for (Element div : divs) {
String divStyle = div.attr("style");
if (StringUtils.isNotEmpty(divStyle) && divStyle.contains("width")) {
div.attr("style", "");
}
}
// jsoup生成闭合标签
doc.outputSettings().syntax(org.jsoup.nodes.Document.OutputSettings.Syntax.xml);
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
return doc.html();
}
/**
* html转成pdf
*
* @param html html
* @param outputPdfPath 输出pdf路径
*/
public static void htmlToPdf(String html, String outputPdfPath) {
com.itextpdf.text.Document document = null;
ByteArrayInputStream bais = null;
try {
// 纸
document = new com.itextpdf.text.Document(PageSize.A4);
// 笔
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outputPdfPath));
document.open();
// html转pdf
bais = new ByteArrayInputStream(html.getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writer, document, bais,
Charset.forName("UTF-8"), new FontProvider() {
@Override
public boolean isRegistered(String s) {
return false;
}
@Override
public Font getFont(String s, String s1, boolean embedded, float size, int style, BaseColor baseColor) {
// 配置字体
Font font = null;
try {
// 方案一:使用本地字体(本地需要有字体)
// BaseFont bf = BaseFont.createFont("c:/Windows/Fonts/simsun.ttc,0", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
// BaseFont bf = BaseFont.createFont("C:/Windows/Fonts/seguisym.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
// 方案二:使用jar包:iTextAsian,这样只需一个jar包就可以了
BaseFont bf = BaseFont.createFont("STSong-Light", "UniGB-UCS2-H", BaseFont.EMBEDDED);
font = new Font(bf, size, style, baseColor);
font.setColor(baseColor);
} catch (Exception e) {
e.printStackTrace();
}
return font;
}
});
} catch (Exception e) {
e.printStackTrace();
} finally {
if (document != null) {
document.close();
}
if (bais != null) {
try {
bais.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
public static void main(String[] args) {
String basePath = "C:/个人资料/";
String docxPath = basePath + "testFile2.docx";
String pdfPath = basePath + "index.pdf";
String imageDir = "C:/个人资料/";
// 测试doc转pdf
// String docHtml = doc2Html(docPath, imageDir);
// docHtml = formatHtml(docHtml);
// htmlToPdf(docHtml, pdfPath);
// 测试docx转pdf
String docxHtml = docx2Html(docxPath, imageDir);
docxHtml = formatHtml(docxHtml);
// docxHtml = docxHtml.replace("___", "张三");
htmlToPdf(docxHtml, pdfPath);
}
}
3.运行main方法,得到一个转换后的pdf文件
PS:转换后的文件,丢失了很多的表格框,另外图片也无法处理,如果不加入上面itext-asign里处理编码格式的方法,转换出来的文件还会出现中文乱码。
因此这种方式只适合比较简单的word文档转pdf,如果是专门做文档业务的,可能要搭建一个高效的独立的office服务,可转换,在在线预览等。
下次有时间研究一下openOffice以及pageOffice,然后看看效率和效果。
最后
以上就是俭朴河马为你收集整理的java解析word替换占位符并导出pdf的全部内容,希望文章能够帮你解决java解析word替换占位符并导出pdf所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复