java解析word替换占位符并导出pdf

84 阅读 0 评论 56 点赞

我是靠谱客的博主俭朴河马，最近开发中收集的这篇文章主要介绍java解析word替换占位符并导出pdf，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

做一个demo，解析docx的word文档，替换占位符，并导出成pdf

分两步，第一步是解析word，替换占位符，生成一个新的word文件。第二步再导出成pdf

Springboot：2.4.0

一、解析word，替换占位符

1.所需要的依赖：

<!--  apache  poi-->
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-scratchpad</artifactId>
            <version>3.8</version>
        </dependency>
        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>2.6</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>3.8</version>
        </dependency>

2.将word模板文件放到resource下的static/aaa.docx
大致如下：
在这里插入图片描述

3.编写代码

service：

package com.example.rabmq.ramqdemo.word;

import org.apache.poi.POIXMLDocument;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.springframework.core.io.ClassPathResource;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Component;

import java.io.*;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;

/**
 * @author Honey
 * @Date 2021/9/9
 * @Description
 */
@Component
public class GeneratePdf {

    public XWPFDocument ttt() {
        Resource resource = new ClassPathResource("static/hubei-protocol-template2021.docx");
        InputStream inputStream = null;
        Map<String, String> map = new HashMap<>();
        map.put("${CONTRACT_BH}", "HT11");
        map.put("${PARTY_A_NAME}", "测试公司");
        String absolutePath = null;
        try {
            absolutePath = resource.getFile().getAbsolutePath();
            System.out.println(absolutePath);
            XWPFDocument document = new XWPFDocument(POIXMLDocument.openPackage(absolutePath));

            Iterator<XWPFParagraph> paragraphsIterator = document.getParagraphsIterator();
            while (paragraphsIterator.hasNext()) {
                XWPFParagraph next = paragraphsIterator.next();
                List<XWPFRun> runs = next.getRuns();
                for (int i = 0; i < runs.size(); i++) {
                    XWPFRun xwpfRun = runs.get(i);
                    String text = xwpfRun.getText(xwpfRun.getTextPosition());
                    System.out.println(text);
                    if (text == null || text.trim().equals("")) {
                        continue;
                    }
                    //替换
                    for (Map.Entry<String, String> entry : map.entrySet()) {
                        String key = entry.getKey();
                        String value = entry.getValue();
                        if (text.contains(key)) {
                            text = text.replace(key, value);
                            xwpfRun.setText(text, 0);
                            break;
                        }
                    }
                    
                }
            }
            return document;
//            System.out.println("aa");
//            FileOutputStream fos = new FileOutputStream("C:/资源/aaa.docx");
//            document.write(fos);
//            ByteArrayOutputStream ostream = new ByteArrayOutputStream();
//            FileOutputStream out = new FileOutputStream(absolutePath);
//            document.write(out);
//            out.flush();
//            out.close();
//            out.write(ostream.toByteArray());

        } catch (IOException e) {
            e.printStackTrace();
            return null;
        }
    }
}

controller:

package com.example.rabmq.ramqdemo.word;

import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;

import javax.servlet.http.HttpServletResponse;
import java.io.OutputStream;

/**
 * @author Honey
 * @Date 2021/9/9
 * @Description
 */
@Controller
public class PdfController {

    @Autowired
    private GeneratePdf generatePdf;

    @RequestMapping("testExport")
    public void gen(HttpServletResponse response) throws Exception {
        OutputStream outputStream = response.getOutputStream();
        XWPFDocument ttt = generatePdf.ttt();
        StringBuilder contentDispositionValue = new StringBuilder();
        contentDispositionValue.append("attachment; filename=")
                .append("testFile.docx")
                .append(";")
                .append("filename*=")
                .append("utf-8''")
                .append("testFile.docx");

        response.setHeader("Content-disposition", contentDispositionValue.toString());
        ttt.write(outputStream);
    }
}

3.项目启动后，使用postman测试接口

在这里插入图片描述
4.打开保存的文件，可以看到目标值被替换了

在这里插入图片描述
PS:在写demo进行测试的时候，遇到一个报错的问题：
Can’t obtain the input stream from /docProps/app.xml
在文本替换后，将文件写入到一个word文件里，poi报错了。
原因发现是，FileOutputStream写出的文件用的之前的同一个文件导致的，这里代码进行如下改造之后就没问题了。

//原代码
FileOutputStream fos = new FileOutputStream(absolutePath);
//改成
FileOutputStream fos = new FileOutputStream("C:/资源/aaa.docx");
//可能是不能使用原来文件的绝对路径。

二、解析后的word导出成pdf文件

经发现，word转换成pdf有好几种方式，但是各有优劣。
1.apache poi + itext
这种方式无需安装其他组件，使用java代码就可以运行，但是word里的图片，还有表格等会丢失，只适合比较简单的文档处理。

2.jacob的jar包
这种方式貌似效率还可以，但是jacob只支持windows，如果你的程序需要在linux服务器上运行，那么就用不了。

3.openOffice
需要使用openOffice搭建一个服务，比较繁琐，运维成本比较高，支持windows和linux。

这里使用第一个poi + itext做一个转换的demo

1.加入itext相关依赖

<!--  apache  poi-->
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>3.14</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-scratchpad</artifactId>
            <version>3.14</version>
        </dependency>
        <dependency>
            <groupId>fr.opensagres.xdocreport</groupId>
            <artifactId>xdocreport</artifactId>
            <version>1.0.6</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml-schemas</artifactId>
            <version>3.14</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>ooxml-schemas</artifactId>
            <version>1.3</version>
        </dependency>
        <dependency>
            <groupId>com.itextpdf.tool</groupId>
            <artifactId>xmlworker</artifactId>
            <version>5.5.11</version>
        </dependency>
        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.11.3</version>
        </dependency>
        <dependency>
            <groupId>com.itextpdf</groupId>
            <artifactId>itextpdf</artifactId>
            <version>5.5.11</version>
        </dependency>
        <dependency>
            <groupId>com.itextpdf</groupId>
            <artifactId>itext-asian</artifactId>
            <version>5.2.0</version>
        </dependency>

2.增加代码

package com.example.rabmq.ramqdemo.word;

import com.itextpdf.text.BaseColor;
import com.itextpdf.text.Font;
import com.itextpdf.text.FontProvider;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.pdf.BaseFont;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import org.apache.poi.POIXMLDocument;
import org.apache.poi.xwpf.converter.core.BasicURIResolver;
import org.apache.poi.xwpf.converter.core.FileImageExtractor;
import org.apache.poi.xwpf.converter.core.utils.StringUtils;
import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Entities;
import org.jsoup.select.Elements;
import org.springframework.core.io.ClassPathResource;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Component;

import java.io.*;
import java.nio.charset.Charset;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;

/**
 * @author Honey
 * @Date 2021/9/9
 * @Description
 */
@Component
public class GeneratePdf {

    public XWPFDocument ttt() {
        Resource resource = new ClassPathResource("static/hubei-protocol-template2021.docx");
        InputStream inputStream = null;
        Map<String, String> map = new HashMap<>();
        map.put("${CONTRACT_BH}", "HT11");
        map.put("${PARTY_A_NAME}", "测试公司");
        String absolutePath = null;
        try {
            absolutePath = resource.getFile().getAbsolutePath();
            System.out.println(absolutePath);
            XWPFDocument document = new XWPFDocument(POIXMLDocument.openPackage(absolutePath));

            Iterator<XWPFParagraph> paragraphsIterator = document.getParagraphsIterator();
            while (paragraphsIterator.hasNext()) {
                XWPFParagraph next = paragraphsIterator.next();
                List<XWPFRun> runs = next.getRuns();
                for (int i = 0; i < runs.size(); i++) {
                    XWPFRun xwpfRun = runs.get(i);
                    String text = xwpfRun.getText(xwpfRun.getTextPosition());
                    if (text == null || text.trim().equals("")) {
                        continue;
                    }
                    //替换
                    for (Map.Entry<String, String> entry : map.entrySet()) {
                        String key = entry.getKey();
                        String value = entry.getValue();
                        if (text.contains(key)) {
                            text = text.replace(key, value);
                            xwpfRun.setText(text, 0);
                            break;
                        }
                    }
                }
            }
            return document;
//            System.out.println("aa");
//            FileOutputStream fos = new FileOutputStream("C:/资源/aaa.docx");
//            document.write(fos);
//            ByteArrayOutputStream ostream = new ByteArrayOutputStream();
//            FileOutputStream out = new FileOutputStream(absolutePath);
//            document.write(out);
//            out.flush();
//            out.close();
//            out.write(ostream.toByteArray());

        } catch (IOException e) {
            e.printStackTrace();
            return null;
        }
    }

    /**
     * 将docx格式文件转成html
     *
     * @param docxPath docx文件路径
     * @param imageDir docx文件中图片存储目录
     * @return html
     */
    public static String docx2Html(String docxPath, String imageDir) {
        String content = null;

        FileInputStream in = null;
        ByteArrayOutputStream baos = null;
        try {
            // 1> 加载文档到XWPFDocument
            in = new FileInputStream(new File(docxPath));
            XWPFDocument document = new XWPFDocument(in);
            // 2> 解析XHTML配置（这里设置IURIResolver来设置图片存放的目录）
            XHTMLOptions options = XHTMLOptions.create();
            // 存放word中图片的目录
            options.setExtractor(new FileImageExtractor(new File(imageDir)));
            options.URIResolver(new BasicURIResolver(imageDir));
            options.setIgnoreStylesIfUnused(false);
            options.setFragment(true);
            // 3> 将XWPFDocument转换成XHTML
            baos = new ByteArrayOutputStream();
            XHTMLConverter.getInstance().convert(document, baos, options);
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                if (in != null) {
                    in.close();
                }
                if (baos != null) {
                    content = new String(baos.toByteArray(), "utf-8");
                    baos.close();
                }
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        return content;
    }

    /**
     * 使用jsoup规范化html
     *
     * @param html html内容
     * @return 规范化后的html
     */
    public static String formatHtml(String html) {
        org.jsoup.nodes.Document doc = Jsoup.parse(html);
        // 去除过大的宽度
        String style = doc.attr("style");
        if (StringUtils.isNotEmpty(style) && style.contains("width")) {
            doc.attr("style", "");
        }
        Elements divs = doc.select("div");
        for (Element div : divs) {
            String divStyle = div.attr("style");
            if (StringUtils.isNotEmpty(divStyle) && divStyle.contains("width")) {
                div.attr("style", "");
            }
        }
        // jsoup生成闭合标签
        doc.outputSettings().syntax(org.jsoup.nodes.Document.OutputSettings.Syntax.xml);
        doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
        return doc.html();
    }


    /**
     * html转成pdf
     *
     * @param html          html
     * @param outputPdfPath 输出pdf路径
     */
    public static void htmlToPdf(String html, String outputPdfPath) {
        com.itextpdf.text.Document document = null;
        ByteArrayInputStream bais = null;
        try {
            // 纸
            document = new com.itextpdf.text.Document(PageSize.A4);
            // 笔
            PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outputPdfPath));
            document.open();

            // html转pdf
            bais = new ByteArrayInputStream(html.getBytes());
            XMLWorkerHelper.getInstance().parseXHtml(writer, document, bais,
                    Charset.forName("UTF-8"), new FontProvider() {
                        @Override
                        public boolean isRegistered(String s) {
                            return false;
                        }

                        @Override
                        public Font getFont(String s, String s1, boolean embedded, float size, int style, BaseColor baseColor) {
                            // 配置字体
                            Font font = null;
                            try {
                                // 方案一：使用本地字体(本地需要有字体)
//                              BaseFont bf = BaseFont.createFont("c:/Windows/Fonts/simsun.ttc,0", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
//                                BaseFont bf = BaseFont.createFont("C:/Windows/Fonts/seguisym.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
                                // 方案二：使用jar包：iTextAsian，这样只需一个jar包就可以了
                                BaseFont bf = BaseFont.createFont("STSong-Light", "UniGB-UCS2-H", BaseFont.EMBEDDED);
                                font = new Font(bf, size, style, baseColor);
                                font.setColor(baseColor);
                            } catch (Exception e) {
                                e.printStackTrace();
                            }
                            return font;
                        }
                    });
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (document != null) {
                document.close();
            }
            if (bais != null) {
                try {
                    bais.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

    public static void main(String[] args) {
        String basePath = "C:/个人资料/";
        String docxPath = basePath + "testFile2.docx";
        String pdfPath = basePath + "index.pdf";
        String imageDir = "C:/个人资料/";

        // 测试doc转pdf
//        String docHtml = doc2Html(docPath, imageDir);
//        docHtml = formatHtml(docHtml);
//        htmlToPdf(docHtml, pdfPath);

        // 测试docx转pdf
        String docxHtml = docx2Html(docxPath, imageDir);
        docxHtml = formatHtml(docxHtml);
//        docxHtml = docxHtml.replace("___", "张三");
        htmlToPdf(docxHtml, pdfPath);
    }
}

3.运行main方法，得到一个转换后的pdf文件

在这里插入图片描述
PS：转换后的文件，丢失了很多的表格框，另外图片也无法处理，如果不加入上面itext-asign里处理编码格式的方法，转换出来的文件还会出现中文乱码。
因此这种方式只适合比较简单的word文档转pdf，如果是专门做文档业务的，可能要搭建一个高效的独立的office服务，可转换，在在线预览等。