Python之利用PyPDF2库实现对PDF的删除和合并

312 阅读 0 评论 206 点赞

我是靠谱客的博主能干蜗牛，这篇文章主要介绍Python之利用PyPDF2库实现对PDF的删除和合并，现在分享给大家，希望可以做个参考。

文章目录

- 概述
- 安装
- 一、The PdfFileReader Class
- - 1、getNumPages()
  - 2、getPage(pageNumber)
- 二、The PdfFileWriter Class
- - 1、addPage(page)
  - 2、write(stream)
- 三、The PdfFileMerger Class
- - 方法1、append（）
  - 方法2、merge（）
  - 方法3、write（）
- 实例一：删除
- 实例二：合并

概述

PyPDF2是Python中用于对PDF操作的第三方库，提供了删除、合并、裁剪、转换等操作
最主要有四个类：
The PdfFileReader Class
The PdfFileMerger Class
The PageObject Class
The PdfFileWriter Class

安装

打开命令行键入

pip install PyPDF2

一、The PdfFileReader Class

 PyPDF2.PdfFileReader(stream, strict=True, warndest=None, overwriteWarnings=True)

Parameters:
stream – A File object or an object that supports the standard read and seek methods similar to a File object. Could also be a string representing a path to a PDF file.

strict (bool) – Determines whether user should be warned of all problems and also causes some correctable problems to be fatal. Defaults to True.

warndest – Destination for logging warnings (defaults to sys.stderr).

overwriteWarnings (bool) – Determines whether to override Python’s warnings.py module with a custom implementation (defaults to True).

1、getNumPages()

Calculates the number of pages in this PDF file.

Returns: number of pages
Return type: int
Raises PdfReadError:
if file is encrypted and restrictions prevent this action.

2、getPage(pageNumber)

Retrieves a page by number from this PDF file.

Parameters: pageNumber (int)
– The page number to retrieve (pages begin at zero)
Returns: a PageObject instance.
Return type: PageObject

二、The PdfFileWriter Class

class PyPDF2.PdfFileWriter
This class supports writing PDF files out, given pages produced by another class (typically PdfFileReader).

1、addPage(page)

Adds a page to this PDF file. The page is usually acquired from a PdfFileReader instance.

Parameters: page (PageObject) – The page to add to the document. Should be an instance of PageObject

2、write(stream)

Writes the collection of pages added to this object out as a PDF file.

Parameters: stream – An object to write the file to. The object must support the write method and the tell method, similar to a file object.

三、The PdfFileMerger Class

Initializes a PdfFileMerger object. PdfFileMerger merges multiple PDFs into a single PDF. It can concatenate, slice, insert, or any combination of the above.
初始化一个PdfFileMerger对象，PdfFileMerger 用来将多个PDF合并为一个PDF，它能够连接，切割，插入或者以上的任意组合

See the functions merge() (or append()) and write() for usage information.

Parameters: strict (bool) – Determines whether user should be warned of all problems and also causes some correctable problems to be fatal. Defaults to True.

方法1、append（）

append(fileobj, bookmark=None, pages=None, import_bookmarks=True)

Identical to the merge() method, but assumes you want to concatenate all pages onto the end of the file instead of specifying a position.
和merge（）方法相同，但假定的是你想要把全部页面连接到文件的最后而不是指定位置

Parameters:
fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file.
一个文件对象（python中用open（）创建的对象）或者类似文件对象的能够支持标准读取和寻找方法的对象，也可以是一个代表指向PDF文件路径的字符串
bookmark (str) – Optionally, you may specify a bookmark to be applied at the beginning of the included file by supplying the text of the bookmark.
pages – can be a Page Range or a (start, stop[, step]) tuple to merge only the specified range of pages from the source document into the output document.
可以是一个页码序列或者一个(start, stop[, step])元组，用来合并指定范围的源文件页面到输出文件

import_bookmarks (bool) – You may prevent the source document’s bookmarks from being imported by specifying this as False.

在这里插入代码片

方法2、merge（）

merge(position, fileobj, bookmark=None, pages=None, import_bookmarks=True)

Merges the pages from the given file into the output file at the specified page number.
从指定位置合并来自给定文件的页面到输出文件

Parameters:
position (int) – The page number to insert this file. File will be inserted after the given number.
插入文件的页码数，将插入到给定页数的后面
0口1 口2 口3
fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file.
bookmark (str) – Optionally, you may specify a bookmark to be applied at the beginning of the included file by supplying the text of the bookmark.
pages – can be a Page Range or a (start, stop[, step]) tuple to merge only the specified range of pages from the source document into the output document.

import_bookmarks (bool) – You may prevent the source document’s bookmarks from being imported by specifying this as False.

注意：position和pages均指的是下图绿色数字，pages的范围是绿色数字之间囊括的页面
在这里插入图片描述

方法3、write（）

write(fileobj)
Writes all data that has been merged to the given output file.
将所有被合并的数据写入到给定的输出文件中

Parameters: fileobj – Output file. Can be a filename or any kind of file-like object.
输出文件，可以是一个文件名或者所有类似文件对象的对象

实例一：删除

#PDF_delete.py
from PyPDF2 import PdfFileWriter, PdfFileReader

def PDF_delete(index):
    output = PdfFileWriter()  # 声明一个用于输出PDF的实例
    input1 = PdfFileReader(open("C:/Users/Yuanzheng/Desktop/Test1.pdf", "rb"))  # 读取本地PDF文件
    pages = input1.getNumPages()  # 读取文档的页数
    for i in range(pages):
        if i + 1 in index:
            continue  # 待删除的页面
        output.addPage(input1.getPage(i))  # 读取PDF的第i页，添加到输出Output实例中
    outputStream = open("C:/Users/Yuanzheng/Desktop/Test-Output1.pdf", "wb")
    output.write(outputStream)  # 把编辑后的文档保存到本地
PDF_delete([2])

实例二：合并

#PDF_merger.py
from PyPDF2 import PdfFileMerger

merger = PdfFileMerger()

input1 =open("C:/Users/Yuanzheng/Desktop/Test1.pdf","rb")
input2  = open("C:/Users/Yuanzheng/Desktop/Test2.pdf","rb")

merger.append(fileobj= input1)
merger.merge(position=0,fileobj=input2,pages=(1,3))

output = open("C:/Users/Yuanzheng/Desktop/PyPDF-Output2.pdf","wb")
merger.write(output)

Reference：https://pythonhosted.org/PyPDF2/PageObject.html

最后

以上就是能干蜗牛最近收集整理的关于Python之利用PyPDF2库实现对PDF的删除和合并的全部内容，更多相关Python之利用PyPDF2库实现对PDF内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：Python应用
浏览次数：312 次浏览
发布日期：2023-09-01 08:30:07

SAP Commerce(原Hybris)的一些架构图，持续更新模块图layer chart类型类型继承ServiceLayer Directcore extensionOrder extensibilityService LayerModel runtimeModel interceptorApplication Context hierarchykey serviceAccelerator架构请求交互图文件目录Commerce order status flow

Python之利用PyPDF2库实现对PDF的删除和合并

文章目录

概述

安装

一、The PdfFileReader Class

1、getNumPages()

2、getPage(pageNumber)

二、The PdfFileWriter Class

1、addPage(page)

2、write(stream)

三、The PdfFileMerger Class

方法1、append（）

方法2、merge（）

方法3、write（）

实例一：删除

实例二：合并

最后

评论列表共有 0 条评论

发表评论取消回复

Python之利用PyPDF2库实现对PDF的删除和合并

文章目录

概述

安装

一、The PdfFileReader Class

1、getNumPages()

2、getPage(pageNumber)

二、The PdfFileWriter Class

1、addPage(page)

2、write(stream)

三、The PdfFileMerger Class

方法1、append（）

方法2、merge（）

方法3、write（）

实例一：删除

实例二：合并

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复