功能比较#

Features Comparison

功能矩阵#

Feature Matrix

下表说明了|PyMuPDF|与其他典型解决方案的比较。

The following table illustrates how PyMuPDF compares with other typical solutions.

_images/icon-pdf.svg _images/icon-svg.svg _images/icon-xps.svg _images/icon-cbz.svg _images/icon-mobi.svg _images/icon-epub.svg _images/icon-image.svg _images/icon-fb2.svg _images/icon-txt.svg _images/icon-docx.svg _images/icon-pptx.svg _images/icon-xlsx.svg _images/icon-hangul.svg
Feature PyMuPDF pikepdf PyPDF2 pdfrw pdfplumber / pdfminer
Supports Multiple Document Formats PDF XPS EPUB MOBI FB2 CBZ SVG TXT Image
DOCX XLSX PPTX HWPX See note
PDF PDF PDF PDF
Implementation Python and C Python and C++ Python Python Python
Render Document Pages All document types No rendering No rendering No rendering No rendering
Write Text to PDF Page
See: Page.insert_htmlbox
or:
Page.insert_textbox
or:
TextWriter
Supports CJK characters
Extract Text All document types PDF only PDF only
Extract Text as Markdown (.md) All document types
Extract Tables All document types PDF only
Extract Vector Graphics All document types Limited
Draw Vector Graphics (PDF)
Based on Existing, Mature Library MuPDF QPDF
Automatic Repair of Damaged PDFs
Encrypted PDFs Limited Limited
Linerarized PDFs
Incremental Updates
Integrates with Jupyter and IPython Notebooks
Joining / Merging PDF with other Document Types All document types PDF only PDF only PDF only PDF only
OCR API for Seamless Integration with Tesseract All document types
Integrated Checkpoint / Restart Feature (PDF)
PDF Optional Content
PDF Embedded Files Limited Limited
PDF Redactions
PDF Annotations Full Limited
PDF Form Fields Create, read, update Limited, no creation
PDF Page Labels Read-only
Support Font Sub-Setting


_images/icon-docx.svg _images/icon-xlsx.svg _images/icon-pptx.svg _images/icon-hangul.svg

备注

关于 Office 文档类型 (DOCX、XLXS、PPTX) 和 Hangul 文档 (HWPX) 的说明。这些文档可以加载到 PyMuPDF 中,您将收到一个 Document 对象。

有一些注意事项:

  • 我们将输入转换为 HTML 来布局内容。

  • 因此,原来的页面分离已经消失。

保存结果时,不能期望其忠实地再现原始布局。

因此输入文件大多采用对文本提取有用的形式。

备注

A note about Office document types (DOCX, XLXS, PPTX) and Hangul documents (HWPX). These documents can be loaded into PyMuPDF and you will receive a Document object.

There are some caveats:

  • we convert the input to HTML to layout the content.

  • because of this the original page separation has gone.

When saving out the result any faithful representation of the original layout cannot be expected.

Therefore input files are mostly in a form that’s useful for text extraction.


性能#

Performance

为了对 PyMuPDF 针对一系列任务的性能进行基准测试,我们使用了一组固定的 8 个 PDF,共 7,031 页 测试套件,其中包含文本和图像,用于获取性能时间。

以下是按任务分组的当前结果:

To benchmark PyMuPDF performance against a range of tasks a test suite with a fixed set of 8 PDFs with a total of 7,031 pages containing text & images is used to obtain performance timings.

Here are current results, grouped by task:


复制

这指的是打开一个文档然后将其保存到新文件。此测试测量读取 PDF 并重新写入新 PDF 的速度。此过程也是合并/连接多个文档等功能的核心。因此,以下数字适用于 PDF 连接和合并。

所有 7,031 页的结果如下:

600
500
400
300
200
100

seconds
3.05
10.54
33.57
494.04
PyMuPDF
PDFrw
PikePDF
PyPDF2
最快
最慢

文本提取

这指的是从文档的每一页中提取简单的纯文本并将其存储在文本文件中。

所有 7,031 页的结果如下:

400
300
200
100

seconds
8.01
27.42
101.64
227.27
PyMuPDF
XPDF
PyPDF2
PDFMiner
最快
最慢

渲染

这指的是以给定的 DPI 分辨率从文档的每一页制作图像(如 PNG)。此功能是在 GUI 窗口中显示文档的基础。

所有 7,031 页的结果如下:

1000
800
600
400
200

seconds
367.04
646
851.52
PyMuPDF
XPDF
PDF2JPG
最快
最慢
Copying

This refers to opening a document and then saving it to a new file. This test measures the speed of reading a PDF and re-writing as a new PDF. This process is also at the core of functions like merging / joining multiple documents. The numbers below therefore apply to PDF joining and merging.

The results for all 7,031 pages are:

600
500
400
300
200
100

seconds
3.05
10.54
33.57
494.04
PyMuPDF
PDFrw
PikePDF
PyPDF2
fastest
slowest

Text Extraction

This refers to extracting simple, plain text from every page of the document and storing it in a text file.

The results for all 7,031 pages are:

400
300
200
100

seconds
8.01
27.42
101.64
227.27
PyMuPDF
XPDF
PyPDF2
PDFMiner
fastest
slowest

Rendering

This refers to making an image (like PNG) from every page of a document at a given DPI resolution. This feature is the basis for displaying a document in a GUI window.

The results for all 7,031 pages are:

1000
800
600
400
200

seconds
367.04
646
851.52
PyMuPDF
XPDF
PDF2JPG
fastest
slowest

备注

有关这些性能计时方法的更多详细信息,请参阅 性能比较方法

备注

For more detail regarding the methodology for these performance timings see: Performance Comparison Methodology.

许可和版权#

License and Copyright

PyMuPDFMuPDF 现可在开源 AGPL 和商业许可协议下使用。请阅读 AGPL 许可协议的全文,可在分发材料(文件 COPYING)和 此处 中找到,以确保您的用例符合许可指南。如果您确定无法满足 AGPL 的要求,请联系 Artifex 以获取有关商业许可的更多信息。

ArtifexMuPDF 的独家商业授权代理商。

ArtifexArtifex 徽标、MuPDFMuPDF 徽标是 Artifex Software Inc. 的注册商标。

PyMuPDF and MuPDF are now available under both, open-source AGPL and commercial license agreements. Please read the full text of the AGPL license agreement, available in the distribution material (file COPYING) and here, to ensure that your use case complies with the guidelines of the license. If you determine you cannot meet the requirements of the AGPL, please contact Artifex for more information regarding a commercial license.

Artifex is the exclusive commercial licensing agent for MuPDF.

Artifex, the Artifex logo, MuPDF, and the MuPDF logo are registered trademarks of Artifex Software Inc.


本文档涵盖截至 2025-03-14 00:00:01PyMuPDF v1.25.4 功能。

PyMuPDFMuPDF 的主要版本和次要版本始终相同。只有第三个限定符(补丁级别)可能与 MuPDF 的有所不同。

通常,PyMuPDF 的发布频率高于 MuPDF,因此通常情况下,PyMuPDF 的补丁级别会高于嵌入的 MuPDF。

例如,PyMuPDF-1.24.5 包含 MuPDF-1.24.2。

另请参阅 pymupdf_versionmupdf_version

This documentation covers PyMuPDF v1.25.4 features as of 2025-03-14 00:00:01.

The major and minor versions of PyMuPDF and MuPDF will always be the same. Only the third qualifier (patch level) may deviate from that of MuPDF.

Typically PyMuPDF is released more frequently than MuPDF so it will often be the case that the patch level of PyMuPDF will be greater than the embedded MuPDF.

For example PyMuPDF-1.24.5 contains MuPDF-1.24.2.

Also see pymupdf_version and mupdf_version.


本软件按原样提供,不作任何明示或暗示担保。本软件根据许可分发,除非根据该许可条款明确授权,否则不得复制、修改或分发。请参阅 artifex.com 上的许可信息,或联系 Artifex Software Inc., 39 Mesa Street, Suite 108A, San Francisco CA 94129, United States 了解更多信息。