打开文件#

Opening Files

支持的文件类型#

Supported File Types

PyMuPDF 可以打开除 PDF 之外的其他文件。

支持以下文件类型:

PyMuPDF can open files other than just PDF.

The following file types are supported:

File type
Document Formats
PDF XPS EPUB MOBI FB2 CBZ SVG TXT
Image Formats
Input formats
JPG/JPEG, PNG, BMP, GIF, TIFF, PNM, PGM, PBM, PPM, PAM, JXR, JPX/JP2, PSD
Output formats
JPG/JPEG, PNG, PNM, PGM, PBM, PPM, PAM, PSD, PS

如何打开文件#

How to Open a File

要打开文件,可以按以下方式操作:

doc = pymupdf.open("a.pdf")

备注

上述代码创建了一个 Document。指令 doc = pymupdf.Document("a.pdf") 完全相同。因此,open 只是一个方便的别名,您可以在相关章节找到其完整的 API 文档。

To open a file, do the following:

doc = pymupdf.open("a.pdf")

备注

The above creates a Document. The instruction doc = pymupdf.Document("a.pdf") does exactly the same. So, open is just a convenient alias and you can find its full API documented in that chapter.

使用 错误的文件扩展名#

Opening with a Wrong File Extension

如果您的文档文件扩展名不正确,您仍然可以正确地打开它。

假设 “some.file” 实际上是一个 XPS 文件。可以按照以下方式打开它:

doc = pymupdf.open("some.file", filetype="xps")

备注

PyMuPDF 本身不会尝试从文件内容中确定文件类型。 需要以某种方式提供文件类型信息——可以通过文件扩展名隐式地提供,或者如上所示通过 filetype 参数显式提供。有一些纯 Python 包,如 filetype,可以帮助您完成这项工作。有关详细信息,请参考 Document 章节。

如果 PyMuPDF 遇到一个未知或缺失扩展名的文件,它将尝试将其作为 PDF 打开。因此,在这种情况下无需额外的预防措施。同样,对于内存中的文档,您只需指定 doc=pymupdf.open(stream=mem_area) 即可将其作为 PDF 文档打开。

如果您尝试打开一个不受支持的文件, PyMuPDF 会抛出文件数据错误。

If you have a document with a wrong file extension for its type, you can still correctly open it.

Assume that “some.file” is actually an XPS. Open it like so:

doc = pymupdf.open("some.file", filetype="xps")

备注

PyMuPDF itself does not try to determine the file type from the file contents. You are responsible for supplying the file type information in some way – either implicitly, via the file extension, or explicitly as shown with the filetype parameter. There are pure Python packages like filetype that help you doing this. Also consult the Document chapter for a full description.

If PyMuPDF encounters a file with an unknown / missing extension, it will try to open it as a PDF. So in these cases there is no need for additional precautions. Similarly, for memory documents, you can just specify doc=pymupdf.open(stream=mem_area) to open it as a PDF document.

If you attempt to open an unsupported file then PyMuPDF will throw a file data error.


打开远程文件#

Opening Remote Files

对于服务器上的远程文件(即非本地文件),您需要将文件数据*流式传输*到 PyMuPDF

例如,按如下方式使用 requests 库:

For remote files on a server (i.e. non-local files), you will need to stream the file data to PyMuPDF.

For example use the requests library as follows:

import pymupdf
import requests

r = requests.get('https://mupdf.com/docs/mupdf_explored.pdf')
data = r.content
doc = pymupdf.Document(stream=data)

从云服务打开文件#

Opening Files from Cloud Services

有关处理典型云服务上保存的文件的更多示例,请参阅这些 云交互代码片段

For further examples which deal with files held on typical cloud services please see these Cloud Interactions code snippets.


以文本形式打开文件#

Opening Files as Text

PyMuPDF 具有打开任何纯文本文件作为文档的能力。要实现这一点,您需要在调用 pymupdf.open 函数时提供 filetype 参数,并将其设为 "txt"

doc = pymupdf.open("my_program.py", filetype="txt")

通过这种方式,您可以打开各种文件类型,并执行典型的 非 PDF 特定功能,如文本搜索、文本提取和页面渲染。显然,一旦您渲染了 txt 内容,就可以轻松地将其保存为 PDF,或与其他 PDF 文件合并。

PyMuPDF has the capability to open any plain text file as a document. In order to do this you should provide the filetype parameter for the pymupdf.open function as "txt".

doc = pymupdf.open("my_program.py", filetype="txt")

In this way you are able to open a variety of file types and perform the typical non-PDF specific features like text searching, text extracting and page rendering. Obviously, once you have rendered your txt content, then saving as PDF or merging with other PDF files is no problem.

例子#

Examples

打开 C# 文件#

Opening a C# file

doc = pymupdf.open("MyClass.cs", filetype="txt")

打开 XML 文件#

Opening an XML file .. code-block:: python

doc = pymupdf.open(“my_data.xml”, filetype=”txt”)

打开 JSON 文件#

Opening a JSON file .. code-block:: python

doc = pymupdf.open(“more_of_my_data.json”, filetype=”txt”)

等等!

正如您所想象的,许多基于文本的文件格式都可以由 PyMuPDF 非常简单地打开*和 *解释。这可以使对各种以前不可用的文件的数据分析和提取突然成为可能。

And so on!

As you can imagine many text based file formats can be very simply opened and interpreted by PyMuPDF. This can make data analysis and extraction for a wide range of previously unavailable files suddenly possible.


本软件按原样提供,不作任何明示或暗示担保。本软件根据许可分发,除非根据该许可条款明确授权,否则不得复制、修改或分发。请参阅 artifex.com 上的许可信息,或联系 Artifex Software Inc., 39 Mesa Street, Suite 108A, San Francisco CA 94129, United States 了解更多信息。