如何从 PDF 中提取图像#

How to extract images from a PDF

在开始之前,请确保你已经 安装了 pdfminer.six。其次,你需要一个包含图片的 PDF。如果你没有,可以下载 这篇研究论文,其中包含猫和狗的图片,并将其保存为 example.pdf:

$ curl https://www.robots.ox.ac.uk/~vgg/publications/2012/parkhi12a/parkhi12a.pdf --output example.pdf

然后运行 pdf2txt 命令:

$ pdf2txt.py example.pdf --output-dir cats-and-dogs

此命令会从 PDF 中提取所有图片,并将其保存到 cats-and-dogs 目录中。

Before you start, make sure you have installed pdfminer.six. The second thing you need is a PDF with images. If you don’t have one, you can download this research paper with images of cats and dogs and save it as example.pdf:

$ curl https://www.robots.ox.ac.uk/~vgg/publications/2012/parkhi12a/parkhi12a.pdf --output example.pdf

Then run the pdf2txt command:

$ pdf2txt.py example.pdf --output-dir cats-and-dogs

This command extracts all the images from the PDF and saves them into the cats-and-dogs directory.