gpt4 book ai didi

python - 使用Python从pdf中提取图像

转载 作者:行者123 更新时间:2023-12-01 00:53:50 25 4
gpt4 key购买 nike

我们如何从PDF中提取图像(仅图像)。

我用过很多在线工具,它们都不是通用的。在大多数 PDF 中,它使用整个图像的屏幕截图而不是图像。PDF链接 -> sg.inflibnet.ac.in:8080/jspui/bitstream/10603/121661/9/09_chapter 4.pdf

最佳答案

这是 PyMuPDF 的解决方案:

#!python3.6
import fitz # PyMuPDF


def get_pixmaps_in_pdf(pdf_filename):
doc = fitz.open(pdf_filename)
xrefs = set()
for page_index in range(doc.pageCount):
for image in doc.getPageImageList(page_index):
xrefs.add(image[0]) # Add XREFs to set so duplicates are ignored
pixmaps = [fitz.Pixmap(doc, xref) for xref in xrefs]
doc.close()
return pixmaps


def write_pixmaps_to_pngs(pixmaps):
for i, pixmap in enumerate(pixmaps):
pixmap.writePNG(f'{i}.png') # Might want to come up with a better name


pixmaps = get_pixmaps_in_pdf(r'C:\StackOverflow\09_chapter 4.pdf')
write_pixmaps_to_pngs(pixmaps)

关于python - 使用Python从pdf中提取图像,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56374258/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com