gpt4 book ai didi

python-3.x - pdfminer python 3.5

转载 作者:行者123 更新时间:2023-12-03 07:35:39 25 4
gpt4 key购买 nike

我已经遵循了一些教程,但我无法让这个代码块运行,我做了从 StringIO 到 BytesIO 的必要切换(我相信?)

我不确定为什么“香蕉”什么也没打印,我认为错误可能是红鲱鱼?这与我遵循 python2.7 教程并尝试将其转换为 python3 有关吗?

errors: File "/Users/foo/PycharmProjects/Try/Pdfminer.py", line 28, in <module>
banana = convert("A1.pdf")
File "/Users/foo/PycharmProjects/Try/Pdfminer.py", line 19, in convert
infile = file(fname, 'rb')
NameError: name 'file' is not defined

脚本
from io import BytesIO

from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage

def convert(fname, pages=None):
if not pages:
pagenums = set()
else:
pagenums = set(pages)

output = BytesIO()
manager = PDFResourceManager()
converter = TextConverter(manager, output, laparams=LAParams())
interpreter = PDFPageInterpreter(manager, converter)

infile = file(fname, 'rb')
for page in PDFPage.get_pages(infile, pagenums):
interpreter.process_page(page)
infile.close()
converter.close()
text = output.getvalue()
output.close
return text

banana = convert("A1.pdf")
print(banana)

这个变体也会发生同样的事情:
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from io import BytesIO

def convert_pdf_to_txt(path):
rsrcmgr = PDFResourceManager()
retstr = BytesIO()
codec = 'utf-8'
laparams = LAParams()
device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
fp = file(path, 'rb')
interpreter = PDFPageInterpreter(rsrcmgr, device)
password = ""
maxpages = 0
caching = True
pagenos=set()

for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password,caching=caching, check_extractable=True):
interpreter.process_page(page)

text = retstr.getvalue()

fp.close()
device.close()
retstr.close()
return text

Banana = convert_pdf_to_txt("A1.pdf")
print(Banana)

我试过搜索这个(大部分pdfminer代码来自 thisthis)但没有运气。

任何见解都值得赞赏。

干杯

最佳答案

的解决方案Python 3.5 : 你需要pdfminer.six .在 下win10 我可以很容易地安装它

pip install pdfminer.six

您可以检查安装的版本
pdfminer.__version__

我还没有对它进行深入测试。但我可以运行以下代码进行转换 pdf→文本 pdf→html

关于python-3.x - pdfminer python 3.5,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39854841/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com