gpt4 book ai didi

python - PyPDF 合并和写入问题

转载 作者:行者123 更新时间:2023-12-04 11:47:45 28 4
gpt4 key购买 nike

使用它时出现意外错误。第一部分来自我在网上找到的脚本,我试图用它来提取 PDF 大纲中标识的特定部分。一切正常,除了在 output.write(outputfile1)它说:

PdfReadError: multiple definitions in dictionary.



还有人遇到这个吗?请原谅所有不必要的 print s 在最后。 :)
import pyPdf
import glob

class Darrell(pyPdf.PdfFileReader):

def getDestinationPageNumbers(self):
def _setup_outline_page_ids(outline, _result=None):
if _result is None:
_result = {}
for obj in outline:
if isinstance(obj, pyPdf.pdf.Destination):
_result[(id(obj), obj.title)] = obj.page.idnum
elif isinstance(obj, list):
_setup_outline_page_ids(obj, _result)
return _result

def _setup_page_id_to_num(pages=None, _result=None, _num_pages=None):
if _result is None:
_result = {}
if pages is None:
_num_pages = []
pages = self.trailer["/Root"].getObject()["/Pages"].getObject()
t = pages["/Type"]
if t == "/Pages":
for page in pages["/Kids"]:
_result[page.idnum] = len(_num_pages)
_setup_page_id_to_num(page.getObject(), _result, _num_pages)
elif t == "/Page":
_num_pages.append(1)
return _result

outline_page_ids = _setup_outline_page_ids(self.getOutlines())
page_id_to_page_numbers = _setup_page_id_to_num()

result = {}
for (_, title), page_idnum in outline_page_ids.iteritems():
result[title] = page_id_to_page_numbers.get(page_idnum, '???')
return result

for fileName in glob.glob("*.pdf"):
output = pyPdf.PdfFileWriter()
print fileName
pdf = Darrell(open(fileName, 'rb'))
template = '%-5s %s'
print template % ('page', 'title')
for p,t in sorted([(v,k) for k,v in pdf.getDestinationPageNumbers().iteritems()]):
print template % (p+1,t)

for p,t in sorted([(v,k) for k,v in pdf.getDestinationPageNumbers().iteritems()]):
if t == "CATEGORY 1":
startpg = p+1
print p+1,'is the first page of Category 1.'
if t == "CATEGORY 2":
endpg = p+1
print p+1,'is the last page of Category 1.'
print startpg, endpg
pagenums = range(startpg,endpg)
print pagenums
for i in pagenums:
output.addPage(pdf.getPage(i))
fileName2 = "%sCategory1_data.pdf" % (str(fileName[:-13]))
print "%s has %s pages." % (fileName2,output.getNumPages())
outputfile1 = file(r"%s" % (fileName2), 'wb')
output.write(outputfile1)
outputfile1.close()

最佳答案

我知道这对您来说可能为时已晚,但对于其他任何偶然在这里寻找答案的人来说:
我今天遇到了同样的问题,设置:

export_reader = PdfFileReader(filename, strict=False)
如果您只是合并,请使用:
merger = PdfFileMerger(strict=False)
这样,您只会收到警告,而不是异常。

关于python - PyPDF 合并和写入问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7602639/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com