gpt4 book ai didi

java - PDFBox PDFMergerUtility : how do I tell which sources failed?

转载 作者:太空宇宙 更新时间:2023-11-04 12:56:08 29 4
gpt4 key购买 nike

所以,我正在这样做:

PDFMergerUtility mergePdf = new PDFMergerUtility();

for (int i = 0; i < filePaths.size(); i++)
mergePdf.addSource(filePaths.get(i));

mergePdf.setDestinationFileName(tempFile.getAbsolutePath());
mergePdf.mergeDocuments();

这非常有效,直到在无法解析的 PDF 上引发异常(损坏的 PDF 或 PDFBox 无法处理的内容)。这种情况并不经常发生。

我希望能够知道它在哪些源上失败,在后续合并中排除它们,并告诉用户哪些文档失败。

这可以做到吗?

更新:

这是我的异常(exception):

java.io.IOException: Error: Expected a long type at offset 591535, instead got 'E^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^UZí^KÄ@©¢^X<8d>G §ÑE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^T<84>f<96><8a>'
at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1695)
at org.apache.pdfbox.pdfparser.BaseParser.readObjectNumber(BaseParser.java:1623)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:614)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1220)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1187)
at org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:237)
at org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:194)
at myapp.util.DocumentImage.combinePDFs(DocumentImage.java:289)
at myapp.webapp.download.DownloadLatestForCLO.generate(DownloadLatestForCLO.java:73)
at myapp.webapp.download.DownloadLatestForCLO.getFileSize(DownloadLatestForCLO.java:64)
at myapp.webapp.download.DownloadServlet.handleRequest(DownloadServlet.java:58)
at myapp.webapp.download.DownloadServlet.doGet(DownloadServlet.java:32)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:200)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

最佳答案

幸运的是PDFBox开源所以已经下载了最新的源代码(在撰写本文时为2.00 RC3)并在文件中\pdfbox-2.0.0-RC3\pdfbox\src\main\java\org\apache\pdfbox\multipdf\PDFMergerUtility.java(第 188 行左右)

我们可以看到它从较低级别向上抛出此异常,并且没有捕获它并添加导致错误的文件的详细信息。

在修复此问题之前,您必须在代码中捕获此错误,并迭代加载和关闭它们的每个源文件,直到找到无法处理的文件并自行报告此问题。

如果您有兴趣从源头解决问题(在 PDFBox 内),那么您需要进行编辑并提交给 PDFBox 项目团队。当该修复程序合并到构建中并且您升级到该版本时,您可以安全地删除迭代代码:

        try
{
MemoryUsageSetting partitionedMemSetting = memUsageSetting != null ?
memUsageSetting.getPartitionedCopy(sources.size()+1) :
MemoryUsageSetting.setupMainMemoryOnly();
Iterator<InputStream> sit = sources.iterator();
destination = new PDDocument(partitionedMemSetting);

while (sit.hasNext())
{
sourceFile = sit.next();
source = PDDocument.load(sourceFile, partitionedMemSetting);
tobeclosed.add(source);
appendDocument(destination, source);
}
if (destinationStream == null)
{
destination.save(destinationFileName);
}
else
{
destination.save(destinationStream);
}
}

catch (IOException e) {/* 插入代码将其放入内部异常中并抛出一个包含命名的“sourceFile”的异常 */}

<小时/>
        finally
{
....}

关于java - PDFBox PDFMergerUtility : how do I tell which sources failed?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35373787/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com