gpt4 book ai didi

java - PDFBox:PDDocument 和 PDPage 是否相互引用?

转载 作者:行者123 更新时间:2023-11-30 01:56:55 25 4
gpt4 key购买 nike

PDPage 对象是否包含对其所属的 PDDocument 的引用?
换句话说,PDPage 是否了解其 PDDocument?
在应用程序的某个位置我有一个 PDDocuments 列表。
这些文档被合并到一个新的 PDDocument 中:

PDFMergerUtility pdfMerger = new PDFMergerUtility();

PDDocument mergedPDDocument = new PDDocument();
for (PDDocument pdfDocument : documentList) {
pdfMerger.appendDocument(mergedPDDocument, pdfDocument);
}

然后这个 PdDocument 被分成 10 个 bundle :

Splitter splitter = new Splitter();
splitter.setSplitAtPage(bundleSize);
List<PDDocument> bundleList = splitter.split(mergedDocument);

我现在的问题是:
如果我循环遍历列表中这些拆分的 PDDocument 的页面,有没有办法知道页面最初属于哪个 PDDocument?

此外,如果您有一个 PDPage 对象,您可以从中获取信息,例如页码,......?或者你可以通过其他方式得到这个吗?

最佳答案

  1. PDPage 对象是否包含对其所属 PDDocument 的引用?换句话说,PDPage 是否了解其 PDDocument

Unfortunately the PDPage does not contain a reference to its parent PDDocument, but it has a list of all other pages in the document that can be used to navigate between pages without a reference to the parent PDDocument.

  • 如果您有一个 PDPage 对象,您可以从中获取页码等信息,还是可以通过其他方式获取此信息?
  • There is a workaround to get information about the position of a PDPage in the document without the PDDocument available. Each PDPage has a dictionary with information about the size of the page, resources, fonts, content, etc. One of these attributes is called Parent, this is an array of Pages that have all the information needed to create a shallow clone of the PDPage using the constructor PDPage(COSDictionary). The pages are in the correct order so the page number can be obtain by the position of the record in the array.

  • 如果我循环遍历列表中这些拆分的 PDDocument 页面,是否有办法知道页面最初属于哪个 PDDocument
  • Once you merge the document list into a single document all references to the original documents will be lost. You can confirm this by looking at the Parent object inside the PDPage, go to Parent > Kids > COSObject[n] > Parent and see if the number for Parent is the same for all the elements in the array. In this example Parent is COSName {Parent} : 1781256139; for all pages.

    COSName {Parent} : COSObject {
    COSDictionary {
    COSName {Kids} : COSArray {
    COSObject {
    COSDictionary {
    COSName {TrimBox} : COSArray {0; 0; 612; 792;};
    COSName {MediaBox} : COSArray {0; 0; 612; 792;};
    COSName {CropBox} : COSArray {0; 0; 612; 792;};
    COSName {Resources} : COSDictionary {
    ...
    };
    COSName {Contents} : COSObject {
    ...
    };
    COSName {Parent} : 1781256139;
    COSName {StructParents} : COSInt {68};
    COSName {ArtBox} : COSArray {0; 0; 612; 792; };
    COSName {BleedBox} : COSArray {0; 0; 612; 792; };
    COSName {Type} : COSName {Page};
    }
    }

    ...

    COSName {Count} : COSInt {4};
    COSName {Type} : COSName {Pages};
    }
    };
    <小时/>

    源代码

    我编写了以下代码来展示如何使用 PDPage 字典中的信息来前后导航页面,并使用数组中的位置获取页码。

    public class PDPageUtils {
    public static void main(String[] args) throws InvalidPasswordException, IOException {
    System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider");

    PDDocument document = null;
    try {
    String filename = "src/main/resources/pdf/us-017.pdf";
    document = PDDocument.load(new File(filename));

    System.out.println("listIterator(PDPage)");
    ListIterator<PDPage> pageIterator = listIterator(document.getPage(0));
    while (pageIterator.hasNext()) {
    PDPage page = pageIterator.next();
    System.out.println("page #: " + pageIterator.nextIndex() + ", Structural Parent Key: " + page.getStructParents());
    }
    } finally {
    if (document != null) {
    document.close();
    }
    }
    }

    /**
    * Returns a <code>ListIterator</code> initialized with the list of pages from
    * the dictionary embedded in the specified <code>PDPage</code>. The current
    * position of this <code>ListIterator</code> is set to the position of the
    * specified <code>PDPage</code>.
    *
    * @param page the specified <code>PDPage</code>
    *
    * @see {@link java.util.ListIterator}
    * @see {@link org.apache.pdfbox.pdmodel.PDPage}
    */
    public static ListIterator<PDPage> listIterator(PDPage page) {
    List<PDPage> pages = new LinkedList<PDPage>();

    COSDictionary pageDictionary = page.getCOSObject();
    COSDictionary parentDictionary = pageDictionary.getCOSDictionary(COSName.PARENT);
    COSArray kidsArray = parentDictionary.getCOSArray(COSName.KIDS);

    List<? extends COSBase> kidList = kidsArray.toList();
    for (COSBase kid : kidList) {
    if (kid instanceof COSObject) {
    COSObject kidObject = (COSObject) kid;
    COSBase type = kidObject.getDictionaryObject(COSName.TYPE);
    if (type == COSName.PAGE) {
    COSBase kidPageBase = kidObject.getObject();
    if (kidPageBase instanceof COSDictionary) {
    COSDictionary kidPageDictionary = (COSDictionary) kidPageBase;
    pages.add(new PDPage(kidPageDictionary));
    }
    }
    }
    }
    int index = pages.indexOf(page);
    return pages.listIterator(index);
    }
    }
    <小时/>

    示例输出

    在此示例中,PDF 文档有 4 页,迭代器使用第一页进行初始化。请注意,页码是 previousIndex()

    System.out.println("listIterator(PDPage)");
    ListIterator<PDPage> pageIterator = listIterator(document.getPage(0));
    while (pageIterator.hasNext()) {
    PDPage page = pageIterator.next();
    System.out.println("page #: " + pageIterator.previousIndex() + ", Structural Parent Key: " + page.getStructParents());
    }
    listIterator(PDPage)page #: 0, Structural Parent Key: 68page #: 1, Structural Parent Key: 69page #: 2, Structural Parent Key: 70page #: 3, Structural Parent Key: 71

    You can also navigate backwards by starting from the last page. Notice now that the page number is the nextIndex().

    ListIterator<PDPage> pageIterator = listIterator(document.getPage(3));
    pageIterator.next();
    while (pageIterator.hasPrevious()) {
    PDPage page = pageIterator.previous();
    System.out.println("page #: " + pageIterator.nextIndex() + ", Structural Parent Key: " + page.getStructParents());
    }
    listIterator(PDPage)page #: 3, Structural Parent Key: 71page #: 2, Structural Parent Key: 70page #: 1, Structural Parent Key: 69page #: 0, Structural Parent Key: 68

    关于java - PDFBox:PDDocument 和 PDPage 是否相互引用?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54120434/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com