gpt4 book ai didi

c# - 使用 iTextSharp 阅读 PDF 文件附件注释

转载 作者:行者123 更新时间:2023-11-30 13:32:17 29 4
gpt4 key购买 nike

我有以下问题。我有一个 PDF,其中附有一个 XML 文件作为注释。不是作为嵌入文件而是作为注释。现在我尝试使用以下链接中的代码阅读它:

iTextSharp - how to open/read/extract a file attachment?

它适用于嵌入文件,但不适用于作为注释的文件附件。

我谷歌了一下从PDF中提取注释,找到了以下链接: Reading PDF Annotations with iText

所以注解类型是“File Attachment Annotations”

有人可以展示一个工作示例吗?

在此先感谢您的帮助

最佳答案

在有关 iText 和 iTextSharp 的问题中,人们应该首先查看 keyword list on itextpdf.com。 .在这里你可以找到 File attachment, extract attachments引用来自 iText in Action — 2nd Edition 的两个 Java 示例:

旧的关键字列表不再存在; itextpdf.com 站点现在提供了其他搜索示例的方法,但我不会描述它们,以免站点再次更改并且我再次有无效链接...

基于iText in Action — Second Edition的相关iText例子是:

  • part4.chapter16.KubrickDvds
  • part4.chapter16.KubrickDocumentary

这里是 Samples from iText5

(我还没有找到 .Net 和 iText 7 的示例端口,但根据其他来源,这个端口应该不会太难......)

KubrickDvds 包含以下方法 extractAttachments/ExtractAttachments 来提取文件附件注释:

Java,iText 5.x:

/**
* Extracts attachments from an existing PDF.
* @param src the path to the existing PDF
*/
public void extractAttachments(String src) throws IOException {
PdfReader reader = new PdfReader(src);
PdfArray array;
PdfDictionary annot;
PdfDictionary fs;
PdfDictionary refs;
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
array = reader.getPageN(i).getAsArray(PdfName.ANNOTS);
if (array == null) continue;
for (int j = 0; j < array.size(); j++) {
annot = array.getAsDict(j);
if (PdfName.FILEATTACHMENT.equals(annot.getAsName(PdfName.SUBTYPE))) {
fs = annot.getAsDict(PdfName.FS);
refs = fs.getAsDict(PdfName.EF);
for (PdfName name : refs.getKeys()) {
FileOutputStream fos
= new FileOutputStream(String.format(PATH, fs.getAsString(name).toString()));
fos.write(PdfReader.getStreamBytes((PRStream)refs.getAsStream(name)));
fos.flush();
fos.close();
}
}
}
}
reader.close();
}

Java,iText 7.x:

public void extractAttachments(String src) throws IOException {
PdfDocument pdfDoc = new PdfDocument(new PdfReader(src));
PdfReader reader = new PdfReader(src);
PdfArray array;
PdfDictionary annot;
PdfDictionary fs;
PdfDictionary refs;
for (int i = 1; i <= pdfDoc.getNumberOfPages(); i++) {
array = pdfDoc.getPage(i).getPdfObject().getAsArray(PdfName.Annots);
if (array == null) continue;
for (int j = 0; j < array.size(); j++) {
annot = array.getAsDictionary(j);
if (PdfName.FileAttachment.equals(annot.getAsName(PdfName.Subtype))) {
fs = annot.getAsDictionary(PdfName.FS);
refs = fs.getAsDictionary(PdfName.EF);
for (PdfName name : refs.keySet()) {
FileOutputStream fos
= new FileOutputStream(String.format(PATH, fs.getAsString(name).toString()));
fos.write(refs.getAsStream(name).getBytes());
fos.flush();
fos.close();
}
}
}
}
reader.close();
}

C#、iText 5.x:

/**
* Extracts attachments from an existing PDF.
* @param src the path to the existing PDF
* @param zip the ZipFile object to add the extracted images
*/
public void ExtractAttachments(byte[] src, ZipFile zip) {
PdfReader reader = new PdfReader(src);
for (int i = 1; i <= reader.NumberOfPages; i++) {
PdfArray array = reader.GetPageN(i).GetAsArray(PdfName.ANNOTS);
if (array == null) continue;
for (int j = 0; j < array.Size; j++) {
PdfDictionary annot = array.GetAsDict(j);
if (PdfName.FILEATTACHMENT.Equals(
annot.GetAsName(PdfName.SUBTYPE)))
{
PdfDictionary fs = annot.GetAsDict(PdfName.FS);
PdfDictionary refs = fs.GetAsDict(PdfName.EF);
foreach (PdfName name in refs.Keys) {
zip.AddEntry(
fs.GetAsString(name).ToString(),
PdfReader.GetStreamBytes((PRStream)refs.GetAsStream(name))
);
}
}
}
}
}

KubrickDocumentary 包含以下方法 extractDocLevelAttachments/ExtractDocLevelAttachments 来提取文档级附件:

Java,iText 5.x:

/**
* Extracts document level attachments
* @param filename a file from which document level attachments will be extracted
* @throws IOException
*/
public void extractDocLevelAttachments(String filename) throws IOException {
PdfReader reader = new PdfReader(filename);
PdfDictionary root = reader.getCatalog();
PdfDictionary documentnames = root.getAsDict(PdfName.NAMES);
PdfDictionary embeddedfiles = documentnames.getAsDict(PdfName.EMBEDDEDFILES);
PdfArray filespecs = embeddedfiles.getAsArray(PdfName.NAMES);
PdfDictionary filespec;
PdfDictionary refs;
FileOutputStream fos;
PRStream stream;
for (int i = 0; i < filespecs.size(); ) {
filespecs.getAsString(i++);
filespec = filespecs.getAsDict(i++);
refs = filespec.getAsDict(PdfName.EF);
for (PdfName key : refs.getKeys()) {
fos = new FileOutputStream(String.format(PATH, filespec.getAsString(key).toString()));
stream = (PRStream) PdfReader.getPdfObject(refs.getAsIndirectObject(key));
fos.write(PdfReader.getStreamBytes(stream));
fos.flush();
fos.close();
}
}
reader.close();
}

Java,iText 7.x

public void extractDocLevelAttachments(String src) throws IOException {
PdfDocument pdfDoc = new PdfDocument(new PdfReader(src));
PdfDictionary root = pdfDoc.getCatalog().getPdfObject();
PdfDictionary documentnames = root.getAsDictionary(PdfName.Names);
PdfDictionary embeddedfiles = documentnames.getAsDictionary(PdfName.EmbeddedFiles);
PdfArray filespecs = embeddedfiles.getAsArray(PdfName.Names);
PdfDictionary filespec;
PdfDictionary refs;
FileOutputStream fos;
PdfStream stream;
for (int i = 0; i < filespecs.size(); ) {
filespecs.getAsString(i++);
filespec = filespecs.getAsDictionary(i++);
refs = filespec.getAsDictionary(PdfName.EF);
for (PdfName key : refs.keySet()) {
fos = new FileOutputStream(String.format(PATH, filespec.getAsString(key).toString()));
stream = refs.getAsStream(key);
fos.write(stream.getBytes());
fos.flush();
fos.close();
}
}
pdfDoc.close();
}

C#、iText 5.x:

/**
* Extracts document level attachments
* @param PDF from which document level attachments will be extracted
* @param zip the ZipFile object to add the extracted images
*/
public void ExtractDocLevelAttachments(byte[] pdf, ZipFile zip) {
PdfReader reader = new PdfReader(pdf);
PdfDictionary root = reader.Catalog;
PdfDictionary documentnames = root.GetAsDict(PdfName.NAMES);
PdfDictionary embeddedfiles =
documentnames.GetAsDict(PdfName.EMBEDDEDFILES);
PdfArray filespecs = embeddedfiles.GetAsArray(PdfName.NAMES);
for (int i = 0; i < filespecs.Size; ) {
filespecs.GetAsString(i++);
PdfDictionary filespec = filespecs.GetAsDict(i++);
PdfDictionary refs = filespec.GetAsDict(PdfName.EF);
foreach (PdfName key in refs.Keys) {
PRStream stream = (PRStream) PdfReader.GetPdfObject(
refs.GetAsIndirectObject(key)
);
zip.AddEntry(
filespec.GetAsString(key).ToString(),
PdfReader.GetStreamBytes(stream)
);
}
}
}

(出于某种原因,C# 示例将提取的文件放入某个 ZIP 文件中,而 Java 版本将它们放入文件系统中……哦,好吧……)

关于c# - 使用 iTextSharp 阅读 PDF 文件附件注释,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14947829/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com