gpt4 book ai didi

pdf - pdf文件中的ID字段是什么?

转载 作者:行者123 更新时间:2023-12-04 07:38:51 25 4
gpt4 key购买 nike

我正在研究improving the pdf scrubber in the ApprovalTests framework,看着simple pdf generated with PdfSharp,我看到它的内容如下。

谁知道底部的ID字段是什么?

%PDF-1.4
%ÓôÌá
1 0 obj
<<
/CreationDate(D:20131119194420-06'00')
/Creator(PDFsharp 1.32.3057-g \(www.pdfsharp.net\))
/Producer(PDFsharp 1.32.3057-g \(www.pdfsharp.net\))
>>
endobj
2 0 obj
<<
/Type/Catalog
/Pages 3 0 R
>>
endobj
3 0 obj
<<
/Type/Pages
/Count 1
/Kids[4 0 R]
>>
endobj
4 0 obj
<<
/Type/Page
/MediaBox[0 0 612 792]
/Parent 3 0 R
/Contents 5 0 R
/Resources
<<
/ProcSet [/PDF/Text/ImageB/ImageC/ImageI]
/ExtGState
<<
/GS0 6 0 R
>>
/Font
<<
/F0 8 0 R
>>
>>
/Group
<<
/CS/DeviceRGB
/S/Transparency
/I false
/K false
>>
>>
endobj
5 0 obj
<<
/Length 99
/Filter/FlateDecode
>>
stream
xœŠI
€@ïyE¼)¸ÄŒ^—«ðŽ
2"êÍ×)ènšº ER¢¿ÊŠq>t¡¼pA-t#áö@ÒªÄú¯À†ã¢R7#ç(ý~qîq:og½
endstream
endobj
6 0 obj
<<
/Type/ExtGState
/ca 1
>>
endobj
7 0 obj
<<
/Type/FontDescriptor
/Ascent 1005
/CapHeight 727
/Descent -210
/Flags 32
/FontBBox[-550 -303 1707 1072]
/ItalicAngle 0
/StemV 0
/XHeight 548
/FontName/Verdana,Bold
>>
endobj
8 0 obj
<<
/Type/Font
/Subtype/TrueType
/BaseFont/Verdana,Bold
/Encoding/WinAnsiEncoding
/FontDescriptor 7 0 R
/FirstChar 0
/LastChar 255
/Widths[1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 341 402 587 867 710 1271 862 332 543 543 710 867 361 479 361 689 710 710 710 710 710 710 710 710 710 710 402 402 867 867 867 616 963 776 761 723 830 683 650 811 837 545 555 770 637 947 846 850 732 850 782 710 681 812 763 1128 763 736 691 543 689 543 867 710 710 667 699 588 699 664 422 699 712 341 402 670 341 1058 712 686 699 699 497 593 455 712 649 979 668 650 596 710 543 710 867 1000 710 1000 332 710 587 1048 710 710 710 1777 710 543 1135 1000 691 1000 1000 332 332 587 587 710 710 1000 710 963 593 543 1067 1000 596 736 341 402 710 710 710 710 543 710 710 963 597 849 867 479 963 710 587 867 597 597 710 721 710 361 710 597 597 849 1181 1181 1181 616 776 776 776 776 776 776 1093 723 683 683 683 683 545 545 545 545 830 846 850 850 850 850 850 867 850 812 812 812 812 736 734 712 667 667 667 667 667 667 1018 588 664 664 664 664 341 341 341 341 679 712 686 686 686 686 686 867 686 712 712 712 712 650 699 650]
>>
endobj
xref
0 9
0000000000 65535 f
0000000015 00000 n
0000000180 00000 n
0000000228 00000 n
0000000283 00000 n
0000000538 00000 n
0000000707 00000 n
0000000750 00000 n
0000000935 00000 n
trailer
<<
/ID[<48189AA5E6D2394D8EF6E7842493B4A9><48189AA5E6D2394D8EF6E7842493B4A9>]
/Info 1 0 R
/Root 2 0 R
/Size 9
>>
startxref
2167
%%EOF

最佳答案

从@Millie的答案添加到图片中的一些注释:

当对PDF的某些方面有疑问时,首先要看的地方应该是ISO 32000-1规范。

它将 ID 条目指定为:

ID array (Required if an Encrypt entry is present; optional otherwise; PDF 1.1)

An array of two byte-strings constituting a file identifier (see 14.4, "File Identifiers") for the file. If there is an Encrypt entry this array and the two byte-strings shall be direct objects and shall be unencrypted.

NOTE 1 Because the ID entries are not encrypted it is possible to check the ID key to assure that the correct file is being accessed without decrypting the file. The restrictions that the string be a direct object and not be encrypted assure that this is possible.

NOTE 2 Although this entry is optional, its absence might prevent the file from functioning in some workflows that depend on files being uniquely identified.

NOTE 3 The values of the ID strings are used as input to the encryption algorithm. If these strings were indirect, or if the ID array were indirect, these strings would be encrypted when written. This would result in a circular condition for a reader: the ID strings must be decrypted in order to use them to decrypt strings, including the ID strings themselves. The preceding restriction prevents this circular condition.



(表15 –文件尾部字典中的条目)

上面的注2本质上是建议添加此可选值的方法,即使它不是使用本文档中其他地方使用的SHALL/SHOULD/MAY规范语言约定来表述的。

该建议在引用的第14.4节中更为明确:

The ID entry is optional but should be used.



正如在这些规范中应该表示一项建议,并且除非有充分的理由,否则一项建议被定义为必须要做的事情,这意味着PDF编写者必须创建该条目,除非它可以反对该要求(我很难想到反对使用的参数)。这应该回答针对Millie的回答所提出的问题

any idea why both PdfSharp and phantomjs create it?



特别是 而不是,它只是上面另一条注释中假定的良好实践。

关于 ID 数组的内容,该规范在14.4节中继续:

The value of this entry shall be an array of two byte strings. The first byte string shall be a permanent identifier based on the contents of the file at the time it was originally created and shall not change when the file is incrementally updated. The second byte string shall be a changing identifier based on the file’s contents at the time it was last updated. When a file is first written, both identifiers shall be set to the same value. If both identifiers match when a file reference is resolved, it is very likely that the correct and unchanged file has been found. If only the first identifier matches, a different version of the correct file has been found.

To help ensure the uniqueness of file identifiers, they should be computed by means of a message digest algorithm ...

The calculation of the file identifier need not be reproducible; all that matters is that the identifier is likely to be unique.



因此, first article Millie quoted from在声明时并不完全正确

the file identifier (the /ID entry from the trailer dictionary). This is an arbitrary string of bytes



ID 条目的值是 而不是一个字符串,而是两个字符串的数组。字符串值是 而不是任意的,而是建议通过散列获得的唯一值。因此,尤其是 ,不得将用于不同的文档,如果它们只是任意的,则可以。

other article quoted from也不是完全正确的说法

a program that makes PDF files is only required to create the file identifier if the file is to be encrypted.



即使不加密,该程序也必须有充分的理由不创建文件标识符,这是规范中的建议。因此,由于缺乏这些原因,创建文件标识符需要程序。

综上所述,任何PDF使用者总是必须准备好查找没有文件标识符的PDF ...毕竟可能有理由不创建它。

关于pdf - pdf文件中的ID字段是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20085899/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com