跳转至

14.4 文件标识符

14.4 File Identifiers

PDF 文件可能包含对其他 PDF 文件的引用(参见 7.11,“文件规范”)。然而,仅仅存储文件名,即使是以平台无关的格式存储,也无法保证能够找到该文件。即使文件仍然存在且文件名未更改,不同的服务器软件可能会以不同的方式识别它。在 DOS 平台上运行的服务器会将所有文件名转换为 8 个字符的主文件名和 3 个字符的扩展名。不同的服务器可能会采用不同的策略将更长的文件名转换为这种格式。

通过在文件中包含一个 文件标识符(PDF 1.1)并在常规平台文件指定的基础上使用它,可以使外部文件引用更加可靠。将文件引用中的标识符与文件中的标识符进行匹配,可以确认是否找到了正确的文件。

文件标识符通过 PDF 文件尾部字典(trailer dictionary)的可选 ID 项定义(参见 7.5.5,“文件尾部”)。虽然 ID 项是可选的,但应尽量使用。该项的值应为两个字节字符串组成的数组。第一个字节字符串是基于文件最初创建时的内容生成的永久标识符,在文件增量更新时不应更改。第二个字节字符串是基于文件最近更新时内容生成的可变标识符。当文件首次写入时,这两个标识符应设置为相同的值。如果在解析文件引用时两个标识符都匹配,则很可能找到的是正确且未更改的文件。如果只有第一个标识符匹配,则可能找到的是正确文件的不同版本。

为了确保文件标识符的唯一性,应使用消息摘要算法(如 MD5,详见 Internet RFC 1321,《MD5 消息摘要算法》;参见参考文献)计算标识符,计算时使用以下信息:

  • 当前时间
  • 文件位置的字符串表示,通常是路径名
  • 文件的字节大小
  • 文件文档信息字典中所有条目的值(参见 14.3.3,“文档信息字典”)

注意

文件标识符的计算无需具有可重现性;唯一重要的是标识符可能是唯一的。例如,两种实现上述算法的方法可能对当前时间使用不同的格式,从而为同时创建的相同文件生成不同的文件标识符,但这不会影响标识符的唯一性。

PDF files may contain references to other PDF files (see 7.11, “File Specifications”). Simply storing a file name, however, even in a platform-independent format, does not guarantee that the file can be found. Even if the file still exists and its name has not been changed, different server software applications may identify it in different ways. Servers running on DOS platforms convert all file names to 8 characters and a 3-character extension. Different servers may use different strategies for converting longer file names to this format.

External file references may be made more reliable by including a file identifier (PDF 1.1) in the file and using it in addition to the normal platform-based file designation. Matching the identifier in the file reference with the one in the file confirms whether the correct file was found.

File identifiers shall be defined by the optional ID entry in a PDF file’s trailer dictionary (see 7.5.5, “File Trailer”). The ID entry is optional but should be used. The value of this entry shall be an array of two byte strings. The first byte string shall be a permanent identifier based on the contents of the file at the time it was originally created and shall not change when the file is incrementally updated. The second byte string shall be a changing identifier based on the file’s contents at the time it was last updated. When a file is first written, both identifiers shall be set to the same value. If both identifiers match when a file reference is resolved, it is very likely that the correct and unchanged file has been found. If only the first identifier matches, a different version of the correct file has been found.

To help ensure the uniqueness of file identifiers, they should be computed by means of a message digest algorithm such as MD5 (described in Internet RFC 1321, The MD5 Message-Digest Algorithm; see the Bibliography), using the following information:

  • The current time
  • A string representation of the file’s location, usually a pathname
  • The size of the file in bytes
  • The values of all entries in the file’s document information dictionary (see 14.3.3, “Document Information Dictionary”)

NOTE

The calculation of the file identifier need not be reproducible; all that matters is that the identifier is likely to be unique. For example, two implementations of the preceding algorithm might use different formats for the current time, causing them to produce different file identifiers for the same file created at the same time, but the uniqueness of the identifier is not affected.