跳转至

7.11 文件规范

File Specifications

7.11.1 概述

General

PDF 文件可以通过使用 文件规范(PDF 1.1)来引用另一个文件的内容,这可以采取以下两种形式之一:

  • 简单 文件规范只需提供目标文件的名称,以标准格式呈现,独立于任何特定文件系统的命名约定。它的形式可以是字符串或字典。
  • 完整 文件规范应包括与一个或多个特定文件系统相关的信息。它只能以字典的形式表示。

文件规范应引用 PDF 文件外部的文件或嵌入在引用 PDF 文件中的文件,允许其内容与 PDF 文件一起存储或传输。在任何情况下,文件都应被视为 PDF 文件的外部文件。

A PDF file can refer to the contents of another file by using a file specification (PDF 1.1), which shall take either of two forms:

  • A simple file specification shall give just the name of the target file in a standard format, independent of the naming conventions of any particular file system. It shall take the form of either a string or a dictionary
  • A full file specification shall include information related to one or more specific file systems. It shall only be represented as a dictionary.

A file specification shall refer to a file external to the PDF file or to a file embedded within the referring PDF file, allowing its contents to be stored or transmitted along with the PDF file. The file shall be considered to be external to the PDF file in either case.

7.11.2 文件规范字符串

File Specification Strings

7.11.2.1 概述

General

简单文件规范的标准表示形式在字符串形式中,将字符串分割成由 SOLIDUS 字符(2Fh)(/) 分隔的组件子字符串。SOLIDUS 是一个通用组件分隔符,在生成特定平台的文件名时应映射到适当的平台特定分隔符。任何组件都可以是空的。如果一个组件包含一个或多个文字 SOLIDI,每个都应该由 REVERSE SOLIDUS(5Ch)() 先行,而 REVERSE SOLIDUS 本身又应由另一个 REVERSE SOLIDUS 先行,以指示它是字符串的一部分,而不是转义字符。

EXAMPLE

( in\\/out )
表示文件名
in/ out

在处理字符串时,REVERSE SOLIDI 将被移除;它们仅用于区分组件值和组件分隔符。组件子字符串应以字节形式存储,并且应未经解释或转换地传递给操作系统。

The standard format for representing a simple file specification in string form divides the string into component substrings separated by the SOLIDUS character (2Fh) (/). The SOLIDUS is a generic component separator that shall be mapped to the appropriate platform-specific separator when generating a platform-dependent file name. Any of the components may be empty. If a component contains one or more literal SOLIDI, each shall be preceded by a REVERSE SOLIDUS (5Ch) (\), which in turn shall be preceded by another REVERSE SOLIDUS to indicate that it is part of the string and not an escape character.

EXAMPLE

( in\\/out )
represents the file name
in/ out

The REVERSE SOLIDI shall be removed in processing the string; they are needed only to distinguish the component values from the component separators. The component substrings shall be stored as bytes and shall be passed to the operating system without interpretation or conversion of any sort.

7.11.2.2 绝对和相对文件规范

Absolute and Relative File Specifications

简单文件规范如果以 SOLIDUS 开头,将是一个 绝对 文件规范。最后一个组件是文件名;前面的组件指定了它的上下文。在某些文件规范中,文件名可能是空的;例如,URL(统一资源定位器)规范可以指定目录而不是文件。不以 SOLIDUS 开头的文件规范是一个相对文件规范,它给出了相对于包含它的 PDF 文件的位置的文件位置。

在基于 URL 的文件系统中,应使用互联网 RFC 1808,《相对统一资源定位符》(见参考文献)的规则,从相对文件规范和 PDF 文件的规范计算出绝对 URL。在此过程之前,应使用 RFC 1738,《统一资源定位符》的转义机制,将相对文件规范转换为相对 URL,以表示任何根据 RFC 1738 不安全或在 7 位美国 ASCII 中不可表示的字节。此外,这种基于 URL 的相对文件规范应限于 RFC 1808 定义的路径。不允许使用方案、网络位置/登录、片段标识符、查询信息和参数部分。

在其他文件系统中,相对文件规范将通过从包含 PDF 文件的规范中移除文件名组件,并在其位置附加相对文件规范来转换为绝对文件规范。

EXAMPLE 1

相对文件规范

    ArtFiles / Figure1 . pdf

出现在规范为

    / HardDisk / PDFDocuments / AnnualReport / Summary.pdf

的 PDF 文件中,产生绝对规范

    / HardDisk / PDFDocuments / AnnualReport / ArtFiles / Figure1.pdf

特殊组件 ..(两个 PERIODs)(2Eh) 可以在相对文件规范中使用,以在文件系统层次结构中上移一级。在得出绝对规范后,当紧接 .. 的组件不是另一个 .. 时,两者相互抵消;两者都从文件规范中消除,然后重复该过程。

EXAMPLE 2

本子句中示例 1 的相对文件规范使用 ..(两个 PERIODs)特殊组件

../../ArtFiles/Figure1.pdf

将产生绝对规范

/HardDisk/ArtFiles/Figure1.pdf

A simple file specification that begins with a SOLIDUS shall be an absolute file specification. The last component shall be the file name; the preceding components shall specify its context. In some file specifications, the file name may be empty; for example, URL (uniform resource locator) specifications can specify directories instead of files. A file specification that does not begin with a SOLIDUS shall be a relative file specification giving the location of the file relative to that of the PDF file containing it.

In the case of a URL-based file system, the rules of Internet RFC 1808, Relative Uniform Resource Locators (see the Bibliography), shall be used to compute an absolute URL from a relative file specification and the specification of the PDF file. Prior to this process, the relative file specification shall be converted to a relative URL by using the escape mechanism of RFC 1738, Uniform Resource Locators, to represent any bytes that would be either unsafe according to RFC 1738 or not representable in 7-bit U.S. ASCII. In addition, such URL- based relative file specifications shall be limited to paths as defined in RFC 1808. The scheme, network location/login, fragment identifier, query information, and parameter sections shall not be allowed.

In the case of other file systems, a relative file specification shall be converted to an absolute file specification by removing the file name component from the specification of the containing PDF file and appending the relative file specification in its place.

EXAMPLE 1

The relative file specification

    ArtFiles / Figure1 . pdf

appearing in a PDF file whose specification is

    / HardDisk / PDFDocuments / AnnualReport / Summary. pdf

yields the absolute specification

    / HardDisk / PDFDocuments / AnnualReport / ArtFiles / Figure1 . pdf

The special component . . (two PERIODs) (2Eh) can be used in a relative file specification to move up a level in the file system hierarchy. After an absolute specification has been derived, when the component immediately preceding . . is not another . . , the two cancel each other; both are eliminated from the file specification and the process is repeated.

EXAMPLE 2

The relative file specification from EXAMPLE 1 in this sub-clause using the .. (two PERIODs) special component

../../ArtFiles/Figure1.pdf

would yield the absolute specification

/HardDisk/ArtFiles/Figure1.pdf

7.11.2.3 转换为与平台相关的文件名

Conversion to Platform-Dependent File Names

文件规范转换为特定平台的文件名取决于每个平台特定的文件命名约定:

  • 对于 DOS,初始组件应是物理或逻辑驱动器标识符,或者是 Microsoft Windows 函数 WNetGetConnection 返回的网络资源名称,并应跟随一个 COLON (3Ah) (:)。网络资源名称应由前两个组件构成;第一个组件是服务器名称,第二个是共享名称(卷名称)。所有组件应通过 REVERSE SOLIDI (backslashes) (5Ch) 分隔。可以通过使第一个组件为空来指定一个没有驱动器的绝对 DOS 路径。(其他平台会忽略空组件。)
  • 对于 Mac OS,所有组件应通过 COLONs 分隔。
  • 对于 UNIX,所有组件应通过 SOLIDI (2Fh) (slashes) 分隔。如果存在初始 SOLIDUS,应保留。

用于指定文件名的字符串应以查看文档的平台的标准编码进行解释。表 43 显示了在最常见平台上的文件规范示例。

表 43 – 文件规范示例
系统 系统依赖路径 书写形式
DOS \pdfdocs\spec.pdf(无驱动器)
r:\pdfdocs\spec.pdf
pclib/eng: pdfdocs\spec.pdf
(//pdfdocs/spec.pdf)
(/r/pdfdocs/spec.pdf)
(/pclib/eng/pdfdocs/spec.pdf)
Mac OS Mac HD:PDFDocs:spec.pdf (/Mac HD/PDFDocs/spec.pdf)
UNIX /user/fred/pdfdocs/spec.pdf
pdfdocs/spec.pdf(相对)
(/user/fred/pdfdocs/spec.pdf)
(pdfdocs/spec.pdf)

NOTE 1

在创建将在多个平台上查看的文档时,应注意确保文件名兼容性。文件规范中应仅使用 U.S. ASCII 字符集的一个子集:大写字母字符(A–Z)、数字字符(0–9)、PERIOD (2Eh) 和 LOW LINE (下划线) (5Fh)。PERIOD 在 DOS 和 Windows 文件名中有特殊含义,在 Mac OS 路径名中作为第一个字符。在文件规范中,PERIOD 只应用于将基本文件名与文件扩展名分开。

NOTE 2

一些文件系统是不区分大小写的,目录中的名称是唯一的,因此即使小写字母被更改为大写或反之,名称仍应保持可区分。在 DOS 和 Windows 3.1 系统以及一些 CD-ROM 文件系统中,文件名限制为 8 个字符加上 3 个字符的扩展名。文件系统软件通常通过保留文件名的前 6 或 7 个字符和最后一个 PERIOD 后的前 3 个字符(如果有)将长名称转换为短名称。由于第六个或第七个字符之后的字符经常转换为与原始值无关的其他值,因此文件名应从第一个 6 个字符中可区分。

The conversion of a file specification to a platform-dependent file name depends on the specific file naming conventions of each platform:

  • For DOS, the initial component shall be either a physical or logical drive identifier or a network resource name as returned by the Microsoft Windows function WNetGetConnection, and shall be followed by a COLON (3Ah) ( : ). A network resource name shall be constructed from the first two components; the first component shall be the server name and the second shall be the share name (volume name). All components shall be separated by REVERSE SOLIDI (backslashes) (5Ch). It shall be possible to specify an absolute DOS path without a drive by making the first component empty. (Empty components are ignored by other platforms.)
  • For Mac OS, all components shall be separated by COLONs.
  • For UNIX, all components shall be separated by SOLIDI (2Fh) (slashes). An initial SOLIDUS, if present, shall be preserved.

Strings used to specify a file name shall be interpreted in the standard encoding for the platform on which the document is being viewed. Table 43 shows examples of file specifications on the most common platforms.

Table 43 – Examples of file specifications
System System-dependent paths Written form
DOS \pdfdocs\spec.pdf(no drive)
r:\pdfdocs\spec.pdf
pclib/eng: pdfdocs\spec.pdf
(//pdfdocs/spec.pdf)
(/r/pdfdocs/spec.pdf)
(/pclib/eng/pdfdocs/spec.pdf)
Mac OS Mac HD:PDFDocs:spec.pdf (/Mac HD/PDFDocs/spec.pdf)
UNIX /user/fred/pdfdocs/spec.pdf
pdfdocs/spec.pdf(relative)
(/user/fred/pdfdocs/spec.pdf)
(pdfdocs/spec.pdf)

NOTE 1

When creating documents that are to be viewed on multiple platforms, care should be taken to ensure file name compatibility. Only a subset of the U.S. ASCII character set should be used in file specifications: the uppercase alphabetic characters (A–Z), the numeric characters (0–9), the PERIOD (2Eh) and the LOW LINE (underscore) (5Fh). The PERIOD has special meaning in DOS and Windows file names, and as the first character in a Mac OS pathname. In file specifications, the PERIOD should be used only to separate a base file name from a file extension.

NOTE 2

Some file systems are case-insensitive, and names within a directory are unique so names should remain distinguishable if lowercase letters are changed to uppercase or vice versa. On DOS and Windows 3.1 systems and on some CD-ROM file systems, file names are limited to 8 characters plus a 3-character extension. File system software typically converts long names to short names by retaining the first 6 or 7 characters of the file name and the first 3 characters after the last PERIOD, if any. Since characters beyond the sixth or seventh are often converted to other values unrelated to the original value, file names should be distinguishable from the first 6 characters.

7.11.2.4 文件规范中的多字节字符串

Multiple-Byte Strings in File Specifications

在 PDF 1.2 或更高版本中,文件规范可以包含多字节字符代码,这些字符代码以十六进制形式表示在尖括号(<and>)之间(使用 LESS-THAN SIGN (3Ch) 和 GREATER-THAN SIGN (3Eh))。由于 SOLIDUS (2Fh)(斜杠字符),表示为 <2F>,被用作组件分隔符,而 REVERSE SOLIDUS (5Ch)(反斜杠字符),表示为 <5C>,被用作转义字符,任何多字节字符中的这些字节的出现都应由 SOLIDUS (2Fh) 的 ASCII 代码先行。

EXAMPLE

包含 2 字节字符代码 <89 5C> 的文件名写作 <89 5C 5C>。当应用程序在文件名中遇到这一系列字节时,它将该序列替换为原始的 2 字节代码。

In PDF 1.2 or higher, a file specification may contain multiple-byte character codes, represented in hexadecimal form between angle brackets (<and>) (using LESS-THAN SIGN (3Ch) and GREATER-THAN SIGN (3Eh)). Since the SOLIDUS (2Fh) (slash character), denoted as <2F>, is used as a component delimiter and the REVERSE SOLIDUS (5Ch) (backslash character), denoted as <5C>, is used as an escape character, any occurrence of either of these bytes in a multiple-byte character shall be preceded by the ASCII code for the SOLIDUS (2Fh).

EXAMPLE

A file name containing the 2-byte character code <89 5C> is written as <89 5C 5C>. When the application encounters this sequence of bytes in a file name, it replaces the sequence with the original 2-byte code.

7.11.3 文件规范词典

File Specification Dictionaries

文件规范的字典形式比字符串形式提供了更大的灵活性,允许为不同的文件系统或平台指定不同的文件,或为标准文件系统(DOS/Windows、Mac OS 和 UNIX)之外的文件系统指定文件。表 44 显示了文件规范字典中的条目。无论平台如何,符合标准的阅读器应使用 FUF(从 PDF 1.7 开始)条目来指定文件。UF 条目是可选的,但应包括它,因为它使得跨平台和跨语言兼容性成为可能。

Table 44 – 文件规范字典中的条目
键名 类型
Type name (必需,如果存在 EFRF 条目;推荐始终存在) 描述此字典所描述的 PDF 对象的类型;对于文件规范字典,应为 Filespec
FS name (可选) 应用于解释此文件规范的文件系统的名称。如果存在此条目,则字典中的所有其他条目应由指定的文件系统解释。PDF 只定义了一个标准文件系统名称 URL(参见 7.11.5,"URL Specifications");应用程序可以注册其他名称(参见附录 E)。此条目应独立于 FUFDOSMacUnix 条目。
F string (如果 DOSMacUnix 条目全部不存在则必需;PDF 1.7 中补充使用 UF 条目) 符合 7.11.2,"File Specification Strings" 中描述的形式的文件规范字符串,或者(如果文件系统为 URL)符合 7.11.5,"URL Specifications" 描述的统一资源定位符。
应同时使用 UF 条目。UF 条目提供跨平台和跨语言兼容性,而 F 条目提供向后兼容性。
DOS byte string (可选)符合 7.11.2,"File Specification Strings" 中描述的 DOS 文件名。
此条目已过时,不应由符合要求的编写器使用。
Mac byte string (可选)符合 7.11.2,"File Specification Strings" 中描述的 Mac OS 文件名。
此条目已过时,不应由符合要求的编写器使用。
Unix byte string (可选)符合 7.11.2,"File Specification Strings" 中描述的 UNIX 文件名。
此条目已过时,不应由符合要求的编写器使用。
ID array (可选)由两个字节字符串组成的数组,构成文件标识符(参见 14.4,"File Identifiers"),应包含在所引用的文件中。

NOTE

使用此条目可以提高应用程序找到预期文件的机会,并允许在文件自链接后发生更改时向用户发出警告。

V boolean (可选;PDF 1.2)指示由文件规范引用的文件是否是 volatile(随时间频繁更改)。如果值为 true,应用程序不应缓存文件的副本。例如,引用指向实时视频摄像头的 URL 的电影注释可以将此标志设置为 true,以通知符合要求的阅读器每次播放时都重新获取电影。默认值为 false
EF dictionary (如果 RF 存在则必需;PDF 1.3;PDF 1.7 中补充包括 UF 键)包含 FUFDOSMacUnix 键的子集的字典,与文件规范字典中相应的条目对应。每个此类键的值应为包含相应文件的嵌入文件流(参见 7.11.4,"Embedded File Streams")。如果存在此条目,则 Type 条目为必需,且文件规范字典应间接引用。
应在 FUF 条目中使用,替代 DOSMacUnix 条目。
RF dictionary (可选;PDF 1.3)结构与 EF 字典相同,EF 字典应存在。RF 字典中的每个键也应在 EF 字典中存在。每个值应为相关文件数组(参见 7.11.4.2,"Related Files Arrays"),用于标识与 EF 字典中相应文件相关的文件。
Desc text string (可选;PDF 1.6)与文件规范关联的描述性文本。应用于 EmbeddedFiles 名称树中的文件(参见 7.7.4,"Name Dictionary")。
CI dictionary (可选;应为间接引用;PDF 1.7)集合项字典,用于创建便携集合的用户界面(参见 7.11.6,"Collection Items")。

The dictionary form of file specification provides more flexibility than the string form, allowing different files to be specified for different file systems or platforms, or for file systems other than the standard ones (DOS/Windows, Mac OS, and UNIX). Table 44 shows the entries in a file specification dictionary. Regardless of the platform, conforming readers should use the F and UF (beginning with PDF 1.7) entries to specify files. The UF entry is optional, but should be included because it enables cross-platform and cross-language compatibility.

Table 44 – Entries in a file specification dictionary
Key Type Value
Type name (Required if an EF or RF entry is present; recommended always) The type of PDF object that this dictionary describes; shall be Filespec for a file specification dictionary.
FS name (Optional) The name of the file system that shall be used to interpret this file specification. If this entry is present, all other entries in the dictionary shall be interpreted by the designated file system. PDF shall define only one standard file system name, URL (see 7.11.5, "URL Specifications"); an application can register other names (see Annex E). This entry shall be independent of the F, UF, DOS, Mac, and Unix entries.
F string (Required if the DOS, Mac, and Unix entries are all absent; amended with the UF entry for PDF 1.7) A file specification string of the form described in 7.11.2, "File Specification Strings," or (if the file system is URL) a uniform resource locator, as described in 7.11.5, "URL Specifications."
The UF entry should be used in addition to the F entry. The UF entry provides cross-platform and cross-language compatibility and the F entry provides backwards compatibility.
DOS byte string (Optional) A file specification string (see 7.11.2, "File Specification Strings") representing a DOS file name.
This entry is obsolescent and should not be used by conforming writers.
Mac byte string (Optional) A file specification string (see 7.11.2, "File Specification Strings") representing a Mac OS file name. This entry is obsolescent and should not be used by conforming writers.
Unix byte string (Optional) A file specification string (see 7.11.2, "File Specification Strings") representing a UNIX file name.
This entry is obsolescent and should not be used by conforming writers.
ID array (Optional) An array of two byte strings constituting a file identifier (see 14.4, "File Identifiers") that should be included in the referenced file.

NOTE

The use of this entry improves an application’s chances of finding the intended file and allows it to warn the user if the file has changed since the link was made.

V boolean (Optional; PDF 1.2) A flag indicating whether the file referenced by the file specification is volatile (changes frequently with time). If the value is true, applications shall not cache a copy of the file. For example, a movie annotation referencing a URL to a live video camera could set this flag to true to notify the conforming reader that it should re-acquire the movie each time it is played. Default value: false.
EF dictionary (Required if RF is present; PDF 1.3; amended to include the UF key in PDF 1.7) A dictionary containing a subset of the keys F, UF, DOS, Mac, and Unix, corresponding to the entries by those names in the file specification dictionary. The value of each such key shall be an embedded file stream (see 7.11.4, "Embedded File Streams") containing the corresponding file. If this entry is present, the Type entry is required and the file specification dictionary shall be indirectly referenced.
The F and UF entries should be used in place of the DOS, Mac, or Unix entries.
RF dictionary (Optional; PDF 1.3) A dictionary with the same structure as the EF dictionary, which shall be present. Each key in the RF dictionary shall also be present in the EF dictionary. Each value shall be a related files array (see 7.11.4.2, "Related Files Arrays") identifying files that are related to the corresponding file in the EF dictionary. If this entry is present, the Type entry is required and the file specification dictionary shall be indirectly referenced.
Desc text string (Optional; PDF 1.6) Descriptive text associated with the file specification. It shall be used for files in the EmbeddedFiles name tree (see 7.7.4, "Name Dictionary").
CI dictionary (Optional; shall be indirect reference; PDF 1.7) A collection item dictionary, which shall be used to create the user interface for portable collections (see 7.11.6, "Collection Items").

7.11.4 嵌入式文件流

Embedded File Streams

7.11.4.1 概述

General

当 PDF 文件包含指向外部文件的文件规范,并且 PDF 文件被存档或传输时,应采取一些措施以确保外部引用仍然有效。一种方法是安排外部文件的副本随 PDF 文件一起提供。嵌入式文件流(Embedded file streams)(PDF 1.3)通过允许将被引用文件的内容直接嵌入到 PDF 文件的主体中来解决这个问题。这使得 PDF 文件成为一个可以作为单一实体存储或传输的自包含单元。(嵌入式文件仅出于方便而包含,任何符合标准的阅读器都无需直接处理它们。)

注意

如果文件包含指向外部存储的高分辨率图像的 OPI(开放预印刷接口)字典(见 14.11.7,"开放预印刷接口 (OPI)"),则可以使用嵌入式文件流将图像数据合并到 PDF 文件中。

嵌入式文件流应以以下方式之一包含在 PDF 文档中:

  • 文档中的任何文件规范字典都可能有一个 EF 条目,该条目指定了一个嵌入式文件流。流数据仍应与文件系统中的位置相关联。特别是,这种方法应用于文件附件注释(见 12.5.6.15,"文件附件注释"),这些注释将嵌入式文件与文档页面上的位置关联起来。
  • 嵌入式文件流可以通过 PDF 文档的名称字典中的 EmbeddedFiles 条目(PDF 1.4)与整个文档关联(见 7.7.4,"名称字典")。关联的名称树应将名称字符串映射到通过它们的 EF 条目引用嵌入式文件流的文件规范。

从 PDF 1.6 开始,应使用文件规范字典中的 Desc 条目(见表 44)提供嵌入式文件的文本描述,这可以在符合标准阅读器的用户界面中显示。以前,需要通过名称字典中提供的名称字符串来识别文档级别的嵌入式文件,这与 JavaScript 名称树将名称字符串与文档级别的 JavaScript 操作关联的方式类似(见 12.6.4.16,"JavaScript 操作")。

描述嵌入式文件的流字典应包含任何流的标准条目,如 LengthFilter(见 表 5),以及 表 45 所示的附加条目。

Table 45 – 嵌入文件流字典中的附加条目
键名 类型
Type name (可选) 描述此字典所描述的 PDF 对象的类型;如果存在,对于嵌入文件流应为 EmbeddedFile
Subtype name (可选) 嵌入文件的子类型。此条目的值应为一个一级名称,如附录 E 中定义的。未注册前缀的名称应符合互联网 RFC 2046 中定义的 MIME 媒体类型名称,多用途互联网邮件扩展(MIME)第二部分:媒体类型(参见参考文献),不允许名称中的字符应使用 7.3.5,"Name Objects" 中描述的二字符十六进制代码格式。
Params dictionary (可选)一个 嵌入文件参数字典,应包含额外的特定于文件的信息(参见 Table 46)。

Table 46 – 嵌入文件参数字典中的条目
键名 类型
Size integer (可选) 未压缩嵌入文件的大小,以字节为单位。
CreationDate date (可选) 嵌入文件创建的日期和时间。
ModDate date (可选) 嵌入文件上次修改的日期和时间。
Mac dictionary (可选) 一个子字典,包含特定于 Mac OS 文件的附加信息(参见 Table 47)。
CheckSum string (可选) 一个 16 字节的字符串,是未压缩嵌入文件字节的校验和。校验和应通过应用标准的 MD5 消息摘要算法(描述在互联网 RFC 1321 中,MD5 Message-Digest Algorithm;参见 参考文献)计算嵌入文件流的字节而得到。

对于 Mac OS 文件,嵌入文件参数字典中的 Mac 条目应包含进一步的子字典,其中包含特定于 Mac OS 的文件信息。Table 47 显示了此子字典的内容。

Table 47 – Mac OS 文件信息字典中的条目
键名 类型
Subtype integer (可选) 嵌入文件的文件类型。它应按照 Mac OS 的约定编码为整数:一个 4 字符 ASCII 文本字面量,应为一个 32 位整数,高位字节优先。

EXAMPLE

文件类型 "CARO" 表示为十六进制整数 4341524F,在十进制中表示为 1128354383。

Creator integer (可选) 嵌入文件的创建者签名应与 Subtype 相同的方式编码。
ResFork stream (可选) 嵌入文件资源叉的二进制内容。

If a PDF file contains file specifications that refer to an external file and the PDF file is archived or transmitted, some provision should be made to ensure that the external references will remain valid. One way to do this is to arrange for copies of the external files to accompany the PDF file. Embedded file streams (PDF 1.3) address this problem by allowing the contents of referenced files to be embedded directly within the body of the PDF file. This makes the PDF file a self-contained unit that can be stored or transmitted as a single entity. (The embedded files are included purely for convenience and need not be directly processed by any conforming reader.)

NOTE

If the file contains OPI (Open Prepress Interface) dictionaries that refer to externally stored high-resolution images (see 14.11.7, "Open Prepress Interface (OPI)"), the image data can be incorporated into the PDF file with embedded file streams.

An embedded file stream shall be included in a PDF document in one of the following ways:

  • Any file specification dictionary in the document may have an EF entry that specifies an embedded file stream. The stream data shall still be associated with a location in the file system. In particular, this method shall be used for file attachment annotations (see 12.5.6.15, "File Attachment Annotations"), which associate the embedded file with a location on a page in the document.
  • Embedded file streams may be associated with the document as a whole through the EmbeddedFiles entry (PDF 1.4) in the PDF document’s name dictionary (see 7.7.4, "Name Dictionary"). The associated name tree shall map name strings to file specifications that refer to embedded file streams through their EF entries.

Beginning with PDF 1.6, the Desc entry of the file specification dictionary (see Table 44) should be used to provide a textual description of the embedded file, which can be displayed in the user interface of a conforming reader. Previously, it was necessary to identify document-level embedded files by the name string provided in the name dictionary associated with an embedded file stream in much the same way that the JavaScript name tree associates name strings with document-level JavaScript actions (see 12.6.4.16, "JavaScript Actions").

The stream dictionary describing an embedded file shall contain the standard entries for any stream, such as Length and Filter (see Table 5), as well as the additional entries shown in Table 45.

Table 45 – Additional entries in an embedded file stream dictionary
Key Type Value
Type name (Optional) The type of PDF object that this dictionary describes; if present, shall be EmbeddedFile for an embedded file stream.
Subtype name (Optional) The subtype of the embedded file. The value of this entry shall be a first-class name, as defined in Annex E. Names without a registered prefix shall conform to the MIME media type names defined in Internet RFC 2046, Multipurpose Internet Mail Extensions (MIME), Part Two: Media Types (see the Bibliography), with the provision that characters not allowed in names shall use the 2-character hexadecimal code format described in [7.3.5], "Name Objects."
Params dictionary (Optional) An embedded file parameter dictionary that shall contain additional file-specific information (see Table 46).

Table 46 – Entries in an embedded file parameter dictionary
Key Type Value
Size integer (Optional) The size of the uncompressed embedded file, in bytes.
CreationDate date (Optional) The date and time when the embedded file was created.
ModDate date (Optional) The date and time when the embedded file was last modified.
Mac dictionary (Optional) A subdictionary containing additional information specific to Mac OS files (see Table 47).
CheckSum string (Optional) A 16-byte string that is the checksum of the bytes of the uncompressed embedded file. The checksum shall be calculated by applying the standard MD5 message-digest algorithm (described in Internet RFC 1321, The MD5 Message-Digest Algorithm; see the Bibliography) to the bytes of the embedded file stream.

For Mac OS files, the Mac entry in the embedded file parameter dictionary should hold a further subdictionary containing Mac OS–specific file information. Table 47 shows the contents of this subdictionary.

Table 47 – Entries in a Mac OS file information dictionary
Key Type Value
Subtype integer (Optional) The embedded file’s file type. It shall be encoded as an integer according to Mac OS conventions: a 4-character ASCII text literal, that shall be a 32-bit integer, with the high-order byte first.

EXAMPLE

The file type “CARO” is represented as the hexadecimal integer 4341524F, which is expressed in decimal as 1128354383.

Creator integer (Optional) The embedded file’s creator signature shall be encoded in the same way as Subtype.
ResFork stream (Optional) The binary contents of the embedded file’s resource fork.

7.11.4.2 相关文件数组

Related Files Arrays

在某些情况下,PDF 文件可以引用一组相关的文件,例如构成 DCS 1.0 颜色分离图像的五个文件集。文件规范明确命名其中一个文件;其余文件应通过该文件名的某种系统变化来识别(例如通过更改扩展名)。当这样的文件被嵌入到 PDF 文件中时,相关文件也应被嵌入。这可以通过将一个 related files 数组(PDF 1.3)作为文件规范字典中 RF 条目的值来实现。数组应有 2 ¥ n 个元素,按以下形式成对:

[ string1 stream1
    string2 stream2
    …
    stringn streamn
]

每对的第一个元素应为给出一个相关文件名称的字符串;第二个元素应为包含文件内容的嵌入文件流。

示例

在以下示例中,对象 21、31 和 41 是包含 DOS 文件 SUNSET.EPS、Mac OS 文件 Sunset.eps 和 UNIX 文件 Sunset.eps 的嵌入文件流。文件规范字典的 RF 条目指定了一个数组,对象 30 标识了与 Mac OS 文件相关的一组嵌入文件,形成了 DCS 1.0 集。示例仅显示了集合中的前两个嵌入文件流;实际的 PDF 文件当然会包含所有这些文件流。

10 0 obj
    << /Type /Filespec          % 文件规范字典
        /DOS (SUNSET.EPS)
        /Mac (Sunset.eps)        % Mac OS 文件名称
        /Unix (Sunset.eps)

        /EF << /DOS 21 0 R
                /Mac 31 0 R        % 嵌入的 Mac OS 文件
                /Unix 41 0 R
            >>
        /RF << /Mac 30 0 R >>     % Mac OS 文件的相关文件数组
    >>
endobj

30 0 obj                            % Mac OS 文件的相关文件数组
    [ ( Sunset.eps ) 31 0 R         % 包括文件 Sunset.eps 本身
        ( Sunset.C ) 32 0 R
        ( Sunset.M ) 33 0 R
        ( Sunset.Y ) 34 0 R
        ( Sunset.K ) 35 0 R
    ]
endobj

31 0 obj                              % Mac OS 文件的嵌入文件流
    << /Type /EmbeddedFile            % Sunset.eps
    /Length …
    /Filter …
    >>
stream
… Sunset.eps 的数据…
endstream
endobj

32 0 obj                             % 相关文件的嵌入文件流
    << /Type /EmbeddedFile           % Sunset.C
        /Length …
        /Filter …
    >>
stream
… Sunset . C 的数据…
endstream
endobj

In some circumstances, a PDF file can refer to a group of related files, such as the set of five files that make up a DCS 1.0 colour-separated image. The file specification explicitly names only one of the files; the rest shall be identified by some systematic variation of that file name (such as by altering the extension). When such a file is to be embedded in a PDF file, the related files shall be embedded as well. This is accomplished by including a related files array (PDF 1.3) as the value of the RF entry in the file specification dictionary. The array shall have 2 ¥ n elements, which shall be paired in the form

[ string1 stream1
  string2 stream2
  …
  stringn streamn
]

The first element of each pair shall be a string giving the name of one of the related files; the second element shall be an embedded file stream holding the file’s contents.

EXAMPLE

In the following example, objects 21, 31, and 41 are embedded file streams containing the DOS file SUNSET. EPS, the Mac OS file Sunset . eps, and the UNIX file Sunset . eps, respectively. The file specification dictionary’s RF entry specifies an array, object 30, identifying a set of embedded files related to the Mac OS file, forming a DCS 1.0 set. The example shows only the first two embedded file streams in the set; an actual PDF file would, of course, include all of them.

10 0 obj
    << /Type /Filespec          % File specification dictionary
       /DOS (SUNSET.EPS)
       /Mac (Sunset.eps)        % Name of Mac OS file
       /Unix (Sunset.eps)

       /EF << /DOS 21 0 R
              /Mac 31 0 R        % Embedded Mac OS file
              /Unix 41 0 R
          >>
       /RF << /Mac 30 0 R >>     % Related files array for Mac OS file
    >>
endobj

30 0 obj                            % Related files array for Mac OS file
    [ ( Sunset . eps ) 31 0 R       % Includes file Sunset . eps itself
      ( Sunset . C ) 32 0 R
      ( Sunset . M ) 33 0 R
      ( Sunset . Y ) 34 0 R
      ( Sunset . K ) 35 0 R
    ]
endobj

31 0 obj                              % Embedded file stream for Mac OS file
    << /Type /EmbeddedFile            % Sunset.eps
    /Length …
    /Filter …
    >>
stream
… Data for Sunset.eps…
endstream
endobj

32 0 obj                             % Embedded file stream for related file
    << /Type /EmbeddedFile           % Sunset . C
       /Length …
       /Filter …
    >>
stream
… Data for Sunset . C …
endstream
endobj

7.11.5 URL 规范

URL Specifications

当文件规范字典中的 FS 条目具有值 URL 时,该字典中 F 条目的值不是文件规范字符串,而是按照互联网 RFC 1738,《统一资源定位符》中定义的形式的统一资源定位符(URL)。

EXAMPLE

以下示例展示了一个 URL 规范。

<< /FS /URL
    /F (ftp://www.beatles.com/Movies/AbbeyRoad.mov)
>>

URL 应遵守 RFC 1738 中指定的字符编码要求。由于 7 位美国 ASCII 是 PDFDocEncoding 的严格子集,因此该值也应被视为使用该编码。

When the FS entry in a file specification dictionary has the value URL, the value of the F entry in that dictionary is not a file specification string, but a uniform resource locator (URL) of the form defined in Internet RFC 1738, Uniform Resource Locators (see the Bibliography).

EXAMPLE

The following example shows a URL specification.

<< /FS /URL
   /F (ftp://www.beatles.com/Movies/AbbeyRoad.mov)
>>

The URL shall adhere to the character-encoding requirements specified in RFC 1738. Because 7-bit U.S. ASCII is a strict subset of PDFDocEncoding, this value shall also be considered to be in that encoding.

7.11.6 收藏项

Collection Items

从 PDF 1.7 开始,集合项字典 应包含集合中特定文件的集合模式字典描述的数据(参见 12.3.5,“Collections”)。表 48 描述了集合项字典中的条目。

Table 48 – 集合项字典中的条目
类型
Type name (可选) 描述此字典的 PDF 对象的类型;如果存在,则应为 CollectionItem,表示集合项字典。
其他键 text string, date, number or dictionary (可选) 提供与集合字典中相关字段对应的数据。如果条目是字典,则应为集合子项字典(见 Table 49)。
每个条目的类型应与集合模式字典中相同键引用的集合字段字典(见 Table 156)中识别的数据类型匹配。

示例

如果相应的集合字段具有 Subtype 条目为 S,则条目为文本字符串。

单个集合项字典可以包含多个条目,每个条目表示一个键(见 12.3.5 的示例 1)。

集合子项字典 提供与集合字典中相关字段对应的数据,并提供一种将前缀字符串与该数据值关联的方法。排序算法将忽略前缀。表 49 描述了集合子项字典中的条目。

Table 49 – 集合子项字典中的条目
类型
Type name (可选) 描述此字典的 PDF 对象的类型;如果存在,则应为 CollectionSubitem,表示集合子项字典。
D text string, date, or number (可选) 与集合字段字典中相关条目对应的数据(见 Table 157)。数据类型应与相应的集合字段字典所识别的数据类型匹配。默认值:无。
P text string (可选) 一个前缀字符串,将与呈现给用户的文本字符串连接。当符合阅读器对集合中的项目进行排序时,此条目将被忽略。默认值:无。

Beginning with PDF 1.7, a collection item dictionary shall contain the data described by the collection schema dictionary for a particular file in a collection (see 12.3.5, "Collections"). Table 48 describes the entries in a collection item dictionary.

Table 48 – Entries in a collection item dictionary
Key Type Value
Type name (Optional) The type of PDF object that this dictionary describes; if present, shall be CollectionItem for a collection item dictionary.
Other keys text string, date, number or dictionary (Optional) Provides the data corresponding to the related fields in the collection dictionary. If the entry is a dictionary, then it shall be a collection subitem dictionary (see Table 49).
The type of each entry shall match the type of data identified by the collection field dictionary (see Table 157) referenced by the same key in the collection schema dictionary (see Table 156).

EXAMPLE

If the corresponding collection field has a Subtype entry of S, then the entry is a text string.

    A single collection item dictionary may contain multiple entries, with one entry representing each key (see EXAMPLE 1 in [12.3.5], "Collections").
    </td>
</tr>

A collection subitem dictionary provides the data corresponding to the related fields in the collection dictionary, and it provides a means of associating a prefix string with that data value. The prefix shall be ignored by the sorting algorithm. Table 49 describes the entries in a collection subitem dictionary.

Table 49 – Entries in a collection subitem dictionary
Key Type Value
Type name (Optional) The type of PDF object that this dictionary describes; if present, shall be CollectionSubitem for a collection item dictionary.
D text string, date, or number (Optional) The data corresponding to the related entry in the collection field dictionary (see Table 157). The type of data shall match the data type identified by the corresponding collection field dictionary. Default: none.
P text string (Optional) A prefix string that shall be concatenated with the text string presented to the user. This entry is ignored when a conforming reader sorts the items in the collection. Default: none.

7.11.7 文件规范的维护

Maintenance of File Specifications

本小节描述的技术可用于在以下类型的操作期间维护 PDF 文件中文件规范的完整性:

  • 当引用的文件被重命名时,更新相关的文件规范
  • 确定复制到镜像站点的所有文件的完整集合
  • 在创建对外部文件的新链接时,发现引用相同文件的现有文件规范并共享它们
  • 查找与要打包或解压的嵌入文件相关的文件规范

NOTE 1

通常无法找到 PDF 文件中的所有文件规范字符串,因为没有办法确定给定字符串是否是文件规范字符串。然而,可以找到所有文件规范 字典,前提是它们满足以下条件:

  • 它们是间接对象。
  • 它们包含一个 Type 条目,其值为 Filespec

NOTE 2

符合要求的阅读器可以通过遍历 PDF 文件的交叉引用表(见 7.5.4,"Cross-Reference Table")来定位所有文件规范字典,并找到所有具有 Type 键值为 Filespec 的字典。因此,所有文件规范应以字典形式表达,并满足上述条件。任何指定嵌入文件的文件规范字典(即包含 EF 条目的字典)都应满足这些条件(见 Table 44)。

NOTE 3

可能无法定位直接对象的文件规范字典,因为它们既不是自我类型化的,也不一定能通过任何标准对象引用路径到达。

NOTE 4

文件可以直接或间接嵌入到 PDF 文件中。直接嵌入时,使用文件规范字典中的 EF 条目;间接嵌入时,使用 RF 条目中指定的相关文件数组。如果文件是间接嵌入的,则其名称由相关文件数组中嵌入文件流之前的字符串给出。如果是直接嵌入的,则其名称由文件规范字典中相应条目的值给出。

EXAMPLE

例如,7.11.4.2,"Related Files Arrays" 中的示例显示了 EF 字典具有 DOS 条目,标识对象号 21 作为嵌入文件流。嵌入的 DOS 文件名 SUNSET.EPS 是由文件规范字典中的 DOS 条目给出的。

NOTE 5

同一个外部文件可能会被多个文件规范引用。因此,在嵌入具有特定名称的文件时,建议检查其他文件规范字典中与相应键关联的相同名称的值的出现情况。这要求找到所有可嵌入的文件规范,并对每个匹配键进行以下两个条件的检查:

  • 与键关联的字符串值与要嵌入的文件的名称匹配。
  • 文件规范中已经没有为该文件名称嵌入的文件。

NOTE 6

如果 EF 字典中已经有相应的键,则已经为该文件名称嵌入了文件。

NOTE 7

与特定文件名相关联的文件不必是唯一的。同一文件名(例如 readme.txt)可以与不同文件规范中的不同嵌入文件相关联。

The techniques described in this sub-clause can be used to maintain the integrity of the file specifications within a PDF file during the following types of operations:

  • Updating the relevant file specification when a referenced file is renamed
  • Determining the complete collection of files that are copied to a mirror site
  • When creating new links to external files, discovering existing file specifications that refer to the same files and sharing them
  • Finding the file specifications associated with embedded files to be packed or unpacked

NOTE 1

It is not possible, in general, to find all file specification strings in a PDF file because there is no way to determine whether a given string is a file specification string. It is possible, however, to find all file specification dictionaries, provided that they meet the following conditions:

They are indirect objects.

They contain a Type entry whose value is the name Filespec.

NOTE 2

A conforming reader can locate all of the file specification dictionaries by traversing the PDF file’s cross- reference table (see 7.5.4, "Cross-Reference Table") and finding all dictionaries with Type keys whose value is Filespec. For this reason, all file specifications should be expressed in dictionary form and meet the conditions stated above. Any file specification dictionary specifying embedded files (that is, one that contains an EF entry) should satisfy these conditions (see Table 44).

NOTE 3

It may not be possible to locate file specification dictionaries that are direct objects, since they are neither self- typed nor necessarily reachable by any standard path of object references.

NOTE 4

Files may be embedded in a PDF file either directly, using the EF entry in a file specification dictionary, or indirectly, using related files arrays specified in the RF entry. If a file is embedded indirectly, its name is given by the string that precedes the embedded file stream in the related files array. If it is embedded directly, its name is obtained from the value of the corresponding entry in the file specification dictionary.

EXAMPLE

The EXAMPLE in 7.11.4.2, "Related Files Arrays," for instance, shows the EF dictionary having a DOS entry identifying object number 21 as an embedded file stream. The name of the embedded DOS file, SUNSET. EPS, is given by the DOS entry in the file specification dictionary.

NOTE 5

A given external file may be referenced from more than one file specification. Therefore, when embedding a file with a given name, it is recommended to check for other occurrences of the same name as the value associated with the corresponding key in other file specification dictionaries. This requires finding all embeddable file specifications and, for each matching key, checking for both of the following conditions:

The string value associated with the key matches the name of the file being embedded.

A file has not already been embedded for the file specification.

NOTE 6

If there is already a corresponding key in the EF dictionary, a file has already been embedded for that use of the file name.

NOTE 7

Files associated with a given file name need not be unique. The same file name, such as readme . txt, may be associated with different embedded files in distinct file specifications.