7.3 对象¶
Objects
7.3.1 概述¶
General
PDF包括八种基本类型的对象:布尔值、整数和实数、字符串、名称、数组、字典、流和空对象。
对象可以被标记,以便它们可以被其他对象引用。一个被标记的对象称为间接对象(见7.3.10,“间接对象”)。
每种对象类型的创建方法以及它们作为间接对象的正确引用方式在7.3.2,“布尔对象”到7.3.10,“间接对象”中有描述。
PDF includes eight basic types of objects: Boolean values, Integer and Real numbers, Strings, Names, Arrays, Dictionaries, Streams, and the null object.
Objects may be labelled so that they can be referred to by other objects. A labelled object is called an indirect object (see 7.3.10, "Indirect Objects").
Each object type, their method of creation and their proper referencing as indirect objects is described in 7.3.2, "Boolean Objects" through 7.3.10, "Indirect Objects."
7.3.2 布尔对象¶
Boolean Objects
布尔对象表示逻辑值真(true)和假(false)。它们在PDF文件中分别使用关键词true和false来表示。
Boolean objects represent the logical values of true and false. They appear in PDF files using the keywords true and false.
7.3.3 数字对象¶
Numeric Objects
PDF提供了两种类型的数值对象:整数和实数。 整数对象 表示数学上的整数。实数对象 表示数学上的实数。数值的范围和精度可能会受到运行符合规范的阅读器的计算机内部表示形式的限制;附录C给出了典型实现的限制。
一个整数应写成一个或多个十进制数字,前面可以有符号。该值应被解释为一个有符号的十进制整数,并应转换为一个整数对象。
Example 1
Integer objects
123 43445 +17 -98 0
一个实数值应写成一个或多个十进制数字,有一个可选的符号和一个前导、尾随或嵌入式的句点(2Eh)(小数点)。该值应被解释为一个实数,并应转换为一个实数对象。
Example 2
Real objects
34.5 -3.62 +123.6 4. -.002 0.0
NOTE 1
符合规范的写入器不应使用非十进制基数的PostScript语法(例如16#FFFE)或指数格式(例如6.02E23)的数字。
NOTE 2
在整个标准中,术语数字指的是类型可以是整数或实数的对象。在预期需要实数的地方,可以使用整数代替。例如,不必将数字1.0写成实数格式;整数1就足够了。
PDF provides two types of numeric objects: integer and real. Integer objects represent mathematical integers. Real objects represent mathematical real numbers. The range and precision of numbers may be limited by the internal representations used in the computer on which the conforming reader is running; Annex C gives these limits for typical implementations.
An integer shall be written as one or more decimal digits optionally preceded by a sign. The value shall be interpreted as a signed decimal integer and shall be converted to an integer object.
Example 1
Integer objects
123 43445 +17 -98 0
A real value shall be written as one or more decimal digits with an optional sign and a leading, trailing, or embedded PERIOD (2Eh) (decimal point). The value shall be interpreted as a real number and shall be converted to a real object.
Example 2
Real objects
34.5 -3.62 +123.6 4. -.002 0.0
NOTE 1
A conforming writer shall not use the PostScript syntax for numbers with non-decimal radices (such as 16#FFFE) or in exponential format (such as 6.02E23).
NOTE 2
Throughout this standard, the term number refers to an object whose type may be either integer or real. Wherever a real number is expected, an integer may be used instead. For example, it is not necessary to write the number 1.0 in real format; the integer 1 is sufficient.
7.3.4 字符串对象¶
String Objects
7.3.4.1 概述¶
General
字符串对象应由零个或多个字节组成。字符串对象不是整数对象,而是以更紧凑的格式存储。字符串的长度可能受到实现限制;见附录C。
字符串对象应以以下两种方式之一书写:
- 作为用括号( )包围的一系列文字字符(使用左括号(28h)和右括号(29h));见7.3.4.2,“文字字符串”。
- 作为用尖括号
<
>
包围的十六进制数据(使用小于号(3Ch)和大于号(3Eh));见7.3.4.3,“十六进制字符串”。
NOTE
在许多上下文中,对字符串值内容的解释存在约定。本子条款仅定义了将字符串作为字节序列的基本语法;特定上下文中字符串内容的约定或规则在定义这些特定上下文时描述。
7.9.2,“字符串对象类型”,描述了用于字符串对象内容的编码方案。
A string object shall consist of a series of zero or more bytes. String objects are not integer objects, but are stored in a more compact format. The length of a string may be subject to implementation limits; see Annex C.
String objects shall be written in one of the following two ways:
- As a sequence of literal characters enclosed in parentheses ( ) (using LEFT PARENTHESIS (28h) and RIGHT PARENThESIS (29h)); see 7.3.4.2, "Literal Strings."
- As hexadecimal data enclosed in angle brackets
<
>
(using LESS-THAN SIGN (3Ch) and GREATER- THAN SIGN (3Eh)); see 7.3.4.3, "Hexadecimal Strings."
NOTE
In many contexts, conventions exist for the interpretation of the contents of a string value. This sub-clause defines only the basic syntax for writing a string as a sequence of bytes; conventions or rules governing the contents of strings in particular contexts are described with the definition of those particular contexts.
7.9.2, "String Object Types," describes the encoding schemes used for the contents of string objects.
7.3.4.2 文字字符串¶
Literal Strings
在文字字符串中,反向固态字符(REVERSE SOLIDUS,5Ch)被用作转义字符。紧随反向固态字符之后的角色号决定了它的确切解释,如表3所示。如果紧随反向固态字符之后的角色号不是表3中显示的那些,那么将忽略反向固态字符。
Example 1
The following are valid literal strings:
( This is a string )
( Strings may contain newlines
and such . )
( Strings may contain balanced parentheses ( ) and
special characters ( * ! & } ^ % and so on ) . )
( The following is an empty string . )
()
( It has zero ( 0 ) length . )
表3中列出了反向固态字符转义序列的解释。如果字符串中的某个字符需要表示其自身,或者需要表示表3中没有列出的转义序列,可以使用反向固态字符加上字符本身来实现。例如,要表示一个左括号,可以使用\(
。如果反向固态字符后面的字符不是表3中的转义序列之一,那么反向固态字符将被忽略。
序列 | 意义 |
---|---|
\n | 换行符 (0Ah) (LF) |
\r | 回车符 (0Dh) (CR) |
\t | 水平制表符 (09h) (HT) |
\b | 退格符 (08h) (BS) |
\f | 表单馈送 (FF) |
\( | 左括号 (28h) |
\) | 右括号 (29h) |
\\ | 反斜杠 (5Ch) (Backslash) |
\ddd | 字符代码 ddd (八进制/octal) |
符合规范的写入器可以将文字字符串分割到多行。行尾的反斜杠(REVERSE SOLIDUS,5Ch)用于表示字符串在下一行继续。符合规范的阅读器在读取字符串时将忽略反斜杠和随后的行结束标记;生成的字符串值应与未分割时读取的值相同。
Example 2
( These \
two strings \
are the same . )
( These two strings are the same . )
如果行结束标记内出现在文字字符串中,并且没有前面的反斜杠,应被视为字节值(0Ah),不管行结束标记是回车符(0Dh)、换行符(0Ah),还是两者都有。
Example 3
( This string has an end-of-line at the end of it .
)
( So does this one .\n )
\ddd 转义序列提供了一种表示非打印ASCII字符集字符的方法。
Example 4
( This string contains \245two octal characters\307 . )
ddd数字可以由一个、两个或三个八进制数字组成;高阶溢出将被忽略。如果字符串的下一个字符也是一个数字,应使用三个八进制数字,并在需要时用前导零填充。
Example 5
the literal
( \0053 )
denotes a string containing two characters, \005 (Control-E) followed by the digit 3, whereas both
( \053 )
and
( \53 )
denote strings containing the single character \053, a plus sign (+).
由于任何8位值都可能出现在字符串中(对于反斜杠和不平衡的括号使用适当的转义),\ddd 符号提供了一种仅使用ASCII字符来指定ASCII字符集之外的字符的方法。然而,任何8位值都可能出现在字符串中,可以表示为它自己或使用 \ddd 符号描述。
当文档被加密时(见7.6,“加密”),它的所有字符串都被加密;加密的字符串值包含任意的8位值。使用文字字符串形式写入加密字符串时,符合规范的写入器应遵循描述的规则。也就是说,反斜杠字符应用作转义,以指定不平衡的括号或反斜杠字符本身。反斜杠可能(但不是必须)用于指定其他任意的8位值。
A literal string shall be written as an arbitrary number of characters enclosed in parentheses. Any characters may appear in a string except unbalanced parentheses (LEFT PARENHESIS (28h) and RIGHT PARENTHESIS (29h)) and the backslash (REVERSE SOLIDUS (5Ch)), which shall be treated specially as described in this sub-clause. Balanced pairs of parentheses within a string require no special treatment.
Example 1
The following are valid literal strings:
( This is a string )
( Strings may contain newlines
and such . )
( Strings may contain balanced parentheses ( ) and
special characters ( * ! & } ^ % and so on ) . )
( The following is an empty string . )
()
( It has zero ( 0 ) length . )
Within a literal string, the REVERSE SOLIDUS is used as an escape character. The character immediately following the REVERSE SOLIDUS determines its precise interpretation as shown in Table 3. If the character following the REVERSE SOLIDUS is not one of those shown in Table 3, the REVERSE SOLIDUS shall be ignored.
Sequence | Meaning |
---|---|
\n | LINE FEED (0Ah) (LF) |
\r | CARRIAGE RETURN (0Dh) (CR) |
\t | HORIZONTAL TAB (09h) (HT) |
\b | BACKSPACE (08h) (BS) |
\f | FORM FEED (FF) |
\( | LEFT PARENTHESIS (28h) |
\) | RIGHT PARENTHESIS (29h) |
\\ | REVERSE SOLIDUS (5Ch) (Backslash) |
\ddd | Character code ddd (octal) |
A conforming writer may split a literal string across multiple lines. The REVERSE SOLIDUS (5Ch) (backslas character) at the end of a line shall be used to indicate that the string continues on the following line. conforming reader shall disregard the REVERSE SOLIDUS and the end-of-line marker following it whe reading the string; the resulting string value shall be identical to that which would be read if the string were no split.
Example 2
( These \
two strings \
are the same . )
( These two strings are the same . )
An end-of-line marker appearing within a literal string without a preceding REVERSE SOLIDUS shall be treated as a byte value of (0Ah), irrespective of whether the end-of-line marker was a CARRIAGE RETURN (0Dh), a LINE FEED (0Ah), or both.
Example 3
( This string has an end-of-line at the end of it .
)
( So does this one .\n )
The \ddd escape sequence provides a way to represent characters outside the printable ASCII character set.
Example 4
( This string contains \245two octal characters\307 . )
The number ddd may consist of one, two, or three octal digits; high-order overflow shall be ignored. Three octal digits shall be used, with leading zeros as needed, if the next character of the string is also a digit.
Example 5
the literal
( \0053 )
denotes a string containing two characters, \005 (Control-E) followed by the digit 3, whereas both
( \053 )
and
( \53 )
denote strings containing the single character \053, a plus sign (+).
Since any 8-bit value may appear in a string (with proper escaping for REVERSE SOLIDUS (backslash) and unbalanced PARENTHESES) this \ddd notation provides a way to specify characters outside the ASCII character set by using ASCII characters only. However, any 8-bit value may appear in a string, represented either as itself or with the \ddd notation described.
When a document is encrypted (see 7.6, “Encryption”), all of its strings are encrypted; the encrypted string values contain arbitrary 8-bit values. When writing encrypted strings using the literal string form, the conforming writer shall follow the rules described. That is, the REVERSE SOLIDUS character shall be used as an escape to specify unbalanced PARENTHESES or the REVERSE SOLIDUS character itself. The REVERSE SOLIDUS may, but is not required, to be used to specify other, arbitrary 8-bit values.
7.3.4.3 十六进制字符串¶
Hexadecimal Strings
字符串也可以以十六进制形式书写,这对于在PDF文件中包含任意二进制数据非常有用。一个十六进制字符串应被书写为十六进制数字(0-9以及A-F或a-f)的序列,作为ASCII字符编码,并用尖括号(使用小于号(3Ch)和大于号(3Eh))包围。
EXAMPLE 1
< 4E6F762073686D6F7A206B6120706F702E >
每一对十六进制数字定义了字符串的一个字符。空白字符(例如空格(20h)、水平制表符(09h)、回车符(0Dh)、换行符(0Ah)和表单馈送(0Ch))将被忽略。
如果十六进制字符串的最后一个数字丢失了——也就是说,如果数字的数量是奇数——最终数字将被假定为0。
EXAMPLE 2
< 901FA3 >
is a 3-byte string consisting of the characters whose hexadecimal codes are 90, 1F, and A3, but
< 901FA >
is a 3-byte string containing the characters whose hexadecimal codes are 90, 1F, and A0.
Strings may also be written in hexadecimal form, which is useful for including arbitrary binary data in a PDF file. A hexadecimal string shall be written as a sequence of hexadecimal digits (0–9 and either A–F or a–f) encoded as ASCII characters and enclosed within angle brackets (using LESS-THAN SIGN (3Ch) and GREATER- THAN SIGN (3Eh)).
EXAMPLE 1
< 4E6F762073686D6F7A206B6120706F702E >
Each pair of hexadecimal digits defines one byte of the string. White-space characters (such as SPACE (20h), HORIZONTAL TAB (09h), CARRIAGE RETURN (0Dh), LINE FEED (0Ah), and FORM FEED (0Ch)) shall be ignored.
If the final digit of a hexadecimal string is missing—that is, if there is an odd number of digits—the final digit shall be assumed to be 0.
EXAMPLE 2
< 901FA3 >
is a 3-byte string consisting of the characters whose hexadecimal codes are 90, 1F, and A3, but
< 901FA >
is a 3-byte string containing the characters whose hexadecimal codes are 90, 1F, and A0.
7.3.5 名称对象¶
Name Objects
PDF 1.2开始,名称对象是一个原子符号,由任意字符(8位值)序列唯一定义,除了空字符(字符代码0)。唯一定义(Uniquely defined)意味着任何两个由相同字符序列组成的名称对象表示同一个对象。原子(Atomic)意味着名称没有内部结构;尽管它由字符序列定义,但这些字符不被视为名称的元素。
在PDF文件中写入名称时,应使用实线斜杠(SOLIDUS,2Fh)来引入名称。实线斜杠不是名称的一部分,而是一个前缀,表明随后的是代表PDF文件中名称的字符序列,并应遵循以下规则:
a) 名称中的数字符号(NUMBER SIGN,23h)应使用其两位数的十六进制代码(23),前面加上数字符号书写。
b) 名称中的任何普通字符(除了数字符号)应以其本身形式书写,或使用其两位数的十六进制代码,前面加上数字符号。
c) 名称中任何不是普通字符的字符应使用其两位数的十六进制代码,前面只加数字符号。
NOTE
名称编码到PDF文件中没有唯一的编码方式,因为普通字符可以以两种方式之一编码。
作为名称的一部分使用的空白应始终使用两位数的十六进制符号编码,实线斜杠和编码名称之间不得有空白插入。
在范围感叹号(EXCLAMATION MARK,21h)(!) 到波浪号(TILDE,7Eh)(~) 之外的普通字符应使用十六进制符号书写。
令牌实线斜杠(后面没有普通字符的斜杠)引入了一个由空字符序列唯一定义的有效名称。
NOTE 2
表4中显示的示例包含#的,在PDF 1.0或1.1中不是有效的文字名称。
语法名称 | 结果名称 |
---|---|
/Name1 | Name1 |
/ASomewhatLongerName | ASomewhatLongerName |
/A;Name_With-VariousCharacters? | A;Name_With-VariousCharacters? |
/1.2 | 1.2 |
/\(\(</td> <td>\)\) | |
/@pattern | @pattern |
/.notdef | .notdef |
/lime#20Green | Lime Green |
/paired#28#29parentheses | paired()parentheses |
/The_Key_of_F#23_Minor | The_Key_of_F#_Minor |
/A#42 | AB |
在PDF中,文字名称始终由实线斜杠字符(/)引入,与true、false和obj等关键词不同。
NOTE 3
本标准遵循一种排版约定,即当名称出现在连续文本和表格中时,不使用前导实线斜杠书写名称。例如,Type和FullScreen表示实际在PDF文件中(以及本标准中的代码示例)会写成/Type和/FullScreen。
名称的长度应受到实现限制;见附录C。该限制适用于名称内部表示中的字符数。例如,名称/A#20B有三个字符(A、空格、B),而不是六个。
如上所述,名称对象在PDF文件中应被视为原子。通常,构成名称的字节不会被当作文本呈现给最终用户或外部应用程序。然而,偶尔会出现需要将名称对象当作文本处理的情况,例如代表字体名称的名称(见表111中的BaseFont条目)、在分离或DeviceN颜色空间中的颜色名称,或结构类型(见14.7.3,“结构类型”)。
在这种情况下,字节序列(如果有任何数字符号序列,则先展开)应根据UTF-8进行解释,UTF-8是一种可变长度的字节编码表示Unicode,其中可打印的ASCII字符与ASCII中的表示相同。这使得名称对象能够几乎表示任何自然语言的文本,但受到名称长度实现限制。
NOTE 4
PDF没有规定为将任何给定的外部指定文本表示为名称对象而选择哪种UTF-8序列。在某些情况下,多个UTF-8序列可能表示相同的逻辑文本。由不同字节序列定义的名称对象在PDF中构成不同的名称对象,即使UTF-8序列可能具有相同的外部解释。
Beginning with PDF 1.2 a name object is an atomic symbol uniquely defined by a sequence of any characters (8-bit values) except null (character code 0). Uniquely defined means that any two name objects made up of the same sequence of characters denote the same object. Atomic means that a name has no internal structure; although it is defined by a sequence of characters, those characters are not considered elements of the name.
When writing a name in a PDF file, a SOLIDUS (2Fh) (/) shall be used to introduce a name. The SOLIDUS is not part of the name but is a prefix indicating that what follows is a sequence of characters representing the name in the PDF file and shall follow these rules:
a) A NUMBER SIGN (23h) (#) in a name shall be written by using its 2-digit hexadecimal code (23), precede by the NUMBER SIGN.
b) Any character in a name that is a regular character (other than NUMBER SIGN) shall be written as itself or by using its 2-digit hexadecimal code, preceded by the NUMBER SIGN.
c) Any character that is not a regular character shall be written using its 2-digit hexadecimal code, preceded by the NUMBER SIGN only.
NOTE
There is not a unique encoding of names into the PDF file because regular characters may be coded in either of two ways.
White space used as part of a name shall always be coded using the 2-digit hexadecimal notation and no white space may intervene between the SOLIDUS and the encoded name.
Regular characters that are outside the range EXCLAMATION MARK(21h) (!) to TILDE (7Eh) (~) should be written using the hexadecimal notation.
The token SOLIDUS (a slash followed by no regular characters) introduces a unique valid name defined by the empty sequence of characters.
NOTE 2
The examples shown in Table 4 and containing # are not valid literal names in PDF 1.0 or 1.1.
Syntax for Literal name | Resulting Name |
---|---|
/Name1 | Name1 |
/ASomewhatLongerName | ASomewhatLongerName |
/A;Name_With-VariousCharacters? | A;Name_With-VariousCharacters? |
/1.2 | 1.2 |
/\(\(</td> <td>\)\) | |
/@pattern | @pattern |
/.notdef | .notdef |
/lime#20Green | Lime Green |
/paired#28#29parentheses | paired()parentheses |
/The_Key_of_F#23_Minor | The_Key_of_F#_Minor |
/A#42 | AB |
In PDF, literal names shall always be introduced by the SOLIDUS character (/), unlike keywords such as true, false, and obj.
NOTE 3
This standard follows a typographic convention of writing names without the leading SOLIDUS when they appear in running text and tables. For example, Type and FullScreen denote names that would actually be written in a PDF file (and in code examples in this standard) as /Type and /FullScreen.
The length of a name shall be subject to an implementation limit; see Annex C. The limit applies to the number of characters in the name’s internal representation. For example, the name /A#20B has three characters (A, SPACE, B), not six.
As stated above, name objects shall be treated as atomic within a PDF file. Ordinarily, the bytes making up the name are never treated as text to be presented to a human user or to an application external to a conforming reader. However, occasionally the need arises to treat a name object as text, such as one that represents a font name (see the BaseFont entry in Table 111), a colorant name in a separation or DeviceN colour space, or a structure type (see 14.7.3, "Structure Types").
In such situations, the sequence of bytes (after expansion of NUMBER SIGN sequences, if any) should beinterpreted according to UTF-8, a variable-length byte-encoded representation of Unicode in which theprintable ASCII characters have the same representations as in ASCII. This enables a name object torepresent text virtually in any natural language, subject to the implementation limit on the length of a name.
NOTE 4
PDF does not prescribe what UTF-8 sequence to choose for representing any given piece of externally specified text as a name object. In some cases, multiple UTF-8 sequences may represent the same logical text. Name objects defined by different sequences of bytes constitute distinct name objects in PDF, even though the UTF-8 sequences may have identical external interpretations.
7.3.6 数组对象¶
Array Objects
数组对象是按顺序排列的对象的一维集合。与许多其他计算机语言中的数组不同,PDF数组可以是异构的;也就是说,数组的元素可以是数字、字符串、字典或任何其他对象的任何组合,包括其他数组。数组可能没有元素。
数组应被写作为用方括号包围的对象序列(使用左方括号[5Bh]和右方括号[5Dh])。
Example
[549 3.14 false ( Ralph ) /SomeName]
PDF直接支持的只有一维数组。可以通过使用数组作为数组的元素来构建更高维度的数组,可以嵌套到任意深度。
An array object is a one-dimensional collection of objects arranged sequentially. Unlike arrays in many other computer languages, PDF arrays may be heterogeneous; that is, an array’s elements may be any combination of numbers, strings, dictionaries, or any other objects, including other arrays. An array may have zero elements.
An array shall be written as a sequence of objects enclosed in SQUARE BRACKETS (using LEFT SQUARE BRACKET (5Bh) and RIGHT SQUARE BRACKET (5Dh)).
Example
[549 3.14 false ( Ralph ) /SomeName]
PDF directly supports only one-dimensional arrays. Arrays of higher dimension can be constructed by using arrays as elements of arrays, nested to any depth.
7.3.7 字典对象¶
Dictionary Objects
字典对象(dictionary object)是一个关联表,包含成对的对象,称为字典的条目。每个条目的第一个元素是键,第二个元素是值。键应该是一个名称(与PostScript中的字典键不同,后者可以是任何类型的对象)。值可以是任何种类的对象,包括另一个字典。如果字典条目的值是null(见7.3.9,“空对象”),则应将其视为该条目不存在。(这与PostScript不同,在PostScript中null像其他任何对象一样作为字典条目的值。)字典中条目的数量应受到实现限制;见附录C。字典可以有零个条目。
字典中的条目代表一个关联表,因此应该是无序的,即使在文件中写入时可能会强加任意顺序。该顺序将被忽略。
同一个字典中不能有多个条目具有相同的键。
字典应被写作为一系列用双尖括号(<< … >>
)包围的键值对(使用小于号(3Ch)和大于号(3Eh))。
EXAMPLE
<< /Type /Example
/Subtype /DictionaryExample
/Version 0 . 01
/IntegerItem 12
/StringItem ( a string )
/Subdictionary << /Item1 0 . 4
/Item2 true
/LastItem ( not ! )
/VeryLastItem ( OK )
>>
>>
NOTE
不要将双尖括号与单尖括号(<
和 >
)(使用小于号(3Ch)和大于号(3Eh))混淆,单尖括号用于定界十六进制字符串(见7.3.4.3,“十六进制字符串”)。
字典对象是PDF文档的主要构建块。它们通常用于收集和关联复杂对象的属性,例如字体或文档的一页,其中字典中的每个条目指定一个属性的名称和值。按照约定,如果存在,此类字典的Type条目标识字典描述的对象类型。在某些情况下,可能会使用Subtype条目(有时缩写为S)来进一步识别通用类型的专门子类别。Type或Subtype条目的值始终是一个名称。例如,在字体字典中,Type条目的值始终是Font,而Subtype条目的值可能是Type1、TrueType或几种其他值之一。
Type条目的值几乎总能从上下文中推断出来。例如,页面字体资源字典中的一个条目的值将是一个字体对象;因此,字体字典中的Type条目主要作为文档记录和错误检查的信息。除非在描述中另有说明,否则不需要Type条目;然而,如果存在该条目,它应具有正确的值。此外,任何字典中的Type条目的值,即使是在私有数据中,必须是本标准中定义的名称或已注册的名称;见附录E了解详细信息。
A dictionary object is an associative table containing pairs of objects, known as the dictionary’s entries. The first element of each entry is the key and the second element is the value. The key shall be a name (unlike dictionary keys in PostScript, which may be objects of any type). The value may be any kind of object, including another dictionary. A dictionary entry whose value is null (see 7.3.9, "Null Object") shall be treated the same as if the entry does not exist. (This differs from PostScript, where null behaves like any other object as the value of a dictionary entry.) The number of entries in a dictionary shall be subject to an implementation limit; see Annex C. A dictionary may have zero entries.
The entries in a dictionary represent an associative table and as such shall be unordered even though an arbitrary order may be imposed upon them when written in a file. That ordering shall be ignored.
Multiple entries in the same dictionary shall not have the same key.
A dictionary shall be written as a sequence of key-value pairs enclosed in double angle brackets (<< … >>
) (using LESS-THAN SIGNs (3Ch) and GREATER-THAN SIGNs (3Eh)).
EXAMPLE
<< /Type /Example
/Subtype /DictionaryExample
/Version 0 . 01
/IntegerItem 12
/StringItem ( a string )
/Subdictionary << /Item1 0 . 4
/Item2 true
/LastItem ( not ! )
/VeryLastItem ( OK )
>>
>>
NOTE
Do not confuse the double angle brackets with single angle brackets (<
and >
) (using LESS-THAN SIGN (3Ch) and GREATER-THAN SIGN (3Eh)), which delimit a hexadecimal string (see 7.3.4.3, "Hexadecimal Strings").
Dictionary objects are the main building blocks of a PDF document. They are commonly used to collect and tie together the attributes of a complex object, such as a font or a page of the document, with each entry in the dictionary specifying the name and value of an attribute. By convention, the Type entry of such a dictionary, if present, identifies the type of object the dictionary describes. In some cases, a Subtype entry (sometimes abbreviated S) may be used to further identify a specialized subcategory of the general type. The value of the Type or Subtype entry shall always be a name. For example, in a font dictionary, the value of the Type entry shall always be Font, whereas that of the Subtype entry may be Type1, TrueType, or one of several other values.
The value of the Type entry can almost always be inferred from context. The value of an entry in a page's font resource dictionary, for example, shall be a font object; therefore, the Type entry in a font dictionary serves primarily as documentation and as information for error checking. The Type entry shall not be required unless so stated in its description; however, if the entry is present, it shall have the correct value. In addition, the value of the Type entry in any dictionary, even in private data, shall be either a name defined in this standard or a registered name; see Annex E for details.
7.3.8 流对象¶
Stream Objects
7.3.8.1 概述¶
General
流对象和字符串对象一样,是字节序列。此外,流可以是无限长度的,而字符串则受到实现限制。基于这个原因,具有潜在大量数据的对象,如图像和页面描述,应该表示为流。
NOTE 1
本子条款仅描述了将流作为字节序列的语法。流被引用的上下文决定了字节序列所表示的内容。
流应由一个字典后跟零个或多个字节组成,这些字节被括在关键词stream
(后跟换行)和endstream
之间:
EXAMPLE
dictionary
stream
… Zero or more bytes …
endstream
所有流都应是间接对象(见7.3.10,“间接对象”),流字典应是直接对象。跟随流字典后的关键词stream
应后跟一个由回车符和换行符组成的行结束标记,或者仅一个换行符,而不是单独一个回车符。构成流的字节序列位于stream
关键词后的行结束标记和endstream
关键词之间;流字典指定了确切的字节数。数据后和endstream
前的行结束标记应存在;这个标记不应包括在流长度中。在endstream
和endobj
之间,除了空白之外,不应有额外的字节。
或者,从PDF 1.2开始,字节可以包含在外部文件中,在这种情况下,流字典指定了文件,并且stream
和endstream
之间的任何字节都将被符合规范的阅读器忽略。
NOTE 2
如果没有限制在关键词stream
后不能单独跟一个回车符,那么使用回车符作为其行结束标记并且数据的第一个字节是换行符的流,将无法与使用回车符-换行符序列表示行结束的流区分开来。
表5列出了所有流字典共有的条目;某些类型的流可能有额外的字典条目,这在描述这些流的地方指出。关于流的过滤器的可选条目指示了在使用数据之前是否以及如何对流中的数据进行转换(解码)。过滤器在7.4,“过滤器”中有进一步的描述。
A stream object, like a string object, is a sequence of bytes. Furthermore, a stream may be of unlimited length, whereas a string shall be subject to an implementation limit. For this reason, objects with potentially large amounts of data, such as images and page descriptions, shall be represented as streams.
NOTE 1
This sub-clause describes only the syntax for writing a stream as a sequence of bytes. The context in which a stream is referenced determines what the sequence of bytes represent.
A stream shall consist of a dictionary followed by zero or more bytes bracketed between the keywords stream (followed by newline) and endstream:
EXAMPLE
dictionary
stream
… Zero or more bytes …
endstream
All streams shall be indirect objects (see 7.3.10, "Indirect Objects") and the stream dictionary shall be a direct object. The keyword stream that follows the stream dictionary shall be followed by an end-of-line marker consisting of either a CARRIAGE RETURN and a LINE FEED or just a LINE FEED, and not by a CARRIAGE RETURN alone. The sequence of bytes that make up a stream lie between the end-of-line marker following the stream keyword and the endstream keyword; the stream dictionary specifies the exact number of bytes. There should be an end-of-line marker after the data and before endstream; this marker shall not be included in the stream length. There shall not be any extra bytes, other than white space, between endstream and endobj.
Alternatively, beginning with PDF 1.2, the bytes may be contained in an external file, in which case the stream dictionary specifies the file, and any bytes between stream and endstream shall be ignored by a conforming reader.
NOTE 2
Without the restriction against following the keyword stream by a CARRIAGE RETURN alone, it would be impossible to differentiate a stream that uses CARRIAGE RETURN as its end-of-line marker and has a LINE FEED as its first byte of data from one that uses a CARRIAGE RETURN–LINE FEED sequence to denote end- of-line.
Table 5 lists the entries common to all stream dictionaries; certain types of streams may have additional dictionary entries, as indicated where those streams are described. The optional entries regarding filters for the stream indicate whether and how the data in the stream shall be transformed (decoded) before it is used. Filters are described further in 7.4, "Filters."
7.3.8.2 流范围¶
Stream Extent
每个流字典都应有一个Length条目,该条目指示PDF文件中用于流数据的字节数。(如果流有过滤器,Length应是编码数据的字节数。)此外,大多数过滤器的定义都是自限制的;也就是说,它们使用一种编码方案,在该方案中,明确的数据结束(EOD)标记限定了数据的范围。最后,流被用来表示许多对象,从这些对象的属性中可以推断出长度。所有这些限制应该是一致的。
EXAMPLE
具有10行和20列、使用单一颜色分量和每个分量8位的图像,恰好需要200字节的图像数据。如果流使用过滤器,PDF文件中应有足够的编码数据字节来产生这200字节。如果Length太小,或者明确的EOD标记过早出现,或者解码数据不包含200字节,就会发生错误。
如果流包含过多的数据,也是一个错误,唯一的例外是在PDF文件中的endstream关键词之前的PDF文件中可能有额外的行结束标记。
Key | Type | Value |
---|---|---|
Length | 整数(integer) | (必需)从stream 关键词后的行开始到最后一个字节endstream 之前的字节数。(可能有额外的行结束标记,在endstream 前,不包括在计数内,逻辑上也不属于流数据的一部分。)见7.3.8.2,“流范围”,以获取更多讨论。 |
Filter | 名称或数组(name or array) | (可选)处理stream 和endstream 关键词之间发现的流数据的过滤器名称,或零个、一个或多个名称的数组。多个过滤器应按它们要应用的顺序指定。 |
DecodeParms | 字典或数组(dictionary or array) | (可选)由Filter指定的过滤器使用的参数字典或此类字典的数组。如果只有一个过滤器并且该过滤器有参数,除非所有过滤器的参数都具有默认值,否则DecodeParms应设置为过滤器的参数字典,否则可以省略DecodeParms条目。如果有多个过滤器并且任何一个过滤器设置了非默认值的参数,DecodeParms应是一个数组,每个过滤器有一个条目:该过滤器的参数字典,或者如果该过滤器没有参数(或者如果它所有参数都具有默认值),则是空对象。如果没有任何过滤器有参数,或者如果它们所有参数都具有默认值,可以省略DecodeParms条目。 |
F | 文件规格(file specification) | (可选;PDF 1.2)包含流数据的文件。如果存在此条目,则stream 和endstream 之间的字节将被忽略。然而,Length条目仍应指定这些字节的数量(通常没有字节,Length是0)。应用于文件数据的过滤器将由FFilter指定,过滤器参数将由FDecodeParms指定。 |
FFilter | 名称或数组(name or array) | (可选;PDF 1.2)处理流的外部文件中发现的数据的过滤器名称,或零个、一个或多个此类名称的数组。与Filter适用相同的规则。 |
FDecodeParms | 字典或数组(dictionary or array) | (可选;PDF 1.2)由FFilter指定的过滤器使用的参数字典,或此类字典的数组。与DecodeParms适用相同的规则。 |
DL | 整数(integer) | (可选;PDF 1.5)表示解码(去过滤)流中的字节数的非负整数。它可以用来确定,例如,是否有足够的磁盘空间将流写入文件。 此值应被视为提示;对于一些流过滤器,可能无法精确确定此值。 |
请注意,表中的"Key"、"Type"、"Value"列分别表示条目的键、类型和值。
Every stream dictionary shall have a Length entry that indicates how many bytes of the PDF file are used for the stream’s data. (If the stream has a filter, Length shall be the number of bytes of encoded data.) In addition, most filters are defined so that the data shall be self-limiting; that is, they use an encoding scheme in which an explicit end-of-data (EOD) marker delimits the extent of the data. Finally, streams are used to represent many objects from whose attributes a length can be inferred. All of these constraints shall be consistent.
EXAMPLE
An image with 10 rows and 20 columns, using a single colour component and 8 bits per component, requires exactly 200 bytes of image data. If the stream uses a filter, there shall be enough bytes of encoded data in the PDF file to produce those 200 bytes. An error occurs if Length is too small, if an explicit EOD marker occurs too soon, or if the decoded data does not contain 200 bytes.
It is also an error if the stream contains too much data, with the exception that there may be an extra end-of-line marker in the PDF file before the keyword endstream.
Key | Type | Value |
---|---|---|
Length | integer | (Required) The number of bytes from the beginning of the line following the keyword stream to the last byte just before the keyword endstream. (There may be an additional EOL marker, preceding endstream, that is not included in the count and is not logically part of the stream data.) See 7.3.8.2, "Stream Extent", for further discussion. |
Filter | name or array | (Optional) The name of a filter that shall be applied in processing the stream data found between the keywords stream and endstream, or an array of zero, one or several names. Multiple filters shall be specified in the order in which they are to be applied. |
DecodeParms | dictionary or array | (Optional) A parameter dictionary or an array of such dictionaries, used by the filters specified by Filter. If there is only one filter and that filter has parameters, DecodeParms shall be set to the filter’s parameter dictionary unless all the filter’s parameters have their default values, in which case the DecodeParms entry may be omitted. If there are multiple filters and any of the filters has parameters set to nondefault values, DecodeParms shall be an array with one entry for each filter: either the parameter dictionary for that filter, or the null object if that filter has no parameters (or if all of its parameters have their default values). If none of the filters have parameters, or if all their parameters have default values, the DecodeParms entry may be omitted. |
F | file specification | (Optional; PDF 1.2) The file containing the stream data. If this entry is present, the bytes between stream and endstream shall be ignored. However, the Length entry should still specify the number of those bytes (usually, there are no bytes and Length is 0). The filters that are applied to the file data shall be specified by FFilter and the filter parameters shall be specified by FDecodeParms. |
FFilter | name or array | (Optional; PDF 1.2) The name of a filter to be applied in processing the data found in the stream’s external file, or an array of zero, one or several such names. The same rules apply as for Filter. |
FDecodeParms | dictionary or array | (Optional; PDF 1.2) A parameter dictionary, or an array of such dictionaries, used by the filters specified by FFilter. The same rules apply as for DecodeParms. |
DL | integer | (Optional; PDF 1.5) A non-negative integer representing the number of bytes in the decoded (defiltered) stream. It can be used to determine, for example, whether enough disk space is available to write a stream to a file. This value shall be considered a hint only; for some stream filters, it may not be possible to determine this value precisely. |
7.3.9 空对象¶
Null Object
空对象具有与其他任何对象都不相等的类型和值。只应存在一个空类型的对象,用关键词null表示。对不存在的对象的间接对象引用(见7.3.10,“间接对象”)应与空对象相同。将空对象指定为字典条目的值(7.3.7,“字典对象”)应等同于完全省略该条目。
The null object has a type and value that are unequal to those of any other object. There shall be only one object of type null, denoted by the keyword null. An indirect object reference (see 7.3.10, "Indirect Objects") to a nonexistent object shall be treated the same as a null object. Specifying the null object as the value of a dictionary entry (7.3.7, "Dictionary Objects") shall be equivalent to omitting the entry entirely.
7.3.10 间接对象¶
Indirect Objects
PDF文件中的任何对象都可以被标记为间接对象。这为对象提供了一个唯一的对象标识符,其他对象可以通过它引用该对象(例如,作为数组的元素或字典条目的值)。对象标识符由两部分组成:
- 一个正整数对象编号。间接对象可能在PDF文件中顺序编号,但这不是必须的;对象编号可以任意顺序分配。
- 一个非负整数代数编号。在新创建的文件中,所有间接对象的代数编号应为0。在文件后续更新时可能会引入非零的代数编号;见子条款7.5.4,“交叉引用表”和7.5.6,“增量更新”。
对象编号和代数编号的组合应唯一标识一个间接对象。
PDF文件中间接对象的定义应包括其对象编号和代数编号(由空白分隔),后跟对象值,这些值被括在关键词obj和endobj之间。
Example 1
间接对象定义
12 0 obj
(Brillig)
endobj
定义了一个具有对象编号12、代数编号0和值Brillig的间接字符串对象。
对象可以从文件的其他位置通过间接引用来引用。这种间接引用应由对象编号、代数编号和关键词R组成(每个部分由空白分隔):
12 0 R
从PDF 1.5开始,间接对象可能位于对象流中(见7.5.7,“对象流”)。它们以相同的方式被引用;然而,它们的定义不应包括关键词obj和endobj,它们的代数编号应为零。
对未定义对象的间接引用不应被符合规范的阅读器视为错误;它应被视为对空对象的引用。
Example 2
如果文件包含间接引用17 0 R但不含相应的定义,则该间接引用被视为对空对象的引用。
除非另有文档记录相反,任何对象值可以是直接引用或间接引用;语义是等价的。
Example 3
下面展示了如何使用间接对象来指定流的长度。流的 Length 条目的值是一个整数对象,它在文件中的流之后。这允许在单次传递中生成PDF的应用在生成流的内容后才指定流的长度。
7 0 obj
<< /Length 8 0 R >> % 对象8的间接引用
stream
BT
/F1 12 Tf
72 712 Td
( A stream with an indirect length ) Tj
ET
endstream
endobj
8 0 obj
77 % 上一个流的长度
endobj
Any object in a PDF file may be labelled as an indirect object. This gives the object a unique object identifier by which other objects can refer to it (for example, as an element of an array or as the value of a dictionary entry). The object identifier shall consist of two parts:
- A positive integer object number. Indirect objects may be numbered sequentially within a PDF file, but this is not required; object numbers may be assigned in any arbitrary order.
- A non-negative integer generation number. In a newly created file, all indirect objects shall have generation numbers of 0. Nonzero generation numbers may be introduced when the file is later updated; see sub-clauses 7.5.4, "Cross-Reference Table" and 7.5.6, "Incremental Updates."
Together, the combination of an object number and a generation number shall uniquely identify an indirect object.
The definition of an indirect object in a PDF file shall consist of its object number and generation number (separated by white space), followed by the value of the object bracketed between the keywords obj and endobj.
Example 1
Indirect object definition
12 0 obj
(Brillig)
endobj
Defines an indirect string object with an object number of 12, a generation number of 0, and the value Brillig.
The object may be referred to from elsewhere in the file by an indirect reference. Such indirect references shall consist of the object number, the generation number, and the keyword R (with white space separating each part):
12 0 R
Beginning with PDF 1.5, indirect objects may reside in object streams (see 7.5.7, "Object Streams"). They are referred to in the same way; however, their definition shall not include the keywords obj and endobj, and their generation number shall be zero.
An indirect reference to an undefined object shall not be considered an error by a conforming reader; it shall be treated as a reference to the null object.
Example 2
If a file contains the indirect reference 17 0 R but does not contain the corresponding definition then the indirect reference is considered to refer to the null object.
Except were documented to the contrary any object value may be a direct or an indirect reference; the semantics are equivalent.
Example 3
The following shows the use of an indirect object to specify the length of a stream. The value of the stream’s Length entry is an integer object that follows the stream in the file. This allows applications that generate PDF in a single pass to defer specifying the stream’s length until after its contents have been generated.
7 0 obj
<< /Length 8 0 R >> % An indirect reference to object 8
stream
BT
/F1 12 Tf
72 712 Td
( A stream with an indirect length ) Tj
ET
endstream
endobj
8 0 obj
77 % The length of the preceding stream
endobj