14.7 逻辑结构¶
14.7 Logical Structure
14.7.1 概述¶
14.7.1 General
PDF 的逻辑结构功能(PDF 1.3)应提供一种机制,将文档内容的结构信息合并到 PDF 文件中。这些信息可能包括文档的章节和部分组织,或标识特殊元素(如图像、表格和脚注)。逻辑结构功能应具有可扩展性,使符合标准的编写器可以自由选择包含哪些结构信息以及如何表示它们,同时允许符合标准的阅读器无需了解生成器的结构约定即可导航文件。
PDF 逻辑结构与标准文档标记语言(如 HTML、SGML 和 XML)具有基本的共同特性。文档的逻辑结构应表示为结构元素的层次结构,每个元素由一个字典对象表示。与其他标记语言中的对应元素类似,PDF 结构元素可以包含内容和属性。在 PDF 中,呈现的文档内容承担了 HTML、SGML 和 XML 中文本所占据的角色。
PDF 文档的逻辑结构应与其可见内容分开存储,二者之间存在指针相互关联。这种分离使得逻辑元素的排序和嵌套可以完全独立于图形对象在文档页面上的顺序和位置。
文档目录(见 7.7.2,“Document Catalog”)中的 Markings 条目应指定一个标记信息字典,其中的条目见 表 321。该字典提供了与结构化 PDF 文档的特定用途相关的附加信息。
| 键(Key) | 类型(Type) | 值(Value) | 
|---|---|---|
| Marked | boolean | (可选)一个标志,指示文档是否符合 Tagged PDF 规范(见 14.8,“Tagged PDF”)。默认值:false。 如果 Suspects 为 true,则该文档可能不完全符合 Tagged PDF 规范。 | 
| UserProperties | boolean | (可选,PDF 1.6)一个标志,指示结构元素是否包含用户属性(见 14.7.5.4,“User Properties”)。 默认值:false。 | 
| Suspects | boolean | (可选,PDF 1.6)一个标志,指示是否存在标记可疑内容(见 14.8.2.3,“Page Content Order”)。默认值:false。 | 
PDF’s logical structure facilities (PDF 1.3) shall provide a mechanism for incorporating structural information about a document’s content into a PDF file. Such information may include the organization of the document into chapters and sections or the identification of special elements such as figures, tables, and footnotes. The logical structure facilities shall be extensible, allowing conforming writers to choose what structural information to include and how to represent it, while enabling conforming readers to navigate a file without knowing the producer’s structural conventions.
PDF logical structure shares basic features with standard document markup languages such as HTML, SGML, and XML. A document’s logical structure shall be expressed as a hierarchy of structure elements, each represented by a dictionary object. Like their counterparts in other markup languages, PDF structure elements may have content and attributes. In PDF, rendered document content takes over the role occupied by text in HTML, SGML, and XML.
A PDF document’s logical structure shall be stored separately from its visible content, with pointers from each to the other. This separation allows the ordering and nesting of logical elements to be entirely independent of the order and location of graphics objects on the document’s pages.
The Markings entry in the document catalogue (see 7.7.2, “Document Catalog”) shall specify a mark information dictionary, whose entries are shown in Table 321. It provides additional information relevant to specialized uses of structured PDF documents.
| Key | Type | Value | 
|---|---|---|
| Marked | boolean | (Optional) A flag indicating whether the document conforms to Tagged PDF conventions (see 14.8, “Tagged PDF”). Default value: false. If Suspects is true, the document may not completely conform to Tagged PDF conventions. | 
| UserProperties | boolean | (Optional; PDF 1.6) A flag indicating the presence of structure elements that contain user properties attributes (see 14.7.5.4, “User Properties”). Default value: false. | 
| Suspects | boolean | (Optional; PDF 1.6) A flag indicating the presence of tag suspects (see 14.8.2.3, “Page Content Order”). Default value: false. | 
14.7.2 结构层次¶
14.7.2 Structure Hierarchy
文档的逻辑结构应通过一组对象的层次结构来描述,称为结构层次或结构树。该层次的根节点应是一个字典对象,称为结构树根节点,通过文档目录中的 StructTreeRoot 条目来定位(见 7.7.2,“Document Catalog”)。表 322 显示了结构树根字典中的条目。K 条目应指定结构树根的直接子元素,这些子元素应为结构元素。
结构元素应由一个字典表示,其条目见 表 323。K 条目应指定结构元素的子元素,这些子元素可以是以下类型中的零个或多个项:
- 其他结构元素
- 内容项的引用,这些内容项可以是标记内容序列(见 14.6,“Marked Content”)或完整的 PDF 对象,如 XObjects 和注释。这些内容项表示与结构元素相关联的图形内容(如果有的话)。内容项的详细讨论见 14.7.4,“Structure Content”。
| 键(Key) | 类型(Type) | 值(Value) | 
|---|---|---|
| Type | name | (必需)指示此字典所描述的 PDF 对象的类型;结构树根节点应为 StructTreeRoot。 | 
| K | dictionary 或 array | (可选)结构树根节点在结构层次中的直接子元素。该值可以是表示单个结构元素的字典,也可以是包含这些字典的数组。 | 
| IDTree | name tree | (如果任何结构元素有元素标识符时必需)一个名称树,将元素标识符(见 表 323)映射到它们所表示的结构元素。 | 
| ParentTree | number tree | (如果任何结构元素包含内容项时必需)一个数字树(见 7.9.7,“Number Trees”),用于查找内容项所属的结构元素。数字树中的每个整数键应对应于文档的单个页面或作为内容项存在的单独对象(如注释或 XObject)。该整数键应为该对象中的 StructParent 或 StructParents 条目的值(见 14.7.4.4,“Finding Structure Elements from Content Items”)。与该键关联的值的形式应取决于对象的性质: 对于作为内容项的独立对象,值应为该对象的父元素(作为内容项包含它的结构元素)的间接引用。 对于包含标记内容序列的页面对象或内容流,值应为指向这些标记内容序列的父元素的引用数组。 参见 14.7.4.4,“Finding Structure Elements from Content Items” 以获取更多讨论。 | 
| ParentTreeNext Key | integer | (可选)一个大于父树中任何键的整数,应作为下一条目添加到树中的键。 | 
| RoleMap | dictionary | (可选)一个字典,将文档中使用的结构类型名称映射到其在标准结构类型集合中的大致等价物(见 14.8.4,“Standard Structure Types”)。 | 
| ClassMap | dictionary | (可选)一个字典,将指定属性类的名称对象映射到相应的属性对象或属性对象数组(见 [14.7.5.2],“Attribute Classes”)。 | 
| 键(Key) | 类型(Type) | 值(Value) | 
|---|---|---|
| Type | name | (可选)此字典所描述的 PDF 对象的类型;如果存在,应为 StructElem,表示一个结构元素。 | 
| S | name | (必需)结构类型,一个名称对象,标识该结构元素的性质及其在文档中的角色,例如章节、段落或脚注(见 14.7.3,“Structure Types”)。结构类型的名称应符合 附录 E 中描述的规范。 | 
| P | dictionary | (必需;应为间接引用)该结构元素在结构层次中的直接父元素。 | 
| ID | byte string | (可选)元素标识符,一个字节字符串,用于指定此结构元素。该字符串在文档的结构层次中应是唯一的。结构树根中的 IDTree 条目(见 表 322)定义元素标识符与所表示的结构元素之间的对应关系。 | 
| Pg | dictionary | (可选;应为间接引用)表示一个页面对象,表示某些或所有由 K 条目指定的内容项将在该页面上呈现。 | 
| K | (各种) | (可选)此结构元素的子元素。此条目的值可以是以下对象之一,或是由一个或多个以下对象组成的数组: • 表示另一个结构元素的结构元素字典 • 表示标记内容序列的整数标记内容标识符 • 表示标记内容序列的标记内容引用字典 • 表示 PDF 对象的对象引用字典 除了第一个(结构元素字典)之外的每个对象都应被视为内容项;有关每种表示形式的进一步讨论,参见 14.7.4,“Structure Content”。 如果 K 的值是一个不包含 Type 条目的字典,则应假定它是一个结构元素字典。 | 
| A | (各种) | (可选)与此结构元素相关联的单个属性对象或属性对象数组。每个属性对象应是一个字典或流。如果此条目的值是数组,则数组中的每个属性对象后面可以跟一个表示其修订号的整数(参见 14.7.5,“Structure Attributes”,和 14.7.5.3,“Attribute Revision Numbers”)。 | 
| C | name 或 array | (可选)与此结构元素相关联的属性类名称或类名称数组。如果此条目的值是数组,则数组中的每个类名称后面可以跟一个表示其修订号的整数(参见 [14.7.5.2],“Attribute Classes”,和 14.7.5.3,“Attribute Revision Numbers”)。 如果同时存在 A 和 C 条目,并且给定的属性由两者指定,则 A 条目指定的属性优先。 | 
| R | integer | (可选)此结构元素的当前修订号(参见 14.7.5.3,“Attribute Revision Numbers”)。值应为非负整数。默认值:0。 | 
| T | text string | (可选)结构元素的标题,一个文本字符串,以人类可读的形式表示该元素。标题应描述特定的结构元素,如“第 1 章”,而非仅仅描述一个通用的元素类型,如“章节”。 | 
| Lang | text string | (可选;PDF 1.4)一个语言标识符,指定结构元素中所有文本的自然语言,除非通过嵌套结构元素或标记内容的语言规范覆盖(参见 14.9.2,“Natural Language Specification”)。如果此条目不存在,则适用文档目录中指定的语言(如果有)。 | 
| Alt | text string | (可选)结构元素及其子元素的替代描述,以人类可读的形式表示,在支持无障碍功能的内容提取中有用,尤其是为了帮助有障碍的用户或其他目的(参见 14.9.3,“Alternate Descriptions”)。 | 
| E | text string | (可选;PDF 1.5)缩写的扩展形式。 | 
| ActualText | text string | (可选;PDF 1.4)替代文本,用于替代结构元素及其子元素。这些替代文本(应尽可能小范围的内容)在支持无障碍功能的内容提取中有用,尤其是为了帮助有障碍的用户或其他目的(参见 14.9.4,“Replacement Text”)。 | 
The logical structure of a document shall be described by a hierarchy of objects called the structure hierarchy or structure tree. At the root of the hierarchy shall be a dictionary object called the structure tree root, located by means of the StructTreeRoot entry in the document catalogue (see 7.7.2, “Document Catalog”). Table 322 shows the entries in the structure tree root dictionary. The K entry shall specify the immediate children of the structure tree root, which shall be structure elements.
Structure elements shall be represented by a dictionary, whose entries are shown in Table 323. The K entry shall specify the children of the structure element, which may be zero or more items of the following kinds:
- Other structure elements
- References to content items, which are either marked-content sequences (see 14.6, “Marked Content”) or complete PDF objects such as XObjects and annotations. These content items represent the graphical content, if any, associated with a structure element. Content items are discussed in detail in 14.7.4, “Structure Content.”
| Key | Type | Value | 
|---|---|---|
| Type | name | (Required) The type of PDF object that this dictionary describes; shall be StructTreeRoot for a structure tree root. | 
| K | dictionary or array | (Optional) The immediate child or children of the structure tree root in the structure hierarchy. The value may be either a dictionary representing a single structure element or an array of such dictionaries. | 
| IDTree | name tree | (Required if any structure elements have element identifiers) A name tree that maps element identifiers (see Table 323) to the structure elements they denote. | 
| ParentTree | number tree | (Required if any structure element contains content items) A number tree (see 7.9.7, “Number Trees”) used in finding the structure elements to which content items belong. Each integer key in the number tree shall correspond to a single page of the document or to an individual object (such as an annotation or an XObject) that is a content item in its own right. The integer key shall be the value of the StructParent or StructParents entry in that object (see 14.7.4.4, “Finding Structure Elements from Content Items”). The form of the associated value shall depend on the nature of the object: For an object that is a content item in its own right, the value shall be an indirect reference to the object’s parent element (the structure element that contains it as a content item). For a page object or content stream containing marked-content sequences that are content items, the value shall be an array of references to the parent elements of those marked-content sequences. See 14.7.4.4, “Finding Structure Elements from Content Items” for further discussion. | 
| ParentTreeNext Key | integer | (Optional) An integer greater than any key in the parent tree, shall be used as a key for the next entry added to the tree. | 
| RoleMap | dictionary | (Optional) A dictionary that shall map the names of structure types used in the document to their approximate equivalents in the set of standard structure types (see 14.8.4, “Standard Structure Types”). | 
| ClassMap | dictionary | (Optional) A dictionary that shall map name objects designating attribute classes to the corresponding attribute objects or arrays of attribute objects (see [14.7.5.2], “Attribute Classes”). | 
| Key | Type | Value | 
|---|---|---|
| Type | name | (Optional) The type of PDF object that this dictionary describes; if present, shall be StructElem for a structure element. | 
| S | name | (Required) The structure type, a name object identifying the nature of the structure element and its role within the document, such as a chapter, paragraph, or footnote (see 14.7.3, “Structure Types”). Names of structure types shall conform to the guidelines described in Annex E. | 
| P | dictionary | (Required; shall be an indirect reference) The structure element that is the immediate parent of this one in the structure hierarchy. | 
| ID | byte string | (Optional) The element identifier, a byte string designating this structure element. The string shall be unique among all elements in the document’s structure hierarchy. The IDTree entry in the structure tree root (see Table 322) defines the correspondence between element identifiers and the structure elements they denote. | 
| Pg | dictionary | (Optional; shall be an indirect reference) A page object representing a page on which some or all of the content items designated by the K entry shall be rendered. | 
| K | (various) | (Optional) The children of this structure element. The value of this entry may be one of the following objects or an array consisting of one or more of the following objects: • A structure element dictionary denoting another structure element • An integer marked-content identifier denoting a marked-content sequence • A marked-content reference dictionary denoting a marked-content sequence • An object reference dictionary denoting a PDF object Each of these objects other than the first (structure element dictionary) shall be considered to be a content item; see 14.7.4, “Structure Content” for further discussion of each of these forms of representation. If the value of K is a dictionary containing no Type entry, it shall be assumed to be a structure element dictionary. | 
| A | (various) | (Optional) A single attribute object or array of attribute objects associated with this structure element. Each attribute object shall be either a dictionary or a stream. If the value of this entry is an array, each attribute object in the array may be followed by an integer representing its revision number (see 14.7.5, “Structure Attributes,” and 14.7.5.3, “Attribute Revision Numbers”). | 
| C | name or array | (Optional) An attribute class name or array of class names associated with this structure element. If the value of this entry is an array, each class name in the array may be followed by an integer representing its revision number (see 14.7.5.2, “Attribute Classes,” and 14.7.5.3, “Attribute Revision Numbers”). If both the A and C entries are present and a given attribute is specified by both, the one specified by the A entry shall take precedence. | 
| R | integer | (Optional) The current revision number of this structure element (see 14.7.5.3, “Attribute Revision Numbers”). The value shall be a non-negative integer. Default value: 0. | 
| T | text string | (Optional) The title of the structure element, a text string representing it in human-readable form. The title should characterize the specific structure element, such as Chapter 1, rather than merely a generic element type, such as Chapter. | 
| Lang | text string | (Optional; PDF 1.4) A language identifier specifying the natural language for all text in the structure element except where overridden by language specifications for nested structure elements or marked content (see 14.9.2, “Natural Language Specification”). If this entry is absent, the language (if any) specified in the document catalogue applies. | 
| Alt | text string | (Optional) An alternate description of the structure element and its children in human-readable form, which is useful when extracting the document’s contents in support of accessibility to users with disabilities or for other purposes (see 14.9.3, “Alternate Descriptions”). | 
| E | text string | (Optional; PDF 1.5) The expanded form of an abbreviation. | 
| ActualText | text string | (Optional; PDF 1.4) Text that is an exact replacement for the structure element and its children. This replacement text (which should apply to as small a piece of content as possible) is useful when extracting the document’s contents in support of accessibility to users with disabilities or for other purposes (see 14.9.4, “Replacement Text”). | 
14.7.3 结构类型¶
14.7.3 Structure Types
每个结构元素应具有一个结构类型,这是一个名称对象,用于标识该结构元素的性质及其在文档中的角色(如章节、段落或脚注)。为了促进符合规范的产品之间的内容交换,PDF 定义了一组标准的结构类型;见 14.8.4,“标准结构类型”。然而,符合规范的产品并不要求采用这些标准类型,可以使用任何名称来表示它们的结构类型。
当使用标准名称以外的名称时,可以在结构树根中提供一个 角色映射(role map),将文档中使用的结构类型映射到标准集合中最接近的等效项。
注意 1
文档中使用的名为 Section 的结构类型可能会被映射到标准类型 Sect。映射不必完全精确;角色映射仅指示类型之间的大致类比,从而使符合规范的产品能够以合理的方式共享非标准的结构元素。
注意 2
相同的结构类型可以同时作为角色映射中的键和值,并且明确允许存在循环链条。因此,单个角色映射可以定义双向映射。符合规范的阅读器使用角色映射时,应遵循关联链,直到找到一个其能识别的结构类型,或返回到已遇到的结构类型。
注意 3
在 PDF 1.5 版本之前,标准元素类型从未进行过重新映射。从 PDF 1.5 开始,即使元素名称是标准类型之一,元素名称也应始终映射到角色映射中的相应名称(如果存在映射)。这样做的目的是使该元素能够表示具有与标准角色相同名称的标签,尽管它的使用方式与标准角色不同。
Every structure element shall have a structure type, a name object that identifies the nature of the structure element and its role within the document (such as a chapter, paragraph, or footnote). To facilitate the interchange of content among conforming products, PDF defines a set of standard structure types; see 14.8.4, “Standard Structure Types.” Conforming products are not required to adopt them, however, and may use any names for their structure types.
Where names other than the standard ones are used, a role map may be provided in the structure tree root, mapping the structure types used in the document to their nearest equivalents in the standard set.
NOTE 1
A structure type named Section used in the document might be mapped to the standard type Sect. The equivalence need not be exact; the role map merely indicates an approximate analogy between types, allowing conforming products to share nonstandard structure elements in a reasonable way.
NOTE 2
The same structure type may occur as both a key and a value in the role map, and circular chains of association are explicitly permitted. Therefore, a single role map may define a bidirectional mapping. A conforming reader using the role map should follow the chain of associations until it either finds a structure type it recognizes or returns to one it has already encountered.
NOTE 3
In PDF versions earlier than 1.5, standard element types were never remapped. Beginning with PDF 1.5, an element name shall always be mapped to its corresponding name in the role map, if there is one, even if the original name is one of the standard types. This shall be done to allow the element, for example, to represent a tag with the same name as a standard role, even though its use differs from the standard role.
14.7.4 结构内容¶
14.7.4 Structure Content
14.7.4.1 概述¶
14.7.4.1 General
任何结构元素都可以具有与之关联的图形内容,图形内容由一个或多个内容项组成。内容项是图形对象,它们在文档中独立于结构树存在,但如以下小节所述,它们与结构元素相关联。内容项有两种类型:
结构元素字典中的 K 项(见 表 323)应指定结构元素的子项,这些子项可以包括任何数量的内容项,以及可能包含自己内容项的子结构元素。
内容项应为结构树的叶子节点;也就是说,它们不能为了逻辑结构而在其中嵌套其他内容项。结构元素之间的层级关系应完全通过结构元素字典中的 K 项来表示,而不是通过关联的内容项的嵌套。因此,以下限制应适用:
- 限定结构内容项的标记内容序列不能在其中嵌套另一个标记内容序列作为内容项,尽管允许存在非结构性的标记内容。
- 结构内容项不得调用(使用 Do 操作符)作为结构内容项的 XObject。
Any structure element may have associated graphical content, consisting of one or more content items. Content items shall be graphical objects that exist in the document independently of the structure tree but are associated with structure elements as described in the following sub-clauses. Content items are of two kinds:
- Marked-content sequences within content streams (see 14.7.4.2, “Marked-Content Sequences as Content Items”)
- Complete PDF objects such as annotations and XObjects (see 14.7.4.3, “PDF Objects as Content Items”)
The K entry in a structure element dictionary (see Table 323) shall specify the children of the structure element, which may include any number of content items, as well as child structure elements that may in turn have content items of their own.
Content items shall be leaf nodes of the structure tree; that is, they may not have other content items nested within them for purposes of logical structure. The hierarchical relationship among structure elements shall be represented entirely by the K entries of the structure element dictionaries, not by nesting of the associated content items. Therefore, the following restrictions shall apply:
- A marked-content sequence delimiting a structure content item may not have another marked-content sequence for a content item nested within it though non-structural marked content shall be allowed.
- A structure content item shall not invoke (with the Do operator) an XObject that is itself a structure content item.
14.7.4.2 标记内容序列作为内容项¶
14.7.4.2 Marked-Content Sequences as Content Items
一组图形操作符可以通过以下方式指定为结构元素的内容项:
- 操作符应通过 BDC 和 EMC 操作符之间的标记内容序列括起来(见 14.6,“标记内容”)。尽管与标记内容序列相关联的标签与文档的逻辑结构没有直接关系,但它应与关联的结构元素的结构类型相同。
- 标记内容序列应包含一个属性列表(见 14.6.2,“属性列表”),该列表中应有 MCID 项,值为一个整数 标记内容标识符,用于唯一标识其内容流中的标记内容序列,如以下示例所示:
示例 1
2 0 obj                    % 页面对象
    << /Type /Page
        /Contents 3 0 R    % 内容流
        …
    >>
endobj
3 0 obj
    << /Length … >>            % 页面内容流
stream
    …
    /P << /MCID 0 >>            % 标记内容序列的开始
        BDC
            …
            (这是一些文本) Tj
            …
        EMC                    % 标记内容序列的结束
    …
endstream
endobj
注意
此示例及以下示例省略了用于内容项的对象中必须包含的 StructParents 项(见 14.7.4.4,“从内容项查找结构元素”)。
结构元素字典可以通过在其 K 项中引用一个或多个标记内容序列作为内容项(见 表 323)。此引用可以有两种形式:
- 一个字典对象,称为标记内容引用。表 324 显示了这种字典的内容,它应指定标记内容标识符,以及标识序列所在内容流的其他信息。示例 2 演示了如何使用标记内容引用引用示例 3 中显示的标记内容序列。
- 一个整数,指定标记内容标识符。这通常用于标记内容序列位于结构元素字典的 Pg 项指定的页面内容流中的常见情况。示例 3 显示了一个结构元素,它有三个子项:一个由标记内容标识符指定的标记内容序列,以及另外两个结构元素。
EXAMPLE 2
1 0 obj                        % 结构元素
    << /Type /StructElem
        /S /P                  % 结构类型
        /P …                   % 结构层次中的父级
        /K << /Type /MCR
            /Pg 2 0 R          % 包含标记内容序列的页面
            /MCID 0            % 标记内容标识符
        >>
    >>
endobj
| 键 | 类型 | 值 | 
|---|---|---|
| Type | name | (必需)该字典描述的 PDF 对象的类型;对于标记内容引用,应为 MCR。 | 
| Pg | dictionary | (可选;应为间接引用)表示图形对象所在页面的页面对象。此条目会覆盖包含标记内容引用的结构元素中的任何 Pg 条目;如果结构元素没有此类条目,则此条目是必需的。 | 
| Stm | stream | (可选;应为间接引用)包含标记内容序列的内容流。此条目仅应在标记内容序列位于其他内容流而非页面的内容流时出现(参见 8.10,“表单 XObjects” 和 12.5.5,“外观流”)。 如果此条目缺失,则标记内容序列应包含在由 Pg 标识的页面的内容流中(无论是标记内容引用字典中的还是父结构元素中的)。 | 
| StmOwn | (any) | (可选;应为间接引用)拥有通过 Stm 标识的流的 PDF 对象。 | 
| MCID | integer | (必需)标记内容序列在其内容流中的标记内容标识符。 | 
EXAMPLE 3
1 0 obj                    % 包含结构元素
    << /Type /StructElem
        /S /MixedContainer    % 结构类型
        /P …                  % 结构层次结构中的父级
        /Pg 2 0 R             % 包含标记内容序列的页面
        /K [ 4 0 R            % 三个子级:一个结构元素
            0                 % 一个标记内容标识符
            50R               % 另一个结构元素
        ]
    >>
endobj
2 0 obj                        % Page 对象
    << /Type /Page
        /Contents 3 0 R        % Content 流
        …
    >>
endobj
3 0 obj                        % Page's content stream
    << /Length … >>
stream
    …
    /P << /MCID 0 >>            % 标记内容序列的开始
        BDC
            ( Here is some text ) Tj
            …
        EMC                    % 标记内容序列结束
    …
endstream
endobj
除页面内容外,其他内容流也可以包含作为结构元素内容项的标记内容序列。表单 XObject 的内容可通过以下方式之一并入结构元素:
- 绘制表单 XObject 的 Do 操作符可以作为标记内容序列的一部分,并应与某个结构元素关联(参见示例 4)。在此情况下,应将整个表单 XObject 视为结构元素内容的一部分,就好像它被插入到 Do 操作符所在的标记内容序列中一样。该表单 XObject 本身不得包含与该结构元素或其他结构元素关联的标记内容序列。
- 表单 XObject 的内容流可以包含一个或多个标记内容序列,并应与结构元素关联(参见示例 5)。表单 XObject 可以具有任意的子结构,包含任意数量的与逻辑结构元素关联的标记内容序列。然而,任何使用 Do 操作符绘制该表单 XObject 的操作都不应属于逻辑结构内容项的一部分。
如果一个表单 XObject 通过多次调用 Do 操作符进行绘制,则它只能通过第一种方法并入文档的逻辑结构,即每次调用 Do 都应单独与某个结构元素关联。
示例 4
1 0 obj                    % 结构元素
    << /Type /StructElem
        /S /P                % 结构类型
        /P …                 % 结构层次中的父元素
        /Pg 2 0 R            % 包含标记内容序列的页面
        /K 0                 % 标记内容标识符
    >>
endobj
2 0 obj                         % 页面对象
    << /Type /Page
        /Resources << /XObject << /Fm4 4 0 R >>    % 资源字典
                    >>                              % 包含表单 XObject
        /Contents 3 0 R                            % 内容流
        …
    >>
endobj
3 0 obj                        % 页面内容流
    << /Length … >>
stream
    …
    /P << /MCID 0 >>                % 标记内容序列的开始
        BDC
            /Fm4 Do                 % 绘制表单 XObject
        EMC                         % 标记内容序列的结束
    …
endstream
endobj
4 0 obj                            % 表单 XObject
    << /Type /XObject
        /Subtype /Form
        /Length …
    >>
stream
    …
    ( 这里是一些文本 ) Tj
    …
endstream
endobj
示例 5
1 0 obj                        % 结构元素
    << /Type /StructElem
        /S /P                    % 结构类型
        /P …                     % 结构层次中的父元素
        /K << /Type /MCR
            /Pg 2 0 R               % 包含标记内容序列的页面
            /Stm 4 0 R              % 包含标记内容序列的流
            /MCID 0                 % 标记内容标识符
        >>
    >>
endobj
2 0 obj                            % 页面对象
    << /Type /Page
        /Resources << /XObject << /Fm4 4 0 R >>  % 资源字典
                    >>                            % 包含表单 XObject
        /Contents 3 0 R                          % 内容流
        …
    >>
endobj
3 0 obj                            % 页面内容流
    << /Length … >>
stream
    …
    /Fm4 Do                        % 绘制表单 XObject
    …
endstream
endobj
4 0 obj                            % 表单 XObject
    << /Type /XObject
        /Subtype /Form
        /Length …
    >>
stream
    …
    /P << /MCID 0 >>                % 标记内容序列的开始
        BDC
        …
        ( 这里是一些文本 ) Tj
        …
    EMC                            % 标记内容序列的结束
    …
endstream
endobj
A sequence of graphics operators in a content stream may be specified as a content item of a structure element in the following way:
- The operators shall be bracketed as a marked-content sequence between BDC and EMC operators (see 14.6, “Marked Content”). Although the tag associated with a marked-content sequence is not directly related to the document’s logical structure, it should be the same as the structure type of the associated structure element.
- The marked-content sequence shall have a property list (see 14.6.2, “Property Lists”) containing an MCID entry, which i shall be an integer marked-content identifier that uniquely identifies the marked-content sequence within its content stream, as shown in the following example:
EXAMPLE 1
2 0 obj                    % Page object
    << /Type /Page
        /Contents 3 0 R    % Content stream
        …
    >>
endobj
3 0 obj
    << /Length … >>            % Page's content stream
stream
    …
    /P << /MCID 0 >>            % Start of marked-content sequence
        BDC
            …
            ( Here is some text ) Tj
            …
        EMC                    % End of marked-content sequence
    …
endstream
endobj
NOTE
This example and the following examples omit required StructParents entries in the objects used as content items (see 14.7.4.4, “Finding Structure Elements from Content Items”).
A structure element dictionary may include one or more marked-content sequences as content items by referring to them in its K entry (see Table 323). This reference may have two forms:
- A dictionary object called a marked-content reference. Table 324 shows the contents of this type of dictionary, which shall specify the marked-content identifier, as well other information identifying the stream in which the sequence is contained. Example 2 illustrates the use of a marked-content reference to the marked-content sequence shown in Example 3.
- An integer that specifies the marked-content identifier. This may be done in the common case where the marked-content sequence is contained in the content stream of the page that is specified in the Pg entry of the structure element dictionary. Example 3 shows a structure element that has three children: a marked- content sequence specified by a marked-content identifier, as well as two other structure elements.
EXAMPLE 2
1 0 obj                        % Structure element
    << /Type /StructElem
        /S /P                  % Structure type
        /P …                   % Parent in structure hierarchy
        /K << /Type /MCR
            /Pg 2 0 R          % Page containing marked-content sequence
            /MCID 0            % Marked-content identifier
        >>
    >>
endobj
| Key | Type | Value | 
|---|---|---|
| Type | name | (Required) The type of PDF object that this dictionary describes; shall be MCR for a marked-content reference. | 
| Pg | dictionary | (Optional; shall be an indirect reference) The page object representing the page on which the graphics objects in the marked-content sequence shall be rendered. This entry overrides any Pg entry in the structure element containing the marked-content reference; it shall be required if the structure element has no such entry. | 
| Stm | stream | (Optional; shall be an indirect reference) The content stream containing the marked-content sequence. This entry should be present only if the marked-content sequence resides in a content stream other than the content stream for the page (see 8.10, “Form XObjects” and 12.5.5, “Appearance Streams”). If this entry is absent, the marked-content sequence shall be contained in the content stream of the page identified by Pg (either in the marked-content reference dictionary or in the parent structure element). | 
| StmOwn | (any) | (Optional; shall be an indirect reference) The PDF object owning the stream identified by Stems annotation to which an appearance stream belongs. | 
| MCID | integer | (Required) The marked-content identifier of the marked-content sequence within its content stream. | 
EXAMPLE 3
1 0 obj                    % Containing structure element
    << /Type /StructElem
        /S /MixedContainer    % Structure type
        /P …                  % Parent in structure hierarchy
        /Pg 2 0 R             % Page containing marked-content sequence
        /K [ 4 0 R            % Three children: a structure element
            0                 % a marked-content identifier
            50R               % another structure element
        ]
    >>
endobj
2 0 obj                        % Page object
    << /Type /Page
        /Contents 3 0 R        % Content stream
        …
    >>
endobj
3 0 obj                        % Page's content stream
    << /Length … >>
stream
    …
    /P << /MCID 0 >>            % Start of marked-content sequence
        BDC
            ( Here is some text ) Tj
            …
        EMC                    % End of marked-content sequence
    …
endstream
endobj
Content streams other than page contents may also contain marked content sequences that are content items of structure elements. The content of form XObjects may be incorporated into structure elements in one of the following ways:
- A Do operator that paints a form XObject may be part of a marked-content sequence that shall be associated with a structure element (see Example 4). In this case, the entire form XObject shall be considered to be part of the structure element’s content, as if it were inserted into the marked-content sequence at the point of the Do operator. The form XObject shall not in turn contain any marked-content sequences associated with this or other structure elements.
- The content stream of a form XObject may contain one or more marked-content sequences that shall be associated with structure elements (see Example 5). The form XObject may have arbitrary substructure, containing any number of marked-content sequences associated with logical structure elements. However, any Do operator that paints the form XObject should not be part of a logical structure content item.
A form XObject that is painted with multiple invocations of the Do operator may be incorporated into the document’s logical structure only by the first method, with each invocation of Do individually associated with a structure element.
EXAMPLE 4
1 0 obj                    % Structure element
    << /Type /StructElem
        /S /P                % Structure type
        /P …                 % Parent in structure hierarchy
        /Pg 2 0 R            % Page containing marked-content sequence
        /K 0                 % Marked-content identifier
    >>
endobj
2 0 obj                         % Page object
    << /Type /Page
        /Resources << /XObject << /Fm4 4 0 R >>    % Resource dictionary
                   >>                              % containing form XObject
        /Contents 3 0 R                            % Content stream
        …
    >>
endobj
3 0 obj                        % Page's content stream
    << /Length … >>
stream
    …
    /P << /MCID 0 >>                % Start of marked-content sequence
        BDC
            /Fm4 Do                 % Paint form XObject
        EMC                         % End of marked-content sequence
    …
endstream
endobj
4 0 obj                            % Form XObject
    << /Type /XObject
        /Subtype /Form
        /Length …
    >>
stream
    …
    ( Here is some text ) Tj
    …
endstream
endobj
EXAMPLE 5
1 0 obj                        % Structure element
    << /Type /StructElem
        /S /P                    % Structure type
        /P …                     % Parent in structure hierarchy
        /K << /Type /MCR
            /Pg 2 0 R               % Page containing marked-content sequence
            /Stm 4 0 R              % Stream containing marked-content sequence
            /MCID 0                 % Marked-content identifier
        >>
    >>
endobj
2 0 obj                            % Page object
    << /Type /Page
        /Resources << /XObject << /Fm4 4 0 R >>  % Resource dictionary
                   >>                            % containing form XObject
        /Contents 3 0 R                          % Content stream
        …
    >>
endobj
3 0 obj                            % Page's content stream
    << /Length … >>
stream
    …
    /Fm4 Do                        % Paint form XObject
    …
endstream
endobj
4 0 obj                            % Form XObject
    << /Type /XObject
        /Subtype /Form
        /Length …
    >>
stream
    …
    /P << /MCID 0 >>                % Start of marked-content sequence
        BDC
        …
        ( Here is some text ) Tj
        …
    EMC                            % End of marked-content sequence
    …
endstream
endobj
14.7.4.3 PDF对象作为内容项¶
14.7.4.3 PDF Objects as Content Items
当结构元素的内容包含整个 PDF 对象(例如 XObject 或注释)时,该对象与某个页面相关联,但不会直接包含在该页面的内容流中。在这种情况下,该对象应通过 对象引用字典 在结构元素的 K 条目中标识(参见 表 325)。
注意 1
这种引用形式仅用于整个对象。如果被引用的内容仅构成对象内容流的一部分,则应按照前述小节的描述,将其作为标记内容序列处理。
| 键 | 类型 | 值 | 
|---|---|---|
| Type | name | (必需)该字典所描述的 PDF 对象的类型;对于对象引用,应为 OBJR。 | 
| Pg | dictionary | (可选;应为间接引用)该对象应被渲染的页面对象。此条目会覆盖包含对象引用的结构元素中的 Pg 条目;如果结构元素没有此类条目,则应使用此条目。 | 
| Obj | (any) | (必需;应为间接引用)被引用的对象。 | 
注意 2
如果被引用的对象呈现在多个页面上,则每次渲染都需要单独的对象引用。然而,如果它在同一页面上被多次渲染,则仅需单个对象引用即可标识所有实例。(如果需要区分同一 XObject 在同一页面上的多个呈现,则应使用包围特定 Do 操作符调用的标记内容序列,而不是对象引用。)
When a structure element’s content includes an entire PDF object, such as an XObject or an annotation, that is associated with a page but not directly included in the page’s content stream, the object shall be identified in the structure element’s K entry by an object reference dictionary (see Table 325).
NOTE 1
This form of reference is used only for entire objects. If the referenced content forms only part of the object’s content stream, it is instead handled as a marked-content sequence, as described in the preceding sub-clause.
| Key | Type | Value | 
|---|---|---|
| Type | name | (Required) The type of PDF object that this dictionary describes; shall be OBJR for an object reference. | 
| Pg | dictionary | (Optional; shall be an indirect reference) The page object of the page on which the object shall be rendered. This entry overrides any Pg entry in the structure element containing the object reference; it shall be used if the structure element has no such entry. | 
| Obj | (any) | (Required; shall be an indirect reference) The referenced object. | 
NOTE 2
If the referenced object is rendered on multiple pages, each rendering requires a separate object reference. However, if it is rendered multiple times on the same page, just a single object reference suffices to identify all of them. (If it is important to distinguish between multiple renditions of the same XObject on the same page, they should be accessed by means of marked-content sequences enclosing particular invocations of the Do operator rather than through object references.)
14.7.4.4 从内容项中查找结构元素¶
14.7.4.4 Finding Structure Elements from Content Items
由于流不能包含对象引用,因此标记内容序列作为内容项时,无法直接引用其父结构元素(即作为内容项所属的结构元素)。为了实现这一功能,需要使用另一种机制,即 结构父树。为了保持一致性,作为整个 PDF 对象的内容项(如 XObject)也应使用父树来引用其父结构元素。
父树是一种数字树(参见 7.9.7,“数字树”),可通过文档结构树根中的 ParentTree 条目访问(参见 表 322)。该树应包含以下条目的键值对:
- 作为至少一个结构元素的内容项的对象。
- 包含至少一个作为内容项的标记内容序列的内容流。
每个条目的键是对象的 StructParent 或 StructParents 条目的整数值(参见 表 326)。这些条目的值如下:
- 如果对象通过对象引用标识为内容项(参见 14.7.4.3,“作为内容项的 PDF 对象”),则其值应为指向父结构元素的间接引用。
- 如果内容流包含作为内容项的标记内容序列,则其值应为指向这些序列的父结构元素的间接引用数组。数组中的每个元素可通过标记内容标识符作为从零开始的索引来找到相应的父结构元素。
注意
由于标记内容标识符在结构父树中作为数组索引使用,因此其赋值应尽可能小,以节省数组空间。
结构树根中的 ParentTreeNextKey 条目应保存一个整数值,该值大于当前在结构父树中使用的任何键值。每当向父树添加新条目时,应使用 ParentTreeNextKey 的当前值作为新条目的键,然后递增该值,以准备添加下一个新条目。
为了定位相关的父树条目,每个在树中表示的对象或内容流应包含一个特殊的字典条目 StructParent 或 StructParents(参见 表 326)。根据内容项的类型,该条目可能出现在包含标记内容序列的页面对象中,或者出现在表单 XObject、图像 XObject 的流字典中,亦或是出现在注释字典或任何其他作为内容项包含在结构元素中的对象字典中。其值应为该对象在结构父树中的整数键。
| 键 | 类型 | 值 | 
|---|---|---|
| StructParent | integer | (对于所有作为结构内容项的对象是必需的;PDF 1.3)该对象在结构父树中的整数键。 | 
| StructParents | integer | (对于包含作为结构内容项的标记内容序列的所有内容流是必需的;PDF 1.3)该对象在结构父树中的整数键。 在同一个对象中最多只能出现 StructParent 或 StructParents 之一。对象可以是完整的内容项,也可以是包含作为内容项的标记内容序列的容器,但不能同时具有两者。 | 
对于通过对象引用标识的内容项,可以通过其对象字典中的 StructParent 条目值,在结构父树(位于结构树根的 ParentTree 条目中)中查找相应的父结构元素。父树中的对应值应为指向父结构元素的引用(参见示例 1)。
示例 1
1 0 obj                    % 父结构元素
    << /Type /StructElem
        …
        /K << /Type /OBJR          % 对象引用
                /Pg 2 0 R            % 包含表单 XObject 的页面
                /Obj 4 0 R           % 对表单 XObject 的引用
        >>
    >>
endobj
2 0 obj                            % 页面对象
    << /Type /Page
        /Resources << /XObject << /Fm4 4 0 R >>    % 资源字典
                    >>                              % 包含表单 XObject
        /Contents 3 0 R                            % 内容流
        …
    >>
endobj
3 0 obj                            % 页面内容流
    << /Length … >>
stream
    …
    /Fm4 Do                        % 绘制表单 XObject
    …
endstream
endobj
4 0 obj                        % 表单 XObject
    << /Type /XObject
        /Subtype /Form
        /Length …
        /StructParent 6        % 父树键值
    >>
stream
    …
endstream
endobj
100 0 obj                        % 结构父树(从结构树根访问)
    << /Nums [ 0 101 0 R
                1 102 0 R
                …
                6 10R            % 页面对象 2 的条目,指向
                …                % 父结构元素
                ]
    >>
endobj
对于作为标记内容序列的内容项,检索方法类似但稍微复杂一些。由于标记内容序列本身不是一个独立的对象,其父树键值应从包含该序列的页面对象或其他内容流的 StructParents 条目中获取。从父树检索到的值不是直接指向父结构元素的引用,而是一个数组,其中包含该内容流内所有标记内容序列对应的父结构元素的引用。要找到特定序列的父结构元素,需要使用该序列的标记内容标识符作为索引,在此数组中进行查找(参见示例 2)。
示例 2
1 0 obj                    % 父结构元素
    << /Type /StructElem
        …
        /Pg 2 0 R                   % 包含标记内容序列的页面
        /K 0                        % 标记内容标识符
    >>
endobj
2 0 obj                    % 页面对象
    << /Type /Page
        /Contents 3 0 R            % 内容流
        /StructParents 6            % 父树键值
        …
    >>
endobj
3 0 obj                    % 页面内容流
    << /Length … >>
stream
    …
    /P << /MCID 0 >>            % 标记内容序列起始
        BDC
            ( Here is some text ) TJ
            …
        EMC                         % 标记内容序列结束
    …
endstream
endobj
100 0 obj                    % 结构父树(从结构树根访问)
    << /Nums [ 0 101 0 R
                1 102 0 R
                …
                6 [1 0 R]   % 页面对象 2 的条目,索引 0 处的数组元素
                …           % 指向父结构元素
            ]
    >>
endobj
Because a stream may not contain object references, there is no way for content items that are marked-content sequences to refer directly back to their parent structure elements (the ones to which they belong as content items). Instead, a different mechanism, the structural parent tree, shall be provided for this purpose. For consistency, content items that are entire PDF objects, such as XObjects, also shall use the parent tree to refer to their parent structure elements.
The parent tree is a number tree (see 7.9.7, “Number Trees”), accessed from the ParentTree entry in a document’s structure tree root (Table 322). The tree shall contain an entry for each object that is a content item of at least one structure element and for each content stream containing at least one marked-content sequence that is a content item. The key for each entry shall be an integer given as the value of the StructParent or StructParents entry in the object (see Table 326). The values of these entries shall be as follows:
- For an object identified as a content item by means of an object reference (see 14.7.4.3, “PDF Objects as Content Items”), the value shall be an indirect reference to the parent structure element.
- For a content stream containing marked-content sequences that are content items, the value shall be an array of indirect references to the sequences’ parent structure elements. The array element corresponding to each sequence shall be found by using the sequence’s marked-content identifier as a zero-based index into the array.
NOTE
Because marked-content identifiers serve as indices into an array in the structural parent tree, their assigned values should be as small as possible to conserve space in the array.
The ParentTreeNextKey entry in the structure tree root shall hold an integer value greater than any that is currently in use as a key in the structural parent tree. Whenever a new entry is added to the parent tree, the current value of ParentTreeNextKey shall be used as its key. The value shall be then incremented to prepare for the next new entry to be added.
To locate the relevant parent tree entry, each object or content stream that is represented in the tree shall contain a special dictionary entry, StructParent or StructParents (see Table 326). Depending on the type of content item, this entry may appear in the page object of a page containing marked-content sequences, in the stream dictionary of a form or image XObject, in an annotation dictionary, or in any other type of object dictionary that is included as a content item in a structure element. Its value shall be the integer key under which the entry corresponding to the object shall be found in the structural parent tree.
| Key | Type | Value | 
|---|---|---|
| StructParent | integer | (Required for all objects that are structural content items; PDF 1.3) The integer key of this object’s entry in the structural parent tree. | 
| StructParents | integer | (Required for all content streams containing marked-content sequences that are structural content items; PDF 1.3) The integer key of this object’s entry in the structural parent tree. At most one of these two entries shall be present in a given object. An object may be either a content item in its entirety or a container for marked-content sequences that are content items, but not both. | 
For a content item identified by an object reference, the parent structure element may be found by using the value of the StructParent entry in the item’s object dictionary as a retrieval key in the structural parent tree (found in the ParentTree entry of the structure tree root). The corresponding value in the parent tree shall be a reference to the parent structure element (see Example 1).
EXAMPLE 1
1 0 obj                    % Parent structure element
    << /Type /StructElem
        …
        /K << /Type /OBJR          % Object reference
              /Pg 2 0 R            % Page containing form XObject
              /Obj 4 0 R           % Reference to form XObject
        >>
    >>
endobj
2 0 obj                            % Page object
    << /Type /Page
        /Resources << /XObject << /Fm4 4 0 R >>    % Resource dictionary
                   >>                              % containing form XObject
        /Contents 3 0 R                            % Content stream
        …
    >>
endobj
3 0 obj                            % Page's content stream
    << /Length … >>
stream
    …
    /Fm4 Do                        % Paint form XObject
    …
endstream
endobj
4 0 obj                        % Form XObject
    << /Type /XObject
        /Subtype /Form
        /Length …
        /StructParent 6        % Parent tree key
    >>
stream
    …
endstream
endobj
100 0 obj                        % Parent tree (accessed from structure tree root)
    << /Nums [ 0 101 0 R
                1 102 0 R
                …
                6 10R            % Entry for page object 2; points back
                …                % to parent structure element
             ]
    >>
endobj
For a content item that is a marked-content sequence, the retrieval method is similar but slightly more complicated. Because a marked-content sequence is not an object in its own right, its parent tree key shall be found in the StructParents entry of the page object or other content stream in which the sequence resides. The value retrieved from the parent tree shall not be a reference to the parent structure element itself but to an array of such references—one for each marked-content sequence contained within that content stream. The parent structure element for the given sequence shall be found by using the sequence’s marked-content identifier as an index into this array (see Example 2).
EXAMPLE 2
1 0 obj                    % Parent structure element
    << /Type /StructElem
        …
        /Pg 2 0 R                   % Page containing marked-content sequence
        /K 0                        % Marked-content identifier
    >>
endobj
2 0 obj                    % Page object
    << /Type /Page
        /Contents 3 0 R            % Content stream
        /StructParents 6            % Parent tree key
        …
    >>
endobj
3 0 obj                    % Page's content stream
    << /Length … >>
stream
    …
    /P << /MCID 0 >>            % Start of marked-content sequence
        BDC
            ( Here is some text ) TJ
            …
        EMC                         % End of marked-content sequence
    …
endstream
endobj
100 0 obj                    % Parent tree (accessed from structure tree root)
    << /Nums [ 0 101 0 R
                1 102 0 R
                …
                6 [1 0 R]   % Entry for page object 2; array element at index 0
                …           % points back to parent structure element
            ]
    >>
endobj
14.7.5 结构属性¶
14.7.5 Structure Attributes
14.7.5.1 概述¶
14.7.5.1 General
符合标准的产品在处理逻辑结构时,可以向任何结构元素附加额外的信息,称为属性。这些属性信息应存储在与结构元素关联的一个或多个属性对象中。属性对象应为字典或流,其中包含一个 O 条目(参见 表 327),用于标识拥有该属性信息的符合标准的产品。其他条目表示具体属性:键为属性名称,值为相应的属性值。为了便于符合标准的产品之间交换内容,PDF 定义了一组由特定标准所有者标识的标准结构属性(参见 14.8.5,"标准结构属性")。此外,(PDF 1.6)属性还可用于表示用户属性(参见 14.7.5.4,"用户属性")。
| 键 | 类型 | 值 | 
|---|---|---|
| O | 名称 | (必需)拥有属性数据的符合标准的产品名称。该名称应符合 附录 E 中的命名指南。 | 
任何符合标准的产品都可以向任何结构元素附加属性,即使该结构元素是由另一个符合标准的产品创建的。多个符合标准的产品可以向同一结构元素附加属性。结构元素字典中的 A 条目(参见 表 323)应包含一个单独的属性对象,或者一个属性对象数组,同时包含修订号以协调不同符合标准的产品创建的属性(参见 14.7.5.3,"属性修订号")。当符合标准的产品为某个结构元素创建或删除第二个属性对象时,应负责将 A 条目的值从单个对象转换为数组或从数组转换回单个对象,并维护修订号的完整性。A 数组中的属性对象没有固有的顺序定义,但新对象应追加到数组末尾,以便数组的第一个元素属于最初创建该结构元素的符合标准的产品。
A conforming product that processes logical structure may attach additional information, called attributes, to any structure element. The attribute information shall be held in one or more attribute objects associated with the structure element. An attribute object shall be a dictionary or stream that includes an O entry (see Table 327) identifying the conforming product that owns the attribute information. Other entries shall represent the attributes: the keys shall be attribute names, and values shall be the corresponding attribute values. To facilitate the interchange of content among conforming products, PDF defines a set of standard structure attributes identified by specific standard owners; see 14.8.5, “Standard Structure Attributes.” In addition, (PDF 1.6) attributes may be used to represent user properties (see 14.7.5.4, “User Properties”).
| Key | Type | Value | 
|---|---|---|
| O | name | (Required) The name of the conforming product owning the attribute data. The name shall conform to the guidelines described in Annex E. | 
Any conforming product may attach attributes to any structure element, even one created by another conforming product. Multiple conforming products may attach attributes to the same structure element. The A entry in the structure element dictionary (see Table 323) shall hold either a single attribute object or an array of such objects, together with revision numbers for coordinating attributes created by different conforming products (see 14.7.5.3, “Attribute Revision Numbers”). A conforming product creating or destroying the second attribute object for a structure element shall be responsible for converting the value of the A entry from a single object to an array or vice versa, as well as for maintaining the integrity of the revision numbers. No inherent order shall be defined for the attribute objects in an A array, but new objects should be added at the end of the array so that the first array element is the one belonging to the conforming product that originally created the structure element.
14.7.5.2 属性类别¶
14.7.5.2 Attribute Classes
如果多个结构元素共享相同的属性值集合,则可以将它们定义为属性类,并共享相同的属性对象。结构元素应通过名称引用该类。类名与属性对象之间的关联应由类映射(class map)定义,该映射存储在结构树根的 ClassMap 条目中(参见 表 322)。类映射中的每个键应为名称对象,表示类的名称;相应的值应为属性对象或属性对象数组。
注意
PDF 的属性类与 Java 和 C++ 等面向对象编程语言中的类(class)概念无关。属性类仅仅是一种更紧凑存储属性信息的机制;它不具备面向对象类的继承特性。
结构元素字典中的 C 条目(参见 表 323)应包含一个类名或一个类名数组(通常还附带修订号,参见 14.7.5.3,“属性修订号”)。对于 C 条目中列出的每个类,相应的属性对象应被视为附加到该结构元素,并与该元素的 A 条目中指定的属性对象一同生效。如果同时存在 A 和 C 条目,并且某个属性在两者中都被定义,则以 A 条目指定的属性为准。
If many structure elements share the same set of attribute values, they may be defined as an attribute class sharing the identical attribute object. Structure elements shall refer to the class by name. The association between class names and attribute objects shall be defined by a dictionary called the class map, that shall be kept in the ClassMap entry of the structure tree root (see Table 322). Each key in the class map shall be a name object denoting the name of a class. The corresponding value shall be an attribute object or an array of such objects.
NOTE
PDF attribute classes are unrelated to the concept of a class in object-oriented programming languages such as Java and C++. Attribute classes are strictly a mechanism for storing attribute information in a more compact form; they have no inheritance properties like those of true object-oriented classes.
The C entry in a structure element dictionary (see Table 323) shall contain a class name or an array of class names (typically accompanied by revision numbers as well; see 14.7.5.3, “Attribute Revision Numbers”). For each class named in the C entry, the corresponding attribute object or objects shall be considered to be attached to the given structure element, along with those identified in the element’s A entry. If both the A and C entries are present and a given attribute is specified by both, the one specified by the A entry shall take precedence.
14.7.5.3 属性修订号¶
14.7.5.3 Attribute Revision Numbers
当符合标准的产品修改结构元素或其内容时,此更改可能会影响其他符合标准的产品附加到该结构元素的属性信息的有效性。为了解决这个问题,PDF 采用了一套修订号(revision numbers)系统,使符合标准的产品能够检测这些更改,并相应地更新其属性信息。本节将对此进行描述。
结构元素的修订号
结构元素应具有一个修订号,该修订号存储在结构元素字典的 R 条目中(参见 表 323)。如果 R 条目不存在,则默认值为 0。最初,修订号应为 0。当符合标准的产品修改结构元素或其任何内容项时,可以通过递增修订号来指示此更改。
注意 1
修订号与间接对象的版本号(generation number)无关(参见 7.3.10,“间接对象”)。
注意 2
如果 R 条目不存在,并且修订号要从默认值 0 递增到 1,则必须在结构元素字典中创建 R 条目,以记录修订号 1。
属性对象的修订号¶
每个附加到结构元素的属性对象都应有一个关联的修订号。该修订号存储在属性对象与结构元素的关联数组中。如果该修订号未存储在数组中,则默认值为 0。
- 结构元素 A 数组中的每个属性对象应由单个或一对数组元素表示:
- 第一个(或唯一的)元素应包含属性对象本身。
- 
第二个(如果存在)应包含与该结构元素关联的整数修订号。 
- 
结构元素 C 数组中的每个属性类应包含单个或一对元素: 
- 第一个(或唯一的)元素应包含类名。
- 第二个(如果存在)应包含关联的修订号。
在 A 和 C 数组中,修订号是可选的。如果属性对象或类名后面没有整数数组元素,则其修订号默认为 0,并且仅用一个条目表示。
注意 3
修订号不会直接存储在属性对象内部,因为同一个属性对象可能被多个结构元素引用,而这些结构元素的修订号可能不同。由于属性对象引用与整数值不同,因此可以通过数组中是否有一对条目来区分该属性对象是否附带修订号。
注意 4
当创建或修改属性对象时,其修订号应设置为结构元素当前的 R 条目值。应用程序可以通过比较属性对象的修订号与结构元素的 R 值,来判断属性对象的内容是否仍然有效,或者它们是否已因最近的结构元素更改而过时。
修订号的更新规则
- 
修改属性对象不会更改结构元素的修订号。结构元素的修订号仅在结构元素本身或其内容项被修改时才会更改。 
- 
在某些情况下,符合标准的产品可能会对结构元素进行大规模更改,这些更改可能会使所有先前的属性信息失效。在这种情况下,产品可以选择: 
- 递增结构元素的修订号,或者
- 从 A 和 C 数组中删除所有未知的属性对象。
这两个操作是互斥的:产品应选择其中之一,但不能同时执行两者。
注意 5
任何创建属性对象的符合标准的产品都应做好准备,因为这些对象可能随时被另一个符合标准的产品删除。
When a conforming product modifies a structure element or its contents, the change may affect the validity of attribute information attached to that structure element by other conforming products. A system of revision numbers shall allow conforming products to detect such changes and update their own attribute information accordingly, as described in this sub-clause.
A structure element shall have a revision number, that shall be stored in the R entry in the structure element dictionary (see Table 323) or default to 0 if no R entry is present. Initially, the revision number shall be 0. When a conforming product modifies the structure element or any of its content items, it may signal the change by incrementing the revision number.
NOTE 1
The revision number is unrelated to the generation number associated with an indirect object (see 7.3.10, “Indirect Objects”).
NOTE 2
If their is no R entry and the revision number is to be incremented from the default value of 0 to 1, an R entry must be created in the structure element dictionary in order to record the 1.
Each attribute object attached to a structure element shall have an associated revision number. The revision number shall be stored in the array that associates the attribute object with the structure element or if not stored in the array that associates the attribute object with the structure element shall default to 0.
- Each attribute object in a structure element’s A array shall be represented by a single or a pair of array elements, the first or only element shall contain the attribute object itself and the second (when present) shall contain the integer revision number associated with it in this structure element.
- The structure element’s C array shall contain a single or a pair of elements for each attribute class, the first or only shall contain the class name and the second (when present) shall contain the associated revision number.
The revision numbers are optional in both the A and C arrays. An attribute object or class name that is not followed by an integer array element shall have a revision number of 0 and is represented by a single entry in the array.
NOTE 3
The revision number is not stored directly in the attribute object because a single attribute object may be associated with more than one structure element (whose revision numbers may differ). Since an attribute object reference is distinct from an integer, that distinction is used to determine whether the attribute object is represented in the array by a single or a pair of entries.
NOTE 4
When an attribute object is created or modified, its revision number is set to the current value of the structure element’s R entry. By comparing the attribute object’s revision number with that of the structure element, an application can determine whether the contents of the attribute object are still current or whether they have been outdated by more recent changes in the underlying structure element.
Changes in an attribute object shall not change the revision number of the associated structure element, which shall change only when the structure element itself or any of its content items is modified.
Occasionally, a conforming product may make extensive changes to a structure element that are likely to invalidate all previous attribute information associated with it. In this case, instead of incrementing the structure element’s revision number, the conforming product may choose to delete all unknown attribute objects from its A and C arrays. These two actions shall be mutually exclusive: the conforming product should either increment the structure element’s revision number or remove its attribute objects, but not both.
NOTE 5
Any conforming product creating attribute objects needs to be prepared for the possibility that they can be deleted at any time by another conforming product.
14.7.5.4 用户属性¶
14.7.5.4 User Properties
大多数结构属性(参见 14.8.5,“标准结构属性”)用于指定影响元素外观的信息,例如 BackgroundColor(背景颜色)或 BorderStyle(边框样式)。
某些符合标准的写入器(例如 CAD 应用程序)可能会使用具有标准化外观的对象,其中每个对象包含区分它们的非图形信息。例如,多个晶体管可能外观相同,但具有不同的属性,如类型和零件编号。
用户属性(User Properties)(PDF 1.6)可用于存储此类信息。任何与结构元素对应的图形对象都可以具有用户属性,这些属性由属性对象字典指定,该字典的 O 条目值应为 UserProperties(参见 表 328)。
用户属性对象字典的附加条目¶
| 键(Key) | 类型(Type) | 值(Value) | 
|---|---|---|
| O | name | (必需)属性所有者。其值应为 UserProperties。 | 
| P | array | (必需)一个字典数组,每个字典表示一个用户属性(参见 表 329)。 | 
P 条目应为数组,用于指定用户属性。数组中的每个元素都应为用户属性字典,表示一个单独的属性(参见 表 329)。数组元素的顺序应按重要性排序。
用户属性字典的条目¶
| 键(Key) | 类型(Type) | 值(Value) | 
|---|---|---|
| N | text | (必需)用户属性的名称。 | 
| V | array | (必需)用户属性的值。 该条目的值可以是任何类型的 PDF 对象,但符合标准的写入器应仅使用文本字符串、数字和布尔值。符合标准的读取器应向用户显示文本、数字和布尔值,但不必显示其他类型的值;然而,读取器不应将其他值视为错误。 | 
| F | text string | (可选) V值的格式化表示,用于特殊格式显示;例如,数字-123.45可表示为($123.45)。如果此条目不存在,符合标准的读取器应使用默认格式。 | 
| H | boolean | (可选)如果为 true,则该属性应隐藏,即不应在任何显示对象属性的用户界面元素中呈现。默认值:false。 | 
如果 PDF 文档包含用户属性,则应在标记信息字典(Mark Information Dictionary)的 UserProperties 条目中提供 true 值(参见 表 321)。此条目允许符合标准的读取器快速确定是否需要搜索结构树,以查找包含用户属性的元素。
示例
以下示例展示了一个包含用户属性的结构元素,该属性包括 Part Name(零件名称)、Part Number(零件编号)、Supplier(供应商)和 Price(价格)。
100 0 obj
    << /Type /StructElem
        /S /Figure                                    % 结构类型
        /P 50 0 R                                     % 结构树中的父元素
        /A << /O /UserProperties                      % 属性对象
                /P [                     % 用户属性数组
                    << /N (Part Name) /V (Framostat) >>
                    << /N (Part Number) /V 11603 >>                     % 隐藏属性
                    << /N (Supplier) /V (Just Framostats) /H true >>    % 格式化值
                    << /N (Price) /V -37.99 /F ($37.99) >>
                    ]
            >>
    >>
endobj
Most structure attributes (see 14.8.5, “Standard Structure Attributes”) specify information that is reflected in the element’s appearance; for example, BackgroundColor or BorderStyle.
Some conforming writers, such as CAD applications, may use objects that have a standardized appearance, each of which contains non-graphical information that distinguishes the objects from one another. For example, several transistors might have the same appearance but different attributes such as type and part number.
User properties (PDF 1.6) may be used to contain such information. Any graphical object that corresponds to a structure element may have associated user properties, specified by means of an attribute object dictionary that shall have a value of UserProperties for the O entry (see Table 328).
| Key | Type | Value | 
|---|---|---|
| O | name | (Required) The attribute owner. Shall be UserProperties. | 
| P | array | (Required) An array of dictionaries, each of which represents a user property (see Table 329). | 
The P entry shall be an array specifying the user properties. Each element in the array shall be a user property dictionary representing an individual property (see Table 329). The order of the array elements shall specify attributes in order of importance.
| Key | Type | Value | 
|---|---|---|
| N | text | (Required) The name of the user property. | 
| V | array | (Required) The value of the user property. While the value of this entry shall be any type of PDF object, conforming writers should use only text string, number, and boolean values. Conforming readers should display text, number and boolean values to users but need not display values of other types; however, they should not treat other values as errors. | 
| F | text string | (Optional) A formatted representation of the value of V, that shall be used for special formatting; for example “($123.45)” for the number -123.45. If this entry is absent, conforming readers should use a default format. | 
| H | boolean | (Optional) If true, the attribute shall be hidden; that is, it shall not be shown in any user interface element that presents the attributes of an object. Default value: false. | 
PDF documents that contain user properties shall provide a UserProperties entry with a value of true in the document’s mark information dictionary (see Table 321). This entry allows conforming readers to quickly determine whether it is necessary to search the structure tree for elements containing user properties.
EXAMPLE
The following example shows a structure element containing user properties called Part Name, Part Number, Supplier, and Price.
100 0 obj
    << /Type /StructElem
        /S /Figure                                    % Structure type
        /P 50 0 R                                     % Parent in structure tree
        /A << /O /UserProperties                      % Attribute object
              /P [                     % Array of user properties
                   << /N (Part Name) /V (Framostat) >>
                   << /N (Part Number) /V 11603 >>                     % Hidden attribute
                   << /N (Supplier) /V (Just Framostats) /H true >>    % Formatted value
                   << /N (Price) /V -37.99 /F ($37.99) >>
                 ]
           >>
    >>
endobj
14.7.6 逻辑结构示例¶
14.7.6 Example of Logical Structure
下一个示例展示了一个 PDF 文件的部分内容,该文件具有简单的文档结构。结构树根(对象 300)包含结构类型为 Chap(对象 301)和 Para(对象 304) 的元素。Chap 元素(标题为“Chapter 1”)包含类型为 Head1(对象 302)和 Para(对象 303) 的元素。
这些元素通过结构树根中指定的角色映射(Role Map)映射到 标记 PDF(Tagged PDF) 规范中定义的标准结构类型(参见 14.8.4,“标准结构类型”)。对象 302 至 304 还附加了属性(参见 14.7.5,“结构属性” 和 14.8.5,“标准结构属性”)。
该示例还展示了父树(对象 400)的结构,该树将内容项映射回其父结构元素,以及 ID 树(对象 403)的结构,该树将元素标识符映射到它们所表示的结构元素。
EXAMPLE
1 0 obj                              % 文档目录
    << /Type /Catalog
        /Pages 100 0 R                % 页面树
        /StructTreeRoot 300 0 R       % 结构树根
    >>
endobj
100 0 obj                            % 页面树
    << /Type /Pages
        /Kids [ 101 1 R              % 第一个页面对象
                102 0 R              % 第二个页面对象
                ]
        /Count 2                     % 页面计数
    >>
endobj
101 1 obj                            % 第一个页面对象
    << /Type /Page
        /Parent 100 0 R                      % 父级为页面树
        /Resources << /Font << /F1 6 0 R     % 字体资源
                                /F12 7 0 R
                            >>
                        /ProcSet [ /PDF /Text ]              % 程序集
                    >>
        /MediaBox [ 0 0 612 792 ]            % 媒体框
        /Contents 201 0 R                    % 内容流
        /StructParents 0                     % 父树键
    >>
endobj
201 0 obj                             % 第一个页面的内容流
    << /Length … >>
stream
    1 1 1 rg
    0 0 612 792 re f
    BT                                   % 文本对象开始
    /Head1 << /MCID 0 >>                 % 标记内容序列 0 开始
        BDC
            0 0 0 rg
            /F1 1 Tf
            30 0 0 30 18 732 Tm
            ( 这是一个一级标题。你好,世界:) Tj
            1.1333 TL
            T*
            ( 再见,宇宙。) Tj
        EMC                                % 标记内容序列 0 结束
    /Para << /MCID 1 >>                    % 标记内容序列 1 开始
        BDC
            /F12 1 Tf
            14 0 0 14 18 660.8 Tm
            ( 这是第一段,跨越多页。它有四个简洁的句子。这是倒数第二个。) Tj
        EMC                                % 标记内容序列 1 结束
    ET
endstream
endobj
202 0 obj                                 % 第二页的内容流
    << /Length … >>
stream
    1 1 1 rg
    0 0 612 792 re f
    BT                                    % 文本对象开始
        /Para << /MCID 0 >>               % 标记内容序列 0 开始
            BDC
                0 0 0 rg
                /F12 1 Tf
                14 0 0 14 18 732 Tm
                ( 句子。这是第一段的最后一句。) Tj
            EMC                           % 标记内容序列 0 结束
    /Para << /MCID 1 >>                   % 标记内容序列 1 开始
        BDC
            /F12 1 Tf
            14 0 0 14 18 570.8 Tm
            ( 这是第二段。它有四个简洁的句子。 \ 这是倒数第二个。) Tj
        EMC                                % 标记内容序列 1 结束
    /Para << /MCID 2 >>                    % 标记内容序列 2 开始
        BDC
            1.1429 TL
            T*
            ( 句子。这是第二段的最后一句。) Tj
        EMC                                % 标记内容序列 2 结束
    ET                                     % 文本对象结束
endstream
endobj
300 0 obj                                    % 结构树根
    << /Type /StructTreeRoot
        /K [ 301 0 R                             % 两个子元素:一个章节
            304 0 R                             % 和一个段落
            ]
        /RoleMap  << /Chap /Sect                % 映射到标准结构类型
                    /Head1 /H
                    /Para /P
                    >>
        /ClassMap   << /Normal 305 0 R >>        % 包含一个属性类的类映射
        /ParentTree 400 0 R                      % 父元素树
        /ParentTreeNextKey 2                     % 下一个要使用的键
        /IDTree 403 0 R                          % 元素标识符树
    >>
endobj
301 0 obj                                    % 章节的结构元素
    << /Type /StructElem
        /S /Chap
        /ID ( Chap1 )                          % 元素标识符
        /T ( 第1章 )                       % 人类可读标题
        /P 300 0 R                             % 父元素是结构树根
        /K [ 302 0 R                           % 两个子元素:一个节头
            303 0 R                           % 和一个段落
            ]
    >>
endobj
302 0 obj                                    % 节头的结构元素
    << /Type /StructElem
        /S /Head1
        /ID ( Sec1.1 )                         % 元素标识符
        /T ( 第1.1节 )                     % 人类可读标题
        /P 301 0 R                             % 父元素是章节
        /Pg 101 1 R                            % 包含内容项的页面
        /A << /O /Layout                       % 布局拥有的属性
                /SpaceAfter 25
                /SpaceBefore 0
                /TextIndent 12.5
            >>
        /K 0                                   % 标记内容序列 0
    >>
endobj
303 0 obj                                    % 段落的结构元素
    << /Type /StructElem
        /S /Para
        /ID ( Para1 )                         % 元素标识符
        /P 301 0 R                            % 父元素是章节
        /Pg 101 1 R                           % 包含第一个内容项的页面
        /C /Normal                            % 包含此元素属性的类
        /K [ 1                                % 标记内容序列 1
                << /Type /MCR                 % 对第二个项的标记内容引用
                    /Pg 102 0 R                % 包含第二个项的页面
                    /MCID 0                    % 标记内容序列 0
                >>                         
            ]                         
    >>
endobj
304 0 obj                                    % 另一个段落的结构元素
    << /Type /StructElem
        /S /Para
        /ID ( Para2 )                         % 元素标识符
        /P 300 0 R                            % 父元素是结构树根
        /Pg 102 0 R                           % 包含内容项的页面
        /C /Normal                            % 包含此元素属性的类
        /A << /O /Layout
                /TextAlign /Justify             % 覆盖由类映射提供的属性
            >>
        /K [ 1 2 ]                            % 标记内容序列 1 和 2
    >>
endobj
305 0 obj                                 % 属性类
    << /O /Layout                            % 布局拥有的属性
        /EndIndent 0
        /StartIndent 0
        /WritingMode /LrTb
        /TextAlign /Start
    >>
endobj
400 0 obj                            % 父元素树
    << /Nums [
                0 401 0 R               % 第一个页面的父元素
                1 402 0 R               % 第二个页面的父元素
                ]
    >>
endobj
401 0 obj                          % 第一个页面的父元素数组
    [ 302 0 R                         % 标记内容序列 0 的父元素
        303 0 R                         % 标记内容序列 1 的父元素
    ]
endobj
402 0 obj                          % 第二个页面的父元素数组
    [ 303 0 R                        % 标记内容序列 0 的父元素
        304 0 R                        % 标记内容序列 1 的父元素
        304 0 R                        % 标记内容序列 2 的父元素
    ]
endobj
403 0 obj                            % ID树根节点
    << /Kids [ 404 0 R ] >>             % 引用到叶节点
endobj
404 0 obj                            % ID树叶节点
    << /Limits [ ( Chap1 ) ( Sec1.3 ) ]       % 树中最小和最大键
        /Names  [ ( Chap1 ) 301 0 R             % 从元素标识符到结构元素的映射
                    ( Sec1.1 ) 302 0 R                        
                    ( Sec1.2 ) 303 0 R
                    ( Sec1.3 ) 304 0 R
                ]
    >>
endobj
The next Example shows portions of a PDF file with a simple document structure. The structure tree root (object 300) contains elements with structure types Chap (object 301) and Para (object 304). The Chap element, titled Chapter 1, contains elements with types Head1 (object 302) and Para (object 303).
These elements are mapped to the standard structure types specified in Tagged PDF (see 14.8.4, “Standard Structure Types”) by means of the role map specified in the structure tree root. Objects 302 through 304 have attached attributes (see 14.7.5, “Structure Attributes,” and 14.8.5, “Standard Structure Attributes”).
The example also illustrates the structure of a parent tree (object 400) that maps content items back to their parent structure elements and an ID tree (object 403) that maps element identifiers to the structure elements they denote.
EXAMPLE
1 0 obj                              % Document catalog
    << /Type /Catalog
        /Pages 100 0 R                % Page tree
        /StructTreeRoot 300 0 R       % Structure tree root
    >>
endobj
100 0 obj                            % Page tree
    << /Type /Pages
        /Kids [ 101 1 R              % First page object
                102 0 R              % Second page object
                ]
        /Count 2                     % Page count
    >>
endobj
101 1 obj                            % First page object
    << /Type /Page
        /Parent 100 0 R                      % Parent is the page tree
        /Resources << /Font << /F1 6 0 R     % Font resources
                                /F12 7 0 R
                            >>
                        /ProcSet [ /PDF /Text ]              % Procedure sets
                    >>
        /MediaBox [ 0 0 612 792 ]            % Media box
        /Contents 201 0 R                    % Content stream
        /StructParents 0                     % Parent tree key
    >>
endobj
201 0 obj                             % Content stream for first page
    << /Length … >>
stream
    1 1 1 rg
    0 0 612 792 re f
    BT                                   % Start of text object
    /Head1 << /MCID 0 >>                 % Start of marked-content sequence 0
        BDC
            0 0 0 rg
            /F1 1 Tf
            30 0 0 30 18 732 Tm
            ( This is a first level heading . Hello world : ) Tj
            1.1333 TL
            T*
            ( goodbye universe . ) Tj
        EMC                                % End of marked-content sequence 0
    /Para << /MCID 1 >>                    % Start of marked-content sequence 1
        BDC
            /F12 1 Tf
            14 0 0 14 18 660.8 Tm
            ( This is the first paragraph, which spans pages . It has four fairly short and \
concise sentences . This is the next to last ) Tj
        EMC                                % End of marked-content sequence 1
    ET
endstream
endobj
202 0 obj                                 % Content stream for second page
    << /Length … >>
stream
    1 1 1 rg
    0 0 612 792 re f
    BT                                    % Start of text object
        /Para << /MCID 0 >>               % Start of marked-content sequence 0
            BDC
                0 0 0 rg
                /F12 1 Tf
                14 0 0 14 18 732 Tm
                ( sentence . This is the very last sentence of the first paragraph . ) Tj
            EMC                           % End of marked-content sequence 0
    /Para << /MCID 1 >>                   % Start of marked-content sequence 1
        BDC
            /F12 1 Tf
            14 0 0 14 18 570.8 Tm
            ( This is the second paragraph . It has four fairly short and concise sentences . \ This is the next
to last ) Tj
        EMC                                % End of marked-content sequence 1
    /Para << /MCID 2 >>                    % Start of marked-content sequence 2
        BDC
            1.1429 TL
            T*
            ( sentence . This is the very last sentence of the second paragraph . ) Tj
        EMC                                % End of marked-content sequence 2
    ET                                     % End of text object
endstream
endobj
300 0 obj                                    % Structure tree root
    << /Type /StructTreeRoot
        /K [ 301 0 R                             % Two children: a chapter
            304 0 R                             % and a paragraph
            ]
        /RoleMap  << /Chap /Sect                % Mapping to standard structure types
                    /Head1 /H
                    /Para /P
                    >>
        /ClassMap   << /Normal 305 0 R >>        % Class map containing one attribute class
        /ParentTree 400 0 R                      % Number tree for parent elements
        /ParentTreeNextKey 2                     % Next key to use in parent tree
        /IDTree 403 0 R                          % Name tree for element identifiers
    >>
endobj
301 0 obj                                    % Structure element for a chapter
    << /Type /StructElem
        /S /Chap
        /ID ( Chap1 )                          % Element identifier
        /T ( Chapter 1 )                       % Human-readable title
        /P 300 0 R                             % Parent is the structure tree root
        /K [ 302 0 R                           % Two children: a section head
            303 0 R                           % and a paragraph
            ]
    >>
endobj
302 0 obj                                    % Structure element for a section head
    << /Type /StructElem
        /S /Head1
        /ID ( Sec1.1 )                         % Element identifier
        /T ( Section 1.1 )                     % Human-readable title
        /P 301 0 R                             % Parent is the chapter
        /Pg 101 1 R                            % Page containing content items
        /A << /O /Layout                       % Attribute owned by Layout
                /SpaceAfter 25
                /SpaceBefore 0
                /TextIndent 12.5
            >>
        /K 0                                   % Marked-content sequence 0
    >>
endobj
303 0 obj                                    % Structure element for a paragraph
    << /Type /StructElem
        /S /Para
        /ID ( Para1 )                         % Element identifier
        /P 301 0 R                            % Parent is the chapter
        /Pg 101 1 R                           % Page containing first content item
        /C /Normal                            % Class containing this element’s attributes
        /K [ 1                                % Marked-content sequence 1
                << /Type /MCR                 % Marked-content reference to 2nd item
                    /Pg 102 0 R                % Page containing second item
                    /MCID 0                    % Marked-content sequence 0
                >>                         
            ]                         
    >>
endobj
304 0 obj                                    % Structure element for another paragraph
    << /Type /StructElem
        /S /Para
        /ID ( Para2 )                         % Element identifier
        /P 300 0 R                            % Parent is the structure tree root
        /Pg 102 0 R                           % Page containing content items
        /C /Normal                            % Class containing this element’s attributes
        /A << /O /Layout
                /TextAlign /Justify             % Overrides attribute provided by classmap
            >>
        /K [ 1 2 ]                            % Marked-content sequences 1 and 2
    >>
endobj
305 0 obj                                 % Attribute class
    << /O /Layout                            % Owned by Layout
        /EndIndent 0
        /StartIndent 0
        /WritingMode /LrTb
        /TextAlign /Start
    >>
endobj
400 0 obj                            % Parent tree
    << /Nums [
                0 401 0 R               % Parent elements for first page
                1 402 0 R               % Parent elements for second page
                ]
    >>
endobj
401 0 obj                          % Array of parent elements for first page
    [ 302 0 R                         % Parent of marked-content sequence 0
        303 0 R                         % Parent of marked-content sequence 1
    ]
endobj
402 0 obj                          % Array of parent elements for second page
    [ 303 0 R                        % Parent of marked-content sequence 0
        304 0 R                        % Parent of marked-content sequence 1
        304 0 R                        % Parent of marked-content sequence 2
    ]
endobj
403 0 obj                            % ID tree root node
    << /Kids [ 404 0 R ] >>             % Reference to leaf node
endobj
404 0 obj                            % ID tree leaf node
    << /Limits [ ( Chap1 ) ( Sec1.3 ) ]       % Least and greatest keys in tree
        /Names  [ ( Chap1 ) 301 0 R             % Mapping from element identifiers
                    ( Sec1.1 ) 302 0 R                        % to structure elements
                    ( Sec1.2 ) 303 0 R
                    ( Sec1.3 ) 304 0 R
                ]
    >>
endobj