跳转至

14.8 标签化 PDF

14.8 Tagged PDF

14.8.1 概述

14.8.1 General

标记 PDF(PDF 1.4)是一种特定风格的 PDF 用法,它建立在 14.7“逻辑结构”中描述的逻辑结构框架之上。它定义了一组标准结构类型和属性,使页面内容(文本、图形和图像)可以被提取并用于其他用途。符合要求的标记 PDF 文档应遵循本小节描述的规则。符合要求的编写工具并不强制生成标记 PDF 文档;但是,如果选择生成,则必须遵守这些规则。

注意 1

其设计目的是用于执行以下类型的操作的工具:

  • 简单地提取文本和图形,以便粘贴到其他应用程序中
  • 自动调整文本和相关图形的重新流动,以适应与原始布局假设不同的页面大小
  • 处理文本以进行搜索、索引和拼写检查等操作
  • 转换为其他常见文件格式(如 HTML、XML 和 RTF),同时保留文档结构和基本样式信息
  • 使内容可供视障用户访问(参见 14.9“无障碍支持”)

标记 PDF 文档应符合以下规则:

  • 页面内容14.8.2“标记 PDF 与页面内容”)。标记 PDF 定义了一组规则,以便在页面内容中表示文本,使字符、单词和文本顺序可以被可靠地确定。所有文本都应以可转换为 Unicode 的形式表示。单词断点应明确表示。实际内容应与版面和分页的人工制品区分开。内容的顺序应与其在页面上的外观顺序相关,并由符合要求的编写工具确定。
  • 基本布局模型14.8.3“基本布局模型”)。一组描述页面上结构元素排列的规则。
  • 结构类型14.8.4“标准结构类型”)。一组标准结构类型,用于定义结构元素的含义,例如段落、标题、文章和表格。
  • 结构属性14.8.5“标准结构属性”)。标准结构属性用于保留符合要求的编写工具在页面上布局内容时所使用的样式信息。

标记 PDF 文档还应包含一个标记信息字典(参见表 321),其 Marked 条目值应为 true

注意 2

标记 PDF 定义的类型和属性旨在提供一组标准的回退角色和最低保证属性,以使符合要求的阅读工具能够执行前述操作。符合要求的编写工具可以自由定义额外的结构类型,只要它们也提供到最接近的标准类型的角色映射,如 14.7.3“结构类型”中所述。同样,符合要求的编写工具可以使用任何可用的扩展机制定义额外的结构属性。

Tagged PDF (PDF 1.4) is a stylized use of PDF that builds on the logical structure framework described in 14.7, “Logical Structure.” It defines a set of standard structure types and attributes that allow page content (text, graphics, and images) to be extracted and reused for other purposes. A tagged PDF document is one that conforms to the rules described in this sub-clause. A conforming writer is not required to produce tagged PDF documents; however, if it does, it shall conform to these rules.

NOTE 1

It is intended for use by tools that perform the following types of operations:

  • Simple extraction of text and graphics for pasting into other applications
  • Automatic reflow of text and associated graphics to fit a page of a different size than was assumed for the original layout
  • Processing text for such purposes as searching, indexing, and spell-checking
  • Conversion to other common file formats (such as HTML, XML, and RTF) with document structure and basic styling information preserved
  • Making content accessible to users with visual impairments (see 14.9, “Accessibility Support”)

A tagged PDF document shall conform to the following rules:

  • Page content (14.8.2, “Tagged PDF and Page Content”). Tagged PDF defines a set of rules for representing text in the page content so that characters, words, and text order can be determined reliably. All text shall be represented in a form that can be converted to Unicode. Word breaks shall be represented explicitly. Actual content shall be distinguished from artifacts of layout and pagination. Content shall be given in an order related to its appearance on the page, as determined by the conforming writer.
  • A basic layout model (14.8.3, “Basic Layout Model”). A set of rules for describing the arrangement of structure elements on the page.
  • Structure types (14.8.4, “Standard Structure Types”). A set of standard structure types define the meaning of structure elements, such as paragraphs, headings, articles, and tables.
  • Structure attributes (14.8.5, “Standard Structure Attributes”). Standard structure attributes preserve styling information used by the conforming writer in laying out content on the page.

A Tagged PDF document shall also contain a mark information dictionary (see Table 321) with a value of true for the Marked entry.

NOTE 2

The types and attributes defined for Tagged PDF are intended to provide a set of standard fallback roles and minimum guaranteed attributes to enable conforming readers to perform operations such as those mentioned previously. Conforming writers are free to define additional structure types as long as they also provide a role mapping to the nearest equivalent standard types, as described in 14.7.3, “Structure Types.” Likewise, conforming writers can define additional structure attributes using any of the available extension mechanisms.

14.8.2 标签化 PDF 与页面内容

14.8.2 Tagged PDF and Page Content

14.8.2.1 概述

14.8.2.1 General

与所有 PDF 文档一样,标记 PDF 文档由一系列独立的页面组成,每个页面应由一个或多个页面内容流描述(包括任何附属流,如表单 XObject 和注释外观)。标记 PDF 进一步定义了一些规则,以组织和标记内容流,使其能够提供额外的信息:

  • 区分作者的原始内容与版面布局过程中产生的人工制品(参见 14.8.2.2“真实内容与人工制品”)。
  • 指定内容顺序,以指导符合要求的阅读工具在重新流式化页面内容时的布局过程(参见 14.8.2.3“页面内容顺序”)。
  • 以可明确推导 Unicode 表示和字体特征信息的形式表示文本(参见 14.8.2.4“字符属性提取”)。
  • 明确表示单词断点(参见 14.8.2.5“识别单词断点”)。
  • 使用标记信息,使文本可供视障用户访问(参见 14.9“无障碍支持”)。

Like all PDF documents, a Tagged PDF document consists of a sequence of self-contained pages, each of which shall be described by one or more page content streams (including any subsidiary streams such as form XObjects and annotation appearances). Tagged PDF defines some further rules for organizing and marking content streams so that additional information can be derived from them:

  • Distinguishing between the author’s original content and artifacts of the layout process (see 14.8.2.2, “Real Content and Artifacts”).
  • Specifying a content order to guide the layout process if the conforming reader reflows the page content (see 14.8.2.3, “Page Content Order”).
  • Representing text in a form from which a Unicode representation and information about font characteristics can be unambiguously derived (see 14.8.2.4, “Extraction of Character Properties”).
  • Representing word breaks unambiguously (see 14.8.2.5, “Identifying Word Breaks”).
  • Marking text with information for making it accessible to users with visual impairments (see 14.9, “Accessibility Support”).

14.8.2.2 真实内容与人工制品

14.8.2.2 Real Content and Artifacts

14.8.2.2.1 概述

14.8.2.2.1 General

文档中的图形对象可分为两类:

  • 文档的真实内容 包括代表文档作者最初引入的材料的对象。
  • 人工制品(Artifacts) 是指不属于作者原始内容的图形对象,而是在符合要求的写入过程中,由于分页、布局或其他纯机械流程生成的对象。

注意

人工制品还可用于描述文档中作者使用图形背景以增强视觉体验的区域。在这种情况下,该背景对于理解内容并非必需。

文档的逻辑结构包含所有构成真实内容的图形对象,并描述这些对象之间的关系。它不包括单纯作为版面布局和生产过程副产品的图形对象。

文档的真实内容不仅包括页面内容流和附属的表单 XObject,还包括符合以下所有条件的关联注释(Annotation):

  • 注释具有包含正常(N)外观的外观流(参见 12.5.5“外观流”)。
  • 注释的Hidden 标志未被设置(参见 12.5.3“注释标志”)。
  • 注释包含在文档的逻辑结构中(参见 14.7“逻辑结构”)。

The graphics objects in a document can be divided into two classes:

  • The real content of a document comprises objects representing material originally introduced by the document’s author.
  • Artifacts are graphics objects that are not part of the author’s original content but rather are generated by the conforming writer in the course of pagination, layout, or other strictly mechanical processes.

NOTE

Artifacts may also be used to describe areas of the document where the author uses a graphical background, with the goal of enhancing the visual experience. In such a case, the background is not required for understanding the content.

The document’s logical structure encompasses all graphics objects making up the real content and describes how those objects relate to one another. It does not include graphics objects that are mere artifacts of the layout and production process.

A document’s real content includes not only the page content stream and subsidiary form XObjects but also associated annotations that meet all of the following conditions:

  • The annotation has an appearance stream (see 12.5.5, “Appearance Streams”) containing a normal (N) appearance.
  • The annotation’s Hidden flag (see 12.5.3, “Annotation Flags”) is not set.
  • The annotation is included in the document’s logical structure (see 14.7, “Logical Structure”).

14.8.2.2.2 人工制品的规范

14.8.2.2.2 Specification of Artifacts

应将工件置于带有标签 Artifact 的标记内容序列中,以明确将其与真实内容区分开来:

/Artifact                /Artifact propertyList
    BMC                       BDC
    …           or            …
    EMC                       EMC

第一种形式应用于标识通用人工制品(artifact);第二种形式适用于具有关联属性列表的人工制品。表 330 展示了可以包含在此类属性列表中的属性。

注意 1

为了帮助文本重排(reflow),应尽可能使用带属性列表的人工制品。缺少指定边界框(bounding box)的人工制品在重排过程中可能会被丢弃。

表 330 – 人工制品的属性列表条目
键(Key) 类型(Type) 值(Value)
Type name (可选)该属性列表所描述的人工制品类型;如果存在,则应为 Pagination(分页)、Layout(布局)、Page(页面)或(PDF 1.7)Background(背景)之一。
BBox rectangle (可选;对于背景人工制品为必填)一个包含四个数字的数组,使用默认用户空间单位,分别给出人工制品边界框的左、下、右和上边缘的坐标(即完全包围其可见范围的矩形)。
Attached array (可选;仅适用于分页和全页面背景人工制品)一个名称对象数组,包含 Top(顶部)、Bottom(底部)、Left(左侧)和 Right(右侧)中的一个到四个名称,指定人工制品在逻辑上附加到的页面边缘(如果有)。页面边缘由页面的裁剪框(crop box)定义(参见 14.11.2,“页面边界”)。数组中的名称顺序无关紧要。
同时包含 LeftRight 或同时包含 TopBottom 分别表示全宽或全高的人工制品。
对于背景人工制品,此条目仅适用于全页面人工制品。 非全页面的背景人工制品,其尺寸应由其父结构元素决定。
Subtype name (可选;PDF 1.7)人工制品的子类型。此条目仅在 Type 条目值为 Pagination 时出现。标准值包括 Header(页眉)、Footer(页脚)和 Watermark(水印)。可为此条目指定其他值,但必须符合 附录 E 中描述的命名约定。

Type 条目可以指定以下类型的人工制品:

  • 分页人工制品(Pagination artifacts):辅助页面元素,例如页眉和页码。
  • 布局人工制品(Layout artifacts):纯粹的排版或设计元素,例如脚注分隔线或背景屏幕。
  • 页面人工制品(Page artifacts):与文档本身无关的生产辅助标记,例如裁切标记和色标。
  • 背景人工制品(Background artifacts):贯穿整个页面长度、宽度或结构元素整个范围的图像、图案或色块。背景人工制品通常作为内容的背景,显示在内容下方或相邻位置。

背景人工制品可以进一步归类为用于增强用户体验的视觉内容,它位于实际内容之下,除了保持视觉一致性外,不是必需的。

注意 2

例如,正文文本下的彩色背景、图案、混合色或图像。对于黑底白字的文本,黑色背景对于阅读白色文本是必不可少的;但黑色背景本身仅用于增强视觉体验。
然而,“草稿”或其他标识性水印被归类为分页人工制品,因为它并不用于增强视觉体验,而是作为一个贯穿整个文档的标记。
进一步的例子是,图(Figure)不同于背景人工制品,因为如果从图中移除图形对象,将会影响对该图作为一个整体的理解。

  • 符合要求的标记 PDF 阅读器可能对哪些页面内容是相关的有自己的判断。例如,文本转语音引擎在翻页时可能不应朗读页眉或页码。通常,符合要求的阅读器可以执行以下操作:
  • 忽略某些页面内容(例如特定类型的人工制品),如果它们不重要
  • 将某些页面元素视为终端(terminals),即不再进一步分析(例如,为重排目的将插图视为一个整体)
  • 用替代文本替换某个元素(参见 14.9.3,“替代描述”)

注意 3

根据不同目标,符合要求的阅读器可能会做出不同的决定。标记 PDF 的目的并不是规定符合要求的阅读器应当如何操作,而是提供足够的声明性和描述性信息,使其能够做出适当的处理决策。

为了支持符合要求的阅读器向残障用户提供无障碍访问,标记 PDF 文档应使用自然语言规范(Lang)替代描述(Alt)替换文本(ActualText)缩写扩展文本(E)功能(参见 14.9,“无障碍支持”)。

An artifact shall be explicitly distinguished from real content by enclosing it in a marked-content sequence with the tag Artifact:

/Artifact                /Artifact propertyList
    BMC                       BDC
    …           or            …
    EMC                       EMC

The first form shall be used to identify a generic artifact; the second shall be used for those that have an associated property list. Table 330 shows the properties that can be included in such a property list.

NOTE 1

To aid in text reflow, artifacts should be defined with property lists whenever possible. Artifacts lacking a specified bounding box are likely to be discarded during reflow.

Table 330 – Property list entries for artifacts
Key Type Value
Type name (Optional) The type of artifact that this property list describes; if present, shall be one of the names Pagination, Layout, Page, or (PDF 1.7) Background.
BBox rectangle (Optional; required for background artifacts) An array of four numbers in default user space units giving the coordinates of the left, bottom, right, and top edges, respectively, of the artifact’s bounding box (the rectangle that completely encloses its visible extent).
Attached array (Optional; pagination and full-page background artifacts only) An array of name objects containing one to four of the names Top, Bottom, Left, and Right, specifying the edges of the page, if any, to which the artifact is logically attached. Page edges shall be defined by the page’s crop box (see 14.11.2, “Page Boundaries”). The ordering of names within the array is immaterial. Including both Left and Right or both Top and Bottom indicates a full-width or full-height artifact, respectively.
Use of this entry for background artifacts shall be limited to full-page artifacts. Background artifacts that are not full-page take their dimensions from their parent structural element.
Subtype name (Optional; PDF 1.7) The subtype of the artifact. This entry should appear only when the Type entry has a value of Pagination. Standard values are Header, Footer, and Watermark. Additional values may be specified for this entry, provided they comply with the naming conventions described in Annex E.

The following types of artifacts can be specified by the Type entry:

  • Pagination artifacts. Ancillary page features such as running heads and folios (page numbers).
  • Layout artifacts. Purely cosmetic typographical or design elements such as footnote rules or background screens.
  • Page artifacts. Production aids extraneous to the document itself, such as cut marks and colour bars.
  • Background artifacts. Images, patterns or coloured blocks that either run the entire length and/or width of the page or the entire dimensions of a structural element. Background artifacts typically serve as a background for content shown either on top of or placed adjacent to that background.

A background artifact can further be classified as visual content that serves to enhance the user experience, that lies under the actual content, and that is not required except to retain visual fidelity.

NOTE 2

Examples of this include a coloured background, pattern, blend, or image that resides under main body text. In the case of white text on a black background, the black background is absolutely necessary to be able to read the white text; however, the background itself is merely there to enhance the visual experience. However, a draft or other identifying watermark is classified as a pagination artifact because it does not serve to enhance the experience; rather, it serves as a running artifact typically used on every page in the document. As a further example, a Figure differs from a background artifact in that removal of the graphics objects from a Figure would detract from the overall contextual understanding of the Figure as an entity.

  • Tagged conforming readers may have their own ideas about what page content to consider relevant. A text-to-speech engine, for instance, probably should not speak running heads or page numbers when the page is turned. In general, conforming readers can do any of the following:
  • Disregard elements of page content (for example, specific types of artifacts) that are not of interest
  • Treat some page elements as terminals that are not to be examined further (for example, to treat an illustration as a unit for reflow purposes)
  • Replace an element with alternate text (see 14.9.3, “Alternate Descriptions”)

NOTE 3

Depending on their goals, different conforming readers can make different decisions in this regard. The purpose of Tagged PDF is not to prescribe what the conforming reader should do, but to provide sufficient declarative and descriptive information to allow it to make appropriate choices about how to process the content.

To support conforming readers in providing accessibility to users with disabilities, Tagged PDF documents should use the natural language specification (Lang), alternate description (Alt), replacement text (ActualText), and abbreviation expansion text (E) facilities described in 14.9, “Accessibility Support.”

14.8.2.2.3 偶然人工制品

14.8.2.2.3 Incidental Artifacts

除了明确标记为人工制品(artifacts)并从文档的逻辑结构中排除的对象外,页面的连续文本可能还包含一些逻辑上不属于文档真实内容的元素和关系,而仅仅是将内容布局到文档中时产生的附带结果。这些元素可能包括以下内容:

  • 连字(Hyphenation)
    由文本布局引入的人工制品之一是连字符,用于表示单词在行末的意外分割。在标记 PDF(Tagged PDF)中,这种附带的单词分割应当使用软连字符(soft hyphen)表示,该字符在 Unicode 映射算法(参见 14.8.2.4,“字符属性提取”中的 “标记 PDF 中的 Unicode 映射”)中对应的 Unicode 值为 U+00AD
    (该字符不同于普通的硬连字符(hard hyphen),后者的 Unicode 值为 U+002D。)
    标记 PDF 的生成方应明确区分软连字符和硬连字符,以免阅读器需要猜测某个字符的类型。

注意 1

在某些语言中,情况会更加复杂:可能存在多个不同的连字符,并且连字可能会改变单词的拼写。请参阅 14.9.4,“替换文本”中的示例。

  • 文本不连续性(Text discontinuities)
    页面中的连续文本(按照页面内容顺序表示,参见 14.8.2.3,“页面内容顺序”)可能会在某些位置发生文本进程的非正常中断。符合要求的阅读器可以通过检查文档的逻辑结构来识别此类不连续性。

注意 2

例如,一个页面可能包含两篇独立文章的开头(参见 12.4.3,“文章”),并且这两篇文章的剩余部分在文档的后续页面继续出现。在页面上,第一篇文章的最后几个单词不应与第二篇文章的开头单词连在一起。

  • 隐藏页面元素(Hidden page elements) 由于多种原因,文档逻辑内容中的某些元素可能在页面上不可见,例如:
  • 它们可能被裁剪(clipped)
  • 它们的颜色可能与背景颜色相同
  • 它们可能被其他重叠对象覆盖

对于标记 PDF,页面内容应被视为包括所有文本和插图的完整内容,无论这些内容在文档显示或打印时是否可见。

注意 3

例如,以前不可见的元素在页面重排(reflow)时可能会变得可见,或者文本转语音(text-to-speech)引擎可能会朗读肉眼不可见的文本。

In addition to objects that are explicitly marked as artifacts and excluded from the document’s logical structure, the running text of a page may contain other elements and relationships that are not logically part of the document’s real content, but merely incidental results of the process of laying out that content into a document. They may include the following elements:

  • Hyphenation. Among the artifacts introduced by text layout is the hyphen marking the incidental division of a word at the end of a line. In Tagged PDF, such an incidental word division shall be represented by a soft hyphen character, which the Unicode mapping algorithm (see “Unicode Mapping in Tagged PDF” in 14.8.2.4, “Extraction of Character Properties”) translates to the Unicode value U+00AD. (This character is distinct from an ordinary hard hyphen, whose Unicode value is U+002D.) The producer of a Tagged PDF document shall distinguish explicitly between soft and hard hyphens so that the consumer does not have to guess which type a given character represents.

NOTE 1

In some languages, the situation is more complicated: there may be multiple hyphen characters, and hyphenation may change the spelling of words. See the Example in 14.9.4, “Replacement Text.”

  • Text discontinuities. The running text of a page, as expressed in page content order (see 14.8.2.3, “Page Content Order”), may contain places where the normal progression of text suffers a discontinuity. Conforming readers may recognize such discontinuities by examining the document’s logical structure.

NOTE 2

For example, the page may contain the beginnings of two separate articles (see 12.4.3, “Articles”), each of which is continued onto a later page of the document. The last words of the first article appearing on the page should not be run together with the first words of the second article.

  • Hidden page elements. For a variety of reasons, elements of a document’s logical content may be invisible on the page: they may be clipped, their colour may match the background, or they may be obscured by other, overlapping objects. For the purposes of Tagged PDF, page content shall be considered to include all text and illustrations in their entirety, regardless of whether they are visible when the document is displayed or printed.

NOTE 3

For example, formerly invisible elements may become visible when a page is reflowed, or a text-to-speech engine may choose to speak text that is not visible to a sighted reader.

14.8.2.3 页面内容顺序

14.8.2.3 Page Content Order

14.8.2.3.1 概述

14.8.2.3.1 General

逐页处理页面内容时,一些符合 Tagged PDF 规范的阅读器可能会选择按照页面内容顺序(page content order)处理元素,而不是按照逻辑结构顺序(logical structure order)处理。
页面内容顺序由页面内容流(content stream)中图形对象的排列顺序以及文本对象字符的排列顺序决定,而逻辑结构顺序则是通过深度优先遍历(depth-first traversal)页面的逻辑结构层次(logical structure hierarchy)确定的。

这两种顺序在逻辑上是独立的,可能会相同,也可能不同。
特别是,页面中包含的任何人工制品(artifacts)必须包含在页面内容顺序中,但不应包含在逻辑结构顺序中,因为它们不属于文档的逻辑结构
符合规范的生成方(conforming writer) 需要为每个页面建立合适的页面内容顺序,并为整个文档建立合适的逻辑结构层次


由于页面内容顺序的主要要求是在重排(reflow)时保持正确的阅读顺序,因此通常(对于西方书写系统)应从上到下(在多栏布局中,还应从列到列)进行排列,并且人工制品应保持正确的相对位置
通常,页面上属于同一篇文章的所有部分应当尽可能保持在一起,即使这篇文章的内容分散在不同的位置。

  • 插图(illustrations)或脚注(footnotes) 可以与正文交错出现,
    也可以出现在文章内容的结尾
    或者(对于脚注)出现在整个页面逻辑内容的末尾

在某些情况下,符合规范的生成方(conforming writer)可能无法确定部分内容正确页面内容顺序
在这种情况下,可以使用 TagSuspect(PDF 1.6 引入)。
符合规范的生成方应使用标记内容(marked content)(参见 14.6,“标记内容”)并使用TagSuspect 标签来标识可疑内容,如下示例所示。
标记内容应当具有属性字典(properties dictionary),其中包含一个名称为 TagSuspect、值为 Ordering 的条目,表示该部分内容的排序不符合 Tagged PDF 规范

注意

例如,当内容从其他应用程序提取时,或者文本输出存在歧义缺失信息时,可能会发生这种情况。

示例

/TagSuspect <</TagSuspect /Ordering>>
    BDC
    ....            % 存在问题的页面内容
    EMC

包含 TagSuspect 的文档必须标记信息字典(mark information dictionary)(参见 表 321)中包含Suspects 条目,并且该条目的值必须为 true

When dealing with material on a page-by-page basis, some Tagged PDF conforming readers may choose to process elements in page content order, determined by the sequencing of graphics objects within a page’s content stream and of characters within a text object, rather than in the logical structure order defined by a depth-first traversal of the page’s logical structure hierarchy. The two orderings are logically distinct and may or may not coincide. In particular, any artifacts the page may contain shall be included in the page content order but not in the logical structure order, since they are not considered part of the document’s logical structure. The conforming writer is responsible for establishing both an appropriate page content order for each page and an appropriate logical structure hierarchy for the entire document.

Because the primary requirement for page content order is to enable reflow to maintain elements in proper reading sequence, it should normally (for Western writing systems) proceed from top to bottom (and, in a multiple-column layout, from column to column), with artifacts in their correct relative places. In general, all parts of an article that appear on a given page should be kept together, even if the article flows to scattered locations on the page. Illustrations or footnotes may be interspersed with the text of the associated article or may appear at the end of its content (or, in the case of footnotes, at the end of the entire page’s logical content).

In some situations, conforming writer may be unable to determine correct page content order for part of a document’s contents. In such cases, tag suspects (PDF 1.6) can be used. The conforming writer shall identify suspect content by using marked content (see 14.6, “Marked Content”) with a tag of TagSuspect, as shown in next Example. The marked content shall have a properties dictionary with an entry whose name is TagSuspect and whose value is Ordering, which indicates that the ordering of the enclosed marked content does not meet Tagged PDF specifications.

NOTE

This can occur, for example, if content was extracted from another application, or if there are ambiguities or missing information in text output.

EXAMPLE

/TagSuspect <</TagSuspect /Ordering>>
    BDC
    ....            % Problem page contents
    EMC

Documents containing tag suspects shall contain a Suspects entry with a value of true in the mark information dictionary (see Table 321).

14.8.2.3.2 注释的排序

14.8.2.3.2 Sequencing of Annotations

页面关联的注释(annotations) 并不会直接嵌入页面的内容流(content stream)中,而是存放在页面对象(page object)Annots 数组中(参见 7.7.3.3,“页面对象”)。因此,注释在页面内容顺序中的正确位置 无法直接从内容流中确定,而应从文档的逻辑结构中推断


页面内容(即标记内容序列(marked-content sequences))和注释(annotations)
都可以作为内容项(content items),并从结构元素(structure elements)进行引用(参见 14.7.4,“结构内容”)。

  • Annot(PDF 1.5)LinkForm 类型的结构元素(参见 14.8.4.4,“内联级结构元素” 和 14.8.4.5,“插图元素”)
    通过显式方式指定了标记内容序列对应注释之间的关联

  • 在其他情况下,若某个 对应于注释的结构元素逻辑结构顺序紧邻(前/后)另一个对应于标记内容序列的结构元素,则该注释在 页面内容顺序 中被视为 位于该标记内容序列的前/后


注意

在必要时,符合规范的生成方(conforming writer)可以引入一个空的标记内容序列
仅用于作为结构元素,以便在页面内容顺序中正确定位相邻的注释

Annotations associated with a page are not interleaved within the page’s content stream but shall be placed in the Annots array in its page object (see 7.7.3.3, “Page Objects”). Consequently, the correct position of an annotation in the page content order is not readily apparent but shall be determined from the document’s logical structure.

Both page content (marked-content sequences) and annotations may be treated as content items that are referenced from structure elements (see 14.7.4, “Structure Content”). Structure elements of type Annot (PDF 1.5), Link, or Form (see 14.8.4.4, “Inline-Level Structure Elements,” and 14.8.4.5, “Illustration Elements”) explicitly specify the association between a marked-content sequence and a corresponding annotation. In other cases, if the structure element corresponding to an annotation immediately precedes or follows (in the logical structure order) a structure element corresponding to a marked-content sequence, the annotation is considered to precede or follow the marked-content sequence, respectively, in the page content order.

NOTE

If necessary, a conforming writer may introduce an empty marked-content sequence solely to serve as a structure element for the purpose of positioning adjacent annotations in the page content order.

14.8.2.3.3 逆序显示字符串

14.8.2.3.3 Reverse-Order Show Strings

注意 1

从右向左书写的文字系统(例如阿拉伯语希伯来语)中,
可能会期望字体中的字形(glyphs)原点位于右下角
并且它们的宽度(向右的水平位移)应为负值

但由于各种技术和历史原因,许多此类字体仍然遵循西方文字系统惯例
字形原点位于左下角宽度为正值,如 图 39 所示。

因此,要在这些从右到左的书写系统正确显示文本,需要采用以下两种方法之一:

  • 逐个字形定位(这既繁琐成本高)。
  • 使用 show 字符串(参见 9.2,“字体的组织与使用”),但字符代码按相反顺序排列

当使用第二种方法时,字符代码在 show 字符串中的顺序
与它们在页面内容顺序中的顺序相反


标记内容标签(marked-content tag) ReversedChars
用于通知符合规范的阅读器(conforming reader):
标记内容序列中的 show 字符串所包含的字符顺序页面内容顺序相反

如果该标记内容序列包含多个 show 字符串
那么仅会反转每个 show 字符串中的字符
而这些字符串本身仍保持自然的阅读顺序


示例

下面的内容:

/ReversedChars
    BMC
        ( olleH ) Tj
        −200 0 Td
        ( . dlrow ) Tj
    EMC

表示的文本为:

Hello world .

show 字符串可以在开头或结尾包含 SPACE(U+0020)指示单词分隔符
(参见 14.8.2.5,“识别单词分隔符”),但其内部不得包含 SPACE


注意 2

这一限制并不会带来严重影响
因为 SPACE 提供了重新调整排版的机会,同时不会产生可见效果
另外,它还有助于限制文本处理工具(word-processing conforming readers)
执行字符顺序反转的范围

NOTE 1

In writing systems that are read from right to left (such as Arabic or Hebrew), one might expect that the glyphs in a font would have their origins at the lower right and their widths (rightward horizontal displacements) specified as negative. For various technical and historical reasons, however, many such fonts follow the same conventions as those designed for Western writing systems, with glyph origins at the lower left and positive widths, as shown in Figure 39. Consequently, showing text in such right-to-left writing systems requires either positioning each glyph individually (which is tedious and costly) or representing text with show strings (see 9.2, “Organization and Use of Fonts”) whose character codes are given in reverse order. When the latter method is used, the character codes’ correct page content order is the reverse of their order within the show string.

The marked-content tag ReversedChars informs the conforming reader that show strings within a marked-content sequence contain characters in the reverse of page content order. If the sequence encompasses multiple show strings, only the individual characters within each string shall be reversed; the strings themselves shall be in natural reading order.

EXAMPLE

The sequence

/ReversedChars
    BMC
        ( olleH ) Tj
        −200 0 Td
        ( . dlrow ) Tj
    EMC

represents the text

Hello world .

The show strings may have a SPACE (U+0020) character at the beginning or end to indicate a word break (see 14.8.2.5, “Identifying Word Breaks”) but shall not contain interior SPACEs.

NOTE 2

This limitation is not serious, since a SPACE provides an opportunity to realign the typography without visible effect, and it serves the valuable purpose of limiting the scope of reversals for word-processing conforming readers.

14.8.2.4 字符属性的提取

14.8.2.4 Extraction of Character Properties

14.8.2.4.1 概述

14.8.2.4.1 General

Tagged PDF 允许字符代码(character codes)
无歧义地 转换为Unicode 值
以表示文本的信息内容

针对该转换,存在多种方法
一个 Tagged PDF 文档至少应符合其中一种方法
(参见 14.8.2.4,“提取字符属性” 中的“Tagged PDF 中的 Unicode 映射”)。

此外,Tagged PDF 还允许推导出相关字体的一些特性
(参见 14.8.2.4,“提取字符属性”中的“字体特性”)。


注意

这些Unicode 值字体特性可用于以下操作:
- 剪切和粘贴编辑(cut-and-paste editing)。 - 文本搜索(searching)。 - 文本转语音转换(text-to-speech conversion)。 - 导出至其他应用程序或文件格式(exporting to other applications or file formats)。

Tagged PDF enables character codes to be unambiguously converted to Unicode values representing the information content of the text. There are several methods for doing this; a Tagged PDF document shall conform to at least one of them (see “Unicode Mapping in Tagged PDF” in 14.8.2.4, “Extraction of Character Properties”). In addition, Tagged PDF enables some characteristics of the associated fonts to be deduced (see “Font Characteristics” in 14.8.2.4, “Extraction of Character Properties”).

NOTE

These Unicode values and font characteristics can then be used for such operations as cut-and-paste editing, searching, text-to-speech conversion, and exporting to other applications or file formats.

14.8.2.4.2 标签化 PDF 中的 Unicode 映射

14.8.2.4.2 Unicode Mapping in Tagged PDF

Tagged PDF 要求文档中的每个字符代码(character code)
都能够映射相应的 Unicode 值


注意 1

Unicode 为世界上大多数语言和书写系统的字符定义了标量值(scalar values)
并提供了一个专用使用区(private use area),用于存放特定应用的字符
有关 Unicode 的详细信息可参考 Unicode 联盟(Unicode Consortium) 发布的
《Unicode 标准》(Unicode Standard)(详见参考文献)。


字符代码到 Unicode 值的映射方法详见 9.10.2
字符代码到 Unicode 值的映射”。
合规写入器(conforming writer)
应确保 PDF 文件包含足够的信息,
以便按照该部分所述的方法之一将所有字符代码映射到 Unicode


注意 2

结构元素字典(structure element dictionary)
标记内容属性列表(marked-content property list)中,
AltActualTextE 条目(entry)
可能会影响某些合规阅读器(conforming readers)实际使用的字符流。

例如,一些合规阅读器可能会选择使用 AltActualText 的值,
而忽略 该结构元素及其子元素关联的所有文本和其他内容

(参见 14.9.3,“替代描述”,
14.9.4,“替换文本”,
14.9.5,“缩写和首字母缩略词的扩展”)


注意 3

某些 Tagged PDF 的应用场景需要使用并非所有字体都提供的字符
例如软连字符(soft hyphen)(参见 14.8.2.2.3,“附带的非内容元素”)。

对于此类字符,可通过以下方式表示:

  • 将字符添加到字体的编码(encoding)或 CMap
    并使用 ToUnicode 条目将其映射到适当的 Unicode 值
  • 在相关结构元素的 ActualText 条目中提供替代字符

Tagged PDF requires that every character code in a document can be mapped to a corresponding Unicode value.

NOTE 1

Unicode defines scalar values for most of the characters used in the world’s languages and writing systems, as well as providing a private use area for application-specific characters. Information about Unicode can be found in the Unicode Standard, by the Unicode Consortium (see the Bibliography).

The methods for mapping a character code to a Unicode value are described in 9.10.2, “Mapping Character Codes to Unicode Values.” A conforming writer shall ensure that the PDF file contains enough information to map all character codes to Unicode by one of the methods described there.

NOTE 2

An Alt, ActualText, or E entry specified in a structure element dictionary or a marked-content property list (see 14.9.3, “Alternate Descriptions,” 14.9.4, “Replacement Text,” and 14.9.5, “Expansion of Abbreviations and Acronyms”) may affect the character stream that some conforming readers actually use. For example, some conforming readers may choose to use the Alt or ActualText value and ignore all text and other content associated with the structure element and its descendants.

NOTE 3

Some uses of Tagged PDF require characters that may not be available in all fonts, such as the soft hyphen (see 14.8.2.2.3, “Incidental Artifacts”). Such characters may be represented either by adding them to the font’s encoding or CMap and using ToUnicode to map them to appropriate Unicode values, or by using an ActualText entry in the associated structure element to provide substitute characters.

14.8.2.4.3 字体特征

14.8.2.4.3 Font Characteristics

除了 Unicode 值之外,
内容流(content stream)中的每个字符代码(character code)
还具有一组关联的字体特性(font characteristics)

这些特性不会在 PDF 文件中显式指定
而是由合规阅读器(conforming reader)
根据字符显示时文本状态(text state)中设置的字体描述符(font descriptor)推导得出。


注意

这些字体特性在导出文本到其他应用程序或文件格式(其可用字体范围有限)时非常有用。


表 331 列出了一个常见的字体特性集合,
其对应于 CSS 和 XSL 中使用的字体特性。

W3C 文档 Extensible Stylesheet Language (XSL) 1.0 提供了更多相关信息(详见参考文献)。

每个字体特性均应从字体描述符(font descriptor)Flags 条目中推导得出(见 9.8.2,“字体描述符标志”)。

表 331 – 字体特性的推导
特性(Characteristic) 类型(Type) 推导方式(Derivation)
Serifed(衬线体) boolean 取决于字体描述符(font descriptor)Flags 条目中的 Serif 标志
Proportional(比例字体) boolean 字体描述符Flags 条目中 FixedPitch 标志的补值
Italic(斜体) boolean 字体描述符Flags 条目中 Italic 标志的值
Smallcap(小型大写) boolean 字体描述符Flags 条目中 SmallCap 标志的值


表中列出的字体特性仅适用于内容流(content stream)中的显示字符串(show strings)
它们不适用于替代描述文本(Alt)替换文本(ActualText)缩写扩展文本(E)

对于标准 14 种 Type 1 字体,字体描述符可能缺失;
在这种情况下,应使用这些字体的公认默认值


PDF 1.5 版中的Tagged PDF 定义了一组更广泛的字体特性
这些特性有助于将 PDF 转换为 RTF、HTML、XML、OEB 等格式
同时提升可访问性(accessibility)表格的重排能力(reflow)
表 332 列出了这些 字体选择属性(font selector attributes)
并说明了其值的推导方式。

如果 FontFamilyFontWeightFontStretch 字段在字体描述符中缺失,
则这些值应由合规阅读器依据字体名称自行推导。








表 332 – 字体选择属性(Font selector attributes)
属性(Attribute) 描述(Description)
FontFamily 一个字符串,指定首选字体家族名称。
该值来源于字体描述符(font descriptor)中的 FontFamily 条目(参见表 122)。
GenericFontFamily 一个通用的字体分类,在 FontFamily 未找到时使用。
该值根据字体描述符Flags 条目推导如下:


Serif(衬线体)   如果 Serif 标志被设置,且 FixedPitchScript 标志均未被设置,则选择 Serif
SansSerif(无衬线体)   如果 FixedPitchScriptSerif 标志均未被设置,则选择 SansSerif
Cursive(手写体)   如果 Script 标志被设置,且 FixedPitch 标志未被设置,则选择 Cursive
Monospace(等宽字体)   如果 FixedPitch 标志被设置,则选择 Monospace


注意: 无法推导 Decorative(装饰字体)Symbol(符号字体) 这两种值。
FontSize 字体大小:一个正数,指定字体的高度,以点(points)为单位。
该值从当前文本矩阵(text matrix)的 a、b、c、d 字段推导而来。
FontStretch 字体的拉伸值。
该值来源于字体描述符中的 FontStretch(参见表 122)。
FontStyle 字体的倾斜(italicization)属性。
如果 字体描述符Flags 字段中 Italic 标志被设置,则其值为 Italic;否则,其值为 Normal
FontVariant 字体的小型大写(small-caps)属性。
如果 字体描述符Flags 字段中 SmallCap 标志被设置,则其值为 SmallCaps;否则,其值为 Normal
FontWeight 字体的粗细(厚度)属性。
该值来源于字体描述符中的 FontWeight(参见表 122)。


注意: ForceBold 标志和 StemV 字段不应用于设置该属性。

In addition to a Unicode value, each character code in a content stream has an associated set of font characteristics. These characteristics are not specified explicitly in the PDF file. Instead, the conforming reader derives the characteristics from the font descriptor for the font that is set in the text state at the time the character is shown.

NOTE

These characteristics are useful when exporting text to another application or file format that has a limited

repertoire of available fonts.

Table 331 lists a common set of font characteristics corresponding to those used in CSS and XSL; the W3C document Extensible Stylesheet Language (XSL) 1.0 provides more information (see the Bibliography). Each of the characteristics shall be derived from information available in the font descriptor’s Flags entry (see 9.8.2, “Font Descriptor Flags”).

Table 331 – Derivation of font characteristics
Characteristic Type Derivation
Serifed boolean The value of the Serif flag in the font descriptor’s Flags entry
Proportional boolean The complement of the FixedPitch flag in the font descriptor’s Flags entry
Italic boolean The value of the Italic flag in the font descriptor’s Flags entry
Smallcap boolean The value of the SmallCap flag in the font descriptor’s Flags entry

The characteristics shown in the table apply only to character codes contained in show strings within content streams. They do not exist for alternate description text (Alt), replacement text (ActualText), or abbreviation expansion text (E).

For the standard 14 Type 1 fonts, the font descriptor may be missing; the well-known values for those fonts shall be used.

Tagged PDF in PDF 1.5 defines a wider set of font characteristics, which provide information needed when converting PDF to other files formats such as RTF, HTML, XML, and OEB, and also improve accessibility and reflow of tables. Table 332 lists these font selector attributes and shows how their values shall be derived.

If the FontFamily, FontWeight and FontStretch fields are not present in the font descriptor, these values shall be derived from the font name in a manner of the conforming reader’s choosing.

Table 332 – Font selector attributes
Attribute Description
FontFamily A string specifying the preferred font family name. Derived from the FontFamily entry in the font descriptor (see Table 122).
GenericFontFamily A general font classification, used if FontFamily is not found. Derived from the font descriptor’s Flags entry as follows:

Serif   Chosen if the Serif flag is set and the FixedPitch and Script flags are not set
SansSer   if Chosen if the FixedPitch, Script and Serif flags are all not set Cursive
Chosen   if the Script flag is set and the FixedPitch flag is not set Monospace Chosen if the FixedPitch flag is set
NOTE   The values Decorative and Symbol cannot be derived
FontSize The size of the font: a positive number specifying the height of the typeface in points. Derived from the a, b, c, and d fields of the current text matrix.
FontStretch The stretch value of the font. Derived from FontStretch in the font descriptor (see Table 122).
FontStyle The italicization value of the font. It shall be Italic if the Italic flag is set in the Flags field of the font descriptor; otherwise, it shall be Normal.
FontVariant The small-caps value of the font. It shall be SmallCaps if the SmallCap flag is set in the Flags field of the font descriptor; otherwise, it shall be Normal.
FontWeight The weight (thickness) value of the font. Derived from FontWeight in the font descriptor (see Table 122).

The ForceBold flag and the StemV field should not be used to set this attribute.

14.8.2.5 确定单词断点

14.8.2.5 Identifying Word Breaks

注意 1

文档的文本流不仅定义了页面文本中的字符,还定义了单词。与字符不同,单词的概念没有精确定义,而是依赖于文本处理的目的。换行工具需要确定在哪里可以将连续文本拆分成行;文本转语音引擎需要识别需要被朗读的单词;拼写检查器和其他应用程序也有各自的单词定义。对于标记化的 PDF 文档来说,识别文本流中的单词并不需要遵循一个单一且明确的定义来满足所有客户端的需求。重要的是,每个客户端应有足够的信息来自己做出这一判断。

标准兼容的阅读器可以通过顺序检查 Unicode 字符流(可能还包括由 ActualText(参见14.9.4,替换文本)指定的替换文本)来查找单词。为此,文本在纯文本表示中分隔单词的空格字符应当出现在标记化 PDF 文本表示中。

注意 2

标准兼容的阅读器不需要根据诸如字形在页面上的定位、字体变化或字形大小等信息来推测单词的分隔。

注意 3

什么构成一个单词的识别与文本如何在显示字符串中分组无关。显示字符串的划分没有语义意义。特别是,即使单词的断开正好发生在一个显示字符串的结尾,依然需要一个空格(U+0020)或其他能断开的字符。

注意 4

一些标准兼容的阅读器可能会通过简单地在每个空格字符处分隔单词来识别单词。其他阅读器可能稍微复杂一些,会将连字符或长破折号等标点符号视为单词分隔符。还有一些阅读器可能会使用类似于 Unicode 标准附录 #29(文本边界,来自 Unicode 联盟)中的算法来识别可能的换行机会(参见参考书目)。

NOTE 1

A document’s text stream defines not only the characters in a page’s text but also the words. Unlike a character, the notion of a word is not precisely defined but depends on the purpose for which the text is being processed. A reflow tool needs to determine where it can break the running text into lines; a text-to-speech engine needs to identify the words to be vocalized; spelling checkers and other applications all have their own ideas of what constitutes a word. It is not important for a Tagged PDF document to identify the words within the text stream according to a single, unambiguous definition that satisfies all of these clients. What is important is that there be enough information available for each client to make that determination for itself.

A conforming reader of a Tagged PDF document may find words by sequentially examining the Unicode character stream, perhaps augmented by replacement text specified with ActualText (see 14.9.4, “Replacement Text”). For this purpose the spacing characters that would be present to separate words in a pure text representation shall be present in the Tagged PDF representation of the text.

NOTE 2

The conforming reader does not need to guess about word breaks based on information such as glyph positioning on the page, font changes, or glyph sizes.

NOTE 3

The identification of what constitutes a word is unrelated to how the text happens to be grouped into show strings. The division into show strings has no semantic significance. In particular, a SPACE (U+0020) or other word-breaking character is still needed even if a word break happens to fall at the end of a show string.

NOTE 4

Some conforming readers may identify words by simply separating them at every SPACE character. Others may be slightly more sophisticated and treat punctuation marks such as hyphens or em dashes as word separators as well. Still others may identify possible line-break opportunities by using an algorithm similar to the one in Unicode Standard Annex #29, Text Boundaries, available from the Unicode Consortium (see the Bibliography).

14.8.3 基础布局模型

14.8.3 Basic Layout Model

基本布局模型从参考区域的概念开始。参考区域是一个矩形区域,用作放置文档内容的框架或指南。一些标准结构属性,例如 StartIndentEndIndent(参见 14.8.5.4.3,“BLSE 的布局属性”),应当从参考区域的边界测量。参考区域没有明确指定,而是根据上下文推断的。通常,感兴趣的参考区域是通用文本布局中的列区域、表格及其组件单元格的外部边界框,以及插图或其他浮动元素的边界框。

注意 1

标记化 PDF 的标准结构类型和属性应当在描述页面上结构元素排列的基本布局模型的上下文中进行解释。该模型旨在捕捉文档基础结构的一般意图,并不一定与创建文档的应用程序实际用于页面布局的模型完全一致。(PDF 内容流指定了确切的外观。)其目标是为标准兼容的阅读器提供足够的信息,以便根据自己的布局模型做出布局决策,同时尽可能保留作者应用程序的意图。

注意 2

标记化 PDF 布局模型类似于 HTML、CSS、XSL 和 RTF 等标记语言中使用的模型,但与这些模型并不完全相同。该模型故意定义得较为宽松,以允许在转换为其他文档格式时对结构元素和属性的解释提供合理的自由度。不同格式之间的布局可能会有所变化。

标准结构类型根据它们在页面布局中所起的作用,分为四大类:

  • 分组元素(参见 14.8.4.2,“分组元素”)将其他元素分组成序列或层次结构,但不直接包含内容,也不直接影响布局。
  • 块级结构元素(BLSEs)(参见 14.8.4.3,“块级结构元素”)描述页面上内容的整体布局,沿 块进展方向 进行排列。
  • 行内级结构元素(ILSEs)(参见 14.8.4.4,“行内级结构元素”)描述在 BLSE 中内容的布局,沿 行内进展方向 进行排列。
  • 插图元素(参见 14.8.4.5,“插图元素”)是按页面内容顺序排列的紧凑内容序列,视为与页面布局相关的整体对象。插图可以视为 BLSE 或 ILSE。

块进展方向行内进展方向 的含义取决于所使用的书写系统,具体由标准属性 WritingMode(参见 14.8.5.4.2,“一般布局属性”)规定。在西方书写系统中,块方向为从上到下,行内方向为从左到右。其他书写系统则使用不同的方向来布局内容。

由于进展方向会根据书写系统的不同而有所变化,页面上的区域边缘和方向是通过与进展顺序无关的术语来标识的,而不是使用诸如 上、下、左、右 这样的常见术语。块布局是从 之前之后 进行的,行内布局则是从开始到结束进行的。因此,在西方书写系统中,参考区域的之前和之后边缘分别位于顶部和底部,开始和结束边缘分别位于左侧和右侧。另一个术语是 偏移方向(上标的偏移方向),指的是与块进展方向相反的方向——即从之后到之前(在西方书写系统中,从底部到顶部)。

BLSEs 应当在参考区域内按块进展顺序堆叠。通常,第一个 BLSE 应当放置在参考区域的之前边缘。随后的 BLSEs 应当堆叠在前一个 BLSE 上,朝向之后边缘进行排列,直到没有更多的 BLSE 可以放入参考区域。如果溢出的 BLSE 允许拆分——例如一个可以在文本行之间拆分的段落——它的一部分可以包含在当前参考区域中,剩余部分则转移到下一个参考区域(无论是在同一页面的其他位置,还是在文档的另一页面)。一旦确定了能够放入参考区域的内容量,可以调整各个 BLSE 的位置,使其偏向参考区域的之前边缘、中间或之后边缘,或者调整 BLSE 内部或之间的间距,以填满参考区域的整个范围。

BLSEs 可以嵌套,子 BLSEs 可以按与参考区域内 BLSE 相同的方式堆叠在父 BLSE 中。除非在少数特定情况下(如 BlockAlign 和 InlineAlign 元素),否则这种 BLSE 嵌套不会导致参考区域的嵌套;所有级别的嵌套 BLSE 都使用一个参考区域。

在 BLSE 内部,子 ILSEs 应当被打包成行。直接内容项——那些作为 BLSE 的直接子项而不是包含在子 ILSE 内的项——应当隐式地视为 ILSEs 进行打包。每一行应当被视为合成的 BLSE,并且应当在父 BLSE 中堆叠。行可以与父区域中的其他 BLSEs 交织。这个构建行的过程类似于在参考区域内堆叠 BLSEs,只不过它是沿行内进展方向进行的,而不是块进展方向:一行应当从包含 BLSE 的开始边缘开始打包 ILSEs,直到达到结束边缘,且该行已满。溢出的 ILSE 可以在语言学确定的或显式标记的断点处进行拆分(例如,单词中的连字符断点),剩余的部分将被转移到下一行。

某些元素的 Placement 属性值会将该元素从正常的堆叠或打包过程中移除,而是允许其漂浮到指定的参考区域或父 BLSE 的边缘;有关更多讨论,请参见 14.8.5.4,“布局属性”中的“常规布局属性”。

每个 BLSE 和 ILSE(包括被隐式处理为 ILSE 的直接内容项)应当关联两个包围矩形:

  • 内容矩形应当从包含内容的形状中推导出来,并定义用于布局任何包含的子元素的边界。
  • 配置矩形包括围绕元素的任何额外边框或间距,影响它与相邻元素以及包围的内容矩形或参考区域之间的位置关系。

这些矩形的定义应当由与结构元素相关的布局属性决定;有关更多讨论,请参见 14.8.5.4.5,“内容和配置矩形”。

The basic layout model begins with the notion of a reference area. This is a rectangular region used as a frame or guide in which to place the document’s content. Some of the standard structure attributes, such as StartIndent and EndIndent (see 14.8.5.4.3, “Layout Attributes for BLSEs”), shall be measured from the boundaries of the reference area. Reference areas are not specified explicitly but are inferred from context. Those of interest are generally the column area or areas in a general text layout, the outer bounding box of a table and those of its component cells, and the bounding box of an illustration or other floating element.

NOTE 1

Tagged PDF’s standard structure types and attributes shall be interpreted in the context of a basic layout model that describes the arrangement of structure elements on the page. This model is designed to capture the general intent of the document’s underlying structure and does not necessarily correspond to the one actually used for page layout by the application creating the document. (The PDF content stream specifies the exact appearance.) The goal is to provide sufficient information for conforming readers to make their own layout decisions while preserving the authoring application’s intent as closely as their own layout models allow.

NOTE 2

The Tagged PDF layout model resembles the ones used in markup languages such as HTML, CSS, XSL, and RTF, but does not correspond exactly to any of them. The model is deliberately defined loosely to allow reasonable latitude in the interpretation of structure elements and attributes when converting to other document formats. Some degree of variation in the resulting layout from one format to another is to be expected.

The standard structure types are divided into four main categories according to the roles they play in page layout:

  • Grouping elements (see 14.8.4.2, “Grouping Elements”) group other elements into sequences or hierarchies but hold no content directly and have no direct effect on layout.
  • Block-level structure elements (BLSEs) (see 14.8.4.3, “Block-Level Structure Elements”) describe the overall layout of content on the page, proceeding in the block-progression direction.
  • Inline-level structure elements (ILSEs) (see 14.8.4.4, “Inline-Level Structure Elements”) describe the layou of content within a BLSE, proceeding in the inline-progression direction.
  • Illustration elements (see 14.8.4.5, “Illustration Elements”) are compact sequences of content, in page content order, that are considered to be unitary objects with respect to page layout. An illustration can be treated as either a BLSE or an ILSE.

The meaning of the terms block-progression direction and inline-progression direction depends on the writing system in use, as specified by the standard attribute WritingMode (see 14.8.5.4.2, “General Layout Attributes”). In Western writing systems, the block direction is from top to bottom and the inline direction is from left to right. Other writing systems use different directions for laying out content.

Because the progression directions can vary depending on the writing system, edges of areas and directions on the page are identified by terms that are neutral with respect to the progression order rather than by familiar terms such as up, down, left, and right. Block layout proceeds from before to after, inline from start to end. Thus, for example, in Western writing systems, the before and after edges of a reference area are at the top and bottom, respectively, and the start and end edges are at the left and right. Another term, shift direction (the direction of shift for a superscript), refers to the direction opposite that for block progression—that is, from after to before (in Western writing systems, from bottom to top).

BLSEs shall be stacked within a reference area in block-progression order. In general, the first BLSE shall be placed against the before edge of the reference area. Subsequent BLSEs shall be stacked against preceding ones, progressing toward the after edge, until no more BLSEs fit in the reference area. If the overflowing BLSE allows itself to be split—such as a paragraph that can be split between lines of text—a portion of it may be included in the current reference area and the remainder carried over to a subsequent reference area (either elsewhere on the same page or on another page of the document). Once the amount of content that fits in a reference area is determined, the placements of the individual BLSEs may be adjusted to bias the placement toward the before edge, the middle, or the after edge of the reference area, or the spacing within or between BLSEs may be adjusted to fill the full extent of the reference area.

BLSEs may be nested, with child BLSEs stacked within a parent BLSE in the same manner as BLSEs within a reference area. Except in a few instances noted (the BlockAlign and InlineAlign elements), such nesting of BLSEs does not result in the nesting of reference areas; a single reference area prevails for all levels of nested BLSEs.

Within a BLSE, child ILSEs shall be packed into lines. Direct content items—those that are immediate children of a BLSE rather than contained within a child ILSE—shall be implicitly treated as ILSEs for packing purposes. Each line shall be treated as a synthesized BLSE and shall be stacked within the parent BLSE. Lines may be intermingled with other BLSEs within the parent area. This line-building process is analogous to the stacking of BLSEs within a reference area, except that it proceeds in the inline-progression rather than the block- progression direction: a line shall be packed with ILSEs beginning at the start edge of the containing BLSE and continuing until the end edge shall be reached and the line is full. The overflowing ILSE may allow itself to be broken at linguistically determined or explicitly marked break points (such as hyphenation points within a word), and the remaining fragment shall be carried over to the next line.

Certain values of an element’s Placement attribute remove the element from the normal stacking or packing process and allow it instead to float to a specified edge of the enclosing reference area or parent BLSE; see “General Layout Attributes” in 14.8.5.4, “Layout Attributes,” for further discussion.

Two enclosing rectangles shall be associated with each BLSE and ILSE (including direct content items that are treated implicitly as ILSEs):

  • The content rectangle shall be derived from the shape of the enclosed content and defines the bounds used for the layout of any included child elements.
  • The allocation rectangle includes any additional borders or spacing surrounding the element, affecting how it shall be positioned with respect to adjacent elements and the enclosing content rectangle or reference area.

The definitions of these rectangles shall be determined by layout attributes associated with the structure element; see 14.8.5.4.5, “Content and Allocation Rectangles” for further discussion.

14.8.4 标准结构类型

14.8.4 Standard Structure Types

14.8.4.1 概述

14.8.4.1 General

标记化 PDF 的 标准结构类型 描述了内容元素在文档中的角色,并结合标准结构属性(见 14.8.5,“标准结构属性”)来说明该内容在页面上的布局。如 14.7.3,“结构类型” 中所讨论的,逻辑结构元素的结构类型应当通过其结构元素字典中的 S 条目来指定。为了被视为标准结构类型,该值应当是:

  • 14.8.4.2,“分组元素”中描述的标准结构类型名称之一。
  • 一个任意名称,应当通过文档的角色映射(见 14.7.3,“结构类型”)映射到其中一个标准名称,可能通过多级映射。

注意 1

从 PDF 1.5 开始,如果有角色映射,元素名称总是会映射到其相应的名称,即使原始名称是标准类型之一。这是为了允许该元素(例如)表示与标准角色相同名称的标签,即使其使用与标准角色不同。

通常,具有标准结构类型的结构元素无论是直接表达类型还是通过角色映射间接确定类型,都应当以相同的方式进行处理。然而,某些符合规范的阅读器可能会赋予非标准结构类型附加的语义,尽管角色映射将它们与标准类型关联起来。

注意 2

例如,S 条目的实际值可能在导出到标记表示(如 XML)时使用,而在转换为呈现格式(如 HTML 或 RTF)时,或用于诸如重排或为残障用户提供无障碍访问的目的时,应当使用相应的角色映射值。

注意 3

大多数标准元素类型主要设计用于文本布局;其术语反映了这一用途。然而,布局实际上可以包含任何类型的内容,例如路径或图像对象。

与结构元素相关联的内容项应当像文本块一样(对于 BLSE)或像文本行中的字符一样(对于 ILSE)在页面上进行布局。

Tagged PDF’s standard structure types characterize the role of a content element within the document and, in conjunction with the standard structure attributes (described in 14.8.5, “Standard Structure Attributes”), how that content is laid out on the page. As discussed in 14.7.3, “Structure Types,” the structure type of a logical structure element shall be specified by the S entry in its structure element dictionary. To be considered a standard structure type, this value shall be either:

  • One of the standard structure type names described in 14.8.4.2, “Grouping Elements.”
  • An arbitrary name that shall be mapped to one of the standard names by the document’s role map (see 14.7.3, “Structure Types”), possibly through multiple levels of mapping.

NOTE 1

Beginning with PDF 1.5, an element name is always mapped to its corresponding name in the role map, if there is one, even if the original name is one of the standard types. This is done to allow the element, for example, to represent a tag with the same name as a standard role, even though its use differs from the standard role.

Ordinarily, structure elements having standard structure types shall be processed the same way whether the type is expressed directly or is determined indirectly from the role map. However, some conforming readers may ascribe additional semantics to nonstandard structure types, even though the role map associates them with standard ones.

NOTE 2

For instance, the actual values of the S entries may be used when exporting to a tagged representation such as XML, and the corresponding role-mapped values shall be used when converting to presentation formats such as HTML or RTF, or for purposes such as reflow or accessibility to users with disabilities.

NOTE 3

Most of the standard element types are designed primarily for laying out text; the terminology reflects this usage. However, a layout may in fact include any type of content, such as path or image objects.

The content items associated with a structure element shall be laid out on the page as if they were blocks of text (for a BLSE) or characters within a line of text (for an ILSE).

14.8.4.2 组合元素

14.8.4.2 Grouping Elements

分组元素 应仅用于将其他结构元素分组;它们不直接与内容项关联。表 333 描述了此类别中元素的标准结构类型。H.8,“描述分层列表的结构化元素”提供了嵌套目录项的示例。

在标记化 PDF 文档中,结构树应包含一个单一的顶级元素;即,结构树的根元素(由文档目录中的 StructTreeRoot 条目标识)应在其 K(子元素)数组中仅有一个子元素。如果 PDF 文件包含完整文档,则应使用“Document”结构类型作为该顶级元素。如果文件包含格式良好的文档片段,则可以使用“Part”、“Art”、“Sect”或“Div”之一作为替代。

表 333 – 分组元素的标准结构类型
结构类型 描述
Document (文档)完整文档。这是包含多个部分或多个文章的任何结构树的根元素。
Part (部分)文档的大规模划分。这种类型的元素适用于分组文章或章节。
Art (文章)构成单一叙述或说明的相对独立的文本块。文章应当是互不重叠的;即,它们不应包含作为组成元素的其他文章。
Sect (章节)用于分组相关内容元素的容器。
注意 1   例如,一个章节可能包含一个标题、几段介绍性段落以及两个或更多作为子章节嵌套其中的其他章节。
Div (划分)通用的块级元素或元素组。
BlockQuote (块引文)由一段或多段文本组成,归属于除周围文本的作者之外的其他人。
Caption (说明文字)简短的文本段,描述表格或图形。
TOC (目录)由目录项条目(结构类型 TOCI)和/或其他嵌套目录项(TOC)组成的列表。
仅包含 TOCI 条目的目录项表示扁平的层次结构。包含其他嵌套 TOC 条目(可能还包括 TOCI 条目)的 TOC 条目表示更复杂的层次结构。理想情况下,顶级目录项的层次结构反映文档主体的结构。
注意 2   图表列表和表格列表以及参考书目,可以视为目录,作为标准结构类型的一部分。
TOCI (目录项)目录中的单个成员。此条目的子元素可以是以下任何结构类型:

Lbl   标签(见 14.8.4.3,“块级结构元素”中的“列表元素”)
Reference   引用标题和页码(见 14.8.4.4,“内联级结构元素”中的“内联级结构元素”)
NonStruct   非结构元素,用于包装领导者伪影(见 14.8.4.2,“分组元素”)。
P   描述性文本(见 “段落类元素” 14.8.4.3,“块级结构元素”)
TOC   用于层次目录的目录元素,如 TOC 条目所描述
Index (索引)包含标识文本及引用元素(结构类型 Reference;见 14.8.4.4,“内联级结构元素”)的条目的序列,指示文档主体中指定文本的出现位置。
NonStruct (非结构元素)没有固有结构意义的分组元素;仅用于分组目的。这种类型的元素与“Div”(结构类型)不同,因为它不应当被解释或导出到其他文档格式中;然而,它的后代应正常处理。
Private (私有元素)包含属于创建它的应用程序的私有内容的分组元素。这种类型元素的结构意义未指定,完全由符合规范的写入者决定。私有元素及其任何后代不应当被解释或导出到其他文档格式中。

Grouping elements shall be used solely to group other structure elements; they are not directly associated with content items. Table 333 describes the standard structure types for elements in this category. H.8, “Structured Elements That Describe Hierarchical Lists” provides an example of nested table of content items.

In a tagged PDF document, the structure tree shall contain a single top-level element; that is, the structure tree root (identified by the StructTreeRoot entry in the document catalogue) shall have only one child in its K (kids) array. If the PDF file contains a complete document, the structure type Document should be used for this top- level element in the logical structure hierarchy. If the file contains a well-formed document fragment, one of the structure types Part, Art, Sect, or Div may be used instead.

Table 333 – Standard structure types for grouping elements
Structure type Description
Document (Document) A complete document. This is the root element of any structure tree containing multiple parts or multiple articles.
Part (Part) A large-scale division of a document. This type of element is appropriate for grouping articles or sections.
Art (Article) A relatively self-contained body of text constituting a single narrative or exposition. Articles should be disjoint; that is, they should not contain other articles as constituent elements.
Sect (Section) A container for grouping related content elements.
NOTE 1   For example, a section might contain a heading, several introductory paragraphs, and two or more other sections nested within it as subsections.
Div (Division) A generic block-level element or group of elements.
BlockQuote (Block quotation) A portion of text consisting of one or more paragraphs attributed to someone other than the author of the surrounding text.
Caption (Caption) A brief portion of text describing a table or figure.
TOC (Table of contents) A list made up of table of contents item entries (structure type TOCI) and/or other nested table of contents entries (TOC).
A TOC entry that includes only TOCI entries represents a flat hierarchy. A TOC entry that includes other nested TOC entries (and possibly TOCI entries) represents a more complex hierarchy. Ideally, the hierarchy of a top level TOC entry reflects the structure of the main body of the document.
NOTE 2   Lists of figures and tables, as well as bibliographies, can be treated as tables of contents for purposes of the standard structure types.
TOCI (Table of contents item) An individual member of a table of contents. This entry’s children may be any of the following structure types:

Lbl   A label (see “List Elements” in 14.8.4.3, “Block-Level Structure Elements”)
Reference   A reference to the title and the page number (see “Inline-Level Structure Elements” in 14.8.4.4, “Inline-Level Structure Elements”)
NonStruct   Non-structure elements for wrapping a leader artifact (see “Grouping Elements” in 14.8.4.2, “Grouping Elements”).
P   Descriptive text (see “Paragraphlike Elements” 14.8.4.3, “Block-Level Structure Elements”)
TOC   Table of content elements for hierarchical tables of content, as described for the TOC entry
Index (Index) A sequence of entries containing identifying text accompanied by reference elements (structure type Reference; see 14.8.4.4, “Inline-Level Structure Elements”) that point out occurrences of the specified text in the main body of a document.
NonStruct (Nonstructural element) A grouping element having no inherent structural significance; it serves solely for grouping purposes. This type of element differs from a division (structure type Div) in that it shall not be interpreted or exported to other document formats; however, its descendants shall be processed normally.
Private (Private element) A grouping element containing private content belonging to the application producing it. The structural significance of this type of element is unspecified and shall be determined entirely by the conforming writer. Neither the Private element nor any of its descendants shall be interpreted or exported to other document formats.

14.8.4.3 区块级结构元素

14.8.4.3 Block-Level Structure Elements

14.8.4.3.1 概述

14.8.4.3.1 General

块级结构元素(BLSE)是指在块进程方向上布局的任何文本或其他内容区域,如段落、标题、列表项或脚注。如果结构元素的结构类型(在有角色映射的情况下)是表 334中列出的类型,则该结构元素为 BLSE。所有其他标准结构类型应被视为内联级结构元素(ILSE),但有以下例外:

  • TR(表格行)、TH(表格头)、TD(表格数据)、THead(表格头部)、TBody(表格主体)和 TFoot(表格底部),这些元素用于在表格中分组其他元素,既不视为 BLSE 也不视为 ILSE。
  • 具有 Placement 属性(见“常规布局属性” 14.8.5.4,“布局属性”)且不具有默认值 Inline 的元素。

表 334 – 块级结构元素
类别 结构类型
类似段落的元素 P   H1   H4
H   H2   H5
    H3   H6
列表元素 L   Lbl
LI   LBody
表格元素 Table

在许多情况下,BLSE 可能会显示为一块紧凑的连续页面内容;在其他情况下,它可能是非连续的。

注意

后者的示例包括跨页边界的 BLSE,或者在页面内容顺序中被其他嵌套的 BLSE 或直接包含的脚注所打断。必要时,符合标准的标记化阅读器可以从逻辑结构中识别出这种断开的 BLSE,并使用此信息重新组合它们并正确布局。

A block-level structure element (BLSE) is any region of text or other content that is laid out in the block- progression direction, such as a paragraph, heading, list item, or footnote. A structure element is a BLSE if its structure type (after role mapping, if any) is one of those listed in Table 334. All other standard structure types shall be treated as ILSEs, with the following exceptions:

  • TR (Table row), TH (Table header), TD (Table data), THead (Table head), TBody (Table body), and TFoot (Table footer), which shall be used to group elements within a table and shall be considered neither BLSEs nor ILSEs
  • Elements with a Placement attribute (see “General Layout Attributes” in 14.8.5.4, “Layout Attributes”) other than the default value of Inline

Table 334 – Block-level structure elements
Category Structure types
Paragraphlike elements P   H1   H4
H   H2   H5
    H3   H6
List elements L   Lbl
LI   LBody
Table element Table

In many cases, a BLSE may appear as one compact, contiguous piece of page content; in other cases, it may be discontiguous.

NOTE

Examples of the latter include a BLSE that extends across a page boundary or is interrupted in the page content order by another, nested BLSE or a directly included footnote. When necessary, Tagged conforming readers can recognize such fragmented BLSEs from the logical structure and use this information to reassemble them and properly lay them out.

14.8.4.3.2 类段落元素

14.8.4.3.2 Paragraphlike Elements

表 335 描述了由运行文本和其他内容组成的 类似段落 元素的结构类型,这些内容以常规段落的形式布局(与更专业的布局,如列表和表格相对)。

表 335 – 类似段落元素的标准结构类型
类别 结构类型
H (标题)文档内容的一个子部分的标签。它应该是它所处的分区的第一个子元素。
H1–H6 具有特定层级的标题,供那些无法层级嵌套其章节的符合标准的编写者使用,因此无法根据嵌套层级来确定标题的级别。
P (段落)文本的低级划分。

Table 335 describes structure types for paragraphlike elements that consist of running text and other content laid out in the form of conventional paragraphs (as opposed to more specialized layouts such as lists and tables).

Table 335 – Standard structure types for paragraphlike elements
Structure Type Description
H (Heading) A label for a subdivision of a document’s content. It should be the first child of the division that it heads.
H1–H6 Headings with specific levels, for use in conforming writers that cannot hierarchically nest their sections and thus cannot determine the level of a heading from its level of nesting.
P (Paragraph) A low-level division of text.

14.8.4.3.3 列表元素

14.8.4.3.3 List Elements

表 336 描述了用于组织列表内容的结构类型。H.8,“描述层次列表的结构化元素”提供了嵌套列表条目的示例。

表 336 – 列表元素的标准结构类型
结构类型 描述
L (列表)具有相似意义和重要性的项目序列。它的直接子元素应为一个可选的标题(结构类型 Caption;见14.8.4.2,“分组元素”),后跟一个或多个列表项(结构类型 LI)。
LI (列表项)列表的单个成员。它的子元素可以是一个或多个标签、列表体,或者两者(结构类型 Lbl 或 LBody)。
Lbl (标签)用来区分同一列表或其他相似项目组中的特定项目的名称或编号。
NOTE   例如,在字典列表中,它包含被定义的术语;在带项目符号或编号的列表中,它包含项目的符号字符或编号及相关标点符号。
LBody (列表体)列表项的描述内容。例如,在字典列表中,它包含术语的定义。它可以直接包含内容,或具有其他 BLSEs 作为子元素,可能包括嵌套的列表。

Table 336 describes structure types for organizing the content of lists. H.8, “Structured Elements That Describe Hierarchical Lists” provides an example of nested list entries.

Table 336 – Standard structure types for list elements
Structure Type Description
L (List) A sequence of items of like meaning and importance. Its immediate children should be an optional caption (structure type Caption; see 14.8.4.2, “Grouping Elements”) followed by one or more list items (structure type LI).
LI (List item) An individual member of a list. Its children may be one or more labels, list bodies, or both (structure types Lbl or LBody).
Lbl (Label) A name or number that distinguishes a given item from others in the same list or other group of like items. NOTE   In a dictionary list, for example, it contains the term being defined; in a bulleted or numbered list, it contains the bullet character or the number of the list item and associated punctuation.
LBody (List body) The descriptive content of a list item. In a dictionary list, for example, it contains the definition of the term. It may either contain the content directly or have other BLSEs, perhaps including nested lists, as children.

14.8.4.3.4 表格元素

14.8.4.3.4 Table Elements

表 337 描述了用于组织表格内容的结构类型。

NOTE 1

严格来说,Table 元素是一个 BLSE;该表中的其他元素既不是 BLSE 也不是 ILSE。

表 337 – 表格元素的标准结构类型
结构类型 描述
Table (表格)一个二维的矩形数据单元格布局,可能具有复杂的子结构。它包含一个或多个表格行(结构类型 TR)作为子元素;或者一个可选的表头(结构类型 THead),后跟一个或多个表格主体元素(结构类型 TBody),并且有一个可选的表格页脚(结构类型 TFoot)。此外,表格可以有一个标题(结构类型 Caption;见 14.8.4.2,“分组元素”)作为它的第一个或最后一个子元素。
TR (表格行)表格中的一行标题或数据。它可以包含表格头单元格和表格数据单元格(结构类型 TH 和 TD)。
TH (表格头单元格)一个包含描述表格一行或多列的表头文本的表格单元格。
TD (表格数据单元格)一个包含表格内容数据的表格单元格。THead(表格头行组;PDF 1.5)是构成表格头的行组。如果表格跨多个页面,这些行可以在每个表格片段的顶部重新绘制(尽管只有一个 THead 元素)。
TBody (表格主体行组;PDF 1.5)构成表格主要部分的行组。如果表格跨多个页面,主体区域可能会在行边界上拆分。一个表格可以有多个 TBody 元素,以便为一组行绘制边框或背景。
TFoot (表格页脚行组;PDF 1.5)构成表格页脚的行组。如果表格跨多个页面,这些行可以在每个表格片段的底部重新绘制(尽管只有一个 TFoot 元素)。

NOTE 2

表头与数据行和列的关联通常由应用程序通过启发式方法确定。对于复杂的表格,这种启发式方法可能会失败;表格中显示的标准属性(见表 348)可以用于明确该关联。

The structure types described in Table 337 shall be used for organizing the content of tables.

NOTE 1

Strictly speaking, the Table element is a BLSE; the others in this table are neither BLSEs or ILSEs.

Table 337 – Standard structure types for table elements
Structure Type Description
Table (Table) A two-dimensional layout of rectangular data cells, possibly having a complex substructure. It contains either one or more table rows (structure type TR) as children; or an optional table head (structure type THead) followed by one or more table body elements (structure type TBody) and an optional table footer (structure type TFoot). In addition, a table may have a caption (structure type Caption; see 14.8.4.2, “Grouping Elements”) as its first or last child.
TR (Table row) A row of headings or data in a table. It may contain table header cells and table data cells (structure types TH and TD).
TH (Table header cell) A table cell containing header text describing one or more rows or columns of the table.
TD (Table data cell) A table cell containing data that is part of the table’s content. THead(Table header row group; PDF 1.5) A group of rows that constitute the header of a table. If the table is split across multiple pages, these rows may be redrawn at the top of each table fragment (although there is only one THead element).
TBody (Table body row group; PDF 1.5) A group of rows that constitute the main body portion of a table. If the table is split across multiple pages, the body area may be broken apart on a row boundary. A table may have multiple TBody elements to allow for the drawing of a border or background for a set of rows.
TFoot (Table footer row group; PDF 1.5) A group of rows that constitute the footer of a table. If the table is split across multiple pages, these rows may be redrawn at the bottom of each table fragment (although there is only one TFoot element.)

NOTE 2

The association of headers with rows and columns of data is typically determined heuristically by applications. Such heuristics may fail for complex tables; the standard attributes for tables shown in Table 348 can be used to make the association explicit.

14.8.4.3.5 区块级结构的使用指南

14.8.4.3.5 Usage Guidelines for Block-Level Structure

由于不同的符合规范的阅读器以不同的方式使用 PDF 的逻辑结构功能,标记 PDF 不强制执行使用标准结构类型的元素的顺序和嵌套规则。此外,每个导出格式都有其自己的逻辑结构约定。然而,遵循某些通用指南有助于在不同的标记 PDF 消费者之间实现最一致和可预测的解释。

14.8.4.2 “分组元素”中所述,标记 PDF 文档可以有一个或多个分组元素级别,例如 Document、Part、Art(文章)、Sect(章节)和 Div(分区)。这些元素的后代应该是 BLSE,例如 H(标题)、P(段落)和 L(列表),它们包含实际内容。它们的后代应该是内容项或进一步描述内容的 ILSE。

NOTE 1

如前所述,通常会被视为 ILSE 的元素可能具有一个 Placement 属性(见“通用布局属性”14.8.5.4,“布局属性”),这会导致它们被视为 BLSE。因此,这些元素可以像标题和段落一样作为 BLSE 包含在内。

块级结构可以遵循两种主要范式之一:

  • 强结构化。分组元素嵌套到所需的多个级别,以反映材料的组织结构,如文章、章节、小节等。在每个级别,分组元素的子元素应包括该级别的标题(H)、一个或多个段落(P)作为该级别的内容,可能还包括一个或多个其他分组元素作为嵌套的小节。
  • 弱结构化。文档相对较平坦,可能只有一到两个级别的分组元素,所有的标题、段落和其他 BLSE 都是它们的直接子元素。在这种情况下,材料的组织结构未在逻辑结构中反映;然而,可以通过使用具有特定级别的标题(H1–H6)来表示。

NOTE 2

强结构化范式被一些基于 XML 的丰富文档模型使用。弱结构化范式是 HTML 表示的文档的典型方式。

列表和表格应使用在“列表元素”14.8.4.3 “块级结构元素”和“表格元素”14.8.4.3 “块级结构元素”中描述的特定结构类型进行组织。同样,目录和索引应按 14.8.4.2 “分组元素”中对 TOC 和 Index 结构类型的描述进行组织。

Because different conforming readers use PDF’s logical structure facilities in different ways, Tagged PDF does not enforce any strict rules regarding the order and nesting of elements using the standard structure types. Furthermore, each export format has its own conventions for logical structure. However, adhering to certain general guidelines helps to achieve the most consistent and predictable interpretation among different Tagged PDF consumers.

As described under 14.8.4.2, “Grouping Elements,” a Tagged PDF document may have one or more levels of grouping elements, such as Document, Part, Art (Article), Sect (Section), and Div (Division). The descendants of these should be BLSEs, such as H (Heading), P (Paragraph), and L (List), that hold the actual content. Their descendants, in turn, should be either content items or ILSEs that further describe the content.

NOTE 1

As noted earlier, elements with structure types that would ordinarily be treated as ILSEs may have a Placement attribute (see “General Layout Attributes” in 14.8.5.4, “Layout Attributes”) that causes them to be treated as BLSEs instead. Such elements may be included as BLSEs in the same manner as headings and paragraphs.

The block-level structure may follow one of two principal paradigms:

  • Strongly structured. The grouping elements nest to as many levels as necessary to reflect the organization of the material into articles, sections, subsections, and so on. At each level, the children of the grouping element should consist of a heading (H), one or more paragraphs (P) for content at that level, and perhaps one or more additional grouping elements for nested subsections.
  • Weakly structured. The document is relatively flat, having perhaps only one or two levels of grouping elements, with all the headings, paragraphs, and other BLSEs as their immediate children. In this case, the organization of the material is not reflected in the logical structure; however, it may be expressed by the use of headings with specific levels (H1–H6).

NOTE 2

The strongly structured paradigm is used by some rich document models based on XML. The weakly structured paradigm is typical of documents represented in HTML.

Lists and tables should be organized using the specific structure types described under “List Elements” in 14.8.4.3, “Block-Level Structure Elements,” and “Table Elements” in 14.8.4.3, “Block-Level Structure Elements”. Likewise, tables of contents and indexes should be structured as described for the TOC and Index structure types under “Grouping Elements” in 14.8.4.2, “Grouping Elements.”

14.8.4.4 内联级结构元素

14.8.4.4 Inline-Level Structure Elements

14.8.4.4.1 概述

14.8.4.4.1 General

一个 内联级结构元素(ILSE)包含一部分文本或其他内容,这些内容具有特定的样式特征或在文档中扮演特定的角色。在段落或由包含的 BLSE 定义的其他块中,连续的 ILSE(可能与作为父 BLSE 直接子元素的其他内容项混合)将按内联方向(在西方书写系统中是从左到右)连续排列。由此生成的内容可能会分成多个 ,这些行将沿块级进展方向堆叠。ILSE 可能包含一个 BLSE,该 BLSE 应视为内联方向上的单一布局项。表 338 列出了标准的内联级结构元素类型。

Table 338 – Standard structure types for inline-level structure elements
Structure Type Description
Span (Span) 一个没有特定固有特征的通用内联文本部分。例如,它可以用于用一组给定的样式属性来限定一段文本范围。

NOTE 1   并非所有的内联样式更改都需要标识为 span。文本颜色和字体更改(包括加粗、斜体和小型大写字母等修饰符)不需要这样标记,因为这些可以从 PDF 内容中派生(见“字体特征”14.8.2.4,“字符属性提取”)。然而,必须使用 span 来应用显式的布局属性,如 LineHeightBaselineShiftTextDecorationType(见“内联结构元素的布局属性”14.8.5.4,“布局属性”)。
NOTE 2   标记内容序列使用 Span 标签也用于传递某些无障碍属性(Alt、ActualText、Lang 和 E;见14.9,“无障碍支持”)。此类序列没有 MCID 属性,并且不与任何结构元素关联。Span 标记内容标签的此使用方式不同于其作为结构类型的使用。
Quote (Quotation) 一个内联文本部分,归属于文档作者以外的人。
引用的文本应作为一个段落中的内联内容。这与块级元素 BlockQuote(见14.8.4.2,“分组元素”)不同,后者由一个或多个完整段落(或以完整段落方式呈现的其他元素)组成。
Note (Note) 一个解释性文本项,如脚注或尾注,它从文档正文中引用。它可以有一个标签(结构类型 Lbl;见“列表元素”14.8.4.3,“块级结构元素”)作为子元素。该注释可以作为引用它的结构元素的子元素包含在正文中,或者可以包含在其他位置(如尾注部分),并通过引用(结构类型 Reference)进行访问。
标记 PDF 并未规定脚注在页面内容顺序中的位置。它们可以是内联的,也可以放在页面的末尾,由符合规范的作者自行决定。
Reference (Reference) 对文档中其他位置内容的引用。
BibEntry (Bibliography entry) 一个标识某些引用内容外部来源的参考。它可以包含一个标签(结构类型 Lbl;见“列表元素”14.8.4.3,“块级结构元素”)作为子元素。
尽管书目条目可能包括标识引用内容的作者、作品、出版商等的组成部分,但在这一层次上没有定义标准的结构类型。
Code (Code) 计算机程序文本的片段。
Link (Link) 内联结构元素内容与相应的链接注释或注释之间的关联(见[12.5.6.5],“链接注释”)。它的子元素应包括一个或多个内容项或子 ILSE,以及一个或多个对象引用(见14.7.4.3,“PDF对象作为内容项”)标识关联的链接注释。有关进一步讨论,请参见“链接元素”14.8.4.3,“块级结构元素”。
Annot (Annotation; PDF 1.5) 内联结构元素内容与相应的 PDF 注释之间的关联(见[12.5],“注释”)。Annot 应用于所有 PDF 注释,除了链接注释(见 Link 元素)和小部件注释(见[表 340]中的表单元素)。有关进一步讨论,请参见“注释元素”14.8.4.4,“内联结构元素”。
Ruby (Ruby; PDF 1.5) 一种以较小字体书写并放置在基文本旁边的旁注(注释)。Ruby 元素也可以包含 RB、RT 和 RP 元素。有关更多细节,请参见“Ruby 和 Warichu 元素”14.8.4.4,“内联结构元素”。
Warichu (Warichu; PDF 1.5) 一种注释或说明,使用较小的字体书写并排版到包含文本行的两个较小的行中,放置在其引用的基文本后面(内联)。Warichu 元素也可以包含 WT 和 WP 元素。有关更多细节,请参见“Ruby 和 Warichu 元素”14.8.4.4,“内联结构元素”。

An inline-level structure element (ILSE) contains a portion of text or other content having specific styling characteristics or playing a specific role in the document. Within a paragraph or other block defined by a containing BLSE, consecutive ILSEs—possibly intermixed with other content items that are direct children of the parent BLSE—are laid out consecutively in the inline-progression direction (left to right in Western writing systems). The resulting content may be broken into multiple lines, which in turn shall be stacked in the block- progression direction. An ILSE may in turn contain a BLSE, which shall be treated as a unitary item of layout in the inline direction. Table 338 lists the standard structure types for ILSEs.

( (
Table 338 – Standard structure types for inline-level structure elements
Structure Type Description
Span (Span) A generic inline portion of text having no particular inherent characteristics. It can be used, for example, to delimit a range of text with a given set of styling attributes.

NOTE 1   Not all inline style changes need to be identified as a span. Text colour and font changes (including modifiers such as bold, italic, and small caps) need not be so marked, since these can be derived from the PDF content (see “Font Characteristics” in 14.8.2.4, “Extraction of Character Properties”). However, it is necessary to use a span to apply explicit layout attributes such as LineHeight, BaselineShift, or TextDecorationType (see “Layout Attributes for ILSEs” in 14.8.5.4, “Layout Attributes”).
NOTE 2   Marked-content sequences having the tag Span are also used to carry certain accessibility properties (Alt, ActualText, Lang, and E; see 14.9, “Accessibility Support”). Such sequences lack an MCID property and are not associated with any structure element. This use of the Span marked-content tag is distinct from its use as a structure type.
Quote (Quotation) An inline portion of text attributed to someone other than the author of the surrounding text.
The quoted text should be contained inline within a single paragraph. This differs from the block-level element BlockQuote (see 14.8.4.2, “Grouping Elements”), which consists of one or more complete paragraphs (or other elements presented as if they were complete paragraphs).
Note (Note) An item of explanatory text, such as a footnote or an endnote, that is referred to from within the body of the document. It may have a label (structure type Lbl; see “List Elements” in 14.8.4.3, “Block-Level Structure Elements”) as a child. The note may be included as a child of the structure element in the body text that refers to it, or it may be included elsewhere (such as in an endnotes section) and accessed by means of a reference (structure type Reference).
Tagged PDF does not prescribe the placement of footnotes in the page content order. They may be either inline or at the end of the page, at the discretion of the conforming writer.
ReferenceReference) A citation to content elsewhere in the document.
BibEntry (Bibliography entry) A reference identifying the external source of some cited content. It may contain a label (structure type Lbl; see “List Elements” in 14.8.4.3, “Block-Level Structure Elements”) as a child.
Although a bibliography entry is likely to include component parts identifying the cited content’s author, work, publisher, and so forth, no standard structure types are defined at this level of detail.
CodeCode) A fragment of computer program text.
Link (Link) An association between a portion of the ILSE’s content and a corresponding link annotation or annotations (see [12.5.6.5], “Link Annotations”). Its children should be one or more content items or child ILSEs and one or more object references (see 14.7.4.3, “PDF Objects as Content Items”) identifying the associated link annotations. See “Link Elements” in 14.8.4.3, “Block-Level Structure Elements,” for further discussion.
Annot (Annotation; PDF 1.5) An association between a portion of the ILSE’s content and a corresponding PDF annotation (see [12.5], “Annotations”). Annot shall be used for all PDF annotations except link annotations (see the Link element) and widget annotations (see the Form element in Table 340). See “Annotation Elements” 14.8.4.4, “Inline-Level Structure Elements,” for further discussion.
Ruby (Ruby; PDF 1.5) A side-note (annotation) written in a smaller text size and placed adjacent to the base text to which it refers. A Ruby element may also contain the RB, RT, and RP elements. See “Ruby and Warichu Elements” [in14.8.4.4], “Inline- Level Structure Elements,” for more details.
Warichu (Warichu; PDF 1.5) A comment or annotation in a smaller text size and formatted onto two smaller lines within the height of the containing text line and placed following (inline) the base text to which it refers. A Warichu element may also contain the WT and WP elements. See “Ruby and Warichu Elements” in 14.8.4.4, “Inline-Level Structure Elements,” for more details.

14.8.4.4.2 链接元素

14.8.4.4.2 Link Elements

NOTE 1

链接注释(像所有 PDF 注释一样)与页面的几何区域相关联,而不是与其内容流中的特定对象相关联。链接与内容之间的任何连接完全基于视觉外观,而不是显式指定的关联。因此,仅凭链接注释,对于视力障碍的用户或需要确定哪些内容可以被激活以调用超文本链接的应用程序来说,链接注释本身是没有用的。

标记 PDF 链接元素(结构类型 Link)使用 PDF 的逻辑结构功能来建立内容项与链接注释之间的关联,提供类似 HTML 超文本链接的功能。链接元素的子元素可以包括以下项:

  • 一个或多个内容项或其他 ILSE(除其他链接外)
  • 对一个或多个与内容关联的链接注释的对象引用(见14.7.4.3,“PDF 对象作为内容项”)

Link 结构元素描述一个文本跨度与链接注释关联,并且该跨度从一行的结尾换到另一行的开头时,Link 结构元素应包含一个对象引用,该引用将该跨度与相应的链接注释关联起来。此外,链接注释应使用 QuadPoint 条目来表示页面上的活动区域。

EXAMPLE 1

Link 结构元素引用一个链接注释,其中包含一个 QuadPoint 条目,框住了字符串“with a”和“link”。也就是说,QuadPoint 条目包含16个数字:前8个数字描述“with a”的四边形,接下来的8个数字描述“link”的四边形。

<p>Here is some text <span style="color:blue;text-decoration: underline blue;">with a linke</span> inside .</p>

NOTE 2

从 PDF 1.7 开始,使用 Link 结构元素来包含多个链接注释的做法已被弃用。

EXAMPLE 2

考虑以下 HTML 代码片段,它生成包含超文本链接的文本行:

<html>
    <body>
        <p>
        Here is some text <a href=http://www.adobe.com>with a link</a> inside .
    </body>
</html>

该代码示例显示了一个等效的 PDF 片段,使用链接元素显示的文本是蓝色并带下划线。

/P << /MCID 0 >>            % 标记内容序列 0(段落)
    BDC                                     % 开始标记内容序列
        BT                                  % 开始文本对象
            /T1_0 1 Tf                      % 设置文本字体和大小
            14 0 0 14 10.000 753.976 Tm     % 设置文本矩阵
            0.0 0.0 0.0 rg                  % 设置非描边颜色为黑色
            ( Here is some text ) Tj        % 显示文本(链接前的文本)
        ET                                  % 结束文本对象
    EMC                                    % 结束标记内容序列

/Link << /MCID 1 >>                        % 标记内容序列 1(链接)
    BDC                                    % 开始标记内容序列
        0.7 w                              % 设置线宽
        [] 0 d                             % 实线图案

        111.094 751.8587 m                 % 移动到下划线起始位置
        174.486 751.8587 l                 % 画下划线
        0.0 0.0 1.0 RG                     % 设置描边颜色为蓝色
        S                                  % 描边下划线

        BT                                 % 开始文本对象
            14 0 0 14 111.094 753.976 Tm   % 设置文本矩阵
            0.0 0.0 1.0 rg                 % 设置非描边颜色为蓝色
            ( with a link ) Tj             % 显示链接文本
        ET                                 % 结束文本对象
    EMC                                    % 结束标记内容序列

/P << /MCID 2 >>                           % 标记内容序列 2(段落)
    BDC                                    % 开始标记内容序列
        BT                                 % 开始文本对象
            14 0 0 14 174.486 753.976 Tm   % 设置文本矩阵
            0.0 0.0 0.0 rg                 % 设置非描边颜色为黑色
            ( inside . ) Tj                % 显示文本(链接后的文本)
        ET                                 % 结束文本对象
    EMC                                    % 结束标记内容序列

EXAMPLE 3

这个例子显示了相关逻辑结构层次的摘录。

501 0 obj                                      % 段落的结构元素
    << /Type /StructElem
        /S /P
        ...
        /K [ 0                                  % 三个子元素:标记内容序列 0
            502 0 R                            % 链接
            2                                  % 标记内容序列 2
            ]
    >>
endobj

502 0 obj                                     % 链接的结构元素
    << /Type /StructElem
        /S /Link
        ...
        /K [ 1                                 % 两个子元素:标记内容序列 1
            503 0 R                           % 链接注释的对象引用
            ]
    >>
endobj

503 0 obj                                    % 链接注释的对象引用
    << /Type /OBJR
        /Obj 600 0 R                          % 链接注释(未显示)
    >>
endobj

NOTE 1

Link annotations (like all PDF annotations) are associated with a geometric region of the page rather than with a particular object in its content stream. Any connection between the link and the content is based solely on visual appearance rather than on an explicitly specified association. For this reason, link annotations alone are not useful to users with visual impairments or to applications needing to determine which content can be activated to invoke a hypertext link.

Tagged PDF link elements (structure type Link) use PDF’s logical structure facilities to establish the association between content items and link annotations, providing functionality comparable to HTML hypertext links. The following items may be children of a link element:

  • One or more content items or other ILSEs (except other links)
  • Object references (see 14.7.4.3, “PDF Objects as Content Items”) to one or more link annotations associated with the content

When a Link structure element describes a span of text to be associated with a link annotation and that span wraps from the end of one line to the beginning of another, the Link structure element shall include a single object reference that associates the span with the associated link annotation. Further, the link annotation shall use the QuadPoint entry to denote the active areas on the page.

EXAMPLE 1

The Link structure element references a link annotation that includes a QuadPoint entry that boxes the strings “with a” and “link”. That is, the QuadPoint entry contains 16 numbers: the first 8 numbers describe a quadrilateral for “with a”, and the next 8 describe a quadrilateral for “link.”

<p>Here is some text <span style="color:blue;text-decoration: underline blue;">with a linke</span> inside .</p>

NOTE 2

Beginning with PDF 1.7, use of the Link structure element to enclose multiple link annotations is deprecated.

EXAMPLE 2

Consider the following fragment of HTML code, which produces a line of text containing a hypertext link:

<html>
    <body>
        <p>
        Here is some text <a href=http://www.adobe.com>with a link</a> inside .
    </body>
</html>

This code sample shows an equivalent fragment of PDF using a link element, whose text it displays in blue and underlined.

/P << /MCID 0 >>            % Marked-content sequence 0 (paragraph)
   BDC                                     % Begin marked-content sequence
       BT                                  % Begin text object
           /T1_0 1 Tf                      % Set text font and size
           14 0 0 14 10.000 753.976 Tm     % Set text matrix
           0.0 0.0 0.0 rg                  % Set nonstroking colour to black
           ( Here is some text ) Tj        % Show text preceding link
       ET                                  % End text object
    EMC                                    % End marked-content sequence

/Link << /MCID 1 >>                        % Marked-content sequence 1 (link)
    BDC                                    % Begin marked-content sequence
        0.7 w                              % Set line width
        [] 0 d                             % Solid dash pattern

        111.094 751.8587 m                 % Move to beginning of underline
        174.486 751.8587 l                 % Draw underline
        0.0 0.0 1.0 RG                     % Set stroking colour to blue
        S                                  % Stroke underline

        BT                                 % Begin text object
            14 0 0 14 111.094 753.976 Tm   % Set text matrix
            0.0 0.0 1.0 rg                 % Set nonstroking colour to blue
            ( with a link ) Tj             % Show text of link
        ET                                 % End text object
    EMC                                    % End marked-content sequence

/P << /MCID 2 >>                           % Marked-content sequence 2 (paragraph)
    BDC                                    % Begin marked-content sequence
        BT                                 % Begin text object
            14 0 0 14 174.486 753.976 Tm   % Set text matrix
            0.0 0.0 0.0 rg                 % Set nonstroking colour to black
            ( inside . ) Tj                % Show text following link
        ET                                 % End text object
    EMC                                    % End marked-content sequence

EXAMPLE 3

This example shows an excerpt from the associated logical structure hierarchy.

501 0 obj                                      % Structure element for paragraph
    << /Type /StructElem
       /S /P
       ...
       /K [ 0                                  % Three children: marked-content sequence 0
            502 0 R                            % Link
            2                                  % Marked-content sequence 2
          ]
    >>
endobj

502 0 obj                                     % Structure element for link
    << /Type /StructElem
       /S /Link
       ...
       /K [ 1                                 % Two children: marked-content sequence 1
            503 0 R                           % Object reference to link annotation
          ]
    >>
endobj

503 0 obj                                    % Object reference to link annotation
    << /Type /OBJR
       /Obj 600 0 R                          % Link annotation (not shown)
    >>
endobj

14.8.4.4.3 注释元素

14.8.4.4.3 Annotation Elements

标记 PDF 注释元素(结构类型 Annot;PDF 1.5)使用 PDF 的逻辑结构功能来建立内容项与 PDF 注释之间的关联。注释元素应适用于所有类型的注释,除了链接(见“链接元素”14.8.4.3,“块级结构元素”)和表单(见表 340)。

注释元素的子项可以包括以下项:

  • 对一个或多个注释字典的对象引用(见14.7.4.3,“PDF 对象作为内容项”)
  • 可选的一个或多个内容项(例如标记内容序列)或其他 ILSE(除其他注释外),这些内容项与注释关联

如果 Annot 元素没有其他子项,除了对象引用,其渲染应由引用的注释的外观定义,其文本内容应视为 Span 元素。它可以具有一个可选的 BBox 属性;如果提供,该属性将覆盖由注释字典的 Rect 条目指定的矩形。

如果 Annot 元素有子项内容项,这些子项表示注释的显示形式,且关联的注释的外观也可以被应用(例如,使用 Highlight 注释)。

可以有多个子项,引用不同的注释,前提是这些注释在 Rect 条目上是相同的。这与 Link 元素的处理方式类似;它允许注释与不连续的内容片段关联,例如换行的文本。

Tagged PDF annotation elements (structure type Annot; PDF 1.5) use PDF’s logical structure facilities to establish the association between content items and PDF annotations. Annotation elements shall be used for all types of annotations other than links (see “Link Elements” in 14.8.4.3, “Block-Level Structure Elements”) and forms (see Table 340).

The following items may be children of an annotation element:

  • Object references (see 14.7.4.3, “PDF Objects as Content Items”) to one or more annotation dictionaries
  • Optionally, one or more content items (such as marked-content sequences) or other ILSEs (except other annotations) associated with the annotations

If an Annot element has no children other than object references, its rendering shall be defined by the appearance of the referenced annotations, and its text content shall be treated as if it were a Span element. It may have an optional BBox attribute; if supplied, this attribute overrides the rectangle specified by the annotation dictionary’s Rect entry.

If the Annot element has children that are content items, those children represent the displayed form of the annotation, and the appearance of the associated annotation may also be applied (for example, with a Highlight annotation).

There may be multiple children that are object references to different annotations, subject to the constraint that the annotations shall be the same except for their Rect entry. This is much the same as is done for the Link element; it allows an annotation to be associated with discontiguous pieces of content, such as line-wrapped text.

14.8.4.4.4 竖排和倾斜元素

14.8.4.4.4 Ruby and Warichu Elements

Ruby 文本是一种侧注,采用较小的文本大小,并且紧邻基准文本放置,用于描述不常见单词的发音,或用于描述如缩写和商标等项目。Ruby 文本在日语和中文中使用。

Warichu 文本是一种注释或说明,采用较小的文本大小,并且在包含文本行的高度范围内格式化为两行,紧跟(内联)在基准文本后面。它用于日语中的描述性注释,以及用于 ruby 注释文本,但当文本太长时,无法以美观的方式格式化为 ruby 样式时,使用 Warichu。

表 339 – Ruby 和 Warichu 元素的标准结构类型(PDF 1.5)
结构类型 描述
Ruby (Ruby)整个 ruby 组件的包装器。它应包含一个 RB 元素,后跟一个 RT 元素,或者是由 RP、RT 和 RP 组成的三元素组。Ruby 元素及其内容元素不得跨越多行。
RB (Ruby 基本文本)应用 ruby 注释的全尺寸文本。RB 可以包含文本、其他内联元素或两者的混合。它可以具有 RubyAlign 属性。
RT (Ruby 注释文本)较小的文本,应紧邻 ruby 基本文本放置。它可以包含文本、其他内联元素或两者的混合。它可以具有 RubyAlignRubyPosition 属性。
RP (Ruby 标点符号)围绕 ruby 注释文本的标点符号。仅当 ruby 注释无法正确格式化为 ruby 样式时,才使用此标点符号,或者当它被格式化为普通注释,或格式化为 warichu 时使用。它包含文本(通常是左括号或右括号或类似的括号字符)。
Warichu (Warichu)整个 warichu 组件的包装器。它可以包含由 WP、WT 和 WP 组成的三元素组。Warichu 元素(及其内容元素)可以跨越多行,根据日本工业标准(JIS)X 4051-1995 中描述的 warichu 换行规则进行换行。
WT (Warichu 文本)格式化为两行并放置在围绕 WP 元素的 Warichu 注释的较小文本。
WP (Warichu 标点符号)围绕 WT 文本的标点符号。它包含文本(通常是左括号或右括号或类似的括号字符)。根据 JIS X 4051-1995 标准,围绕 warichu 的括号可以根据格式化器的决定转换为空格(通常宽度为 ¼ EM)。

Ruby text is a side note, written in a smaller text size and placed adjacent to the base text to which it refers. It is used in Japanese and Chinese to describe the pronunciation of unusual words or to describe such items as abbreviations and logos.

Warichu text is a comment or annotation, written in a smaller text size and formatted onto two smaller lines within the height of the containing text line and placed following (inline) the base text to which it refers. It is used in Japanese for descriptive comments and for ruby annotation text that is too long to be aesthetically formatted as a ruby.

Table 339 – Standard structure types for Ruby and Warichu elements (PDF 1.5)
Structure Type Description
Ruby (Ruby) The wrapper around the entire ruby assembly. It shall contain one RB element followed by either an RT element or a three-element group consisting of RP, RT, and RP. Ruby elements and their content elements shall not break across multiple lines.
RB (Ruby base text) The full-size text to which the ruby annotation is applied. RB may contain text, other inline elements, or a mixture of both. It may have the RubyAlign attribute.
RT (Ruby annotation text) The smaller-size text that shall be placed adjacent to the ruby base text. It may contain text, other inline elements, or a mixture of both. It may have the RubyAlign and RubyPosition attributes.
RP (Ruby punctuation) Punctuation surrounding the ruby annotation text. It is used only when a ruby annotation cannot be properly formatted in a ruby style and instead is formatted as a normal comment, or when it is formatted as a warichu. It contains text (usually a single LEFT or RIGHT PARENTHESIS or similar bracketing character).
Warichu (Warichu) The wrapper around the entire warichu assembly. It may contain a three-element group consisting of WP, WT, and WP. Warichu elements (and their content elements) may wrap across multiple lines, according to the warichu breaking rules described in the Japanese Industrial Standard (JIS) X 4051-1995.
WT (Warichu text) The smaller-size text of a warichu comment that is formatted into two lines and placed between surrounding WP elements.
WP (Warichu punctuation) The punctuation that surrounds the WT text. It contains text (usually a single LEFT or RIGHT PARENTHESIS or similar bracketing character). According to JIS X 4051-1995, the parentheses surrounding a warichu may be converted to a SPACE (nominally ¼ EM in width) at the discretion of the formatter.

14.8.4.5 插图元素

14.8.4.5 Illustration Elements

Tagged PDF 定义了插图元素为任何结构元素,其结构类型(在角色映射后,如果有的话)是 表 340 中列出的类型之一。插图的内容应包含一个或多个完整的图形对象。它不得出现在限定文本对象的 BTET 操作符之间(见 9.4,“文本对象”)。它可以仅包括标记的裁剪序列,这些裁剪序列定义在 14.6.3,“标记内容和裁剪” 中。在 Tagged PDF 中,所有这些标记的裁剪序列应携带标记内容标签 Clip。

表 340 – 插图元素的标准结构类型
结构类型 描述
Figure (插图) 一项图形内容。其位置可通过 Placement 布局属性指定(见 14.8.5.4,“布局属性”)。
Formula (公式) 数学公式。
此结构类型仅用于标识整个内容元素为公式。没有标准结构类型用于标识公式中的单个组成部分。从格式化角度看,公式应类似于图形(结构类型为 Figure)。
Form (表单) 一个表示交互式表单字段的小部件注释(见 12.7,“交互式表单”)。如果元素包含 Role 属性,它可以包含表示表单字段(非交互式)的内容项。如果元素省略了 Role 属性(见 [表 348]),它应仅有一个子元素:一个对象引用(见 14.7.4.3,“PDF 对象作为内容项”),指向该小部件注释。注释的外观流(见 12.5.5,“外观流”)应描述表单元素的外观。

插图可以具有逻辑子结构,包括其他插图。然而,为了适应重新排版,它应作为一个整体移动(并可能调整大小),而不检查其内部内容。为了支持重新排版,它应具有 BBox 属性。它还可以具有 PlacementWidthHeightBaselineShift 属性(见 14.8.5.4,“布局属性”)。

通常,插图在逻辑上是文档中某一段落或其他元素的一部分或至少与之相关联。任何此类包含或附加应通过使用 Figure 结构类型表示。Figure 元素指示附加点,其 Placement 属性描述附加的方式。没有 Placement 属性的插图元素应视为 ILSE 并内联布局。

为了便于残障用户访问和其他文本提取目的,插图元素应在其结构元素字典中具有 Alt 条目或 ActualText 条目(或两者)(见 14.9.3,“替代描述”和 14.9.4,“替换文本”)。Alt 是插图的描述,而 ActualText 提供了图形插图的确切文本等价物,该图形插图看起来像文本。

Tagged PDF defines an illustration element as any structure element whose structure type (after role mapping, if any) is one of those listed in Table 340. The illustration’s content shall consist of one or more complete graphics objects. It shall not appear between the BT and ET operators delimiting a text object (see 9.4, “Text Objects”). It may include clipping only in the form of a contained marked clipping sequence, as defined in 14.6.3, “Marked Content and Clipping.” In Tagged PDF, all such marked clipping sequences shall carry the marked-content tag Clip.

Table 340 – Standard structure types for illustration elements
Structure Type Description
Figure (Figure) An item of graphical content. Its placement may be specified with the Placement layout attribute (see “General Layout Attributes” in 14.8.5.4, “Layout Attributes”).
Formula (Formula) A mathematical formula.
This structure type is useful only for identifying an entire content element as a formula. No standard structure types are defined for identifying individual components within the formula. From a formatting standpoint, the formula shall be treated similarly to a figure (structure type Figure).
Form (Form) A widget annotation representing an interactive form field (see 12.7, “Interactive Forms”). If the element contains a Role attribute, it may contain content items that represent the value of the (non-interactive) form field. If the element omits a Role attribute (see Table 348), it shall have only one child: an object reference (see 14.7.4.3, “PDF Objects as Content Items”) identifying the widget annotation. The annotations’ appearance stream (see 12.5.5, “Appearance Streams”) shall describe the appearance of the form element.

An illustration may have logical substructure, including other illustrations. For purposes of reflow, however, it shall be moved (and perhaps resized) as a unit, without examining its internal contents. To be useful for reflow, it shall have a BBox attribute. It may also have Placement, Width, Height, and BaselineShift attributes (see 14.8.5.4, “Layout Attributes”).

Often an illustration is logically part of, or at least attached to, a paragraph or other element of a document. Any such containment or attachment shall be represented through the use of the Figure structure type. The Figure element indicates the point of attachment, and its Placement attribute describes the nature of the attachment. An illustration element without a Placement attribute shall be treated as an ILSE and laid out inline.

For accessibility to users with disabilities and other text extraction purposes, an illustration element should have an Alt entry or an ActualText entry (or both) in its structure element dictionary (see 14.9.3, “Alternate Descriptions,” and 14.9.4, “Replacement Text”). Alt is a description of the illustration, whereas ActualText gives the exact text equivalent of a graphical illustration that has the appearance of text.

14.8.5 标准结构属性

14.8.5 Standard Structure Attributes

14.8.5.1 概述

14.8.5.1 General

除了标准结构类型外,Tagged PDF 定义了标准的布局和样式属性,用于这些类型的结构元素。这些属性使得在进行重新排版和将 PDF 内容导出到其他文档格式等操作时,能够应用可预测的格式化。

14.7.5,“结构属性”中所讨论的,属性应在属性对象中定义,这些对象是字典或流,并通过以下两种方式之一附加到结构元素上:

  • 结构元素字典中的 A 条目标识一个属性对象或多个此类对象的数组。
  • 结构元素字典中的 C 条目给出一个属性类的名称或多个此类名称的数组。类名称反过来会在类映射中查找,该映射是由结构树根中的 ClassMap 条目标识的字典,返回一个对应于该类的属性对象或对象数组。

除了 14.8.5.2,“标准属性拥有者”中描述的标准结构属性外,还有几个其他可选条目—LangAltActualTextE—这些条目在 14.9,“无障碍支持”中有描述,但对其他 PDF 使用者也同样有用。它们出现在 PDF 文件中的以下位置(而不是在属性字典中):

  • 作为结构元素字典中的条目(见 表 323
  • 作为附加到标记内容序列的属性列表中的条目,带有 Span 标签(见 14.6,“标记内容”)

14.7.6,“逻辑结构示例”中的示例演示了标准结构属性的使用。

In addition to the standard structure types, Tagged PDF defines standard layout and styling attributes for structure elements of those types. These attributes enable predictable formatting to be applied during operations such as reflow and export of PDF content to other document formats.

As discussed in 14.7.5, “Structure Attributes,” attributes shall be defined in attribute objects, which are dictionaries or streams attached to a structure element in either of two ways:

  • The A entry in the structure element dictionary identifies an attribute object or an array of such objects.
  • The C entry in the structure element dictionary gives the name of an attribute class or an array of such names. The class name is in turn looked up in the class map, a dictionary identified by the ClassMap entry in the structure tree root, yielding an attribute object or array of objects corresponding to the class.

In addition to the standard structure attributes described in 14.8.5.2, “Standard Attribute Owners,” there are several other optional entries—Lang, Alt, ActualText, and E—that are described in 14.9, “Accessibility Support,” but are useful to other PDF consumers as well. They appear in the following places in a PDF file (rather than in attribute dictionaries):

  • As entries in the structure element dictionary (see Table 323)
  • As entries in property lists attached to marked-content sequences with a Span tag (see 14.6, “Marked Content”)

The Example in 14.7.6, “Example of Logical Structure,” illustrates the use of standard structure attributes.

14.8.5.2 标准属性所有者

14.8.5.2 Standard Attribute Owners

每个属性对象都有一个 拥有者 ,由对象的 O 条目指定,决定了该对象字典中定义的属性的解释。多个拥有者可能会定义具有相同名称的属性,但其值类型或解释可能不同。Tagged PDF 定义了一组标准的属性拥有者,如 表 341 所示。

表 341 – 标准属性拥有者
拥有者 描述
Layout 控制内容布局的属性
List 控制列表编号的属性
Print 控制非交互式表单字段的表单结构元素的属性 (PDF 1.7)
Table 控制表格中单元格组织的属性
XML-1.00 控制翻译为 XML 格式(版本 1.00)的附加属性
HTML-3.20 控制翻译为 HTML 格式(版本 3.20)的附加属性
HTML-4.01 控制翻译为 HTML 格式(版本 4.01)的附加属性
OEB-1.00 控制翻译为 OEB 格式(版本 1.0)的附加属性
RTF-1.05 控制翻译为 Microsoft Rich Text Format 格式(版本 1.05)的附加属性
CSS-1.00 控制翻译为使用 CSS 格式(版本 1.00)的附加属性
CSS-2.00 控制翻译为使用 CSS 格式(版本 2.00)的附加属性

由特定导出格式(例如 XML-1.00)拥有的属性对象仅在将 PDF 内容导出为该格式时应用。此类格式特定的属性将覆盖任何由 LayoutListPrintFieldTable 所拥有的对应属性。可能还会有其他格式特定的属性;可能的属性集合是开放式的,并没有明确指定或由 Tagged PDF 限制。

Each attribute object has an owner, specified by the object’s O entry, which determines the interpretation of the attributes defined in the object’s dictionary. Multiple owners may define like-named attributes with different value types or interpretations. Tagged PDF defines a set of standard attribute owners, shown in Table 341.

Table 341 – Standard attribute owners
Owner Description
Layout Attributes governing the layout of content
List Attributes governing the numbering of lists
Print Field(PDF 1.7) Attributes governing Form structure elements for non-interactive form fields
Table Attributes governing the organization of cells in tables
XML-1.00 Additional attributes governing translation to XML, version 1.00
HTML-3.20 Additional attributes governing translation to HTML, version 3.20
HTML-4.01 Additional attributes governing translation to HTML, version 4.01
OEB-1.00 Additional attributes governing translation to OEB, version 1.0
RTF-1.05 Additional attributes governing translation to Microsoft Rich Text Format, version 1.05
CSS-1.00 Additional attributes governing translation to a format using CSS, version 1.00
CSS-2.00 Additional attributes governing translation to a format using CSS, version 2.00

An attribute object owned by a specific export format, such as XML-1.00, shall be applied only when exporting PDF content to that format. Such format-specific attributes shall override any corresponding attributes owned by Layout, List, PrintField, or Table. There may also be additional format-specific attributes; the set of possible attributes is open-ended and is not explicitly specified or limited by Tagged PDF.

14.8.5.3 属性值和继承

14.8.5.3 Attribute Values and Inheritance

某些属性被定义为可继承的。可继承的属性会在结构树中传播;也就是说,指定在某个元素上的属性将应用于该元素在结构树中的所有后代,除非某个后代元素为该属性指定了明确的值。

注意 1

本子条款中对每个标准属性的描述指定了其值是否为可继承的。

可继承的属性可以在元素上指定,以便将其值传播到子元素,即使该属性对父元素来说没有意义。不可继承的属性只能在对其有意义的元素上指定。

以下列表显示了确定属性值的优先级。符合标准的阅读器应当将属性的值确定为以下列表中第一个适用的项:

a) 如果存在并且正在输出到该格式,元素的 A 条目中指定的属性值,由一个导出格式(如 XMLHTML-3.20HTML-4.01OEB-1.0CSS-1.00CSS-2.0、和 RTF)拥有。

b) 如果存在,元素的 A 条目中指定的属性值,由 LayoutPrintFieldTableList 拥有。

c) 如果存在,元素的 C 条目关联的类映射中指定的属性值。

d) 如果该属性是可继承的,父结构元素的已解析值。

e) 如果有默认值,属性的默认值。

注意 2

LangAltActualTextE 属性不会出现在属性字典中。它们的应用规则在 14.9 “可访问性支持” 中讨论。

显式指定的属性和继承的属性之间没有语义区别。逻辑上,结构树中每个元素都有完全绑定的属性,即使其中一些属性是从祖先元素继承的。这与字体特性等属性的行为一致,后者虽未通过结构属性指定,但必须从内容中派生。

Some attributes are defined as inheritable. Inheritable attributes propagate down the structure tree; that is, an attribute that is specified for an element shall apply to all the descendants of the element in the structure tree unless a descendent element specifies an explicit value for the attribute.

NOTE 1

The description of each of the standard attributes in this sub-clause specifies whether their values are inheritable.

An inheritable attribute may be specified for an element for the purpose of propagating its value to child elements, even if the attribute is not meaningful for the parent element. Non-inheritable attributes may be specified only for elements on which they would be meaningful.

The following list shows the priority for determining attribute values. A conforming reader determines an attribute’s value to be the first item in the following list that applies:

a) The value of the attribute specified in the element’s A entry, owned by one of the export formats (such as XML, HTML-3.20, HTML-4.01, OEB-1.0, CSS-1.00, CSS-2.0, and RTF), if present, and if outputting to that format

b) The value of the attribute specified in the element’s A entry, owned by Layout, PrintField, Table or List, if present

c) The value of the attribute specified in a class map associated with the element’s C entry, if there is one

d) The resolved value of the parent structure element, if the attribute is inheritable

e) The default value for the attribute, if there is one

NOTE 2

The attributes Lang, Alt, ActualText, and E do not appear in attribute dictionaries. The rules governing their application are discussed in 14.9, “Accessibility Support.”

There is no semantic distinction between attributes that are specified explicitly and ones that are inherited. Logically, the structure tree has attributes fully bound to each element, even though some may be inherited from an ancestor element. This is consistent with the behaviour of properties (such as font characteristics) that are not specified by structure attributes but shall be derived from the content.

14.8.5.4 布局属性

14.8.5.4 Layout Attributes

14.8.5.4.1 概述

14.8.5.4.1 General

布局属性指定了用于生成文档PDF内容所描述外观的布局过程的参数。此类别中的属性应在其 O(所有者)条目为 Layout(或表 341中列出的格式特定所有者名称之一)的属性对象中定义。

注意

这些参数的目的是使其能够用于重新流式处理内容或将其导出到其他文档格式时至少保留基本的样式。

表 342总结了标准布局属性及其适用的结构元素。以下子条款描述了这些属性的含义和用法。

14.8.5.3“属性值和继承”中所述,可以为任何元素指定可继承的属性,以将其传播到后代,无论该属性对该元素是否有意义。

表 342 – 标准布局属性
结构元素 属性 是否可继承
任何结构元素 Placement
WritingMode
BackgroundColor
BorderColor
BorderStyle
BorderThickness
Color
Padding








任何 BLSE
ILSEs 其 Placement 非内联
SpaceBefore
SpaceAfter
StartIndent
EndIndent




包含文本的 BLSEs TextIndent
TextAlign

插图元素(公式、表单、表格) BBox
Width
Height


TH(表头)
TD(表格数据)
Width
Height
BlockAlign
InlineAlign
TBorderStyle
TPadding






任何 ILSE
包含 ILSEs 或直接或嵌套内容项的 BLSEs
LineHeight
BaselineShift
TextDecorationType
TextDecorationColor
TextDecorationThickness





分组元素(Art, Sect, Div) ColumnCount
ColumnWidths
ColumnGap



竖排文本 GlyphOrientationVertical
Ruby文本 RubyAlign
RubyPosition

Layout attributes specify parameters of the layout process used to produce the appearance described by a document’s PDF content. Attributes in this category shall be defined in attribute objects whose O (owner) entry has the value Layout (or is one of the format-specific owner names listed in Table 341).

NOTE

The intent is that these parameters can be used to reflow the content or export it to some other document format with at least basic styling preserved.

Table 342 summarizes the standard layout attributes and the structure elements to which they apply. The following sub-clauses describe the meaning and usage of these attributes.

As described in 14.8.5.3, “Attribute Values and Inheritance,” an inheritable attribute may be specified for any element to propagate it to descendants, regardless of whether it is meaningful for that element.

Table 342 – Standard layout attributes
Structure Elements Attributes Inheritable
Any structure element Placement
WritingMode
BackgroundColor
BorderColor
BorderStyle
BorderThickness
Color
Padding
No
Yes
No
Yes
No
Yes
Yes
No
Any BLSE
ILSEs with Placement other
than Inline
SpaceBefore
SpaceAfter
StartIndent
EndIndent
No
No
Yes
Yes
BLSEs containing text TextIndent
TextAlign
Yes
Yes
Illustration elements Formula, Form) Table BBox
Width
Height
No
No
No
TH (Table header)
TD (Table data)
Width
Height
BlockAlign
InlineAlign
TBorderStyle
TPadding
No
No
Yes
Yes
Yes
Yes
Any ILSE
BLSEs containing ILSEs or
containing direct or nested
content items
LineHeight
BaselineShift
TextDecorationType
TextDecorationColor
TextDecorationThickness
Yes
No
No
Yes
Yes
Grouping elements Art, Sect, and Div ColumnCount
ColumnWidths
ColumnGap
No
No
No
Vertical text GlyphOrientationVertical Yes
Ruby text RubyAlign
RubyPosition
Yes
Yes

14.8.5.4.2 通用布局属性

14.8.5.4.2 General Layout Attributes

表 343中描述的布局属性可能适用于块级结构元素(BLSEs)或内联结构元素(ILSEs)中的任何标准类型的结构元素。

表 343 – 适用于所有标准结构类型的标准布局属性
结构元素 属性 是否可继承
Placement name (可选;不可继承)元素相对于封闭参考区域和其他内容的定位:

Block   在封闭参考区域或父级 BLSE 内,按块进展方向堆叠。
Inline   在封闭 BLSE 内,按内联进展方向排列。
Before   元素的分配矩形的前缘(见14.8.5.4中的“内容和分配矩形”,“布局属性”)与最近的封闭参考区域对齐。必要时,元素可以浮动,以实现指定的布局。元素应视为一个块,占据封闭参考区域的内联方向的完整范围。其他内容应堆叠,以便从元素分配矩形的后缘开始。
Start   元素的分配矩形的起始边缘(见14.8.5.4中的“内容和分配矩形”,“布局属性”)与最近的封闭参考区域对齐。必要时,元素可以浮动,以实现指定的布局。任何会侵入元素分配矩形的内容应采用回流布局。
End   元素的分配矩形的结束边缘(见14.8.5.4中的“内容和分配矩形”,“布局属性”)与最近的封闭参考区域对齐。必要时,元素可以浮动,以实现指定的布局。任何会侵入元素分配矩形的内容应采用回流布局。
应用于 ILSE 时,除 Inline 外的任何值应使该元素视为 BLSE 处理。默认值:Inline。
Placement 值为 Before、Start 或 End 的元素应从正常的堆叠或排列过程中移除,并允许浮动到封闭参考区域或父级 BLSE 的指定边缘。多个这样的浮动元素可以邻接放置在参考区域的指定边缘,或按遇到的顺序串联放置。复杂情况,如浮动元素互相干扰或无法在同一页面上放置,可能由不同的合规阅读器以不同方式处理。标记 PDF 仅标识这些元素为浮动,并指示其期望的放置位置。
WritingMode name (可选;可继承)ILSE(内联进展)堆叠 BLSE(块进展)方向的布局进展:

LrTb   内联进展从左到右;块进展从上到下。这是西方书写系统的典型写作模式。
RlTb   内联进展从右到左;块进展从上到下。这是阿拉伯语和希伯来语书写系统的典型写作模式。
TbRl   内联进展从上到下;块进展从右到左。这是中文和日语书写系统的典型写作模式。

指定的布局方向应应用于给定结构元素及其所有后代,直至任何嵌套级别。默认值:LrTb。

对于生成多个列的元素,写作模式定义了列在参考区域内的进展方向:内联方向决定列的堆叠方向及列间文本的默认流动顺序。对于表格,写作模式控制行和列的布局:表格行(结构类型 TR)应按块方向堆叠,行内的单元格(结构类型 TD)应按内联方向排列。

写作模式指定的内联进展方向可以在文本中进行局部覆盖,如[Unicode 标准附录 #9]《双向算法》中所描述的,详细信息见 Unicode 联盟(参见参考书目)。
BackgroundColor array (可选;不可继承;PDF 1.5)用于填充表格单元格或任何元素的内容矩形(可能由 Padding 属性调整)的背景色。值应为一个包含三个数字的数组,范围为 0.0 到 1.0,分别表示 RGB 色彩空间的红色、绿色和蓝色值。如果未指定此属性,元素应被视为透明。
BorderColor array (可选;可继承;PDF 1.5)在表格单元格或任何元素的内容矩形的边缘绘制的边框颜色(可能由 Padding 属性调整)。每个边缘的值应为一个包含三个数字的数组,范围为 0.0 到 1.0,分别表示 RGB 色彩空间的红色、绿色和蓝色值。可以有两种形式:
一个包含三个数字的单一数组,表示应用于四个边缘的 RGB 值;
一个包含四个数组的数组,每个数组指定一个边缘的 RGB 值,顺序为前、后、开始和结束边缘。任何边缘的 null 值表示该边缘不绘制。
如果未指定此属性,则该元素的边框颜色将为其关联内容开始时的当前文本填充颜色。
BorderStyle array 或 name (可选;不可继承;PDF 1.5)元素边框的样式。指定绘制表格单元格或任何元素内容矩形边缘的笔触模式(可能由 Padding 属性调整)。有两种形式:

- 一个来自以下列表的名称,表示应用于所有四个边缘的边框样式;
- 一个包含四个条目的数组,每个条目指定一个边缘的样式,顺序为前、后、开始和结束边缘。任何边缘的 null 值表示该边缘不绘制。

None   无边框。强制 BorderThickness 的计算值为 0。
Hidden   与 None 相同,除在表格元素的边框冲突解决方面。
Dotted   边框是由一系列点组成。
Dashed   边框是由一系列短线段组成。
Solid   边框是单一线段。
Double   边框是两条实线,两条线和它们之间的间距的总和等于 BorderThickness 的值。
Groove   边框看起来像是被雕刻进画布中的。
Ridge   边框看起来像是从画布中突出来的(与 Groove 相反)。
Inset   边框使整个框看起来像是嵌入到画布中的。
Outset   边框使整个框看起来像是从画布中突出的(与 Inset 相反)。

默认值:None

所有边框应绘制在框的背景之上。对于 Groove、Ridge、Inset 和 Outset 的边框,边框的颜色应根据结构元素的 BorderColor 属性以及绘制边框时的背景色来确定。

注意   合规的 HTML 应用程序可能会将 Dotted、Dashed、Double、Groove、Ridge、Inset 和 Outset 视为 Solid。
BorderThickness number 或 array (可选;可继承;PDF 1.5)绘制表格单元格或任何元素内容矩形边缘的边框厚度(可能由 Padding 属性调整)。每个边缘的值应为一个正数,表示边框的厚度(值为 0 表示该边框不绘制)。有两种形式:
一个数字,表示所有四个边缘的边框厚度;
一个包含四个条目的数组,每个条目指定一个边缘的厚度,顺序为前、后、开始和结束边缘。任何边缘的 null 值表示该边缘不绘制。
Padding number 或 array (可选;不可继承;PDF 1.5)指定元素内容矩形与周围边框之间的偏移量(见14.8.5.4中的“内容和分配矩形”,“布局属性”)。正值会扩大背景区域;负值会裁剪它,可能允许边框重叠到元素的文本或图形上。
值可以是一个数字,表示应用于四个边缘的填充宽度,使用默认用户空间单位,或者一个包含四个数字的数组,表示前、后、开始和结束边缘的填充宽度。默认值:0。
Color array (可选;可继承;PDF 1.5)用于绘制文本的颜色,以及表格边框和文本装饰的默认颜色。值应为一个包含三个数字的数组,范围为 0.0 到 1.0,分别表示 RGB 色彩空间的红色、绿色和蓝色值。如果未指定此属性,该元素的边框颜色将为关联内容开始时的当前文本填充颜色。

The layout attributes described in Table 343 may apply to structure elements of any of the standard types at the block level (BLSEs) or the inline level (ILSEs).

Table 343 – Standard layout attributes common to all standard structure types
Structure Elements Attributes Inheritable
Placement name (Optional; not inheritable) The positioning of the element with respect to the enclosing reference area and other content:

Block   Stacked in the block-progression direction within an enclosing reference area or parent BLSE.
Inline   Packed in the inline-progression direction within an enclosing BLSE.
Before   Placed so that the before edge of the element’s allocation rectangle (see “Content and Allocation Rectangles” in 14.8.5.4, “Layout Attributes”) coincides with that of the nearest enclosing reference area. The element may float, if necessary, to achieve the specified placement. The element shall be treated as a block occupying the full extent of the enclosing reference area in the inline direction. Other content shall be stacked so as to begin at the after edge of the element’s allocation rectangle.
Start   Placed so that the start edge of the element’s allocation rectangle (see “Content and Allocation Rectangles” in 14.8.5.4, “Layout Attributes”) coincides with that of the nearest enclosing reference area. The element may float, if necessary, to achieve the specified placement. Other content that would intrude into the element’s allocation rectangle shall be laid out as a runaround.
End   Placed so that the end edge of the element’s allocation rectangle (see “Content and Allocation Rectangles” in 14.8.5.4, “Layout Attributes”) coincides with that of the nearest enclosing reference area. The element may float, if necessary, to achieve the specified placement. Other content that would intrude into the element’s allocation rectangle shall be laid out as a runaround.
When applied to an ILSE, any value except Inline shall cause the element to be treated as a BLSE instead. Default value: Inline.
Elements with Placement values of Before, Start, or End shall be removed from the normal stacking or packing process and allowed to float to the specified edge of the enclosing reference area or parent BLSE. Multiple such floating elements may be positioned adjacent to one another against the specified edge of the reference area or placed serially against the edge, in the order encountered. Complex cases such as floating elements that interfere with each other or do not fit on the same page may be handled differently by different conforming readers. Tagged PDF merely identifies the elements as floating and indicates their desired placement.
WritingMode name (Optional; inheritable) The directions of layout progression for packing of ILSEs (inline progression) and stacking of BLSEs (block progression):

LrTb   Inline progression from left to right; block progression from top to bottom. This is the typical writing mode for Western writing systems.
RlTb   Inline progression from right to left; block progression from top to bottom. This is the typical writing mode for Arabic and Hebrew writing systems.
TbRl   Inline progression from top to bottom; block progression from right to left. This is the typical writing mode for Chinese and Japanese writing systems.

The specified layout directions shall apply to the given structure element and all of its descendants to any level of nesting. Default value: LrTb.

For elements that produce multiple columns, the writing mode defines the direction of column progression within the reference area: the inline direction determines the stacking direction for columns and the default flow order of text from column to column. For tables, the writing mode controls the layout of rows and columns: table rows (structure type TR) shall be stacked in the block direction, cells within a row (structure type TD) in the inline direction.

The inline-progression direction specified by the writing mode is subject to local override within the text being laid out, as described in Unicode Standard Annex #9, The Bidirectional Algorithm, available from the Unicode Consortium (see the Bibliography).
BackgroundColor array (Optional; not inheritable; PDF 1.5) The colour to be used to fill the background of a table cell or any element’s content rectangle (possibly adjusted by the Padding attribute). The value shall be an array of three numbers in the range 0.0 to 1.0, representing the red, green, and blue values, respectively, of an RGB colour space. If this attribute is not specified, the element shall be treated as if it were transparent.
BorderColor array (Optional; inheritable; PDF 1.5) The colour of the border drawn on the edges of a table cell or any element’s content rectangle (possibly adjusted by the Padding attribute). The value of each edge shall be an array of three numbers in the range 0.0 to 1.0, representing the red, green, and blue values, respectively, of an RGB colour space. There are two forms:
A single array of three numbers representing the RGB values to apply to all four edges.
An array of four arrays, each specifying the RGB values for one edge of the border, in the order of the before, after, start, and end edges. A value of null for any of the edges means that it shall not be drawn.
If this attribute is not specified, the border colour for this element shall be the current text fill colour in effect at the start of its associated content.
BorderStyle array or name (Optional; not inheritable; PDF 1.5) The style of an element’s border. Specifies the stroke pattern of each edge of a table cell or any element’s content rectangle (possibly adjusted by the Padding attribute). There are two forms:

- A name from the list below representing the border style to apply to all four edges.
- An array of four entries, each entry specifying the style for one edge of the border in the order of the before, after, start, and end edges. A value of null for any of the edges means that it shall not be drawn.

None   No border. Forces the computed value of BorderThickness to be 0.
Hidden   Same as None, except in terms of border conflict resolution for table elements.
Dotted  The border is a series of dots.
Dashed  The border is a series of short line segments.
Solid   The border is a single line segment.
Double   The border is two solid lines. The sum of the two lines and the space between them equals the value of BorderThickness.
Groove  The border looks as though it were carved into the canvas.
Ridge  The border looks as though it were coming out of the canvas (the opposite of Groove).
Inset   The border makes the entire box look as though it were embedded in the canvas.
Outset  The border makes the entire box look as though it were coming out of the canvas (the opposite of Inset).

Default value: None

All borders shall be drawn on top of the box’s background. The colour of borders drawn for values of Groove, Ridge, Inset, and Outset shall depend on the structure element’s BorderColor attribute and the colour of the background over which the border is being drawn.

NOTE   Conforming HTML applications may interpret Dotted, Dashed, Double, Groove, Ridge, Inset, and Outset to be Solid.
BorderThickness number or array (Optional; inheritable; PDF 1.5) The thickness of the border drawn on the edges of a table cell or any element’s content rectangle (possibly adjusted by the Padding attribute). The value of each edge shall be a positive number in default user space units representing the border’s thickness (a value of 0 indicates that the border shall not be drawn). There are two forms:
A number representing the border thickness for all four edges.
An array of four entries, each entry specifying the thickness for one edge of the border, in the order of the before, after, start, and end edges. A value of null for any of the edges means that it shall not be drawn.
Padding number or array (Optional; not inheritable; PDF 1.5) Specifies an offset to account for the separation between the element’s content rectangle and the surrounding border (see “Content and Allocation Rectangles” in 14.8.5.4, “Layout Attributes”). A positive value enlarges the background area; a negative value trims it, possibly allowing the border to overlap the element’s text or graphic.
The value shall be either a single number representing the width of the padding, in default user space units, that applies to all four sides or a 4-element array of numbers representing the padding width for the before, after, start, and end edge, respectively, of the content rectangle. Default value: 0.
Color array (Optional; inheritable; PDF 1.5) The colour to be used for drawing text and the default value for the colour of table borders and text decorations. The value shall be an array of three numbers in the range 0.0 to 1.0, representing the red, green, and blue values, respectively, of an RGB colour space. If this attribute is not specified, the border colour for this element shall be the current text fill colour in effect at the start of its associated content.

14.8.5.4.3 区块级结构的布局属性

14.8.5.4.3 Layout Attributes for BLSEs

表 344 – 仅适用于块级结构元素(BLSE)的附加标准布局属性

表 344 – 仅适用于块级结构元素(BLSE)的附加标准布局属性
类型
SpaceBefore number (可选;不可继承) 在BLSE之前边缘前的额外空间,单位为默认用户空间单位,沿块进展方向测量。该值将添加到由BLSE内第一行的ILSE的LineHeight属性引起的任何调整(参见14.8.5.4中的“内联级结构的布局属性”)。如果前一个BLSE具有SpaceAfter属性,则应使用两者属性值中的较大者。默认值:0。该属性应忽略放置在给定参考区域中的第一个BLSE。
SpaceAfter number (可选;不可继承) 在BLSE之后边缘后的额外空间,单位为默认用户空间单位,沿块进展方向测量。该值将添加到由BLSE内最后一行的ILSE的LineHeight属性引起的任何调整(参见14.8.5.4中的“内联级结构的布局属性”)。如果后续的BLSE具有SpaceBefore属性,则应使用两者属性值中的较大者。默认值:0。
该属性应忽略放置在给定参考区域中的最后一个BLSE。
StartIndent number (可选;可继承) 从参考区域的开始边缘到BLSE的开始边缘的距离,单位为默认用户空间单位,沿内联进展方向测量。该属性仅适用于Placement属性为Block或Start的结构元素(参见14.8.5.4中的“一般布局属性”)。对于其他Placement值的元素应忽略此属性。默认值:0。
该属性的负值会使BLSE的开始边缘位于参考区域的外部。结果取决于实现,某些符合Tagged PDF标准的产品或特定导出格式可能不支持此行为。
如果具有StartIndent属性的结构元素与具有Placement属性为Start的浮动元素相邻,则用于元素起始缩进的实际值应为该元素自身的StartIndent属性或相邻浮动元素的内联范围,以较大者为准。如果有TextIndent属性,则该值可能会进一步调整。
EndIndent number (可选;可继承) 从BLSE的结束边缘到参考区域的结束边缘的距离,单位为默认用户空间单位,沿内联进展方向测量。该属性仅适用于Placement属性为Block或End的结构元素(参见14.8.5.4中的“一般布局属性”)。对于其他Placement值的元素应忽略此属性。默认值:0。
该属性的负值会使BLSE的结束边缘位于参考区域的外部。结果取决于实现,某些符合Tagged PDF标准的产品或特定导出格式可能不支持此行为。
如果具有EndIndent属性的结构元素与具有Placement属性为End的浮动元素相邻,则用于元素结束缩进的实际值应为该元素自身的EndIndent属性或相邻浮动元素的内联范围,以较大者为准。
TextIndent number (可选;可继承;仅适用于某些BLSE) 从BLSE的开始边缘(由StartIndent指定)到第一行文本的额外距离,单位为默认用户空间单位,沿内联进展方向测量。负值表示悬挂缩进。默认值:0。
该属性仅适用于类似段落的BLSE,以及结构类型为Lbl(标签)、LBody(列表主体)、TH(表头)和TD(表数据)的结构元素,只要它们包含非嵌套BLSE的内容。
TextAlign name (可选;可继承;仅适用于包含文本的BLSE) 文字及其他内容在BLSE内行内的对齐方式:

Start   与起始边缘对齐。
Center   在起始和结束边缘之间居中对齐。
End   与结束边缘对齐。
Justify   与起始和结束边缘对齐,如果需要,通过扩展每行中的内部间距来实现此对齐。最后一行(或唯一一行)仅与起始边缘对齐。

默认值:Start。
BBox rectangle (可选;注释专用;对于单页上完整显示的任何图形或表格是必需的;不可继承) 一个包含四个数字的数组,分别表示元素的左、底、右、顶边缘的坐标,单位为默认用户空间单位(即,完全包围其可见内容的矩形)。该属性适用于任何位于单一页面上并占据单一矩形的元素。
Width number 或 name (可选;不可继承;仅适用于插图、表格、表头和表单单元格;应使用于表格单元格) 元素内容矩形的宽度(参见14.8.5.4中的“内容和分配矩形”),单位为默认用户空间单位,沿内联进展方向测量。该属性仅适用于结构类型为Figure、Formula、Form、Table、TH(表头)或TD(表数据)的元素。
将Auto作为数值的替代表示,表示不对宽度施加具体约束;元素的宽度应由其内容的固有宽度决定。默认值:Auto。
Height number 或 name (可选;不可继承;仅适用于插图、表格、表头和表格单元格) 元素内容矩形的高度(参见14.8.5.4中的“内容和分配矩形”),单位为默认用户空间单位,沿块进展方向测量。该属性仅适用于结构类型为Figure、Formula、Form、Table、TH(表头)或TD(表数据)的元素。
将Auto作为数值的替代表示,表示不对高度施加具体约束;元素的高度应由其内容的固有高度决定。默认值:Auto。
BlockAlign name (可选;可继承;仅适用于表格单元格) 内容在表格单元格内的对齐方式:

Before   第一子项的分配矩形的前边缘与表格单元格的内容矩形对齐。
Middle   子项在表格单元格内居中。第一个子项的分配矩形的前边缘与表格单元格内容矩形的前边缘的距离与最后一个子项的分配矩形的后边缘与表格单元格内容矩形的后边缘的距离相同。
After   最后一个子项的分配矩形的后边缘与表格单元格的内容矩形对齐。
Justify   子项分别与表格单元格内容矩形的前后边缘对齐。第一个子项按照“Before”描述的位置放置,最后一个子项按照“After”描述的位置放置,子项之间的间距相等。如果只有一个子项,则它仅与起始边缘对齐,如“Before”所述。

该属性仅适用于结构类型为TH(表头)或TD(表数据)的元素,控制给定元素的所有BLSE子项的放置。表格单元格的内容矩形(参见14.8.5.4中的“内容和分配矩形”)将成为其所有后代的参考区域。默认值:Before。
InlineAlign name (可选;可继承;仅适用于表格单元格) 内容在表格单元格内的对齐方式:

Start   每个子项的分配矩形的起始边缘与表格单元格的内容矩形的对齐。
Center   每个子项在表格单元格内居中对齐。子项分配矩形的起始边缘与表格单元格内容矩形的起始边缘的距离与子项结束边缘与表格单元格内容矩形结束边缘的距离相同。
End   每个子项的分配矩形的结束边缘与表格单元格的内容矩形的对齐。

该属性仅适用于结构类型为TH(表头)或TD(表数据)的元素,控制给定元素的所有BLSE子项的放置。表格单元格的内容矩形(参见14.8.5.4中的“内容和分配矩形”)将成为其所有后代的参考区域。默认值:Start。
TBorderStyle name 或 array (可选;可继承;PDF 1.5) 表格单元格每个边缘绘制的边框样式。允许的值应与表 343中指定的BorderStyle值相同。如果给定表格单元格的TBorderStyleBorderStyle同时适用,BorderStyle应优先。默认值:None。
TPadding integer 或 array (可选;可继承;PDF 1.5) 指定表格单元格内容矩形与周围边框之间的分离量(参见14.8.5.4中的“内容和分配矩形”)。如果给定表格单元格同时适用TPaddingPadding,则Padding应优先。正值会增大背景区域,负值会裁剪背景区域,边框可能会与元素的文本或图形重叠。该值应为一个单一数字,表示应用于表格单元格四个边缘的填充宽度,单位为默认用户空间单位,或一个包含四个数字的数组,表示内容矩形的前边缘、后边缘、开始边缘和结束边缘的填充宽度。默认值:0。

Table 344 describes layout attributes that shall apply only to block-level structure elements (BLSEs).

Inline-level structure elements (ILSEs) with a Placement attribute other than the default value of Inline shall be treated as BLSEs and shall also be subject to the attributes described here.

Table 344 – Additional standard layout attributes specific to block-level structure elements
Key Type Value
SpaceBefore number (Optional; not inheritable) The amount of extra space preceding the before edge of the BLSE, measured in default user space units in the block-progression direction. This value shall be added to any adjustments induced by the LineHeight attributes of ILSEs within the first line of the BLSE (see “Layout Attributes for ILSEs” in 14.8.5.4, “Layout Attributes”). If the preceding BLSE has a SpaceAfter attribute, the greater of the two attribute values shall be used. Default value: 0. This attribute shall be disregarded for the first BLSE placed in a given reference area.
SpaceAfter number (Optional; not inheritable) The amount of extra space following the after edge of the BLSE, measured in default user space units in the block-progression direction. This value shall be added to any adjustments induced by the LineHeight attributes of ILSEs within the last line of the BLSE (see 14.8.5.4, “Layout Attributes”). If the following BLSE has a SpaceBefore attribute, the greater of the two attribute values shall be used. Default value: 0.
This attribute shall be disregarded for the last BLSE placed in a given reference area.
StartIndent number (Optional; inheritable) The distance from the start edge of the reference area to that of the BLSE, measured in default user space units in the inline-progression direction. This attribute shall apply only to structure elements with a Placement attribute of Block or Start (see “General Layout Attributes” in 14.8.5.4, “Layout Attributes”). The attribute shall be disregarded for elements with other Placement values. Default value: 0.
A negative value for this attribute places the start edge of the BLSE outside that of the reference area. The results are implementation- dependent and may not be supported by all conforming products that process Tagged PDF or by particular export formats.
If a structure element with a StartIndent attribute is placed adjacent to a floating element with a Placement attribute of Start, the actual value used for the element’s starting indent shall be its own StartIndent attribute or the inline extent of the adjacent floating element, whichever is greater. This value may be further adjusted by the element’s TextIndent attribute, if any.
EndIndent number (Optional; inheritable) The distance from the end edge of the BLSE to that of the reference area, measured in default user space units in the inline-progression direction. This attribute shall apply only to structure elements with a Placement attribute of Block or End (see “General Layout Attributes” in 14.8.5.4, “Layout Attributes”). The attribute shall be disregarded for elements with other Placement values. Default value: 0.
A negative value for this attribute places the end edge of the BLSE outside that of the reference area. The results are implementation- dependent and may not be supported by all conforming products that process Tagged PDF or by particular export formats.
If a structure element with an EndIndent attribute is placed adjacent to a floating element with a Placement attribute of End, the actual value used for the element’s ending indent shall be its own EndIndent attribute or the inline extent of the adjacent floating element, whichever is greater.
TextIndent number (Optional; inheritable; applies only to some BLSEs) The additional distance, measured in default user space units in the inline- progression direction, from the start edge of the BLSE, as specified by StartIndent, to that of the first line of text. A negative value shall indicate a hanging indent. Default value: 0.
This attribute shall apply only to paragraphlike BLSEs and those of structure types Lbl (Label), LBody (List body), TH (Table header), and TD (Table data), provided that they contain content other than nested BLSEs.
TextAlign name (Optional; inheritable; applies only to BLSEs containing text) The alignment, in the inline-progression direction, of text and other content within lines of the BLSE:

Start  Aligned with the start edge.
Center  Centered between the start and end edges.
End  Aligned with the end edge.
Justify   Aligned with both the start and end edges, with internal spacing within each line expanded, if necessary, to achieve such alignment. The last (or only) line shall be aligned with the start edge only.

Default value: Start.
BBox rectangle (Optional for Annot; required for any figure or table appearing in its entirety on a single page; not inheritable) An array of four numbers in default user space units that shall give the coordinates of the left, bottom, right, and top edges, respectively, of the element’s bounding box (the rectangle that completely encloses its visible content). This attribute shall apply to any element that lies on a single page and occupies a single rectangle.
Width number or name (Optional; not inheritable; illustrations, tables, table headers, and table cells only; should be used for table cells) The width of the element’s content rectangle (see “Content and Allocation Rectangles” in 14.8.5.4, “Layout Attributes”), measured in default user space units in the inline-progression direction. This attribute shall apply only to elements of structure type Figure, Formula, Form, Table, TH (Table header), or TD (Table data).
The name Auto in place of a numeric value shall indicate that no specific width constraint is to be imposed; the element’s width shall be determined by the intrinsic width of its content. Default value: Auto.
Height number or name (Optional; not inheritable; illustrations, tables, table headers, and table cells only) The height of the element’s content rectangle (see “Content and Allocation Rectangles” in 14.8.5.4, “Layout Attributes”), measured in default user space units in the block-progression direction. This attribute shall apply only to elements of structure type Figure, Formula, Form, Table, TH (Table header), or TD (Table data).
The name Auto in place of a numeric value shall indicate that no specific height constraint is to be imposed; the element’s height shall be determined by the intrinsic height of its content. Default value: Auto.
BlockAlign name (Optional; inheritable; table cells only) The alignment, in the block-progression direction, of content within the table cell:

Before   Before edge of the first child’s allocation rectangle aligned with that of the table cell’s content rectangle.
Middle   Children centered within the table cell. The distance between the before edge of the first child’s allocation rectangle and that of the table cell’s content rectangle shall be the same as the distance between the after edge of the last child’s allocation rectangle and that of the table cell’s content rectangle.
After   After edge of the last child’s allocation rectangle aligned with that of the table cell’s content rectangle.
Justify   Children aligned with both the before and after edges of the table cell’s content rectangle. The first child shall be placed as described for Before and the last child as described for After, with equal spacing between the children. If there is only one child, it shall be aligned with the before edge only, as for Before.

This attribute shall apply only to elements of structure type TH (Table header) or TD (Table data) and shall control the placement of all BLSEs that are children of the given element. The table cell’s content rectangle (see “Content and Allocation Rectangles” in 14.8.5.4, “Layout Attributes”) shall become the reference area for all of its descendants. Default value: Before.
InlineAlign name (Optional; inheritable; table cells only) The alignment, in the inline-progression direction, of content within the table cell:

Start   Start edge of each child’s allocation rectangle aligned with that of the table cell’s content rectangle.
Center   Each child centered within the table cell. The distance between the start edges of the child’s allocation rectangle and the table cell’s content rectangle shall be the same as the distance between their end edges.
End   End edge of each child’s allocation rectangle aligned with that of the table cell’s content rectangle.

This attribute shall apply only to elements of structure type TH (Table header) or TD (Table data) and controls the placement of all BLSEs that are children of the given element. The table cell’s content rectangle (see “Content and Allocation Rectangles” in 14.8.5.4, “Layout Attributes”) shall become the reference area for all of its descendants. Default value: Start.
TBorderStyle name or array (Optional; inheritable; PDF 1.5) The style of the border drawn on each edge of a table cell. Allowed values shall be the same as those specified for BorderStyle (see Table 343). If both TBorderStyle and BorderStyle apply to a given table cell, BorderStyle shall supersede TBorderStyle. Default value: None.
TPadding integer or array (Optional; inheritable; PDF 1.5) Specifies an offset to account for the separation between the table cell’s content rectangle and the surrounding border (see “Content and Allocation Rectangles” in 14.8.5.4, “Layout Attributes”). If both TPadding and Padding apply to a given table cell, Padding shall supersede TPadding. A positive value shall enlarge the background area; a negative value shall trim it, and the border may overlap the element’s text or graphic. The value shall be either a single number representing the width of the padding, in default user space units, that applies to all four edges of the table cell or a 4-entry array representing the padding width for the before edge, after edge, start edge, and end edge, respectively, of the content rectangle. Default value: 0.

14.8.5.4.4 内联级结构的布局属性

14.8.5.4.4 Layout Attributes for ILSEs

描述在表345中的属性适用于行内级结构元素(ILSE)。它们也可以为块级元素(BLSE)指定,并可能适用于其直接子元素的任何内容项。

表345 – 特定于行内级结构元素的标准布局属性
类型
BaselineShift number (可选;不可继承) 元素基线相对于其父元素基线的偏移量,单位为默认用户空间单位。偏移方向应与当前WritingMode属性指定的块进程方向相反(见14.8.5.4节“通用布局属性”)。因此,正值将基线偏移到参考区域的前缘,负值则偏移到参考区域的后缘(在西方书写系统中分别是向上和向下)。默认值:0。

偏移的元素可以是上标、下标或行内图形。偏移应应用于该元素、其内容及所有后代元素。任何应用于此元素子元素的基线偏移应相对于此(父)元素的偏移基线进行度量。
LineHeight number 或 name (可选;可继承) 元素的首选高度,单位为默认用户空间单位,按块进程方向测量。行高由该元素包含的任何完整或部分ILSE的最大LineHeight值决定。
如果使用名称 Normal 或 Auto 代替数字值,则表示不对高度施加特定限制。元素的高度应根据内容的字体大小设置为合理值:

Normal   调整行高以包含任何非零值指定的BaselineShift
Auto   不会调整BaselineShift的值。

默认值:Normal。

此属性适用于所有ILSE(包括隐式的)作为此元素的子元素或其嵌套ILSE的子元素。它不适用于嵌套的BLSE。
当翻译为特定导出格式时,如果目标格式中可用,则应直接使用Normal和Auto值(如果指定)。术语“合理值”的含义留给符合规范的读者来确定。它应大约是字体大小的1.2倍,但该值可能会根据导出格式有所变化。

注意1   在没有数字值LineHeight或明确字体大小的情况下,从带标签PDF文件中的信息计算行高的合理方法是找到关联字体的AscentDescent值之间的差异(见9.8节“字体描述符”),将其从字形空间映射到默认用户空间(见9.4.4节“文本空间详细信息”),并使用该行中任何字符的最大结果值。
TextDecorationColor array (可选;可继承;PDF 1.5) 用于绘制文本装饰的颜色。值应为一个包含三个数字的数组,表示RGB颜色空间中红色、绿色和蓝色的值,范围为0.0到1.0。如果未指定此属性,则该元素的边框颜色应为其关联内容开始时的当前填充颜色。
TextDecorationThickness number (可选;可继承;PDF 1.5) 作为文本装饰一部分绘制的每条线的粗细。值应为一个非负数字,单位为默认用户空间单位,表示线条的粗细(0被解释为最细的线)。如果未指定此属性,则应根据元素关联内容开始时的当前描边厚度推导出该值,并转换为默认用户空间单位。
TextDecorationType name (可选;不可继承) 应用到元素文本的文本装饰(如果有)。

None   无文本装饰
Underline   在文本下方画一条线
Overline   在文本上方画一条线
LineThrough   在文本中间画一条线

默认值:None。
此属性应应用于所有作为此元素的子元素或其嵌套ILSE的子元素的文本内容项。该属性不适用于嵌套的BLSE或非文本内容项。
装饰的颜色、位置和粗细应在所有子元素中保持一致,无论内容的文本特性(如颜色、字体大小等)发生何种变化。
RubyAlign name (可选;可继承;PDF 1.5) ruby组装中的行对齐方式:

Start   内容应在行内进程方向的起始边对齐。
Center   内容应在行内进程方向上居中对齐。
End   内容应在行内进程方向的结束边对齐。
Justify   内容应扩展以填充行内进程方向上的可用宽度。
Distribute   内容应扩展以填充行内进程方向上的可用宽度,但文本的起始边和结束边应插入空格。空格应按照1:2:1(起始:中间:结束)比例分配。如果ruby出现在文本行的开始,则比例应改为0:1:1;如果ruby出现在文本行的末尾,则比例应改为1:1:0。

默认值:Distribute。
此属性可以在RB和RT元素上指定。格式化ruby时,此属性应应用于这两个元素中较短的一行。(如果RT元素的宽度小于RB元素的宽度,则RT元素应按其RubyAlign属性指定的方式对齐。)
RubyPosition name (可选;可继承;PDF 1.5) 在ruby组装中,RT结构元素相对于RB元素的位置:

Before   RT内容应沿元素的前缘对齐。
After   RT内容应沿元素的后缘对齐。
Warichu   RT和关联的RP元素应作为warichu格式化,紧跟在RB元素后面。
Inline   RT和关联的RP元素应作为括号注释格式化,紧跟在RB元素后面。

默认值:Before。
GlyphOrientationVertical name (可选;可继承;PDF 1.5) 指定在行内进程方向为从上到下或从下到上的情况下,字形的方向。
此属性可以采用以下值之一:

angle 代表字形顶部相对于参考区域顶部的顺时针旋转角度,值应为-180到+360之间的90度倍数。
Auto   根据文本是否是全宽(宽度与高度相同)来指定默认的文本方向。全宽拉丁字母和全宽表意文字(不包括表意标点)应设置为0度。表意标点和具有水平和垂直形式的其他表意字符应使用字形的垂直形式。非全宽文本应设置为90度。
默认值:Auto。
注意2   此属性最常用于区分在垂直书写的日文文档中字母(非表意文字)文本的首选方向(Auto或90)与西方标志和广告中文本(90)的方向。
此属性将影响字形的对齐和宽度。如果字形垂直于基线,其水平对齐点应与字形所属脚本的对齐基线对齐。字形区域的宽度应根据字形的水平宽度字体特征确定。

The attributes described in Table 345 apply to inline-level structure elements (ILSEs). They may also be specified for a block-level element (BLSE) and may apply to any content items that are its immediate children.

Table 345 – Standard layout attributes specific to inline-level structure elements
Key Type Value
BaselineShift number (Optional; not inheritable) The distance, in default user space units, by which the element’s baseline shall be shifted relative to that of its parent element. The shift direction shall be the opposite of the block-progression direction specified by the prevailing WritingMode attribute (see “General Layout Attributes” in 14.8.5.4, “Layout Attributes”). Thus, positive values shall shift the baseline toward the before edge and negative values toward the after edge of the reference area (upward and downward, respectively, in Western writing systems). Default value: 0.

The shifted element may be a superscript, a subscript, or an inline graphic. The shift shall apply to the element, its content, and all of its descendants. Any further baseline shift applied to a child of this element shall be measured relative to the shifted baseline of this (parent) element.
LineHeight number or name (Optional; inheritable) The element’s preferred height, measured in default user space units in the block-progression direction. The height of a line shall be determined by the largest LineHeight value for any complete or partial ILSE that it contains.
The name Normal or Auto in place of a numeric value shall indicate that no specific height constraint is to be imposed. The element’s height shall be set to a reasonable value based on the content’s font size:

Normal   Adjust the line height to include any nonzero value specified for BaselineShift.
Auto   Adjustment for the value of BaselineShift shall not be made.

Default value: Normal.

This attribute applies to all ILSEs (including implicit ones) that are children of this element or of its nested ILSEs, if any. It shall not apply to nested BLSEs.
When translating to a specific export format, the values Normal and Auto, if specified, shall be used directly if they are available in the target format. The meaning of the term “reasonable value” is left to the conforming reader to determine. It should be approximately 1.2 times the font size, but this value can vary depending on the export format.

NOTE 1   In the absence of a numeric value for LineHeight or an explicit value for the font size, a reasonable method of calculating the line height from the information in a Tagged PDF file is to find the difference between the associated font’s Ascent and Descent values (see 9.8, “Font Descriptors”), map it from glyph space to default user space (see 9.4.4, “Text Space Details”), and use the maximum resulting value for any character in the line.
TextDecorationColor array (Optional; inheritable; PDF 1.5) The colour to be used for drawing text decorations. The value shall be an array of three numbers in the range 0.0 to 1.0, representing the red, green, and blue values, respectively, of an RGB colour space. If this attribute is not specified, the border colour for this element shall be the current fill colour in effect at the start of its associated content.
TextDecorationThickness number (Optional; inheritable; PDF 1.5) The thickness of each line drawn as part of the text decoration. The value shall be a non-negative number in default user space units representing the thickness (0 is interpreted as the thinnest possible line). If this attribute is not specified, it shall be derived from the current stroke thickness in effect at the start of the element’s associated content, transformed into default user space units.
TextDecorationType name (Optional; not inheritable) The text decoration, if any, to be applied to the element’s text.

None   No text decoration
Underline   A line below the text
Overline   A line above the text
LineThrough   A line through the middle of the text

Default value: None.
This attribute shall apply to all text content items that are children of this element or of its nested ILSEs, if any. The attribute shall not apply to nested BLSEs or to content items other than text.
The colour, position, and thickness of the decoration shall be uniform across all children, regardless of changes in colour, font size, or other variations in the content’s text characteristics.
RubyAlign name (Optional; inheritable; PDF 1.5) The justification of the lines within a ruby assembly:

Start   The content shall be aligned on the start edge in the inline-progression direction.
Center   The content shall be centered in the inline-progression direction.
End   The content shall be aligned on the end edge in the inline-progression direction.
Justify   The content shall be expanded to fill the available width in the inline-progression direction.
Distribute   The content shall be expanded to fill the available width in the inline-progression direction. However, space shall also be inserted at the start edge and end edge of the text. The spacing shall be distributed using a 1:2:1 (start:infix:end) ratio. It shall be changed to a 0:1:1 ratio if the ruby appears at the start of a text line or to a 1:1:0 ratio if the ruby appears at the end of the text line.

Default value: Distribute.
This attribute may be specified on the RB and RT elements. When a ruby is formatted, the attribute shall be applied to the shorter line of these two elements. (If the RT element has a shorter width than the RB element, the RT element shall be aligned as specified in its RubyAlign attribute.)
RubyPosition name (Optional; inheritable; PDF 1.5) The placement of the RT structure element relative to the RB element in a ruby assembly:

Before   The RT content shall be aligned along the before edge of the element.
After   The RT content shall be aligned along the after edge of the element.
Warichu   The RT and associated RP elements shall be formatted as a warichu, following the RB element.
Inline   The RT and associated RP elements shall be formatted as a parenthesis comment, following the RB element.

Default value: Before.
GlyphOrientationVertical name (Optional; inheritable; PDF 1.5) Specifies the orientation of glyphs when the inline-progression direction is top to bottom or bottom to top.
This attribute may take one of the following values:

angle A number representing the clockwise rotation in degrees of the top of the glyphs relative to the top of the reference area. Shall be a multiple of 90 degrees between -180 and +360.
AutoSpecifies a default orientation for text, depending on whether it is fullwidth (as wide as it is high). Fullwidth Latin and fullwidth ideographic text (excluding ideographic punctuation) shall be set with an angle of 0. Ideographic punctuation and other ideographic characters having alternate horizontal and vertical forms shall use the vertical form of the glyph. Non-fullwidth text shall be set with an angle of 90.
Default value: Auto.
NOTE 2   This attribute is used most commonly to differentiate between the preferred orientation of alphabetic (non- ideographic) text in vertically written Japanese documents (Auto or 90) and the orientation of the ideographic characters and/or alphabetic (non- ideographic) text in western signage and advertising (90).
This attribute shall affect both the alignment and width of the glyphs. If a glyph is perpendicular to the vertical baseline, its horizontal alignment point shall be aligned with the alignment baseline for the script to which the glyph belongs. The width of the glyph area shall be determined from the horizontal width font characteristic for the glyph.

14.8.5.4.5 内容和分配矩形

14.8.5.4.5 Content and Allocation Rectangles

根据14.8.3 “基本布局模型”的定义,元素的内容矩形是一个包围矩形,它由元素内容的形状派生,定义了用于布局任何包含子元素的边界。分配矩形包括元素周围的任何额外边框或间距,影响元素在相邻元素和包围内容矩形或参考区域中的定位方式。

内容矩形的精确定义取决于元素的结构类型:

  • 对于表格单元格(结构类型 TH 或 TD),内容矩形应由单元格内容中所有图形对象的边界框决定,并考虑任何显式边界框(例如表单 XObject 中的BBox条目)。此隐式大小可以通过单元格的宽度高度属性显式覆盖。单元格的高度应调整为该行中任何单元格的最大高度;单元格的宽度应调整为该列中任何单元格的最大宽度。
  • 对于任何其他 BLSE,内容矩形的高度应为其包含的所有 BLSE 的高度总和,再加上这些元素之间的任何附加间距调整。
  • 对于包含文本的 ILSE,内容矩形的高度应由行高属性设置。宽度应通过汇总包含字符的宽度来确定,并调整任何缩进、字距、词距或行尾条件。
  • 对于包含插图或表格的 ILSE,内容矩形应由内容中所有图形对象的边界框决定,并考虑任何显式边界框(例如表单 XObject 中的BBox条目)。此隐式大小可以通过元素的宽度高度属性显式覆盖。
  • 对于包含混合元素的 ILSE,内容矩形的高度应通过根据文本基线(对于文本 ILSE)或结束边缘(对于非文本 ILSE)对齐子对象,并考虑适用的基线移位属性(对于所有 ILSE)来确定,并查找所有元素的极端顶部和底部。

注意

一些符合要求的阅读器可能会将此过程应用于块内的所有元素;其他阅读器可能会逐行应用该过程。

分配矩形应根据结构类型的不同从内容矩形派生:

  • 对于 BLSE,分配矩形应等于内容矩形,并根据元素的上边距下边距属性(如果有)调整其前后边缘,但不改变起始和结束边缘。
  • 对于 ILSE,分配矩形与内容矩形相同。

As defined in 14.8.3, “Basic Layout Model,” an element’s content rectangle is an enclosing rectangle derived from the shape of the element’s content, which shall define the bounds used for the layout of any included child elements. The allocation rectangle includes any additional borders or spacing surrounding the element, affecting how it shall be positioned with respect to adjacent elements and the enclosing content rectangle or reference area.

The exact definition of the content rectangle shall depend on the element’s structure type:

  • For a table cell (structure type TH or TD), the content rectangle shall be determined from the bounding box of all graphics objects in the cell’s content, taking into account any explicit bounding boxes (such as the BBox entry in a form XObject). This implied size may be explicitly overridden by the cell’s Width and Height attributes. The cell’s height shall be adjusted to equal the maximum height of any cell in its row; its width shall be adjusted to the maximum width of any cell in its column.
  • For any other BLSE, the height of the content rectangle shall be the sum of the heights of all BLSEs it contains, plus any additional spacing adjustments between these elements.
  • For an ILSE that contains text, the height of the content rectangle shall be set by the LineHeight attribute. The width shall be determined by summing the widths of the contained characters, adjusted for any indents, letter spacing, word spacing, or line-end conditions.
  • For an ILSE that contains an illustration or table, the content rectangle shall be determined from the bounding box of all graphics objects in the content, and shall take into account any explicit bounding boxes (such as the BBox entry in a form XObject). This implied size may be explicitly overridden by the element’s Width and Height attributes.
  • For an ILSE that contains a mixture of elements, the height of the content rectangle shall be determined by aligning the child objects relative to one another based on their text baseline (for text ILSEs) or end edge (for non-text ILSEs), along with any applicable BaselineShift attribute (for all ILSEs), and finding the extreme top and bottom for all elements.

NOTE

Some conforming readers may apply this process to all elements within the block; others may apply it on a line-by-line basis.

The allocation rectangle shall be derived from the content rectangle in a way that also depends on the structure type:

  • For a BLSE, the allocation rectangle shall be equal to the content rectangle with its before and after edges adjusted by the element’s SpaceBefore and SpaceAfter attributes, if any, but with no changes to the start and end edges.
  • For an ILSE, the allocation rectangle is the same as the content rectangle.

14.8.5.4.6 插图属性

14.8.5.4.6 Illustration Attributes

插图元素(结构类型为图形、公式或表单)在特定使用情况下将有额外的限制:

  • 当插图元素的Placement属性为块(Block)时,它必须具有显式指定数值的高度属性(不能为自动)。此数值应是插图在块进展方向上扩展的唯一信息来源。
  • 当插图元素的Placement属性为行内(Inline)时,它必须具有显式指定数值的宽度属性(不能为自动)。此数值应是插图在行内进展方向上扩展的唯一信息来源。
  • 当插图元素的Placement属性为行内(Inline)、开始(Start)或结束(End)时,其基线移位BaselineShift)属性的值应用于确定其结束边缘相对于文本基线的位置;对于其他Placement属性值,基线移位应被忽略。(具有Placement属性值为开始(Start)的插图元素可用于创建下沉大写字母;具有Placement属性值为行内(Inline)的插图元素可用于创建上升大写字母。)

Particular uses of illustration elements (structure types Figure, Formula, or Form) shall have additional restrictions:

  • When an illustration element has a Placement attribute of Block, it shall have a Height attribute with an explicitly specified numerical value (not Auto). This value shall be the sole source of information about the illustration’s extent in the block-progression direction.
  • When an illustration element has a Placement attribute of Inline, it shall have a Width attribute with an explicitly specified numerical value (not Auto). This value shall be the sole source of information about the illustration’s extent in the inline-progression direction.
  • When an illustration element has a Placement attribute of Inline, Start, or End, the value of its BaselineShift attribute shall be used to determine the position of its after edge relative to the text baseline; BaselineShift shall be ignored for all other values of Placement. (An illustration element with a Placement value of Start may be used to create a dropped capital; one with a Placement value of Inline may be used to create a raised capital.)

14.8.5.4.7 列属性

14.8.5.4.7 Column Attributes

表格346中描述的属性适用于分组元素Art、Sect和Div(参见14.8.4.2,“分组元素”)。这些属性在分组元素的内容被划分为列时使用。

表格346 – 标准列属性
类型
ColumnCount 整数 (可选;不可继承;PDF 1.6)分组元素内容中的列数。默认值:1。
ColumnGap 数字或数组 (可选;不可继承;PDF 1.6)相邻列之间的期望间距,以默认用户空间单位在行内进展方向上进行测量。如果值为数字,则指定所有列之间的间距。如果值为数组,则应包含数字,第一个元素指定第一列和第二列之间的间距,第二个指定第二列和第三列之间的间距,依此类推。如果数组中的元素少于ColumnCount - 1,则最后一个元素应指定所有剩余的间距;如果元素多于ColumnCount - 1,则多余的数组元素应被忽略。
ColumnWidths 数字或数组 (可选;不可继承;PDF 1.6)列的期望宽度,以默认用户空间单位在行内进展方向上进行测量。如果值为数字,则指定所有列的宽度。如果值为数组,则应包含数字,表示每一列的宽度,按顺序排列。如果数组中的元素少于ColumnCount,则最后一个元素应指定所有剩余的宽度;如果元素多于ColumnCount,则多余的数组元素应被忽略。

The attributes described in Table 346 shall be present for the grouping elements Art, Sect, and Div (see 14.8.4.2, “Grouping Elements”). They shall be used when the content in the grouping element is divided into columns.

Table 346 – Standard column attributes
Key Type Value
ColumnCount integer (Optional; not inheritable; PDF 1.6) The number of columns in the content of the grouping element. Default value: 1.
ColumnGap number or array (Optional; not inheritable; PDF 1.6) The desired space between adjacent columns, measured in default user space units in the inline-progression direction. If the value is a number, it specifies the space between all columns. If the value is an array, it should contain numbers, the first element specifying the space between the first and second columns, the second specifying the space between the second and third columns, and so on. If there are fewer than ColumnCount - 1 numbers, the last element shall specify all remaining spaces; if there are more than ColumnCount - 1 numbers, the excess array elements shall be ignored.
ColumnWidths number or array (Optional; not inheritable; PDF 1.6) The desired width of the columns, measured in default user space units in the inline-progression direction. If the value is a number, it specifies the width of all columns. If the value is an array, it shall contain numbers, representing the width of each column, in order. If there are fewer than ColumnCount numbers, the last element shall specify all remaining widths; if there are more than ColumnCount numbers, the excess array elements shall be ignored.

14.8.5.5 列表属性

14.8.5.5 List Attribute

如果存在,ListNumbering 属性(如 表格347 所述)应出现在L(列表)元素中。它控制在列表的LI(列表项)元素中对Lbl(标签)元素的解释(参见14.8.4.3中的“列表元素”)。此属性仅可以在其O(所有者)条目的值为List 或是表格341中列出的格式特定所有者名称之一的属性对象中定义。

表格347 – 标准列表属性
类型
ListNumbering 名称 (可选;可继承)用于生成自动编号列表中的Lbl(标签)元素内容的编号系统,或者用于标识无编号列表中每个项目的符号。ListNumbering 的值应为以下之一,并按照此处描述的方式应用:

None   无自动编号;Lbl元素(如果存在)包含不受任何编号方案约束的任意文本

Disc   实心圆形项目符号

Circle   空心圆形项目符号

Square   实心方形项目符号

Decimal   十进制阿拉伯数字(1–9,10–99,…)

UpperRoman   大写罗马数字(I,II,III,IV,…)

LowerRoman   小写罗马数字(i,ii,iii,iv,…)

UpperAlpha   大写字母(A,B,C,…)

LowerAlpha   小写字母(a,b,c,…)

默认值:None。

UpperAlpha和LowerAlpha使用的字母表应由当前的Lang条目确定(参见14.9.2,“自然语言规范”)。

随着Unicode识别更多的编号系统,可能会扩展可能的值集。符合规范的阅读器应忽略此表中未列出的任何值;它应表现得好像该值为None。

NOTE

此属性用于允许内容提取工具自动编号列表。然而,表格中的Lbl元素应包含生成的数字,以便文档可以重新流动或打印,而无需依赖自动编号。

If present, the ListNumbering attribute, described in Table 347, shall appear in an L (List) element. It controls the interpretation of the Lbl (Label) elements within the list’s LI (List item) elements (see “List Elements” in 14.8.4.3, “Block-Level Structure Elements”). This attribute may only be defined in attribute objects whose O (owner) entry has the value List or is one of the format-specific owner names listed in Table 341.

Table 347 – Standard list attribute
Key Type Value
ListNumbering name (Optional; inheritable) The numbering system used to generate the content of the Lbl (Label) elements in an autonumbered list, or the symbol used to identify each item in an unnumbered list. The value of the ListNumbering shall be one of the following, and shall be applied as described here.

None   No autonumbering; Lbl elements (if present) contain arbitrary text not subject to any numbering scheme

Disc   Solid circular bullet

Circle   Open circular bullet

Square   Solid square bullet

Decimal   Decimal arabic numerals (1–9, 10–99, … )

UpperRoman   Uppercase roman numerals (I, II, III, IV, … )

LowerRoman   Lowercase roman numerals (i, ii, iii, iv, … )

UpperAlpha   Uppercase letters (A, B, C, … )

LowerAlpha   Lowercase letters (a, b, c, … )

Default value: None.

The alphabet used for UpperAlpha and LowerAlpha shall be determined by the prevailing Lang entry (see 14.9.2, “Natural Language Specification”).

The set of possible values may be expanded as Unicode identifies additional numbering systems. A conforming reader shall ignore any value not listed in this table; it shall behave as though the value were None.

NOTE

This attribute is used to allow a content extraction tool to autonumber a list. However, the Lbl elements within the table should nevertheless contain the resulting numbers explicitly, so that the document can be reflowed or printed without the need for autonumbering.

14.8.5.6 打印字段属性

14.8.5.6 PrintField Attributes

(PDF 1.7) 在表格348中描述的属性标识了非交互式PDF表单中字段的角色。这些表单可能最初包含了交互式字段,如文本字段和单选按钮,但随后被转换为非交互式PDF文件,或者它们可能是设计为打印出来并手动填写的。由于字段的角色无法从交互式元素中确定,因此使用PrintField属性定义这些角色。

NOTE

PrintField属性使屏幕阅读器能够识别表示表单字段(如单选按钮、复选框、按钮和文本字段)的页面内容。这些属性使得打印表单字段中的控件能够在逻辑结构树中呈现,并且能够被辅助技术呈现为只读交互式字段。

表格348 – PrintField属性
类型
Role 名称 (可选;不可继承)此图形表示的表单字段类型。Role的值应为以下之一,符合标准的阅读器应按照此处定义的含义进行解释:

rb   单选按钮

cb   复选框

pb   按钮

tv   文本值字段

tv角色应用于那些其值已被转换为文本的交互式字段,文本将成为Form元素的内容(见表格340)。

NOTE 1   示例包括文本编辑字段、数字字段、密码字段、数字签名和组合框。

默认值:未指定。
checked 名称 (可选;不可继承)单选按钮或复选框字段的状态。值应为:onoff(默认)或neutral

NOTE 2   此键的大小写不符合标准中其他地方使用的约定。
Desc 文本字符串 (可选;不可继承)字段的备用名称。

NOTE 3   类似于交互式字段字典中的TU条目提供的值(见表格220)。

(PDF 1.7) The attributes described in Table 348 identify the role of fields in non-interactive PDF forms. Such forms may have originally contained interactive fields such as text fields and radio buttons but were then converted into non-interactive PDF files, or they may have been designed to be printed out and filled in manually. Since the roles of the fields cannot be determined from interactive elements, the roles are defined using PrintField attributes.

NOTE

PrintField attributes enable screen readers to identify page content that represents form fields (radio buttons, check boxes, push buttons, and text fields). These attributes enable the controls in print form fields to be represented in the logical structure tree and to be presented to assistive technology as if they were read-only interactive fields.

Table 348 – PrintField attributes
Key Type Value
Role name (Optional; not inheritable) The type of form field represented by this graphic. The value of Role shall be one of the following, and a conforming reader shall interpret its meaning as defined herein.

rb   Radio button

cb   Check box

pb   Push button

tv   Text-value field

The tv role shall be used for interactive fields whose values have been converted to text in the non-interactive document. The text that is the value of the field shall be the content of the Form element (see Table 340).

NOTE 1   Examples include text edit fields, numeric fields, password fields, digital signatures, and combo boxes.

Default value: None specified.
checked name (Optional; not inheritable) The state of a radio button or check box field. The value shall be one of: on, off (default), or neutral.

NOTE 2   The case (capitalization) used for this key does not conform to the same conventions used elsewhere in this standard.
Desc text string (Optional; not inheritable) The alternate name of the field.

NOTE 3   Similar to the value supplied in the TU entry of the field dictionary for interactive fields (see Table 220).

14.8.5.7 表格属性

14.8.5.7 Table Attributes

O(所有者)条目的值在 Table 属性元素中应为 Table 或 表 341 中列出的某个特定格式的所有者名称。

表 349 – 标准表格属性
类型
RowSpan 整数 (可选;不可继承)指定要跨越的外部表格中的行数。单元格应通过在由表格的 WritingMode 属性指定的块进程方向上添加行来扩展。如果此条目缺失,符合规范的读取器应假设值为 1。

仅当表格单元格的结构类型为 THTD,或其角色映射到结构类型 THTD 时(请参见 表 337),才应使用此条目。
ColSpan 整数 (可选;不可继承)指定要跨越的外部表格中的列数。单元格应通过在由表格的 WritingMode 属性指定的行内进程方向上添加列来扩展。如果此条目缺失,符合规范的读取器应假设值为 1。

仅当表格单元格的结构类型为 THTD,或其角色映射到结构类型 THTD 时(请参见 表 337),才应使用此条目。
Headers 数组 (可选;不可继承;PDF 1.5)一个字节字符串数组,其中每个字符串应为一个 TH 结构元素的元素标识符(请参见 表 323 中的 ID 条目),该元素应作为与此单元格关联的标题。

此属性适用于标题单元格(TH)以及数据单元格(TD)(请参见 表 337)。因此,任何单元格关联的标题应为其 Headers 数组中的标题,加上任何该数组中 TH 单元格的 Headers 数组中的标题,依此类推递归进行。
Scope 名称 (可选;不可继承;PDF 1.5)一个名称,其值应为以下之一:RowColumnBoth。此属性仅在元素的结构类型为 TH 时使用(请参见 表 337)。它应反映该标题单元格是否适用于包含它的行中的其余单元格、包含它的列,或包含它的行和列。
Summary 文本字符串 (可选;不可继承;PDF 1.7)表格目的和结构的摘要。此条目仅在 Table 结构元素中使用(请参见 表 337)。
注   用于非视觉渲染,如语音或盲文。

The value of the O (owner) entry of a Table attributes element shall be Table or one of the format-specific owner names listed in Table 341.

Table 349 – Standard table attributes
Key Type Value
RowSpan integer (Optional; not inheritable) The number of rows in the enclosing table that shall be spanned by the cell. The cell shall expand by adding rows in the block-progression direction specified by the table’s WritingMode attribute. If this entry is absent, a conforming reader shall assume a value of 1.

This entry shall only be used when the table cell has a structure type of TH or TD or one that is role mapped to structure type TH or TD (see Table 337).
ColSpan integer (Optional; not inheritable) The number of columns in the enclosing table that shall be spanned by the cell. The cell shall expand by adding columns in the inline-progression direction specified by the table’s WritingMode attribute. If this entry is absent, a conforming reader shall assume a value of 1

This entry shall only be used when the table cell has a structure type of TH or TD or one that is role mapped to structure types TH or TD (see Table 337).
Headers array (Optional; not inheritable; PDF 1.5) An array of byte strings, where each string shall be the element identifier (see the ID entry in Table 323) for a TH structure element that shall be used as a header associated with this cell.

This attribute may apply to header cells (TH) as well as data cells (TD) (see Table 337). Therefore, the headers associated with any cell shall be those in its Headers array plus those in the Headers array of any TH cells in that array, and so on recursively.
Scope name (Optional; not inheritable; PDF 1.5) A name whose value shall be one of the following: Row, Column, or Both. This attribute shall only be used when the structure type of the element is TH. (see Table 337). It shall reflect whether the header cell applies to the rest of the cells in the row that contains it, the column that contains it, or both the row and the column that contain it.
Summary text string (Optional; not inheritable; PDF 1.7) A summary of the table’s purpose and structure. This entry shall only be used within Table structure elements (see Table 337).
NOTE   For use in non-visual rendering such as speech or braille