跳转至

14.6 标记内容

14.6 Marked Content

14.6.1 概述

14.6.1 General

标记内容操作符(PDF 1.2)可用于标识 PDF 内容流中的某个部分,使其成为特定符合标准的产品感兴趣的标记内容元素。标记内容元素及其标记操作符可分为以下两类:

  • MPDP 操作符用于在内容流中指定一个 标记内容点
  • BMCBDCEMC 操作符用于在内容流中括起一个 标记内容序列

注意 1

该序列不仅仅是内容流中的字节序列,而是完整的图形对象序列。每个对象都由其渲染时的图形状态参数完全限定。

注意 2

例如,图形应用程序可能会使用标记内容将一组相关对象标识为一个整体,以便作为单元处理。文本处理应用程序可能会利用标记内容来维护正文中的脚注标记与页面底部对应脚注文本之间的关联。PDF 的逻辑结构功能利用标记内容序列将图形内容与结构元素关联(参见 14.7.4,“结构内容”)。表 320 总结了标记内容操作符。

EMC 之外的所有标记内容操作符都应带有一个标签(tag)操作数,用于指示符合标准的阅读器对该标记内容元素的角色或意义。所有此类标签都应向 Adobe Systems 注册(参见 附录 E),以避免不同应用程序在同一内容流上标记冲突。除标签操作数外,DPBDC 操作符还应指定一个属性列表,其中包含与标记内容相关的额外信息。有关属性列表的更多内容,请参见 14.6.2,“属性列表”。

标记内容操作符只能出现在内容流中的图形对象之间,不得出现在图形对象内部,或图形状态操作符与其操作数之间。标记内容序列可以相互嵌套,但每个序列必须完全包含在单个内容流内。

注意 3

标记内容序列不得跨页。

注意 4

页面对象的 Contents 条目(参见 7.7.3.3,“页面对象”),可为单个流或流数组,但在标记内容序列的上下文中被视为单个流。

表 320 – 标记内容操作符
操作数 操作符 描述
标签(tag) MP 指定一个标记内容点。标签应为名称对象,指示该点的角色或意义。
标签(tag) 属性(properties) DP 指定带有属性列表的标记内容点。标签应为名称对象,指示该点的角色或意义。属性应为一个内联字典(包含属性列表)或当前资源字典的 Properties 子字典中的名称对象(参见 14.6.2,“属性列表”)。
标签(tag) BMC 开始一个由 EMC 操作符终止的标记内容序列。标签应为名称对象,指示该序列的角色或意义。
标签(tag) 属性(properties) BDC 开始一个带有属性列表的标记内容序列,由 EMC 操作符终止。标签应为名称对象,指示该序列的角色或意义。属性应为一个内联字典(包含属性列表)或当前资源字典的 Properties 子字典中的名称对象(参见 14.6.2,“属性列表”)。
-- EMC 结束由 BMCBDC 操作符开始的标记内容序列。

当标记内容操作符 BMCBDCEMC 与文本对象操作符 BTET(参见 9.4,“文本对象”)结合使用时,每对匹配的操作符(BMCEMCBDCEMCBTET)应当正确(分别)嵌套。因此,以下序列是有效的:

BMC                BT
    BT                 BMC
    ...      和       ...
    ET                 EMC
EMC                ET

但以下序列无效:

BMC                BT
    BT                 BMC
    ...      和       ...
    EMC                ET
BT                 EMC

Marked-content operators (PDF 1.2) may identify a portion of a PDF content stream as a marked-content element of interest to a particular conforming product. Marked-content elements and the operators that mark them shall fall into two categories:

  • The MP and DP operators shall designate a single marked-content point in the content stream.
  • The BMC, BDC, and EMC operators shall bracket a marked-content sequence of objects within the content stream.

NOTE 1

This is a sequence not simply of bytes in the content stream but of complete graphics objects. Each object is fully qualified by the parameters of the graphics state in which it is rendered.

NOTE 2

A graphics application, for example, might use marked content to identify a set of related objects as a group to be processed as a single unit. A text-processing application might use it to maintain a connection between a footnote marker in the body of a document and the corresponding footnote text at the bottom of the page. The PDF logical structure facilities use marked-content sequences to associate graphical content with structure elements (see 14.7.4, “Structure Content”). Table 320 summarizes the marked-content operators.

All marked-content operators except EMC shall take a tag operand indicating the role or significance of the marked-content element to the conforming reader. All such tags shall be registered with Adobe Systems (see Annex E) to avoid conflicts between different applications marking the same content stream. In addition to the tag operand, the DP and BDC operators shall specify a property list containing further information associated with the marked content. Property lists are discussed further in 14.6.2, “Property Lists.”

Marked-content operators may appear only between graphics objects in the content stream. They may not occur within a graphics object or between a graphics state operator and its operands. Marked-content sequences may be nested one within another, but each sequence shall be entirely contained within a single content stream.

NOTE 3

A marked-content sequence may not cross page boundaries.

NOTE 4

The Contents entry of a page object (see 7.7.3.3, “Page Objects”), which may be either a single stream or an array of streams, is considered a single stream with respect to marked-content sequences.

Table 320 – Marked-content operators
Operands Operator Description
tag MP Designate a marked-content point. tag shall be a name object indicating the role or significance of the point.
tag properties DP Designate a marked-content point with an associated property list. tag shall be a name object indicating the role or significance of the point. properties shall be either an inline dictionary containing the property list or a name object associated with it in the Properties subdictionary of the current resource dictionary (see 14.6.2, “Property Lists”).
tag BMC Begin a marked-content sequence terminated by a balancing EMC operator. tag shall be a name object indicating the role or significance of the sequence.
tag properties BDC Begin a marked-content sequence with an associated property list, terminated by a balancing EMC operator. tag shall be a name object indicating the role or significance of the sequence. properties shall be either an inline dictionary containing the property list or a name object associated with it in the Properties subdictionary of the current resource dictionary (see 14.6.2, “Property Lists”).
-- EMC End a marked-content sequence begun by a BMC or BDC operator.

When the marked-content operators BMC, BDC, and EMC are combined with the text object operators BT and ET (see 9.4, “Text Objects”), each pair of matching operators (BMCEMC, BDCEMC, or BTET) shall be properly (separately) nested. Therefore, the sequences

BMC                BT
  BT                 MBC
    ...      and       ...
  ET                 EMC
EMC                ET

are valid, but

BMC                BT
  BT                 MBC
    ...      and       ...
  EMC                ET
BT                 EMC

are not valid.

14.6.2 属性列表

14.6.2 Property Lists

标记内容操作符 DPBDC 可在内容流中将属性列表与标记内容元素关联。属性列表是一个字典,其中包含符合标准的编写器创建标记内容时的私有信息。符合标准的产品应一致地使用字典条目,并确保与特定键关联的值始终属于相同类型(或少量类型集合)。

如果属性列表字典中的所有值都是直接对象,则该字典可以作为直接对象内联写入内容流。如果任何值是对内容流外部对象的间接引用,则该属性列表字典必须在当前资源字典的 Properties 子字典中定义为一个命名资源(参见 7.8.3,“资源字典”),并通过名称引用,作为 DPBDC 操作符的属性(properties)操作数。

The marked-content operators DP and BDC associate a property list with a marked-content element within a content stream. The property list is a dictionary containing private information meaningful to the conforming writer creating the marked content. Conforming products should use the dictionary entries in a consistent way; the values associated with a given key should always be of the same type (or small set of types).

If all of the values in a property list dictionary are direct objects, the dictionary may be written inline in the content stream as a direct object. If any of the values are indirect references to objects outside the content stream, the property list dictionary shall be defined as a named resource in the Properties subdictionary of the current resource dictionary (see 7.8.3, “Resource Dictionaries”) and referenced by name as the properties operand of the DP or BDC operator.

14.6.3 标记内容和裁剪

14.6.3 Marked Content and Clipping

某些 PDF 路径对象和文本对象仅用于影响当前裁剪路径,而不会实际绘制在页面上。当路径对象使用操作符序列 W n 或 W* n 定义时(参见 8.5.4,“裁剪路径操作符”),或者当文本对象使用文本渲染模式 7 进行绘制时(参见 9.3.6,“文本渲染模式”),便会出现这种情况。这些被裁剪但未绘制的路径或文本对象称为裁剪对象(clipping objects)。当裁剪对象出现在标记内容序列中时,除非整个序列仅由裁剪对象组成,否则它们不应被视为该序列的一部分。例如,在示例 1 中,标记内容序列 Clip 包含文本字符串 (Clip me),但不包括定义裁剪边界的矩形路径。

示例 1

/Clip BMC
    100 100 10 10 re W n        % 裁剪路径
    ( Clip me ) Tj              % 需被裁剪的对象
EMC

仅当标记内容序列完全由裁剪对象组成时,裁剪对象才应被视为该序列的一部分。在这种情况下,该序列称为标记裁剪序列(marked clipping sequence)。这样的序列可以嵌套。例如,在示例 2 中,多个文本行被用于裁剪后续的图形对象(在本例中为填充路径)。每行文本应在一个单独的标记裁剪序列中进行括起,并标记为 Pgf。整个系列又由一个外部标记裁剪序列括起,并标记为 Clip。

注意

标记内容序列 ClippedText 不是标记裁剪序列,因为它包含了一个填充的矩形路径,而该路径不是裁剪对象。因此,Clip 和 Pgf 序列中的裁剪对象不被视为 ClippedText 序列的一部分。

示例 2

/ClippedText BMC
    /Clip << … >>
        BDC
            BT
                7 Tr                    % 开启文本裁剪模式
                /Pgf BMC
                    ( Line 1 ) Tj
                EMC
                    /Pgf BMC
                        ( Line ) '
                        ( 2 ) Tj
                EMC
            ET                            % 设置当前文本裁剪
        EMC
    100 100 10 10 re f                    % 填充路径
EMC

标记裁剪序列的具体规则如下:

  • 裁剪对象(clipping object) 指的是以 W nW* n 结束的路径对象,或使用文本渲染模式 7 进行绘制的文本对象。
  • 不可见图形对象(invisible graphics object) 指的是以 n 结束的路径对象(且之前没有 WW*),或者使用文本渲染模式 3 进行绘制的文本对象。
  • 可见图形对象(visible graphics object) 指的是以 n 以外的任何操作符结束的路径对象,或使用除 3 和 7 以外的文本渲染模式进行绘制的文本对象,或者是通过 Do 操作符调用的 XObject。
  • 空标记内容元素(empty marked-content element) 指的是不包含任何图形对象的标记内容点或标记内容序列。
  • 标记裁剪序列(marked clipping sequence) 指的是包含至少一个裁剪对象但不包含可见图形对象的标记内容序列。
  • 裁剪对象和标记裁剪序列仅在其外部包围的标记内容序列是标记裁剪序列时,才被视为该序列的一部分。
  • 不可见图形对象和空标记内容元素始终被视为其外部包围的标记内容序列的一部分,而不管该序列是否为标记裁剪序列。
  • q(保存)和 Q(恢复)操作符不得出现在标记裁剪序列中。

示例 3 说明了这些规则的应用。标记内容序列 S4 是一个标记裁剪序列,因为它包含一个裁剪对象(裁剪路径 2)且不包含可见图形对象。因此,裁剪路径 2 被视为 S4 序列的一部分。而标记内容序列 S1、S2 和 S3 不是 标记裁剪序列,因为它们每个都至少包含一个可见图形对象。因此,裁剪路径 1 和 2 不属于这三个序列中的任何一个。

示例 3

/S1 BMC
    /S2 BMC
        /S3 BMC
            0 0 m
            100 100 l
            0 100 l W n        % 裁剪路径 1
            0 0 m
            200 200 l
            0 100 l f          % 填充路径
        EMC

        /S4 BMC
            0 0 m
            300 300 l
            0 100 l W n        % 裁剪路径 2
        EMC
    EMC
    100 100 10 10 re f         % 填充路径
EMC

在示例 4 中,标记内容序列 S1 是一个标记裁剪序列,因为它包含的唯一图形对象是一个裁剪路径。因此,空的标记内容序列 S3 和标记内容点 P1 都属于序列 S2,而 S2、S3 和 P1 都属于序列 S1。

示例 4

/S1 BMC
    … 裁剪路径 …
    /S2 BMC
        /S3 BMC
        EMC
        /P1 DP
    EMC
EMC

在示例 5 中,标记内容序列 S1 和 S4 是标记裁剪序列,因为它们包含的唯一对象是裁剪路径。因此,该裁剪路径是 S1 和 S4 序列的一部分;S3 属于 S2;而 S2、S3 和 S4 都属于 S1。

示例 5

/S1 BMC
    /S2 BMC
        /S3 BMC
        EMC
    EMC

    /S4 BMC
        … 裁剪路径 …
    EMC
EMC

Some PDF path and text objects are defined purely for their effect on the current clipping path, without the objects actually being painted on the page. This occurs when a path object is defined using the operator sequence W n or W* n (see 8.5.4, “Clipping Path Operators”) or when a text object is painted in text rendering mode 7 (see 9.3.6, “Text Rendering Mode”). Such clipped, unpainted path or text objects are called clipping objects. When a clipping object falls within a marked-content sequence, it shall not be considered part of the sequence unless the entire sequence consists only of clipping objects. In Example 1, for instance, the marked- content sequence tagged Clip includes the text string ( Clip me ) but not the rectangular path that defines the clipping boundary.

EXAMPLE 1

/Clip BMC
    100 100 10 10 re W n        % Clipping path
    ( Clip me ) Tj              % Object to be clipped
EMC

Only when a marked-content sequence consists entirely of clipping objects shall the clipping objects be considered part of the sequence. In this case, the sequence is known as a marked clipping sequence. Such sequences may be nested. In Example 2, for instance, multiple lines of text are used to clip a subsequent graphics object (in this case, a filled path). Each line of text shall be bracketed within a separate marked clipping sequence, tagged Pgf. The entire series shall be bracketed in turn by an outer marked clipping sequence, tagged Clip.

NOTE

The marked-content sequence tagged ClippedText is not a marked clipping sequence, since it contains a filled rectangular path that is not a clipping object. The clipping objects belonging to the Clip and Pgf sequences are therefore not considered part of the ClippedText sequence.

EXAMPLE 2

/ClippedText BMC
    /Clip << … >>
        BDC
            BT
                7 Tr                    % Begin text clip mode
                /Pgf BMC
                    ( Line 1 ) Tj
                EMC
                    /Pgf BMC
                        ( Line ) '
                        ( 2 ) Tj
                EMC
            ET                            % Set current text clip
        EMC
    100 100 10 10 re f                    % Filled path
EMC

The precise rules governing marked clipping sequences shall be as follows:

  • A clipping object shall be a path object ended by the operator sequence W n or W* n or a text object painted in text rendering mode 7.
  • An invisible graphics object shall be a path object ended by the operator n only (with no preceding W or W*) or a text object painted in text rendering mode 3.
  • A visible graphics object shall be a path object ended by any operator other than n, a text object painted in any text rendering mode other than 3 or 7, or any XObject invoked by the Do operator.
  • An empty marked-content element shall be a marked-content point or a marked-content sequence that encloses no graphics objects.
  • A marked clipping sequence shall be a marked-content sequence that contains at least one clipping object and no visible graphics objects.
  • Clipping objects and marked clipping sequences shall be considered part of an enclosing marked-content sequence only if it is a marked clipping sequence.
  • Invisible graphics objects and empty marked-content elements shall always be considered part of an enclosing marked-content sequence, regardless of whether it is a marked clipping sequence.
  • The q (save) and Q (restore) operators may not occur within a marked clipping sequence.

Example 3 illustrates the application of these rules. Marked-content sequence S4 is a marked clipping sequence because it contains a clipping object (clipping path 2) and no visible graphics objects. Clipping path 2 is therefore considered part of sequence S4. Marked-content sequences S1, S2, and S3 are not marked clipping sequences, since they each include at least one visible graphics object. Thus, clipping paths 1 and 2 are not part of any of these three sequences.

EXAMPLE 3

/S1 BMC
    /S2 BMC
        /S3 BMC
            0 0 m
            100 100 l
            0 100 l W n        % Clipping path 1
            0 0 m
            200 200 l
            0 100 l f          % Filled path
        EMC

        /S4 BMC
            0 0 m
            300 300 l
            0 100 l W n        % Clipping path 2
        EMC
    EMC
    100 100 10 10 re f         % Filled path
EMC

In Example 4 marked-content sequence S1 is a marked clipping sequence because the only graphics object it contains is a clipping path. Thus, the empty marked-content sequence S3 and the marked-content point P1 are both part of sequence S2, and S2, S3, and P1 are all part of sequence S1.

EXAMPLE 4

/S1 BMC
    … Clipping path …
    /S2 BMC
        /S3 BMC
        EMC
        /P1 DP
    EMC
EMC

In Example 5 marked-content sequences S1 and S4 are marked clipping sequences because the only object they contain is a clipping path. Hence the clipping path is part of sequences S1 and S4; S3 is part of S2; and S2, S3, and S4 are all part of S1.

EXAMPLE 5

/S1 BMC
    /S2 BMC
        /S3 BMC
        EMC
    EMC

    /S4 BMC
        … Clipping path …
    EMC
EMC