14.9 无障碍支持¶
14.9 Accessibility Support
14.9.1 概述¶
14.9.1 General
PDF 包含多种功能,以支持残障用户访问文档。特别是,许多具有视觉障碍的计算机用户使用屏幕阅读器来朗读文档内容。为了使得文档能够正确朗读,无论是通过屏幕阅读器还是通过某种更直接的文本转语音引擎调用,PDF 支持以下功能:
- 指定 PDF 文档中文本使用的自然语言——例如,指定为英语或西班牙语,或用于隐藏或显示可选内容(参见 14.9.2,“自然语言规范”)
- 为图像或其他无法自然转换为文本的项目提供文字描述(参见 14.9.3,“替代描述”),或为可以转换为文本但以非标准方式表示的内容(如连字或装饰字符;参见 14.9.4,“替代文本”)提供替代文本
- 指定缩写或首字母缩略词的扩展(参见 14.9.5,“缩写和首字母缩略词的扩展”)
这种支持的核心在于能够通过逻辑结构和标记 PDF 确定 PDF 文档中内容的逻辑顺序,而不依赖于内容的外观或布局,如 [14.8.2.3] 中所述,“页面内容顺序”。辅助技术应用程序可以通过遍历结构层次并展示每个节点的内容,提取文档的内容以供残障用户呈现。因此,符合规范的创作人员确保文档中的所有信息都可以通过结构层次访问,并且应使用本小节中描述的功能。
注意 1
可以从标记 PDF 文档中提取文本,并将其检查或重新使用,用于除辅助功能之外的其他目的;参见 14.8,“标记 PDF”。
注意 2
关于 Web 上发布内容的辅助功能支持的附加指南,可以在 W3C 文档 Web 内容可访问性指南 及其引用的文档中找到(参见 参考书目)。
PDF includes several facilities in support of accessibility of documents to users with disabilities. In particular, many visually computer users with visual impairments use screen readers to read documents aloud. To enable proper vocalization, either through a screen reader or by some more direct invocation of a text-to-speech engine, PDF supports the following features:
- Specifying the natural language used for text in a PDF document—for example, as English or Spanish, or used to hide or reveal optional content (see 14.9.2, “Natural Language Specification”)
- Providing textual descriptions for images or other items that do not translate naturally into text (14.9.3, “Alternate Descriptions”), or replacement text for content that does translate into text but is represented in a nonstandard way (such as with a ligature or illuminated character; see 14.9.4, “Replacement Text”)
- Specifying the expansion of abbreviations or acronyms (Section 14.9.5, “Expansion of Abbreviations and Acronyms”)
The core of this support lies in the ability to determine the logical order of content in a PDF document, independently of the content’s appearance or layout, through logical structure and Tagged PDF, as described under [14.8.2.3], “Page Content Order.” An accessibility application can extract the content of a document for presentation to users with disabilities by traversing the structure hierarchy and presenting the contents of each node. For this reason, conforming writers ensure that all information in a document is reachable by means of the structure hierarchy, and they should use the facilities described in this sub-clause.
NOTE 1
Text can be extracted from Tagged PDF documents and examined or reused for purposes other than accessibility; see 14.8, “Tagged PDF.”
NOTE 2
Additional guidelines for accessibility support of content published on the Web can be found in the W3C document Web Content Accessibility Guidelines and the documents it points to (see the Bibliography).
14.9.2 自然语言规范¶
14.9.2 Natural Language Specification
14.9.2.1 概述¶
14.9.2.1 General
可以为文档中的文本或可选内容指定自然语言。
文档中文本使用的自然语言应以层级方式确定,基于是否在多个可能位置中的任何一个存在可选的 Lang 条目(PDF 1.4)。在最高层次,文档的默认语言(适用于文本字符串和内容流中的文本)可以通过文档目录中的 Lang 条目指定(参见 7.7.2,“文档目录”)。在此之下,可以为以下项目指定语言:
- 任何类型的结构元素(参见 14.7.2,“结构层次”),通过结构元素字典中的 Lang 条目。
- 不在结构层次中的标记内容序列(参见 14.6,“标记内容”),通过附加到带有 Span 标签的标记内容序列的属性列表中的 Lang 条目。
注意 1
尽管 Span 也是一种标准结构类型,如 14.8.4.4 中所述,“内联级结构元素”,但它在此的使用完全独立于逻辑结构。
注意 2
可选内容使用的自然语言允许基于 Language 字典中的 Lang 条目(PDF 1.5)隐藏或显示内容,位于可选内容使用字典中。
注意 3
以下子条款提供了 Lang 条目的值以及确定文档中文本语言的层次方式的详细信息。
用 Unicode 编码的文本字符串可以包括一个转义序列或语言标签,指示文本的语言,并覆盖当前的 Lang 条目(参见 7.9.2.2,“文本字符串类型”)。
Natural language may be specified for text in a document or for optional content.
The natural language used for text in a document shall be determined in a hierarchical fashion, based on whether an optional Lang entry (PDF 1.4) is present in any of several possible locations. At the highest level, the document’s default language (which applies to both text strings and text within content streams) may be specified by a Lang entry in the document catalogue (see 7.7.2, “Document Catalog”). Below this, the language may be specified for the following items:
- Structure elements of any type (see 14.7.2, “Structure Hierarchy”), through a Lang entry in the structure element dictionary.
- Marked-content sequences that are not in the structure hierarchy (see 14.6, “Marked Content”), through a Lang entry in a property list attached to the marked-content sequence with a Span tag.
NOTE 1
Although Span is also a standard structure type, as described under 14.8.4.4, “Inline-Level Structure Elements,” its use here is entirely independent of logical structure.
NOTE 2
The natural language used for optional content allows content to be hidden or revealed, based on the Lang entry (PDF 1.5) in the Language dictionary of an optional content usage dictionary.
NOTE 3
The following sub-clauses provide details on the value of the Lang entry and the hierarchical manner in which the language for text in a document is determined.
Text strings encoded in Unicode may include an escape sequence or language tag indicating the language of the text and overriding the prevailing Lang entry (see 7.9.2.2, “Text String Type”).
14.9.2.2 语言标识符¶
14.9.2.2 Language Identifiers
某些与语言相关的字典条目是指定语言标识符的文本字符串。此类文本字符串可能在以下结构或字典中显示为 Lang 条目:
- 文档目录、结构元素字典或属性列表
- 可选内容使用字典的语言字典,14.9.2.3“语言规范层次结构”中描述的层次结构问题不适用于此条目
语言标识符应为空文本字符串,以指示语言未知,或为 RFC 3066“语言标识标签”中定义的 语言标签。
虽然语言代码通常使用小写字母表示,国家代码通常使用大写字母表示,但所有标签均应视为不区分大小写。
Certain language-related dictionary entries are text strings that specify language identifiers. Such text strings may appear as Lang entries in the following structures or dictionaries:
- Document catalogue, structure element dictionary, or property list
- Optional content usage dictionary’s Language dictionary, the hierarchical issues described in 14.9.2.3, “Language Specification Hierarchy,” shall not apply to this entry
A language identifier shall either be the empty text string, to indicate that the language is unknown, or a Language-Tag as defined in RFC 3066, Tags for the Identification of Languages.
Although language codes are commonly represented using lowercase letters and country codes are commonly represented using uppercase letters, all tags shall be treated as case insensitive.
14.9.2.3 语言规范层次结构¶
14.9.2.3 Language Specification Hierarchy
Lang 条目在文档目录中应指定文档中所有文本的默认自然语言。语言规范可以出现在结构元素中,也可以出现在不在结构层次中的标记内容序列中。如果存在此类语言规范,则会覆盖默认值。
结构层次中的语言规范按以下顺序应用:
- 结构元素的语言规范。如果结构元素没有 Lang 条目,则该元素应继承其父元素的语言规范(如果父元素有)。
- 在结构元素内,为嵌套的结构元素或标记内容序列指定语言规范。
如果页面内容的部分位于结构层次中,并且结构化内容嵌套在适用不同语言规范的非结构化内容中,则结构元素的语言规范应优先适用。
附加到标记内容序列的语言标识符(使用 Span 标签)指定该序列中所有文本的语言,除非嵌套的标记内容包含在结构层次中(这种情况下适用结构元素的语言)或被其他嵌套标记内容的语言规范覆盖。
注意
本子条款中的示例说明了确定文档中文本语言的层次方式。
示例 1
此示例显示了如何通过在页面内容流内的标记内容序列中指定的语言覆盖为文档整体指定的语言,而不依赖于任何逻辑结构。在此情况下,文档目录中的 Lang 条目(未显示)具有值 en-US,表示美国英语,并通过附加在标记内容序列 "Hasta la vista" 上的 Lang 属性(使用 Span 标签)覆盖,该属性指定的语言为 es-MX,表示墨西哥西班牙语。
2 0 obj % 页面对象
<< /Type /Page
/Contents 3 0 R % 内容流
…
>>
endobj
3 0 obj % 页面内容流
<< /Length … >>
stream
BT
( See you later, or as Arnold would say, ) Tj
/Span << /Lang ( es-MX ) >> % 标记内容序列开始
BDC
( Hasta la vista . ) Tj
EMC % 标记内容序列结束
ET
endstream
endobj
示例 2
在以下示例中,结构元素字典中的 Lang 条目(指定英语)适用于在指定页面的内容流中具有 MCID(标记内容标识符)值为 0 的标记内容序列。然而,在该标记内容序列中嵌套了另一个标记内容序列,其中附加的 Lang 属性(使用 Span 标签,指定西班牙语)覆盖了结构元素的语言规范。
此示例省略了在用作内容项的对象中所需的 StructParents 条目(参见 14.7.4.4,“从内容项查找结构元素”)。
1 0 obj % 结构元素
<< /Type /StructElem
/S /P % 结构类型
/P … % 结构层次中的父级
/K << /Type /MCR
/Pg 2 0 R % 包含标记内容序列的页面
/MCID 0 % 标记内容标识符
>>
/Lang ( en-US ) % 此元素的语言规范
>>
endobj
2 0 obj % 页面对象
<< /Type /Page
/Contents 3 0 R % 内容流
…
>>
endobj
3 0 obj % 页面内容流
<< /Length … >>
stream
BT
/P << /MCID 0 >> % 标记内容序列开始
BDC
( See you later, or in Spanish you would say, ) Tj
/Span << /Lang ( es-MX ) >> % 嵌套标记内容序列开始
BDC
( Hasta la vista . ) Tj
EMC % 嵌套标记内容序列结束
EMC % 标记内容序列结束
ET
endstream
endobj
示例 3
页面内容流由一个标记内容序列组成,该序列通过 Span 标签和 Lang 属性指定西班牙语作为其语言。在该序列中嵌套的内容是一个结构元素的一部分(通过该属性列表中的 MCID 条目指示),并且适用于后者内容的语言规范是结构元素的语言,即英语。
此示例省略了在用作内容项的对象中所需的 StructParents 条目(参见 14.7.4.4,“从内容项查找结构元素”)。
1 0 obj % 结构元素
<< /Type /StructElem
/S /P % 结构类型
/P … % 结构层次中的父级
/K << /Type /MCR
/Pg 2 0 R % 包含标记内容序列的页面
/MCID 0 % 标记内容标识符
>>
/Lang ( en-US ) % 此元素的语言规范
>>
endobj
2 0 obj % 页面对象
<< /Type /Page
/Contents 3 0 R % 内容流
…
>>
endobj
3 0 obj % 页面内容流
<< /Length … >>
stream
/Span << /Lang ( es-MX ) >> % 标记内容序列开始
BDC
( Hasta la vista, ) Tj
/P << /MCID 0 >> % 结构化标记内容序列开始,
BDC % 适用结构元素的语言
( as Arnold would say. ) Tj
EMC % 结构化标记内容序列结束
EMC % 标记内容序列结束
endstream
endobj
The Lang entry in the document catalogue shall specify the default natural language for all text in the document. Language specifications may appear within structure elements, and they may appear within marked-content sequences that are not in the structure hierarchy. If present, such language specifications override the default.
Language specifications within the structure hierarchy apply in this order:
- A structure element’s language specification. If a structure element does not have a Lang entry, the element shall inherit its language from any parent element that has one.
- Within a structure element, a language specification for a nested structure element or marked-content sequence
If only part of the page content is contained in the structure hierarchy, and the structured content is nested within nonstructured content for which a different language specification applies, the structure element’s language specification shall take precedence.
A language identifier attached to a marked-content sequence with the Span tag specifies the language for all text in the sequence except for nested marked content that is contained in the structure hierarchy (in which case the structure element’s language applies) and except where overridden by language specifications for other nested marked content.
NOTE
Examples in this sub-clause illustrate the hierarchical manner in which the language for text in a document is determined.
EXAMPLE 1
This example shows how a language specified for the document as a whole could be overridden by one specified for a marked-content sequence within a page’s content stream, independent of any logical structure. In this case, the Lang entry in the document catalogue (not shown) has the value en-US, meaning U.S. English, and it is overridden by the Lang property attached (with the Span tag) to the marked-content sequence Hasta la vista. The Lang property identifies the language for this marked content sequence with the value es-MX, meaning Mexican Spanish.
2 0 obj % Page object
<< /Type /Page
/Contents 3 0 R % Content stream
…
>>
endobj
3 0 obj % Page's content stream
<< /Length … >>
stream
BT
( See you later, or as Arnold would say, ) Tj
/Span << /Lang ( es-MX ) >> % Start of marked-content sequence
BDC
( Hasta la vista . ) Tj
EMC % End of marked-content sequence
ET
endstream
endobj
EXAMPLE 2
In the following example, the Lang entry in the structure element dictionary (specifying English) applies to the marked-content sequence having an MCID (marked-content identifier) value of 0 within the indicated page’s content stream. However, nested within that marked-content sequence is another one in which the Lang property attached with the Span tag (specifying Spanish) overrides the structure element’s language specification.
This example omits required StructParents entries in the objects used as content items (see 14.7.4.4, “Finding Structure Elements from Content Items”).
1 0 obj % Structure element
<< /Type /StructElem
/S /P % Structure type
/P … % Parent in structure hierarchy
/K << /Type /MCR
/Pg 2 0 R % Page containing marked-content sequence
/MCID 0 % Marked-content identifier
>>
/Lang ( en-US ) % Language specification for this element
>>
endobj
2 0 obj % Page object
<< /Type /Page
/Contents 3 0 R % Content stream
…
>>
endobj
3 0 obj % Page's content stream
<< /Length … >>
stream
BT
/P << /MCID 0 >> % Start of marked-content sequence
BDC
( See you later, or in Spanish you would say, ) Tj
/Span << /Lang ( es-MX ) >> % Start of nested marked-content sequence
BDC
( Hasta la vista . ) Tj
EMC % End of nested marked-content sequence
EMC % End of marked-content sequence
ET
endstream
endobj
EXAMPLE 3
The page’s content stream consists of a marked-content sequence that specifies Spanish as its language by means of the Span tag with a Lang property. Nested within it is content that is part of a structure element (indicated by the MCID entry in that property list), and the language specification that applies to the latter content is that of the structure element, English.
This example omits required StructParents entries in the objects used as content items (see 14.7.4.4, “Finding Structure Elements from Content Items”).
1 0 obj % Structure element
<< /Type /StructElem
/S /P % Structure type
/P … % Parent in structure hierarchy
/K << /Type /MCR
/Pg 2 0 R % Page containing marked-content sequence
/MCID 0 % Marked-content identifier
>>
/Lang ( en-US ) % Language specification for this element
>>
endobj
2 0 obj % Page object
<< /Type /Page
/Contents 3 0 R % Content stream
…
>>
endobj
3 0 obj % Page's content stream
<< /Length … >>
stream
/Span << /Lang ( es-MX ) >> % Start of marked-content sequence
BDC
( Hasta la vista, ) Tj
/P << /MCID 0 >> % Start of structured marked-content sequence,
BDC % to which structure element's language applies
( as Arnold would say. ) Tj
EMC % End of structured marked-content sequence
EMC % End of marked-content sequence
endstream
endobj
14.9.2.4 多语言文本数组¶
14.9.2.4 Multi-language Text Arrays
多语言文本数组(PDF 1.5)允许为每个文本字符串指定多个语言标识符。(有关其用法的示例,请参见 表格 274 和 277 中的 Alt 条目。)
多语言文本数组应包含字符串对。每对中的第一个字符串应为语言标识符(参见 14.9.2.2,“语言标识符”)。语言标识符在数组中不得出现超过一次;任何无法识别的语言标识符应被忽略。空字符串指定默认文本,在数组中未找到适当的语言标识符时可使用。第二个字符串是与该语言关联的文本。
示例
[ (en-US) (My vacation) (fr) (mes vacances) ( ) (default text) ]
当符合规范的阅读器搜索多语言文本数组以查找给定语言的文本时,它应查找给定语言标识符与数组中的语言标识符之间的完全匹配(不区分大小写)。如果未找到完全匹配,则应按数组顺序尝试前缀匹配:如果给定标识符是数组中某个标识符的前缀且前缀后的第一个字符是连字符,则声明匹配。例如,给定标识符 en 可以匹配数组标识符 en-US,但给定标识符 en-US 既不能匹配 en 也不能匹配 en-GB。如果无法找到完全匹配或前缀匹配,则应使用默认文本(如果有)。
A multi-language text array (PDF 1.5) allows multiple text strings to be specified, each in association with language identifier. (See the Alt entry in Tables 274 and 277 for examples of its use.)
A multi-language text array shall contain pairs of strings. The first string in each pair shall be a language identifier (14.9.2.2, “Language Identifiers”). A language identifier shall not appear more than once in the array; any unrecognized language identifier shall be ignored. An empty string specifies default text that may be used when no suitable language identifier is found in the array. The second string is text associated with the language.
EXAMPLE
[ (en-US) (My vacation) (fr) (mes vacances) ( ) (default text) ]
When a conforming reader searches a multi-language text array to find text for a given language, it shall look for an exact (though case-insensitive) match between the given language’s identifier and the language identifiers in the array. If no exact match is found, prefix matching shall be attempted in increasing array order: a match shall be declared if the given identifier is a leading, case-insensitive, substring of an identifier in the array, and the first post-substring character in the array identifier is a hyphen. For example, given identifier en matches array identifier en-US, but given identifier en-US matches neither en nor en-GB. If no exact or prefix match can be found, the default text (if any) should be used.
14.9.3 替代描述¶
14.9.3 Alternate Descriptions
PDF 文档可以通过为图像、公式或其他无法自然转化为文本的项目提供替代描述来增强可访问性。
注意 1
替代描述是可由文本到语音引擎朗读的可读文本,旨在为视力受限的用户提供帮助。
可以为以下项目指定替代描述:
- 结构元素(参见 14.7.2,“结构层次”),通过结构元素字典中的 Alt 条目
- (PDF 1.5)标记内容序列(参见 14.6,“标记内容”),通过附加到带有 Span 标签的标记内容序列的属性列表中的 Alt 条目
- 任何类型的注释(参见 12.5,“注释”),如果该注释没有文本表示,则通过注释字典中的 Contents 条目
对于通常显示文本的注释类型,应使用注释字典中的 Contents 条目作为替代描述的来源。对于不显示文本的注释类型,可以包含 Contents 条目(PDF 1.4)来指定替代描述。声音注释不需要替代描述来进行朗读,但可以包括 Contents 条目,指定可以显示的描述,以便为有听力障碍的用户提供帮助。
可以为交互表单字段指定替代名称(参见 [12.7],“交互表单”)。如果存在此替代名称,它应在符合规范的阅读器识别字段时替代实际字段名称。此替代名称(如果提供)应通过字段字典中的 TU 条目指定。
注意 2
TU 条目对朗读目的非常有用。
替代描述是文本字符串,应使用 PDFDocEncoding 或 Unicode 字符编码进行编码。
注意 3
如 7.9.2.2 中所述,“文本字符串类型”,Unicode 定义了一个转义序列,用于指示文本的语言。此机制使替代描述能够从现行 Lang 条目指定的语言更改(如前述小节所述)。在替代描述中,指定语言的 Unicode 转义序列应覆盖现行的 Lang 条目。
当应用于结构元素时,替代描述文本应视为当前元素的完整(或整体)单词或短语替代。如果一系列中的两个(或更多)元素在其字典中有 Alt 条目,它们应视为之间存在换行符。同样的规则适用于连续的标记内容序列。
属性列表中的 Alt 条目可以与其他条目结合使用。
示例
该示例显示了 Alt 条目与 Lang 条目结合使用的情况。
/Span << /Lang (en-us) /Alt (six-point star) >> BDC (A) Tj EMC
PDF documents may be enhanced by providing alternate descriptions for images, formulas, or other items that do not translate naturally into text.
NOTE 1
Alternate descriptions are human-readable text that could, for example, be vocalized by a text-to-speech engine for the benefit of users with visual impairments.
An alternate description may be specified for the following items:
- A structure element (see 14.7.2, “Structure Hierarchy”), through an Alt entry in the structure element dictionary
- (PDF 1.5) A marked-content sequence (see 14.6, “Marked Content”), through an Alt entry in a property list attached to the marked-content sequence with a Span tag.
- Any type of annotation (see 12.5, “Annotations”) that does not already have a text representation, through a Contents entry in the annotation dictionary
For annotation types that normally display text, the Contents entry of the annotation dictionary shall be used as the source for an alternate description. For annotation types that do not display text, a Contents entry (PDF 1.4) may be included to specify an alternate description. Sound annotations, which need no alternate description for the purpose of vocalization, may include a Contents entry specifying a description that may be displayed for the benefit of users with hearing impairments.
An alternate name may be specified for an interactive form field (see [12.7], “Interactive Forms”) which, if present, shall be used in place of the actual field name when a conforming reader identifies the field in a user- interface. This alternate name, if provided, shall be specified using the TU entry of the field dictionary.
NOTE 2
The TU entry is useful for vocalization purposes.
Alternate descriptions are text strings, which shall be encoded in either PDFDocEncoding or Unicode character encoding.
NOTE 3
As described in 7.9.2.2, “Text String Type,” Unicode defines an escape sequence for indicating the language of the text. This mechanism enables the alternate description to change from the language specified by the prevailing Lang entry (as described in the preceding sub-clause). Within alternate descriptions, Unicode escape sequences specifying language shall override the prevailing Lang entry.
When applied to structure elements, the alternate description text shall be considered to be a complete (or whole) word or phrase substitution for the current element. If each of two (or more) elements in a sequence have an Alt entry in their dictionaries, they shall be treated as if a word break is present between them. The same applies to consecutive marked-content sequences.
The Alt entry in property lists may be combined with other entries.
EXAMPLE
This example shows the Alt entry combined with a Lang entry.
/Span << /Lang (en-us) /Alt (six-point star) >> BDC (A) Tj EMC
14.9.4 替换文本¶
14.9.4 Replacement Text
注意 1
就像可以为图像和其他无法自然转化为文本的项目提供替代描述一样(如前述小节所述),也可以为那些可以转化为文本但以非标准方式表示的内容指定替代文本。这些非标准表示可能包括例如连字的字形、定制字符,或与照亮手稿中的字母或下沉大写字母相对应的内联图形。
替代文本可以为以下项目指定:
- 结构元素(参见 14.7.2,“结构层次”),通过结构元素字典中的可选 ActualText 条目(PDF 1.4)。
- (PDF 1.5)标记内容序列(参见 14.6,“标记内容”),通过附加到带有 Span 标签的标记内容序列的属性列表中的 ActualText 条目。
ActualText 的值应作为内容的替代,而非描述,提供与用户在查看内容时看到的文本相等的文本。ActualText 的值应视为对结构元素或标记内容序列的字符替代。如果两个(或更多)连续的结构元素或标记内容序列都有 ActualText 条目,则它们应视为没有换行符。
注意 2
ActualText 被视为字符替代,这与 Alt 被视为整体单词或短语替代的处理方式不同。
示例
该示例展示了如何使用替代文本指示在拼写变动时正确的字符内容(在德语中,直到最近的拼写改革之前,单词“Drucker”在断字时被拼写为“Druk-” 和 “ker”)。
(Dru) Tj
/Span
<</Actual Text (c) >>
BDC
(k-) Tj
EMC
(ker) '
与替代描述(和其他文本字符串)一样,替代文本如果采用 Unicode 编码,也可以包括用于指示文本语言的转义序列。该序列应覆盖现行的 Lang 条目(参见 7.9.2.2,“文本字符串类型”)。
NOTE 1
Just as alternate descriptions can be provided for images and other items that do not translate naturally into text (as described in the preceding sub-clause), replacement text can be specified for content that does translate into text but that is represented in a nonstandard way. These nonstandard representations might include, for example, glyphs for ligatures or custom characters, or inline graphics corresponding to letters in an illuminated manuscript or to dropped capitals.
Replacement text may be specified for the following items:
- A structure element (see 14.7.2, “Structure Hierarchy”), by means of the optional ActualText entry (PDF 1.4) of the structure element dictionary.
- (PDF 1.5) A marked-content sequence (see 14.6, “Marked Content”), through an ActualText entry in a property list attached to the marked-content sequence with a Span tag.
The ActualText value shall be used as a replacement, not a description, for the content, providing text that is equivalent to what a person would see when viewing the content. The value of ActualText shall be considered to be a character substitution for the structure element or marked-content sequence. If each of two (or more) consecutive structure or marked-content sequences has an ActualText entry, they shall be treated as if no word break is present between them.
NOTE 2
The treatment of ActualText as a character replacement is different from the treatment of Alt, which is treated as a whole word or phrase substitution.
EXAMPLE
This example shows the use of replacement text to indicate the correct character content in a case where hyphenation changes the spelling of a word (in German, up until recent spelling reforms, the word “Drucker” when hyphenated was rendered as “Druk-” and “ker”).
(Dru) Tj
/Span
<</Actual Text (c) >>
BDC
(k-) Tj
EMC
(ker) '
Like alternate descriptions (and other text strings), replacement text, if encoded in Unicode, may include an escape sequence for indicating the language of the text. Such a sequence shall override the prevailing Lang entry (see 7.9.2.2, “Text String Type”).
14.9.5 缩写和首字母缩写的扩展¶
14.9.5 Expansion of Abbreviations and Acronyms
缩写或首字母缩略词的扩展可以为以下项目指定:
- 标记内容序列,通过附加到带有 Span 标签的序列的属性列表中的 E 属性(PDF 1.4)。
- 结构元素,通过结构元素字典中的 E 条目(PDF 1.5)。
注意 1
缩写和首字母缩略词可能会给文本转语音引擎带来问题。有时,缩写的完整发音可以通过上下文推断出来。例如,字典查询可能会揭示“Blvd.” 发音为“boulevard”,而“Ave.” 发音为“avenue”。然而,有些缩写较难解析,如在句子“Dr. Healwell works at 123 Industrial Dr.”中。
示例
BT
/Span << /E ( Doctor ) >>
BDC
( Dr. ) Tj
EMC
( Healwell works at 123 Industrial ) Tj
/Span << /E ( Drive ) >>
BDC
( Dr. ) Tj
EMC
ET
E 值(一个文本字符串)应视为对标记文本的单词或短语替代,因此应视为与周围文本之间有单词分隔。如果扩展文本采用 Unicode 编码,则可以包括用于指示文本语言的转义序列(参见 7.9.2.2,“文本字符串类型”)。该序列应覆盖现行的 Lang 条目。
注意 2
一些缩写或首字母缩略词通常不扩展为单词。例如,对于文本“XYZ”,要么不提供扩展(将其发音交给文本转语音引擎),要么为了安全起见,指定扩展为“X Y Z”。
The expansion of an abbreviation or acronym may be specified for the following items:
- Marked-content sequences, through an E property (PDF 1.4) in a property list attached to the sequence with a Span tag.
- Structure elements, through an E entry (PDF 1.5) in the structure element dictionary.
NOTE 1
Abbreviations and acronyms can pose a problem for text-to-speech engines. Sometimes the full pronunciation for an abbreviation can be divined without aid. For example, a dictionary search will probably reveal that “Blvd.” is pronounced “boulevard” and that “Ave.” is pronounced “avenue.” However, some abbreviations are difficult to resolve, as in the sentence “Dr. Healwell works at 123 Industrial Dr.”.
EXAMPLE
BT
/Span << /E ( Doctor ) >>
BDC
( Dr. ) Tj
EMC
( Healwell works at 123 Industrial ) Tj
/Span << /E ( Drive ) >>
BDC
( Dr. ) Tj
EMC
ET
The E value (a text string) shall be considered to be a word or phrase substitution for the tagged text and therefore shall be treated as if a word break separates it from any surrounding text.The expansion text, if encoded in Unicode, may include an escape sequence for indicating the language of the text (see 7.9.2.2, “Text String Type”). Such a sequence shall override the prevailing Lang entry.
NOTE 2
Some abbreviations or acronyms are conventionally not expanded into words. For the text “XYZ,” for example, either no expansion should be supplied (leaving its pronunciation up to the text-to-speech engine) or, to be safe, the expansion “X Y Z” should be specified.