9.7 组合字体¶
Composite Fonts
9.7.1 概述¶
General
复合字体,也叫做 Type 0 字体,是一种其字形来自名为 CIDFont 的类似字体对象的字体。复合字体应由一个字体字典表示,其 Subtype 值为 Type0。Type 0 字体被称为根字体,其关联的 CIDFont 被称为 后代。
NOTE 1
PDF 中的复合字体类似于 PostScript 中的复合字体,但有一些限制。特别是,PDF 要求字符编码由 CMap 定义,而 CMap 只是 PostScript 中可用的多种编码方法之一。此外,PostScript 允许 Type 0 字体具有多个后代,这些后代也可以是 Type 0 字体。PDF 只支持单一的后代,并且该后代必须是 CIDFont。
当当前字体为复合字体时,文本显示操作符的行为与简单字体不同。对于简单字体,要显示的字符串的每个字节选择一个字形,而对于复合字体,一个或多个字节的序列将被解码,从后代 CIDFont 中选择一个字形。
NOTE 2
这一功能支持使用非常大的字符集,例如用于中文、日文和韩文的字符集。它还简化了具有复杂编码要求的字体的组织方式。
本小节首先介绍了 CID 键定字体 的架构,CID 键定字体是 PDF 中支持的唯一类型的复合字体。然后描述了 CIDFont 和 CMap 字典,这些是表示 CID 键定字体中相应命名组件的 PDF 对象。最后,介绍了 Type 0 字体字典,它结合了 CIDFont 和 CMap,通过字符串中的可变长度字符代码访问字体的字形。
A composite font, also called a Type 0 font, is one whose glyphs are obtained from a fontlike object called a CIDFont. A composite font shall be represented by a font dictionary whose Subtype value is Type0. The Type 0 font is known as the root font, and its associated CIDFont is called its descendant.
NOTE 1
Composite fonts in PDF are analogous to composite fonts in PostScript but with some limitations. In particular, PDF requires that the character encoding be defined by a CMap, which is only one of several encoding methods available in PostScript. Also, PostScript allows a Type 0 font to have multiple descendants, which might also be Type 0 fonts. PDF supports only a single descendant, which shall be a CIDFont.
When the current font is composite, the text-showing operators shall behave differently than with simple fonts. For simple fonts, each byte of a string to be shown selects one glyph, whereas for composite fonts, a sequence of one or more bytes are decoded to select a glyph from the descendant CIDFont.
NOTE 2
This facility supports the use of very large character sets, such as those for the Chinese, Japanese, and Korean languages. It also simplifies the organization of fonts that have complex encoding requirements.
This sub-clause first introduces the architecture of CID-keyed fonts, which are the only kind of composite font supported in PDF. Then it describes the CIDFont and CMap dictionaries, which are the PDF objects that represent the correspondingly named components of a CID-keyed font. Finally, it describes the Type 0 font dictionary, which combines a CIDFont and a CMap to produce a font whose glyphs may be accessed by means of variable-length character codes in a string to be shown.
9.7.2 CID 键控字体概述¶
CID-Keyed Fonts Overview
CID 键定字体提供了一种方便高效的方法,用于定义多字节字符编码和具有大量字形的字体。这些功能为表示具有大字符集的语言(如中文、日文和韩文,统称 CJK)书写系统中的文本提供了极大的灵活性。
CID 键定字体架构指定了某些字体程序的外部表示方式,这些字体程序称为 CMap 和 CIDFont 文件,以及将这些文件组合和使用的一些约定。如前所述,PDF 不支持完整的 CID 键定字体架构,该架构是独立于 PDF 的;CID 键定字体可以在其他环境中使用。
NOTE
有关架构和文件格式的完整文档,请参阅 Adobe 技术笔记 #5092《CID 键定字体技术概述》和 #5014《Adobe CMap 和 CIDFont 文件规范》。本小节仅描述表示这些字体程序的 PDF 对象。
术语 CID 键定 字体反映了使用 CID(字符标识符)编号来索引和访问字体中的字形描述这一事实。对于大字体,这种方法比通过字符名称访问的方式更为高效,后者在一些简单字体中使用。CID 的范围从 0 到一个可能受实现限制的最大值(见 Table C.1)。
字符集合是一个有序的字形集合。字符集合中字形的顺序应确定每个字形的 CID 编号。每个 CID 键定字体应明确引用其 CID 编号所基于的字符集合;详见 9.7.3,《CIDSystemInfo 字典》。
CMap(字符映射)文件应指定字符代码与用于标识字形的 CID 编号之间的对应关系。它相当于简单字体中的编码概念。简单字体允许最多 256 个字形一次性进行编码和访问,而 CMap 可以描述从多字节编码到数千个字形的映射,适用于大型 CID 键定字体。
EXAMPLE
一个 CMap 可以描述 Shift-JIS,这是几种广泛使用的日文编码之一。
CMap 文件可以引用整个字符集合或字符集合的子集。CMap 文件的映射生成一个 字体编号(在 PDF 中应为 0)和一个字符选择器(在 PDF 中应为 CID)。此外,CMap 文件可以通过引用包含另一个 CMap 文件,而无需重复它。这些功能使得字符集合能够被组合或补充,并使所有组成字符能够通过单一编码供文本显示操作访问。
CIDFont 包含一个字符集合的字形描述。字形描述本身通常采用类似于简单字体的格式,例如 Type 1。然而,它们是通过 CID 而不是名称来标识,并且它们的组织方式有所不同。
在 PDF 中,CMap 文件和 CIDFont 的数据应通过 PDF 对象进行表示,如 9.7.4,《CIDFonts》和 9.7.5,《CMaps》所述。CMap 文件和 CIDFont 程序本身可以通过名称引用或作为流对象嵌入到 PDF 文件中。
因此,CID 键定字体应是 CMap 和包含字形描述的 CIDFont 的组合。它应表示为一个 Type 0 字体。它包含一个 Encoding 条目,其值应为一个 CMap 字典,并且其 DescendantFonts 条目应引用与 CMap 结合的 CIDFont 字典。
CID-keyed fonts provide a convenient and efficient method for defining multiple-byte character encodings and fonts with a large number of glyphs. These capabilities provide great flexibility for representing text in writing systems for languages with large character sets, such as Chinese, Japanese, and Korean (CJK).
The CID-keyed font architecture specifies the external representation of certain font programs, called CMap and CIDFont files, along with some conventions for combining and using those files. As mentioned earlier, PDF does not support the entire CID-keyed font architecture, which is independent of PDF; CID-keyed fonts may be used in other environments.
NOTE
For complete documentation on the architecture and the file formats, see Adobe Technical Notes #5092, CID-Keyed Font Technology Overview, and #5014, Adobe CMap and CIDFont Files Specification. This sub-clause describes only the PDF objects that represent these font programs.
The term CID-keyed font reflects the fact that CID (character identifier) numbers are used to index and access the glyph descriptions in the font. This method is more efficient for large fonts than the method of accessing by character name, as is used for some simple fonts. CIDs range from 0 to a maximum value that may be subject to an implementation limit (see Table C.1).
A character collection is an ordered set of glyphs. The order of the glyphs in the character collection shall determine the CID number for each glyph. Each CID-keyed font shall explicitly reference the character collection on which its CID numbers are based; see 9.7.3, "CIDSystemInfo Dictionaries".
A CMap (character map) file shall specify the correspondence between character codes and the CID numbers used to identify glyphs. It is equivalent to the concept of an encoding in simple fonts. Whereas a simple font allows a maximum of 256 glyphs to be encoded and accessible at one time, a CMap can describe a mapping from multiple-byte codes to thousands of glyphs in a large CID-keyed font.
EXAMPLE
A CMap can describe Shift-JIS, one of several widely used encodings for Japanese.
A CMap file may reference an entire character collection or a subset of a character collection. The CMap file’s mapping yields a font number (which in PDF shall be 0) and a character selector (which in PDF shall be a CID). Furthermore, a CMap file may incorporate another CMap file by reference, without having to duplicate it. These features enable character collections to be combined or supplemented and make all the constituent characters accessible to text-showing operations through a single encoding.
A CIDFont contains the glyph descriptions for a character collection. The glyph descriptions themselves are typically in a format similar to those used in simple fonts, such as Type 1. However, they are identified by CIDs rather than by names, and they are organized differently.
In PDF, the data from a CMap file and CIDFont shall be represented by PDF objects as described in 9.7.4, "CIDFonts" and 9.7.5, "CMaps". The CMap file and CIDFont programs themselves may be either referenced by name or embedded as stream objects in the PDF file.
A CID-keyed font, then, shall be the combination of a CMap with a CIDFont containing glyph descriptions. It shall be represented as a Type 0 font. It contains an Encoding entry whose value shall be a CMap dictionary, and its DescendantFonts entry shall reference the CIDFont dictionary with which the CMap has been combined.
9.7.3 CIDSystemInfo 字典¶
CIDSystemInfo Dictionaries
CIDFont 和 CMap 字典应包含一个 CIDSystemInfo 条目,用于指定与 CMap 关联的 CIDFont 所假定的字符集合——即 CIDFont 使用的 CID 编号的解释。字符集合应通过 CIDSystemInfo 字典中的 Registry、Ordering 和 Supplement 条目唯一标识,如表 116 所示。为了兼容,Registry 和 Ordering 的值必须相同。
CIDFont 中的 CIDSystemInfo 条目是一个字典,应指定 CIDFont 的字符集合。CIDFont 不需要为集合中的所有 CID 提供字形描述;它可以包含子集。CMap 文件中的 CIDSystemInfo 条目可以是一个字典或字典数组,具体取决于它是将代码与单一字符集合还是多个字符集合关联;详见 9.7.5,《CMaps》。
为了确保正确的行为,CMap 中的 CIDSystemInfo 条目应与它所使用的 CIDFont 或 CIDFont 集合中的 CIDSystemInfo 条目兼容。
键 | 类型 | 值 |
---|---|---|
Registry | ASCII 字符串 | (必需) 标识字符集合发布者的字符串。有关分配注册标识符的信息,请联系 Adobe 解决方案网络或访问 ASN 网站(见 参考文献)。 |
Ordering | ASCII 字符串 | (必需) 唯一命名字符集合的字符串,在指定的注册表中。 |
Supplement | 整数 | (必需) 字符集合的 补充号。原始字符集合的补充号为 0。每当在字符集合中分配额外的 CID 时,补充号应增加。补充不应改变字符集合中现有 CID 的排序。此值不得用于确定字符集合之间的兼容性。 |
CIDFont and CMap dictionaries shall contain a CIDSystemInfo entry specifying the character collection assumed by the CIDFont associated with the CMap—that is, the interpretation of the CID numbers used by the CIDFont. A character collection shall be uniquely identified by the Registry, Ordering, and Supplement entries in the CIDSystemInfo dictionary, as described in Table 116. In order to be compatible, the Registry and Ordering values must be the same.
The CIDSystemInfo entry in a CIDFont is a dictionary that shall specify the CIDFont’s character collection. The CIDFont need not contain glyph descriptions for all the CIDs in a collection; it may contain a subset. The CIDSystemInfo entry in a CMap file shall be either a single dictionary or an array of dictionaries, depending on whether it associates codes with a single character collection or with multiple character collections; see 9.7.5, "CMaps".
For proper behaviour, the CIDSystemInfo entry of a CMap shall be compatible with that of the CIDFont or CIDFonts with which it is used.
Key | Type | Value |
---|---|---|
Registry | ASCII string | (Required) A string identifying the issuer of the character collection. For information about assigning a registry identifier, contact the Adobe Solutions Network or consult the ASN Web site (see the Bibliography). |
Ordering | ASCII string | (Required) A string that uniquely names the character collection within the specified registry. |
Supplement | integer | (Required) The supplement number of the character collection. An original character collection has a supplement number of 0. Whenever additional CIDs are assigned in a character collection, the supplement number shall be increased. Supplements shall not alter the ordering of existing CIDs in the character collection. This value shall not be used in determining compatibility between character collections. |
9.7.4 CIDFonts¶
CIDFonts
9.7.4.1 概述¶
General
CIDFont 程序包含使用 CID 作为字符选择器访问的字形描述。CIDFont 有两种类型:
- CIDFont 包含基于 CFF 的字形描述
NOTE
“Type 0”在应用于 CIDFont 时与应用于“Type 0 字体”时的含义不同。
- Type 2 CIDFont 包含基于 TrueType 字体格式的字形描述。
CIDFont 字典是一个 PDF 对象,包含有关 CIDFont 程序的信息。尽管其 Type 值为 Font,但 CIDFont 实际上不是一个字体。它没有 Encoding 条目,可能不会列在资源字典的 Font 子字典中,也不能用作 Tf 操作符的操作数。它只能作为 Type 0 字体的后代使用。Type 0 字体中的 CMap 定义了将字符代码映射到 CIDFont 中的 CID 的编码方式。表 117 列出了 CIDFont 字典中的条目。
键 | 类型 | 值 |
---|---|---|
Type | 名称 | (必需) 该字典描述的 PDF 对象的类型;对于 CIDFont 字典,值应为 Font。 |
Subtype | 名称 | (必需) CIDFont 的类型应为 CIDFontType0 或 CIDFontType2。 |
BaseFont | 名称 | (必需) CIDFont 的 PostScript 名称。对于Type 0 CIDFont,值应为 CIDFont 程序中 CIDFontName 条目的值。对于Type 2 CIDFont,应按与简单 TrueType 字体相同的方式推导;详见 9.6.3,《TrueType 字体》。在任何情况下,如果适用,名称可以有子集前缀;详见 9.6.4,《字体子集》。 |
CIDSystemInfo | 字典 | (必需) 包含定义 CIDFont 字符集合的条目的字典。详见 表 116。 |
FontDescriptor | 字典 | (必需;应为间接引用) 描述 CIDFont 默认度量(除字形宽度外)的字体描述符(见 9.8,《字体描述符》)。 |
DW | 整数 | (可选) CIDFont 中字形的默认宽度(见 9.7.4.3,《CIDFont 中的字形度量》)。默认值:1000(以用户单位定义)。 |
W | 数组 | (可选) CIDFont 中字形宽度的描述。
NOTE 数组元素有可变格式,可以指定连续 CID 的单独宽度,或指定一组 CID 的宽度(见 9.7.4.3,《CIDFont 中的字形度量》)。 默认值:无(所有字形将使用 DW 值)。 |
DW2 | 数组 | (可选;仅适用于用于垂直书写的 CIDFont) 指定垂直书写的默认度量的两个数字数组(见 9.7.4.3,《CIDFont 中的字形度量》)。默认值:[ 880 −1000 ]。 |
W2 | 数组 | (可选;仅适用于用于垂直书写的 CIDFont) 描述 CIDFont 中字形的垂直书写度量(见 9.7.4.3,《CIDFont 中的字形度量》)。默认值:无(所有字形将使用 DW2 值)。 |
CIDToGIDMap | 流或名称 | (可选;仅适用于Type 2 CIDFont) 指定从 CID 到字形索引的映射。如果值是流,则流中的字节包含从 CID 到字形索引的映射:特定 CID 值 c 的字形索引应为存储在字节 2 × c 和 2 × c + 1 中的 2 字节值,其中第一个字节应为高字节。如果 CIDToGIDMap 的值是名称,则应为 Identity,表示 CID 和字形索引之间的映射是身份映射。默认值:Identity。 该条目仅可出现在Type 2 CIDFont 中,该字体的关联 TrueType 字体程序嵌入在 PDF 文件中。 |
A CIDFont program contains glyph descriptions that are accessed using a CID as the character selector. There are two types of CIDFonts:
- A Type 0 CIDFont contains glyph descriptions based on CFF
NOTE
- The term “Type 0” when applied to a CIDFont has a different meaning than for a “Type 0 font”.
A Type 2 CIDFont contains glyph descriptions based on the TrueType font format
A CIDFont dictionary is a PDF object that contains information about a CIDFont program. Although its Type value is Font, a CIDFont is not actually a font. It does not have an Encoding entry, it may not be listed in the Font subdictionary of a resource dictionary, and it may not be used as the operand of the Tf operator. It shall be used only as a descendant of a Type 0 font. The CMap in the Type 0 font shall be what defines the encoding that maps character codes to CIDs in the CIDFont. Table 117 lists the entries in a CIDFont dictionary.
Key | Type | Value |
---|---|---|
Type | name | (Required) The type of PDF object that this dictionary describes; shall be Font for a CIDFont dictionary. |
Subtype | name | (Required) The type of CIDFont shall be CIDFontType0 or CIDFontType2. |
BaseFont | name | (Required) The PostScript name of the CIDFont. For Type 0 CIDFonts, this shall be the value of the CIDFontName entry in the CIDFont program. For Type 2 CIDFonts, it shall be derived the same way as for a simple TrueType font; see 9.6.3, "TrueType Fonts". In either case, the name may have a subset prefix if appropriate; see 9.6.4, "Font Subsets". |
CIDSystemInfo | dictionary | (Required) A dictionary containing entries that define the character collection of the CIDFont. See Table 116. |
FontDescriptor | dictionary | (Required; shall be an indirect reference) A font descriptor describing the CIDFont’s default metrics other than its glyph widths (see 9.8, "Font Descriptors"). |
DW | integer | (Optional) The default width for glyphs in the CIDFont (see 9.7.4.3, "Glyph Metrics in CIDFonts"). Default value: 1000 (defined in user units). |
W | array | (Optional) A description of the widths for the glyphs in the CIDFont.
NOTE The array’s elements have a variable format that can specify individual widths for consecutive CIDs or one width for a range of CIDs (see 9.7.4.3, "Glyph Metrics in CIDFonts"). Default value: none (the DW value shall be used for all glyphs). |
DW2 | array | (Optional; applies only to CIDFonts used for vertical writing) An array of two numbers specifying the default metrics for vertical writing (see 9.7.4.3, "Glyph Metrics in CIDFonts"). Default value: [ 880 −1000 ]. |
W2 | array | (Optional; applies only to CIDFonts used for vertical writing) A description of the metrics for vertical writing for the glyphs in the CIDFont (see 9.7.4.3, "Glyph Metrics in CIDFonts"). Default value: none (the DW2 value shall be used for all glyphs). |
CIDToGIDMap | stream or name | (Optional; Type 2 CIDFonts only) A specification of the mapping from CIDs to glyph indices. If the value is a stream, the bytes in the stream shall contain the mapping from CIDs to glyph indices: the glyph index for a particular CID value c shall be a 2-byte value stored in bytes 2 × c and 2 × c + 1, where the first byte shall be the high-order byte. If the value of CIDToGIDMap is a name, it shall be Identity, indicating that the mapping between CIDs and glyph indices is the identity mapping. Default value: Identity. This entry may appear only in a Type 2 CIDFont whose associated TrueType font program is embedded in the PDF file. |
9.7.4.2 CIDFonts 中的字形选择¶
Glyph Selection in CIDFonts
Type 0 和 Type 2 CIDFont 在 CIDs 到字形描述的映射处理方式上有所不同。
对于 Type 0,CIDFont 程序包含通过 CID 标识的字形描述。CIDFont 程序通过 CIDSystemInfo 字典标识字符集合,该字典应复制到 PDF CIDFont 字典中。无论程序是嵌入 PDF 文件中还是从外部源获取,所有支持特定字符集合的 CIDFont 程序中,CID 的解释应保持一致。
当 CIDFont 包含嵌入的字体程序,并且该程序采用紧凑字体格式(CFF)表示时,字体描述符中的 FontFile3 条目(见 表 126)可能为 CIDFontType0C 或 OpenType。根据字体程序的内容,有两种情况:
- “CFF”字体程序具有使用 CIDFont 操作符的 Top DICT:CID 应用于使用 CFF 程序中的字符集表来确定字形程序的 GID 值。然后,GID 值将用于通过 CharStrings INDEX 表查找字形程序。
NOTE
尽管在许多字体中,CID 值和 GID 值相同,但 CID 和 GID 值可能不同。
- “CFF”字体程序的 Top DICT 不使用 CIDFont 操作符:CID 将直接用作 GID 值,字形程序将通过 CharStrings INDEX 检索。
对于Type 2,CIDFont 程序实际上是一个 TrueType 字体程序,它没有原生的 CID 概念。在 TrueType 字体程序中,字形描述通过字形索引值来标识。字形索引是字体内部的,不同字体之间的定义并不一致。相反,TrueType 字体程序包含一个“cmap”表,直接提供从字符代码到字形索引的映射,适用于一个或多个预定义的编码。
TrueType 字体程序与 CID 键入字体架构的集成有两种方式,具体取决于字体程序是否嵌入 PDF 文件中:
- 如果 TrueType 字体程序是嵌入的,则Type 2 CIDFont 字典应包含一个 CIDToGIDMap 条目,用于将 CIDs 映射到该字体程序中适当字形描述的字形索引。
- 如果 TrueType 字体程序没有嵌入,但通过名称引用,则Type 2 CIDFont 字典不应包含 CIDToGIDMap 条目,因为在外部字体程序中引用字形索引没有意义。在这种情况下,CID 不参与字形选择,并且此 CIDFont 只能使用预定义的 CMap(见 9.7.5,《CMaps》)。符合标准的阅读器应通过将字符从预定义 CMap 指定的编码转换为 TrueType 字体“cmap”表中的编码之一来选择字形。具体如何实现取决于实现方式。
尽管在Type 2 CIDFont 中,CID 不用于选择字形,但它们始终用于确定字形度量,如下一子条款中所述。
每个 CIDFont 必须包含 CID 0 的字形描述,这类似于简单字体中的 .notdef 字符名称(见 9.7.6.3,《处理未定义字符》)。
Type 0 and Type 2 CIDFonts handle the mapping from CIDs to glyph descriptions in somewhat different ways.
For Type 0, the CIDFont program contains glyph descriptions that are identified by CIDs. The CIDFont program identifies the character collection by a CIDSystemInfo dictionary, which should be copied into the PDF CIDFont dictionary. CIDs shall be interpreted uniformly in all CIDFont programs supporting a given character collection, whether the program is embedded in the PDF file or obtained from an external source.
When the CIDFont contains an embedded font program that is represented in the Compact Font Format (CFF), the FontFile3 entry in the font descriptor (see Table 126) may be CIDFontType0C or OpenType. There are two cases, depending on the contents of the font program:
- The “CFF” font program has a Top DICT that uses CIDFont operators: The CIDs shall be used to determine the GID value for the glyph procedure using the charset table in the CFF program. The GID value shall then be used to look up the glyph procedure using the CharStrings INDEX table.
NOTE
Although in many fonts the CID value and GID value are the same, the CID and GID values may differ.
- The “CFF” font program has a Top DICT that does not use CIDFont operators: The CIDs shall be used directly as GID values, and the glyph procedure shall be retrieved using the CharStrings INDEX.
For Type 2, the CIDFont program is actually a TrueType font program, which has no native notion of CIDs. In a TrueType font program, glyph descriptions are identified by glyph index values. Glyph indices are internal to the font and are not defined consistently from one font to another. Instead, a TrueType font program contains a “cmap” table that provides mappings directly from character codes to glyph indices for one or more predefined encodings.
TrueType font programs are integrated with the CID-keyed font architecture in one of two ways, depending on whether the font program is embedded in the PDF file:
- If the TrueType font program is embedded, the Type 2 CIDFont dictionary shall contain a *CIDToGIDMap entry that maps CIDs to the glyph indices for the appropriate glyph descriptions in that font program.
- If the TrueType font program is not embedded but is referenced by name, the Type 2 CIDFont dictionary shall not contain a CIDToGIDMap entry, since it is not meaningful to refer to glyph indices in an external font program. In this case, CIDs shall not participate in glyph selection, and only predefined CMaps may be used with this CIDFont (see 9.7.5, "CMaps"). The conforming reader shall select glyphs by translating characters from the encoding specified by the predefined CMap to one of the encodings in the TrueType font’s “cmap” table. The means by which this is accomplished are implementation-dependent.
Even though the CIDs are not used to select glyphs in a Type 2 CIDFont, they shall always be used to determine the glyph metrics, as described in the next sub-clause.
Every CIDFont shall contain a glyph description for CID 0, which is analogous to the . notdef character name in simple fonts (see 9.7.6.3, "Handling Undefined Characters").
9.7.4.3 CIDFonts 中的字形指标¶
Glyph Metrics in CIDFonts
如[9.2.4]《字形定位和度量》中所讨论,字形的 宽度 是指在横向书写模式下,字形的原点与下一个字形原点之间的水平位移。在这种模式下,字形原点之间的垂直位移应为 0。CIDFont 的宽度通过 CIDFont 字典中的 DW 和 W 条目定义。这些宽度应与 CIDFont 程序中给定的实际宽度一致。
W 数组允许为单个 CID 定义宽度。数组的元素应按两种或三种一组的方式组织,每组应采用以下两种格式之一:
在第一种格式中,c 应为一个整数,指定起始 CID 值;后面跟着一个包含 n 个数字的数组,指定从 c 开始的 n 个连续 CID 的宽度。第二种格式定义了从 cfirst 到 clast 范围内所有 CID 的相同宽度 w。
EXAMPLE 1
在此示例中,CID 120、121 和 122 对应的字形宽度分别为 400、325 和 500 单位。CID 范围从 7080 到 8032 的字形宽度均为 1000 单位。
W 条目示例:
/W [ 120 [ 400 325 500 ]
7080 8032 1000
]
CIDFont 中的字形可以用于竖排模式。这通过关联的 CMap 字典中的 WMode 条目选择;参见 9.7.5《CMaps》。要以这种方式使用,CIDFont 必须为每个字形定义垂直位移,并且定义一个关联水平和垂直书写原点的位置向量。
默认位置向量和垂直位移向量应通过 CIDFont 字典中的 DW2 条目指定。DW2 应为一个包含两个值的数组:位置向量 v 的垂直分量和位移向量 w1 的垂直分量(见图 40)。位置向量的水平分量应为字形宽度的一半,位移向量的水平分量应为 0。
EXAMPLE 2
如果 DW2 条目是:
/ DW2 [ 880 −1000 ]
则字形的位置向量和垂直位移向量为:
其中 w0 是同一字形的宽度(水平位移)。
NOTE
垂直分量的负值将把下一个字形的原点置于当前字形的 下方,因为标准坐标系统中的垂直坐标是从下往上增加的。
W2 数组应定义单个 CID 的垂直度量。数组的元素应按两种或五种一组的方式组织,每组应采用以下两种格式之一:
在第一种格式中,c 是起始 CID,后面跟着一个数组,数组中的数字按三组一组解释。每组包含垂直位移向量 w1 的垂直分量(其水平分量应为 0),然后是位置向量 v 的水平和垂直分量。连续的每组定义从 c 开始的连续 CID 的垂直度量。第二种格式定义了从 cfirst 到 clast 范围内的 CID,后面跟随三个数字,定义此范围内所有 CID 的垂直度量。
EXAMPLE 3
该 W2 条目定义了 CID 120 的垂直位移向量为 (0, −1000),位置向量为 (250, 772)。它还定义了 CID 范围从 7080 到 8032 的位移向量为 (0, −1000),位置向量为 (500, 900)。
/W2 [ 120 [ −1000 250 772 ]
7080 8032 −1000 500 900
]
As discussed in [9.2.4], "Glyph Positioning and Metrics", the width of a glyph refers to the horizontal displacement between the origin of the glyph and the origin of the next glyph when writing in horizontal mode. In this mode, the vertical displacement between origins shall be 0. Widths for a CIDFont are defined using the DW and W entries in the CIDFont dictionary. These widths shall be consistent with the actual widths given in the CIDFont program.
The W array allows the definition of widths for individual CIDs. The elements of the array shall be organized in groups of two or three, where each group shall be in one of these two formats:
In the first format, c shall be an integer specifying a starting CID value; it shall be followed by an array of n numbers that shall specify the widths for n consecutive CIDs, starting with c. The second format shall define the same width, w, for all CIDs in the range cfirst to clast.
EXAMPLE 1
In this example, the glyphs having CIDs 120, 121, and 122 are 400, 325, and 500 units wide, respectively. CIDs in the range 7080 through 8032 all have a width of 1000 units.
W entry example:
/W [ 120 [ 400 325 500 ]
7080 8032 1000
]
Glyphs from a CIDFont may be shown in vertical writing mode. This is selected by the WMode entry in the associated CMap dictionary; see 9.7.5, "CMaps". To be used in this way, the CIDFont shall define the vertical displacement for each glyph and the position vector that relates the horizontal and vertical writing origins.
The default position vector and vertical displacement vector shall be specified by the DW2 entry in the CIDFont dictionary. DW2 shall be an array of two values: the vertical component of the position vector v and the vertical component of the displacement vector w1 (see Figure 40). The horizontal component of the position vector shall be half the glyph width, and that of the displacement vector shall be 0.
EXAMPLE 2
If the DW2 entry is
/ DW2 [ 880 −1000 ]
then a glyph’s position vector and vertical displacement vector are
where w0 is the width (horizontal displacement) for the same glyph.
NOTE
A negative value for the vertical component places the origin of the next glyph below the current glyph because vertical coordinates in a standard coordinate system increase from bottom to top.
The W2 array shall define vertical metrics for individual CIDs. The elements of the array shall be organized in groups of two or five, where each group shall be in one of these two formats:
In the first format, c is a starting CID and shall be followed by an array containing numbers interpreted in groups of three. Each group shall consist of the vertical component of the vertical displacement vector w1 (whose horizontal component shall be 0) followed by the horizontal and vertical components for the position vector v. Successive groups shall define the vertical metrics for consecutive CIDs starting with c. The second format defines a range of CIDs from cfirst to clast , that shall be followed by three numbers that define the vertical metrics for all CIDs in this range.
EXAMPLE 3
This W2 entry defines the vertical displacement vector for the glyph with CID 120 as (0, −1000) and the position vector as (250, 772). It also defines the displacement vector for CIDs in the range 7080 through 8032 as (0, −1000) and the position vector as (500, 900).
/W2 [ 120 [ −1000 250 772 ]
7080 8032 −1000 500 900
]
9.7.5 CMaps¶
CMaps
9.7.5.1 概述¶
General
CMap 应指定从字符代码到字符选择器的映射。在 PDF 中,字符选择器应为 CIDFont 中的 CIDs(如前所述,PostScript CMap 也可以使用名称或代码)。CMap 的功能类似于简单字体的 Encoding 字典。CMap 不应直接引用特定的 CIDFont;而是应与其结合,作为 CID 键控字体的一部分,在 PDF 中表示为 Type 0 字体字典(参见 9.7.6《Type 0 字体字典》)。在 CMap 中,字符映射应通过字体编号引用关联的 CIDFont,在 PDF 中,字体编号为 0。
PDF 还使用一种特殊类型的 CMap,将字符代码映射到 Unicode 值(参见 9.10.3《ToUnicode CMap》)。
CMap 应指定与其结合使用的任何 CIDFont 的书写模式——水平或垂直。书写模式决定在从该字体绘制字形时应使用哪些度量。
NOTE
书写模式是作为 CMap 的一部分指定的,因为在某些情况下,水平书写和垂直书写时使用的字形不同。在这种情况下,CMap 的水平和垂直变体为给定字符代码指定不同的 CIDs。
CMap 可以通过以下两种方式之一来指定:
- 作为一个名称对象,标识一个预定义的 CMap,其值应为表 118 中定义的预定义 CMap 名称之一。
- 作为一个流对象,其内容应为 CMap 文件。
A CMap shall specify the mapping from character codes to character selectors. In PDF, the character selectors shall be CIDs in a CIDFont (as mentioned earlier, PostScript CMaps can use names or codes as well). A CMap serves a function analogous to the Encoding dictionary for a simple font. The CMap shall not refer directly to a specific CIDFont; instead, it shall be combined with it as part of a CID-keyed font, represented in PDF as a Type 0 font dictionary (see 9.7.6, "Type 0 Font Dictionaries"). Within the CMap, the character mappings shall refer to the associated CIDFont by font number, which in PDF shall be 0.
PDF also uses a special type of CMap to map character codes to Unicode values (see 9.10.3, "ToUnicode CMaps").
A CMap shall specify the writing mode—horizontal or vertical—for any CIDFont with which the CMap is combined. The writing mode determines which metrics shall be used when glyphs are painted from that font.
NOTE
Writing mode is specified as part of the CMap because, in some cases, different shapes are used when writing horizontally and vertically. In such cases, the horizontal and vertical variants of a CMap specify different CIDs for a given character code.
A CMap shall be specified in one of two ways:
- As a name object identifying a predefined CMap, whose value shall be one of the predefined CMap names defined in Table 118.
- As a stream object whose contents shall be a CMap file.
9.7.5.2 预定义 CMap¶
Predefined CMaps
几个 CMap 定义了从 Unicode 编码到字符集合的映射。在文本字符串中出现的 Unicode 值应按大端顺序(高位字节在前)表示。CMap 名称中包含 “UCS2” 的使用 UCS-2 编码;包含 “UTF16” 的使用 UTF-16BE(大端)编码。
NOTE 1
表 118 列出了预定义 CMap 的名称。这些 CMap 将字符代码映射到单一后代 CIDFont 中的 CIDs。名称以 H 结尾的 CMap 指定水平书写模式;以 V 结尾的则指定垂直书写模式。
名称 | 描述 |
---|---|
简体中文 | |
GB-EUC-H | Microsoft Code Page 936 (lfCharSet 0x86),GB 2312-80 字符集,EUC-CN 编码 |
GB-EUC-V | GB-EUC-H 的垂直版本 |
GBpc-EUC-H | Mac OS,GB 2312-80 字符集,EUC-CN 编码,脚本管理器代码 19 |
GBpc-EUC-V | GBpc-EUC-H 的垂直版本 |
GBK-EUC-H | Microsoft Code Page 936 (lfCharSet 0x86),GBK 字符集,GBK 编码 |
GBK-EUC-V | GBK-EUC-H 的垂直版本 |
GBKp-EUC-H | 与 GBK-EUC-H 相同,但将半宽拉丁字符替换为等宽形式,并将字符代码 0x24 映射为美元符号 ($),而不是元符号 (¥) |
GBKp-EUC-V | GBKp-EUC-H 的垂直版本 |
GBK2K-H | GB 18030-2000 字符集,混合 1、2 和 4 字节编码 |
GBK2K-V | GBK2K-H 的垂直版本 |
UniGB-UCS2-H | Adobe-GB1 字符集合的 Unicode (UCS-2) 编码 |
UniGB-UCS2-V | UniGB-UCS2-H 的垂直版本 |
UniGB-UTF16-H | Adobe-GB1 字符集合的 Unicode (UTF-16BE) 编码;包含所有 GB18030-2000 字符集中的字符映射 |
UniGB-UTF16-V | UniGB-UTF16-H 的垂直版本 |
繁体中文 | |
B5pc-H | Mac OS,大五字形集,大五编码,脚本管理器代码 2 |
B5pc-V | B5pc-H 的垂直版本 |
HKscs-B5-H | 香港 SCS,扩展自大五字形集及编码 |
HKscs-B5-V | HKscs-B5-H 的垂直版本 |
ETen-B5-H | Microsoft Code Page 950 (lfCharSet 0x88),大五字形集与 ETen 扩展 |
ETen-B5-V | ETen-B5-H 的垂直版本 |
ETenms-B5-H | 与 ETen-B5-H 相同,但将半宽拉丁字符替换为等宽形式 |
ETenms-B5-V | ETenms-B5-H 的垂直版本 |
CNS-EUC-H | CNS 11643-1992 字符集,EUC-TW 编码 |
CNS-EUC-V | CNS-EUC-H 的垂直版本 |
UniCNS-UCS2-H | Adobe-CNS1 字符集合的 Unicode (UCS-2) 编码 |
UniCNS-UCS2-V | UniCNS-UCS2-H 的垂直版本 |
UniCNS-UTF16-H | Adobe-CNS1 字符集合的 Unicode (UTF-16BE) 编码;包含 HKSCS-2001 字符集中的所有字符映射,且包含 2 字节和 4 字节字符编码 |
UniCNS-UTF16-V | UniCNS-UTF16-H 的垂直版本 |
日语 | |
83pv-RKSJ-H | Mac OS,JIS X 0208 字符集与 KanjiTalk6 扩展,Shift-JIS 编码,脚本管理器代码 1 |
90ms-RKSJ-H | Microsoft Code Page 932 (lfCharSet 0x80),JIS X 0208 字符集与 NEC 和 IBM® 扩展 |
90ms-RKSJ-V | 90ms-RKSJ-H 的垂直版本 |
90msp-RKSJ-H | 与 90ms-RKSJ-H 相同,但将半宽拉丁字符替换为等宽形式 |
90msp-RKSJ-V | 90msp-RKSJ-H 的垂直版本 |
90pv-RKSJ-H | Mac OS,JIS X 0208 字符集与 KanjiTalk7 扩展,Shift-JIS 编码,脚本管理器代码 1 |
Add-RKSJ-H | JIS X 0208 字符集与富士通 FMR 扩展,Shift-JIS 编码,Add-RKSJ-V 为 Add-RKSJ-H 的垂直版本 |
EUC-HJIS X | 0208 字符集,EUC-JP 编码,EUC-V 为 EUC-H 的垂直版本 |
Ext-RKSJ-H | JIS C 6226 (JIS78) 字符集与 NEC 扩展,Shift-JIS 编码,Ext-RKSJ-V 为 Ext-RKSJ-H 的垂直版本 |
H | JIS X 0208 字符集,ISO-2022-JP 编码 |
V | H 的垂直版本 |
UniJIS-UCS2-H | Adobe-Japan1 字符集合的 Unicode (UCS-2) 编码 |
UniJIS-UCS2-V | UniJIS-UCS2-H 的垂直版本 |
UniJIS-UCS2-HW-H | 与 UniJIS-UCS2-H 相同,但将等宽拉丁字符替换为半宽形式 |
UniJIS-UCS2-HW-V | UniJIS-UCS2-HW-H 的垂直版本 |
UniJIS-UTF16-H | Adobe-Japan1 字符集合的 Unicode (UTF-16BE) 编码;包含 JIS X 0213:1000 字符集中的所有字符映射 |
UniJIS-UTF16-V | UniJIS-UTF16-H 的垂直版本 |
韩语 | |
KSC-EUC-H | KS X 1001:1992 字符集,EUC-KR 编码 |
KSC-EUC-V | KSC-EUC-H 的垂直版本 |
KSCms-UHC-H | Microsoft Code Page 949 (lfCharSet 0x81),KS X 1001:1992 字符集加上 8822 个额外韩文字符,统一韩文编码 (UHC) |
KSCms-UHC-V | KSCms-UHC-H 的垂直版本 |
KSCms-UHC-H | W-H 与 KSCms-UHC-H 相同,但将等宽拉丁字符替换为半宽形式 |
KSCms-UHC-H | W-V KSCms-UHC-HW-H 的垂直版本 |
KSCpc-EUC-H | Mac OS,KS X 1001:1992 字符集与 Mac OS KH 扩展,脚本管理器代码 3 |
UniKS-UCS2-H | Adobe-Korea1 字符集合的 Unicode (UCS-2) 编码 |
UniKS-UCS2-V | UniKS-UCS2-H 的垂直版本 |
UniKS-UTF16-H | Adobe-Korea1 字符集合的 Unicode (UTF-16BE) 编码 |
UniKS-UTF16-V | UniKS-UTF16-H 的垂直版本 |
通用 | |
Identity-H | 用于 2 字节 CID 的水平身份映射;可以与任何 Registry、Ordering 和 Supplement 值的 CIDFonts 配合使用。它将 0 到 65,535 之间的 2 字节字符代码映射到相同的 2 字节 CID 值,按照高位字节优先进行解释。 |
Identity-V | Identity-H 的垂直版本。映射与 Identity-H 相同。 |
NOTE 2
Identity-H 和 Identity-V CMap 可以用于直接通过其 CID 引用字形,显示文本字符串时。
当当前字体为 Type 0 字体且其 Encoding 条目为 Identity-H 或 Identity-V 时,要显示的字符串应包含表示 CID 的字节对,高字节在前。当当前字体为 CIDFont 时,要显示的字符串应包含表示 CID 的字节对,高字节在前。当当前字体为 Type 2 CIDFont 且 CIDToGIDMap 条目为 Identity,并且 TrueType 字体嵌入在 PDF 文件中时,2 字节的 CID 值应与 TrueType 字体程序中的字形描述的字形索引相同。
NOTE 3
表 119 列出了不同版本 PDF 中预定义 CMap 所引用的字符集合。破折号(—)表示该版本的 PDF 中没有预定义该 CMap。
CMAP | PDF 1.2 | PDF 1.3 | PDF 1.4 | PDF 1.5 |
---|---|---|---|---|
中文 (简体) | ||||
GB-EUC-H/V | Adobe-GB1-0 | Adobe-GB1-0 | Adobe-GB1-0 | Adobe-GB1-0 |
GBpc-EUC-H | Adobe-GB1-0 | Adobe-GB1-0 | Adobe-GB1-0 | Adobe-GB1-0 |
GBpc-EUC-V | — | Adobe-GB1-0 | Adobe-GB1-0 | Adobe-GB1-0 |
GBK-EUC-H/V | — | Adobe-GB1-2 | Adobe-GB1-2 | Adobe-GB1-2 |
GBKp-EUC-H/V | — | — | Adobe-GB1-2 | Adobe-GB1-2 |
GBK2K-H/V | — | — | Adobe-GB1-4 | Adobe-GB1-4 |
UniGB-UCS2-H/V | — | Adobe-GB1-2 | Adobe-GB1-4 | Adobe-GB1-4 |
UniGB-UTF16-H/V | — | — | — | Adobe-GB1-4 |
中文 (繁体) | ||||
B5pc-H/V | Adobe-CNS1-0 | Adobe-CNS1-0 | Adobe-CNS1-0 | Adobe-CNS1-0 |
HKscs-B5-H/V | — | — | Adobe-CNS1-3 | Adobe-CNS1-3 |
ETen-B5-H/V | Adobe-CNS1-0 | Adobe-CNS1-0 | Adobe-CNS1-0 | Adobe-CNS1-0 |
ETenms-B5-H/V | — | Adobe-CNS1-0 | Adobe-CNS1-0 | Adobe-CNS1-0 |
CNS-EUC-H/V | Adobe-CNS1-0 | Adobe-CNS1-0 | Adobe-CNS1-0 | Adobe-CNS1-0 |
UniCNS-UCS2-H/V | — | Adobe-CNS1-0 | Adobe-CNS1-3 | Adobe-CNS1-3 |
UniCNS-UTF16-H/V | — | — | — | Adobe-CNS1-4 |
日文 | ||||
83pv-RKSJ-H | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 |
90ms-RKSJ-H/V | Adobe-Japan1-2 | Adobe-Japan1-2 | Adobe-Japan1-2 | Adobe-Japan1-2 |
90msp-RKSJ-H/V | — | Adobe-Japan1-2 | Adobe-Japan1-2 | Adobe-Japan1-2 |
90pv-RKSJ-H | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 |
Add-RKSJ-H/V | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 |
EUC-H/V | — | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 |
Ext-RKSJ-H/V | Adobe-Japan1-2 | Adobe-Japan1-2 | Adobe-Japan1-2 | Adobe-Japan1-2 |
H/V | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 |
UniJIS-UCS2-H/V | — | Adobe-Japan1-2 | Adobe-Japan1-4 | Adobe-Japan1-4 |
UniJIS-UCS2-HW-H/V | — | Adobe-Japan1-2 | Adobe-Japan1-4 | Adobe-Japan1-4 |
UniJIS-UTF16-H/V | — | — | — | Adobe-Japan1-5 |
韩语 | ||||
KSC-EUC-H/V | Adobe-Korea1-0 | Adobe-Korea1-0 | Adobe-Korea1-0 | Adobe-Korea1-0 |
KSCms-UHC-H/V | Adobe-Korea1-1 | Adobe-Korea1-1 | Adobe-Korea1-1 | Adobe-Korea1-1 |
KSCms-UHC-HW-H/V | — | Adobe-Korea1-1 | Adobe-Korea1-1 | Adobe-Korea1-1 |
KSCpc-EUC-H | Adobe-Korea1-0 | Adobe-Korea1-0 | Adobe-Korea1-0 | Adobe-Korea1-0 |
UniKS-UCS2-H/V | — | Adobe-Korea1-1 | Adobe-Korea1-1 | Adobe-Korea1-1 |
UniKS-UTF16-H/V | — | — | — | Adobe-Korea1-2 |
Generic | ||||
Identity-H/V | Adobe-Identity-0 | Adobe-Identity-0 | Adobe-Identity-0 | Adobe-Identity-0 |
符合要求的阅读器应支持表 119 中列出的所有字符集合。如9.7.3《CIDSystemInfo 字典》中所述,字符集合通过注册表、排序和补充号来标识,且补充是累积的;即,较高编号的补充包括较低编号补充中的 CID,以及一些额外的 CID。因此,根据给定 PDF 版本的预定义 CMap 编码的文本,在由支持相同或更高版本 PDF 的符合要求的阅读器中解释时应是有效的。当由支持较早 PDF 版本的符合要求的阅读器解释时,如果遇到未为该 PDF 版本预定义的 CMap,则该文本会导致错误。如果遇到的字符代码是在比支持的 PDF 版本对应的补充号更高的补充中添加的,则对于这些代码,不会显示任何字符;详见9.7.6.3《处理未定义字符》。
Identity-H 和 Identity-V CMap 不得与非嵌入字体一起使用。只能使用标准化字符集。
NOTE 4
如果符合要求的写入器在生成 PDF 文件时遇到使用来自高编号补充的 CID 的文本,该补充的编号高于正在生成的 PDF 版本所对应的补充号,则应用程序应嵌入高编号补充的 CMap,而不是引用预定义的 CMap。
定义预定义 CMap 的 CMap 程序可以通过 ASN 网站获取。
Several of the CMaps define mappings from Unicode encodings to character collections. Unicode values appearing in a text string shall be represented in big-endian order (high-order byte first). CMap names containing “UCS2” use UCS-2 encoding; names containing “UTF16” use UTF-16BE (big-endian) encoding.
NOTE 1
Table 118 lists the names of the predefined CMaps. These CMaps map character codes to CIDs in a single descendant CIDFont. CMaps whose names end in H specify horizontal writing mode; those ending in V specify vertical writing mode.
Name | Description |
---|---|
Chinese (Simplified) | |
GB-EUC-H | Microsoft Code Page 936 (lfCharSet 0x86), GB 2312-80 character set, EUC-CN encoding |
GB-EUC-V | Vertical version of GB-EUC-H |
GBpc-EUC-H | Mac OS, GB 2312-80 character set, EUC-CN encoding, Script Manager code 19 |
GBpc-EUC-V | Vertical version of GBpc-EUC-H |
GBK-EUC-H | Microsoft Code Page 936 (lfCharSet 0x86), GBK character set, GBK encoding |
GBK-EUC-V | Vertical version of GBK-EUC-H |
GBKp-EUC-H | Same as GBK-EUC-H but replaces half-width Latin characters with proportional forms and maps character code 0x24 to a dollar sign ($) instead of a yuan symbol (¥) |
GBKp-EUC-V | Vertical version of GBKp-EUC-H |
GBK2K-H | GB 18030-2000 character set, mixed 1-, 2-, and 4-byte encoding |
GBK2K-V | Vertical version of GBK2K-H |
UniGB-UCS2-H | Unicode (UCS-2) encoding for the Adobe-GB1 character collection |
UniGB-UCS2-V | Vertical version of UniGB-UCS2-H |
UniGB-UTF16-H | Unicode (UTF-16BE) encoding for the Adobe-GB1 character collection; contains mappings for all characters in the GB18030-2000 character set |
UniGB-UTF16-V | Vertical version of UniGB-UTF16-H |
Chinese (Traditional) | |
B5pc-H | Mac OS, Big Five character set, Big Five encoding, Script Manager code 2 |
B5pc-V | Vertical version of B5pc-H |
HKscs-B5-H | Hong Kong SCS, an extension to the Big Five character set and encoding |
HKscs-B5-V | Vertical version of HKscs-B5-H |
ETen-B5-H | Microsoft Code Page 950 (lfCharSet 0x88), Big Five character set with ETen extensions |
ETen-B5-V | Vertical version of ETen-B5-H |
ETenms-B5-H | Same as ETen-B5-H but replaces half-width Latin characters with proportionalforms |
ETenms-B5-V | Vertical version of ETenms-B5-H |
CNS-EUC-H | CNS 11643-1992 character set, EUC-TW encoding |
CNS-EUC-V | Vertical version of CNS-EUC-H |
UniCNS-UCS2-H | Unicode (UCS-2) encoding for the Adobe-CNS1 character collection |
UniCNS-UCS2-V | Vertical version of UniCNS-UCS2-H |
UniCNS-UTF16-H | Unicode (UTF-16BE) encoding for the Adobe-CNS1 character collection; contains mappings for all the characters in the HKSCS-2001 character set and contains both 2- and 4-byte character codes |
UniCNS-UTF16-V | Vertical version of UniCNS-UTF16-H |
Japanese | |
83pv-RKSJ-H | Mac OS, JIS X 0208 character set with KanjiTalk6 extensions, Shift-JIS encoding, Script Manager code 1 |
90ms-RKSJ-H | Microsoft Code Page 932 (lfCharSet 0x80), JIS X 0208 character set with NEC and IBM® extensions |
90ms-RKSJ-V | Vertical version of 90ms-RKSJ-H |
90msp-RKSJ-H | Same as 90ms-RKSJ-H but replaces half-width Latin characters with proportional forms |
90msp-RKSJ-V | Vertical version of 90msp-RKSJ-H |
90pv-RKSJ-H | Mac OS, JIS X 0208 character set with KanjiTalk7 extensions, Shift-JIS encoding, Script Manager code 1 |
Add-RKSJ-H | JIS X 0208 character set with Fujitsu FMR extensions, Shift-JIS encoding Add-RKSJ-VVertical version of Add-RKSJ-H |
EUC-HJIS X | 0208 character set, EUC-JP encoding EUC-VVertical version of EUC-H |
Ext-RKSJ-H | JIS C 6226 (JIS78) character set with NEC extensions, Shift-JIS encoding Ext-RKSJ-VVertical version of Ext-RKSJ-H |
H | JIS X 0208 character set, ISO-2022-JP encoding |
V | Vertical version of H |
UniJIS-UCS2-H | Unicode (UCS-2) encoding for the Adobe-Japan1 character collection |
UniJIS-UCS2-V | Vertical version of UniJIS-UCS2-H |
UniJIS-UCS2-HW-H | Same as UniJIS-UCS2-H but replaces proportional Latin characters with half-width forms |
UniJIS-UCS2-HW-V | Vertical version of UniJIS-UCS2-HW-H |
UniJIS-UTF16-H | Unicode (UTF-16BE) encoding for the Adobe-Japan1 character collection; contains mappings for all characters in the JIS X 0213:1000 character set |
UniJIS-UTF16-V | Vertical version of UniJIS-UTF16-H |
Korean | |
KSC-EUC-H | KS X 1001:1992 character set, EUC-KR encoding |
KSC-EUC-V | Vertical version of KSC-EUC-H |
KSCms-UHC-H | Microsoft Code Page 949 (lfCharSet 0x81), KS X 1001:1992 character set plus 8822 additional hangul, Unified Hangul Code (UHC) encoding |
KSCms-UHC-V | Vertical version of KSCms−UHC-H |
KSCms-UHC-H | W-HSame as KSCms-UHC-H but replaces proportional Latin characters with half-width forms |
KSCms-UHC-H | W-VVertical version of KSCms-UHC-HW-H |
KSCpc-EUC-H | Mac OS, KS X 1001:1992 character set with Mac OS KH extensions, Script Manager Code 3 |
UniKS-UCS2-H | Unicode (UCS-2) encoding for the Adobe-Korea1 character collection |
UniKS-UCS2-V | Vertical version of UniKS-UCS2-H |
UniKS-UTF16-H | Unicode (UTF-16BE) encoding for the Adobe-Korea1 character collection |
UniKS-UTF16-V | Vertical version of UniKS-UTF16-H |
Generic | |
Identity-H | The horizontal identity mapping for 2-byte CIDs; may be used with CIDFonts using any Registry, Ordering, and Supplement values. It maps 2-byte character codes ranging from 0 to 65,535 to the same 2-byte CID value, interpreted high- order byte first. |
Identity-V | Vertical version of Identity-H. The mapping is the same as for Identity-H. |
NOTE 2
The Identity-H and Identity-V CMaps may be used to refer to glyphs directly by their CIDs when showing a text string.
When the current font is a Type 0 font whose Encoding entry is Identity-H or Identity-V, the string to be shown shall contain pairs of bytes representing CIDs, high-order byte first. When the current font is a CIDFont, the string to be shown shall contain pairs of bytes representing CIDs, high-order byte first. When the current font is a Type 2 CIDFont in which the CIDToGIDMap entry is Identity and if the TrueType font is embedded in the PDF file, the 2-byte CID values shall be identical glyph indices for the glyph descriptions in the TrueType font program.
NOTE 3
Table 119 lists the character collections referenced by the predefined CMaps for the different versions of PDF. A dash (—) indicates that the CMap is not predefined in that PDF version.
CMAP | PDF 1.2 | PDF 1.3 | PDF 1.4 | PDF 1.5 |
---|---|---|---|---|
Chinese (Simplified) | ||||
GB-EUC-H/V | Adobe-GB1-0 | Adobe-GB1-0 | Adobe-GB1-0 | Adobe-GB1-0 |
GBpc-EUC-H | Adobe-GB1-0 | Adobe-GB1-0 | Adobe-GB1-0 | Adobe-GB1-0 |
GBpc-EUC-V | — | Adobe-GB1-0 | Adobe-GB1-0 | Adobe-GB1-0 |
GBK-EUC-H/V | — | Adobe-GB1-2 | Adobe-GB1-2 | Adobe-GB1-2 |
GBKp-EUC-H/V | — | — | Adobe-GB1-2 | Adobe-GB1-2 |
GBK2K-H/V | — | — | Adobe-GB1-4 | Adobe-GB1-4 |
UniGB-UCS2-H/V | — | Adobe-GB1-2 | Adobe-GB1-4 | Adobe-GB1-4 |
UniGB-UTF16-H/V | — | — | — | Adobe-GB1-4 |
Chinese (Traditional) | ||||
B5pc-H/V | Adobe-CNS1-0 | Adobe-CNS1-0 | Adobe-CNS1-0 | Adobe-CNS1-0 |
HKscs-B5-H/V | — | — | Adobe-CNS1-3 | Adobe-CNS1-3 |
ETen-B5-H/V | Adobe-CNS1-0 | Adobe-CNS1-0 | Adobe-CNS1-0 | Adobe-CNS1-0 |
ETenms-B5-H/V | — | Adobe-CNS1-0 | Adobe-CNS1-0 | Adobe-CNS1-0 |
CNS-EUC-H/V | Adobe-CNS1-0 | Adobe-CNS1-0 | Adobe-CNS1-0 | Adobe-CNS1-0 |
UniCNS-UCS2-H/V | — | Adobe-CNS1-0 | Adobe-CNS1-3 | Adobe-CNS1-3 |
UniCNS-UTF16-H/V | — | — | — | Adobe-CNS1-4 |
Japanese | ||||
83pv-RKSJ-H | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 |
90ms-RKSJ-H/V | Adobe-Japan1-2 | Adobe-Japan1-2 | Adobe-Japan1-2 | Adobe-Japan1-2 |
90msp-RKSJ-H/V | — | Adobe-Japan1-2 | Adobe-Japan1-2 | Adobe-Japan1-2 |
90pv-RKSJ-H | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 |
Add-RKSJ-H/V | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 |
EUC-H/V | — | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 |
Ext-RKSJ-H/V | Adobe-Japan1-2 | Adobe-Japan1-2 | Adobe-Japan1-2 | Adobe-Japan1-2 |
H/V | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 | Adobe-Japan1-1 |
UniJIS-UCS2-H/V | — | Adobe-Japan1-2 | Adobe-Japan1-4 | Adobe-Japan1-4 |
UniJIS-UCS2-HW-H/V | — | Adobe-Japan1-2 | Adobe-Japan1-4 | Adobe-Japan1-4 |
UniJIS-UTF16-H/V | — | — | — | Adobe-Japan1-5 |
Korean | ||||
KSC-EUC-H/V | Adobe-Korea1-0 | Adobe-Korea1-0 | Adobe-Korea1-0 | Adobe-Korea1-0 |
KSCms-UHC-H/V | Adobe-Korea1-1 | Adobe-Korea1-1 | Adobe-Korea1-1 | Adobe-Korea1-1 |
KSCms-UHC-HW-H/V | — | Adobe-Korea1-1 | Adobe-Korea1-1 | Adobe-Korea1-1 |
KSCpc-EUC-H | Adobe-Korea1-0 | Adobe-Korea1-0 | Adobe-Korea1-0 | Adobe-Korea1-0 |
UniKS-UCS2-H/V | — | Adobe-Korea1-1 | Adobe-Korea1-1 | Adobe-Korea1-1 |
UniKS-UTF16-H/V | — | — | — | Adobe-Korea1-2 |
Generic | ||||
Identity-H/V | Adobe-Identity-0 | Adobe-Identity-0 | Adobe-Identity-0 | Adobe-Identity-0 |
A conforming reader shall support all of the character collections listed in Table 119. As noted in 9.7.3, "CIDSystemInfo Dictionaries", a character collection is identified by registry, ordering, and supplement number, and supplements are cumulative; that is, a higher-numbered supplement includes the CIDs contained in lower- numbered supplements, as well as some additional CIDs. Consequently, text encoded according to the predefined CMaps for a given PDF version shall be valid when interpreted by a conforming reader supporting the same or a later PDF version. When interpreted by a conforming reader supporting an earlier PDF version, such text causes an error if a CMap is encountered that is not predefined for that PDF version. If character codes are encountered that were added in a higher-numbered supplement than the one corresponding to the supported PDF version, no characters are displayed for those codes; see 9.7.6.3, "Handling Undefined Characters".
The Identity-H and Identity-V CMaps shall not be used with a non-embedded font. Only standardized character sets may be used.
NOTE 4
If a conforming writer producing a PDF file encounters text to be included that uses CIDs from a higher-numbered supplement than the one corresponding to the PDF version being generated, the application should embed the CMap for the higher-numbered supplement rather than refer to the predefined CMap.
The CMap programs that define the predefined CMaps are available through the ASN Web site.
9.7.5.3 嵌入式 CMap 文件¶
Embedded CMap Files
对于未预定义的字符编码,PDF 文件应包含一个流来定义 CMap。除了流的标准条目(在表 5中列出)外,CMap 流字典还包含表 120中列出的条目。流中的数据定义了从字符代码到字体编号和字符选择器的映射。数据应遵循 Adobe 技术说明 #5014《Adobe CMap 和 CIDFont 文件规范》中定义的语法(参见参考书目)。
键 | 类型 | 值 |
---|---|---|
Type | name | (必需) 此字典描述的 PDF 对象的类型;对于 CMap 字典,值应为 CMap。 |
CMapName | name | (必需) CMap 的名称。它应与 CMap 文件中 CMapName 的值相同。 |
CIDSystemInfo | dictionary | (必需) 一个字典(参见9.7.3《CIDSystemInfo 字典》),包含定义与 CMap 关联的 CIDFont 或 CIDFonts 的字符集合的条目。 此条目的值应与 CMap 文件中 CIDSystemInfo 的值相同。(但它不需要与 Identity-H 或 Identity-V CMap 的 CIDSystemInfo 值匹配。) |
WMode | integer | (可选) 一个代码,指定与此 CMap 结合使用的任何 CIDFont 的书写模式。值为 0 表示水平书写,1 表示垂直书写。默认值:0。 此条目的值应与 CMap 文件中 WMode 的值相同。 |
UseCMap | name 或 stream | (可选) 预定义 CMap 的名称,或包含 CMap 的流。如果此条目存在,则引用的 CMap 应仅指定与引用的 CMap 不同的字符映射。 |
For character encodings that are not predefined, the PDF file shall contain a stream that defines the CMap. In addition to the standard entries for streams (listed in Table 5), the CMap stream dictionary contains the entries listed in Table 120. The data in the stream defines the mapping from character codes to a font number and a character selector. The data shall follow the syntax defined in Adobe Technical Note #5014, Adobe CMap and CIDFont Files Specification (see bibliography).
Key | Type | Value |
---|---|---|
Type | name | (Required) The type of PDF object that this dictionary describes; shall be CMap for a CMap dictionary. |
CMapName | name | (Required) The name of the CMap. It shall be the same as the value of CMapName in the CMap file. |
CIDSystemInfo | dictionary | (Required) A dictionary (see 9.7.3, "CIDSystemInfo Dictionaries") containing entries that define the character collection for the CIDFont or CIDFonts associated with the CMap. The value of this entry shall be the same as the value of CIDSystemInfo in the CMap file. (However, it does not need to match the values of CIDSystemInfo for the Identity-H or Identity-V CMaps.) |
WMode | integer | (Optional) A code that specifies the writing mode for any CIDFont with which this CMap is combined. The value shall be 0 for horizontal or 1 for vertical. Default value: 0. The value of this entry shall be the same as the value of WMode in the CMap file. |
UseCMap | name or stream | (Optional) The name of a predefined CMap, or a stream containing a CMap. If this entry is present, the referencing CMap shall specify only the character mappings that differ from the referenced CMap. |
9.7.5.4 CMap 示例及操作符总结¶
CMap Example and Operator Summary
嵌入的 CMap 文件应符合 Adobe 技术说明 #5014 中记录的格式,并遵守以下附加约束:
a) 如果嵌入的 CMap 文件包含 usecmap 引用,则该引用指示的 CMap 也应通过 CMap 流字典中的 UseCMap 条目进行标识。
b) 如果存在 usefont 操作符,则该操作符应指定字体编号为 0。
c) beginbfchar 和 endbfchar 不应出现在用作 Type 字体的 Encoding 条目的 CMap 中;然而,它们可以出现在 ToUnicode CMap 的定义中。
d) 如果正常映射产生的 CID 在关联的 CIDFont 中没有相应的字形,则应使用 beginnotdefchar、endnotdefchar、beginnotdefrange 和 endnotdefrange 定义的 notdef 映射。
e) 不应使用 beginrearrangedfont、endrearrangedfont、beginusematrix 和 endusematrix 操作符。
示例
此示例展示了一个用于日本 Shift-JIS 编码的 CMap 示例。该编码中的字符代码可以是 1 字节或 2 字节长度。此 CMap 可与使用与 CIDSystemInfo 条目中指定的相同 CID 排序的 CIDFont 配合使用。请注意,流字典中的几个条目也在流数据中重复。
22 0 obj
<< /Type /CMap
/CMapName /90ms-RKSJ-H
/CIDSystemInfo << /Registry ( Adobe )
/Ordering ( Japan1 )
/Supplement 2
>>
/WMode 0
/Length 23 0 R
>>
stream
%!PS-Adobe-3 . 0 Resource-CMap
%%DocumentNeededResources : ProcSet ( CIDInit )
%%IncludeResource : ProcSet ( CIDInit )
%%BeginResource : CMap ( 90ms-RKSJ-H )
%%Title : ( 90ms-RKSJ-H Adobe Japan1 2 )
%%Version : 10 . 001
%%Copyright : Copyright 1990-2001 Adobe Systems Inc .
%%Copyright : All Rights Reserved .
%%EndComments
/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo
3 dict dup begin
/Registry ( Adobe ) def
/Ordering ( Japan1 ) def
/Supplement 2 def
end def
/CMapName /90ms-RKSJ-H def
/CMapVersion 10 . 001 def
/CMapType 1 def
/UIDOffset 950 def
/XUID [ 1 10 25343 ] def
/WMode 0 def
4 begincodespacerange
<00> <80>
<8140> <9FFC>
<A0>
<DF>
<E040> <FCFC>
endcodespacerange
1 beginnotdefrange
<00> <1F> 231
endnotdefrange
100 begincidrange
<20> <7D> 231
<7E> <7E> 631
<8140> <817E> 633
<8180> <81AC> 696
<81B8> <81BF> 741
<81C8> <81CE> 749
…Additional ranges…
<FB40> <FB7E> 8518
<FB80> <FBFC> 8581
<FC40> <FC4B> 8706
endcidrange
endcmap
CMapName currentdict /CMap defineresource pop
end
end
%%EndResource
%%EOF
endstream
endobj
Embedded CMap files shall conform to the format documented in Adobe Technical Note #5014, subject to these additional constraints:
a) If the embedded CMap file contains a usecmap reference, the CMap indicated there shall also be identified by the UseCMap entry in the CMap stream dictionary.
b) The usefont operator, if present, shall specify a font number of 0.
c) The beginbfchar and endbfchar shall not appear in a CMap that is used as the Encoding entry of a Type font; however, they may appear in the definition of a ToUnicode CMap.
d) A notdef mapping, defined using beginnotdefchar, endnotdefchar, beginnotdefrange, and endnotdefrange shall be used if the normal mapping produces a CID for which no glyph is present in the associated CIDFont.
e) The beginrearrangedfont, endrearrangedfont, beginusematrix, and endusematrix operators shall not be used.
EXAMPLE
This example shows a sample CMap for a Japanese Shift-JIS encoding. Character codes in this encoding can be either 1 or 2 bytes in length. This CMap could be used with a CIDFont that uses the same CID ordering as specified in the CIDSystemInfo entry. Note that several of the entries in the stream dictionary are also replicated in the stream data.
22 0 obj
<< /Type /CMap
/CMapName /90ms-RKSJ-H
/CIDSystemInfo << /Registry ( Adobe )
/Ordering ( Japan1 )
/Supplement 2
>>
/WMode 0
/Length 23 0 R
>>
stream
%!PS-Adobe-3 . 0 Resource-CMap
%%DocumentNeededResources : ProcSet ( CIDInit )
%%IncludeResource : ProcSet ( CIDInit )
%%BeginResource : CMap ( 90ms-RKSJ-H )
%%Title : ( 90ms-RKSJ-H Adobe Japan1 2 )
%%Version : 10 . 001
%%Copyright : Copyright 1990-2001 Adobe Systems Inc .
%%Copyright : All Rights Reserved .
%%EndComments
/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo
3 dict dup begin
/Registry ( Adobe ) def
/Ordering ( Japan1 ) def
/Supplement 2 def
end def
/CMapName /90ms-RKSJ-H def
/CMapVersion 10 . 001 def
/CMapType 1 def
/UIDOffset 950 def
/XUID [ 1 10 25343 ] def
/WMode 0 def
4 begincodespacerange
<00> <80>
<8140> <9FFC>
<A0>
<DF>
<E040> <FCFC>
endcodespacerange
1 beginnotdefrange
<00> <1F> 231
endnotdefrange
100 begincidrange
<20> <7D> 231
<7E> <7E> 631
<8140> <817E> 633
<8180> <81AC> 696
<81B8> <81BF> 741
<81C8> <81CE> 749
…Additional ranges…
<FB40> <FB7E> 8518
<FB80> <FBFC> 8581
<FC40> <FC4B> 8706
endcidrange
endcmap
CMapName currentdict /CMap defineresource pop
end
end
%%EndResource
%%EOF
endstream
endobj
9.7.6 Type 0 字体字典¶
Type 0 Font Dictionaries
9.7.6.1 概述¶
General
Type 0 字体字典包含表 121 中列出的条目。
键 | 类型 | 值 |
---|---|---|
Type | name | (必需) 描述此字典的 PDF 对象类型;对于字体字典,必须是 Font。 |
Subtype | name | (必需) 字体类型;对于 Type 0 字体,必须是 Type0。 |
BaseFont | dictionary | (必需) 字体名称。如果后代是 Type 0 CIDFont,则此名称应为 CIDFont 的 BaseFont 名称、连字符和 Encoding 条目中给定的 CMap 名称(或 CMap 中的 CMapName 条目)的连接。如果后代是 Type 2 CIDFont,则此名称应与 CIDFont 的 BaseFont 名称相同。
注意 原则上,这是一个任意名称,因为没有直接与 Type 0 字体字典关联的字体程序。这里描述的约定确保与现有阅读器的最大兼容性。 |
Encoding | name or stream | (必需) 预定义 CMap 的名称,或包含 CMap 的流,该 CMap 将字符代码映射到字体编号和 CID。如果后代是一个 Type 2 CIDFont,并且其关联的 TrueType 字体程序没有嵌入 PDF 文件中,则 Encoding 条目应为预定义的 CMap 名称(见 9.7.4.2,“CIDFont 中的字形选择”)。 |
DescendantFonts | array | (必需) 一个包含单个元素的数组,指定作为此 Type 0 字体后代的 CIDFont 字典。 |
ToUnicode | stream | (可选) 包含 CMap 文件的流,将字符代码映射到 Unicode 值(见 9.10,“提取文本内容”)。 |
示例
该代码示例展示了一个 Type 0 字体。
14 0 obj
<< /Type /Font
/Subtype /Type0
/BaseFont /HeiseiMin-W5-90ms-RKSJ-H
/Encoding /90ms-RKSJ-H
/DescendantFonts [ 15 0 R ]
>>
endobj
A Type 0 font dictionary contains the entries listed in Table 121.
Key | Type | Value |
---|---|---|
Type | name | (Required) The type of PDF object that this dictionary describes; shall be Font for a font dictionary. |
Subtype | name | (Required) The type of font; shall be Type0 for a Type 0 font. |
BaseFont | dictionary | (Required) (Required) The name of the font. If the descendant is a Type 0 CIDFont, this name should be the concatenation of the CIDFont’s BaseFont name, a hyphen, and the CMap name given in the Encoding entry (or the CMapName entry in the CMap). If the descendant is a Type 2 CIDFont, this name should be the same as the CIDFont’s BaseFont name.
NOTE In principle, this is an arbitrary name, since there is no font program associated directly with a Type 0 font dictionary. The conventions described here ensure maximum compatibility with existing readers. |
Encoding | name or stream | (Required) The name of a predefined CMap, or a stream containing a CMap that maps character codes to font numbers and CIDs. If the descendant is a Type 2 CIDFont whose associated TrueType font program is not embedded in the PDF file, the Encoding entry shall be a predefined CMap name (see 9.7.4.2, "Glyph Selection in CIDFonts"). |
DescendantFonts | array | (Required) A one-element array specifying the CIDFont dictionary that is the descendant of this Type 0 font. |
ToUnicode | stream | (Optional) A stream containing a CMap file that maps character codes to Unicode values (see 9.10, "Extraction of Text Content"). |
EXAMPLE
This code sample shows a Type 0 font.
14 0 obj
<< /Type /Font
/Subtype /Type0
/BaseFont /HeiseiMin-W5-90ms-RKSJ-H
/Encoding /90ms-RKSJ-H
/DescendantFonts [ 15 0 R ]
>>
endobj
9.7.6.2 CMap 映射¶
CMap Mapping
Type 0 字体字典的 Encoding 条目指定一个 CMap,该 CMap 指定了当当前字体为 Type 0 字体时,文本显示操作符(如 Tj)如何解释要显示的字符串中的字节。本小节描述了如何解码字符串中的字符,并将其映射为字符选择符,在 PDF 中,字符选择符始终是 CIDs。
CMap 中的代码空间范围(由 begincodespacerange 和 endcodespacerange 分隔)指定了从字符串中提取每个连续字符代码的字节数。代码空间范围应由一对特定长度的代码指定,给出该范围的下限和上限。如果代码的长度与边界代码相同,并且其每个字节的值位于下限和上限的相应字节之间,则认为该代码匹配该范围。代码长度不得大于 4。
将从字符串中提取一个或多个字节,并与 CMap 中的代码空间范围进行匹配。也就是说,首先将字节与 1 字节代码空间范围匹配;如果没有找到匹配项,则提取第二个字节,并将 2 字节代码与 2 字节代码空间范围匹配。此过程将继续进行,直到找到匹配项或测试完所有代码空间范围。由于代码空间范围不能重叠,因此最多只会找到一个匹配项。
从字符串中提取的代码将根据该长度的字符代码映射进行查找。(这些是由 beginbfchar、endbfchar、begincidchar、endcidchar 和对应的范围操作符定义的映射。)如果没有找到匹配项,则将在 notdef 映射中查找,如下一个小节所述。
CMap 映射算法的结果是一个字体编号和一个字符选择符。字体编号应作为索引,指向 Type 0 字体的 DescendantFonts 数组,以选择一个 CIDFont。在 PDF 中,字体编号应为 0,字符选择符应为 CID;这就是此处描述的唯一情况。然后,将使用 CID 来选择 CIDFont 中的一个字形。如果 CIDFont 中没有该 CID 的字形,则应参考 notdef 映射,如 9.7.6.3 中所述,“处理未定义的字符”。
The Encoding entry of a Type 0 font dictionary specifies a CMap that specifies how text-showing operators (such as Tj) shall interpret the bytes in the string to be shown when the current font is the Type 0 font. This sub- clause describes how the characters in the string shall be decoded and mapped into character selectors, which in PDF are always CIDs.
The codespace ranges in the CMap (delimited by begincodespacerange and endcodespacerange) specify how many bytes are extracted from the string for each successive character code. A codespace range shall be specified by a pair of codes of some particular length giving the lower and upper bounds of that range. A code shall be considered to match the range if it is the same length as the bounding codes and the value of each of its bytes lies between the corresponding bytes of the lower and upper bounds. The code length shall not be greater than 4.
A sequence of one or more bytes shall be extracted from the string and matched against the codespace ranges in the CMap. That is, the first byte shall be matched against 1-byte codespace ranges; if no match is found, a second byte shall be extracted, and the 2-byte code shall be matched against 2-byte codespace ranges. This process continues for successively longer codes until a match is found or all codespace ranges have been tested. There will be at most one match because codespace ranges shall not overlap.
The code extracted from the string shall be looked up in the character code mappings for codes of that length. (These are the mappings defined by beginbfchar, endbfchar, begincidchar, endcidchar, and corresponding operators for ranges.) Failing that, it shall be looked up in the notdef mappings, as described in the next sub- clause.
The results of the CMap mapping algorithm are a font number and a character selector. The font number shall be used as an index into the Type 0 font’s DescendantFonts array to select a CIDFont. In PDF, the font number shall be 0 and the character selector shall be a CID; this is the only case described here. The CID shall then be used to select a glyph in the CIDFont. If the CIDFont contains no glyph for that CID, the notdef mappings shall be consulted, as described in 9.7.6.3, "Handling Undefined Characters".
9.7.6.3 处理未定义的字符¶
Handling Undefined Characters
CMap 映射操作可能由于多种原因未能选择一个字形。本小节描述了这些原因以及发生时的处理方式。
如果一个代码映射到一个在后代 CIDFont 中不存在的 CID,则应查阅 CMap 中的 notdef 映射,以获取替代的字符选择符。这些映射由嵌入的 CMap 文件中的 beginnotdefchar、endnotdefchar、beginnotdefrange 和 endnotdefrange 操作符分隔。它们应始终映射到一个 CID。如果找到匹配的 notdef 映射,则该 CID 将选择与之关联的后代字形,后代字形应为一个 CIDFont。如果该 CID 没有对应的字形,则应使用 CID 0 的字形(该字形必须存在)进行替代。
NOTE 5
notdef 映射类似于简单字体中的 .notdef 字符机制。
如果 CMap 中没有针对该代码的字符映射或 notdef 映射,则应选择后代 0,并从关联的 CIDFont 中替代 CID 0 的字形。
如果代码无效——即,从要显示的字符串中提取的字节不匹配 CMap 中的任何代码空间范围——则选择一个替代字形,如上所述。字符映射算法应重置到字符串中的原始位置,然后修改后的映射算法会选择最匹配的部分代码空间范围:
a) 如果从要显示的字符串中提取的第一个字节与任何代码空间范围的第一个字节不匹配,则应选择具有最短代码的范围。
b) 否则(即,如果存在部分匹配),对于每个额外提取的字节,已累计的代码将与所有更长代码空间范围的开头进行匹配,直到找到最长的部分匹配。如果多个代码空间范围具有相同长度的部分匹配,则应选择具有最短代码的那个。
所选代码空间范围中的代码长度决定了当前映射操作中要从字符串中消耗的总字节数。
A CMap mapping operation can fail to select a glyph for a variety of reasons. This sub-clause describes those reasons and what happens when they occur.
If a code maps to a CID for which no such glyph exists in the descendant CIDFont, the notdef mappings in the CMap shall be consulted to obtain a substitute character selector. These mappings are delimited by the operators beginnotdefchar, endnotdefchar, beginnotdefrange, and endnotdefrange within an embedded CMap file. They shall always map to a CID. If a matching notdef mapping is found, the CID selects a glyph in the associated descendant, which shall be a CIDFont. If no glyph exists for that CID, the glyph for CID 0 (which shall be present) shall be substituted.
NOTE 5
The notdef mappings are similar to the . notdef character mechanism in simple fonts.
If the CMap does not contain either a character mapping or a notdef mapping for the code, descendant 0 shall be selected and the glyph for CID 0 shall be substituted from the associated CIDFont.
If the code is invalid—that is, the bytes extracted from the string to be shown do not match any codespace range in the CMap—a substitute glyph is chosen as just described. The character mapping algorithm shall be reset to its original position in the string, and a modified mapping algorithm chooses the best partially matching codespace range:
a)If the first byte extracted from the string to be shown does not match the first byte of any codespace range, the range having the shortest codes shall be chosen.
b)Otherwise (that is, if there is a partial match), for each additional byte extracted, the code accumulated so far shall be matched against the beginnings of all longer codespace ranges until the longest such partial match has been found. If multiple codespace ranges have partial matches of the same length, the one having the shortest codes shall be chosen.
The length of the codes in the chosen codespace range determines the total number of bytes to consume from the string for the current mapping operation.