13.3 声音¶

13.3 Sounds

声音对象（PDF 1.2） 应为一个流，其中包含用于计算机扬声器播放的采样值。声音注释或声音操作字典中的 Sound（声音）条目（参见表 185 和表 208）应标识一个声音对象，该对象表示在激活注释时要播放的声音。

由于声音对象是一个流，因此它可能包含所有流通用的标准条目，如表 5 所述。特别是，如果它包含 F（文件规范）条目，则声音应在外部文件中定义。此声音文件应是自描述的，包含渲染声音所需的所有信息；PDF 文件中无需提供额外信息。

注意

AIFF、AIFF-C（Mac OS）、RIFF（.wav）和 snd（.au）文件格式均为自描述格式。

如果 F 条目不存在，则声音对象本身应包含采样数据以及定义声音所需的所有其他信息。表 294 显示了声音对象特有的附加字典条目。

**表 294 – 声音对象特有的附加条目**
键（Key）	类型（Type）	值（Value）
Type	name	（可选）此字典描述的 PDF 对象类型；如果存在，则对于声音对象，应为 Sound。
R	number	（必需）采样率，以每秒采样次数表示。
C	integer	（可选）声音通道数。默认值：1。
B	integer	（可选）每通道每个采样值的比特数。默认值：8。
E	name	（可选）采样数据的编码格式： Raw 未指定或无符号值，范围为 0 至 2B − 1 Signed 二补数表示的值 muLaw m-law 编码的采样值 ALaw A-law 编码的采样值默认值：Raw。
CO	name	（可选）对采样数据使用的声音压缩格式。（这与声音对象的 Filter 条目指定的任何流压缩格式是分开的；参见表 5 和 7.4，“过滤器。”）如果此条目不存在，则不应使用声音压缩；数据包含应以 R 采样率、每通道每秒播放的采样波形。
CP	various	（可选）与所使用的声音压缩格式相关的可选参数。 CO 和 CP 条目尚未定义标准值。

采样值应按最高有效位优先的方式存储在流中（对于大于 8 位的采样，采用大端字节序）。非 8 位倍数的采样值应打包到连续字节中，从最高有效位开始存储。如果一个采样值跨越字节边界，则最高有效位应存放在第一个字节中，随后是次高位，依此类推。对于双通道立体声，采样值应采用交错格式存储，每个左声道（通道 1）的采样值应在对应的右声道（通道 2）采样值之前。

为了最大程度提高包含嵌入声音的 PDF 文档的可移植性，符合规范的阅读器应至少支持以下格式（假设平台具备足够的硬件和操作系统支持以播放声音）：

R 8000、11,025 或 22,050 采样率（每秒采样次数）
C 1 或 2 通道
B 8 或 16 位每通道
E Raw、Signed 或 muLaw 编码格式

如果编码格式（E）为 Raw 或 Signed，则 R 应为 11,025 或 22,050 采样率每通道。
如果编码格式为 muLaw，则 R 应为 8000 采样率每通道，C 应为 1 通道，B 应为 8 位每通道。
声音播放器应能够在必要时转换格式、降低采样率以及合并通道，以便在目标平台上渲染声音。

A sound object (PDF 1.2) shall be a stream containing sample values that define a sound to be played through the computer’s speakers. The Sound entry in a sound annotation or sound action dictionary (see Table 185 and Table 208) shall identify a sound object representing the sound to be played when the annotation is activated.

Since a sound object is a stream, it may contain any of the standard entries common to all streams, as described in Table 5. In particular, if it contains an F (file specification) entry, the sound shall be defined in an external file. This sound file shall be self-describing, containing all information needed to render the sound; no additional information need be present in the PDF file.

NOTE

The AIFF, AIFF-C (Mac OS), RIFF (. wav), and snd (. au) file formats are all self-describing.

If no F entry is present, the sound object itself shall contain the sample data and all other information needed to define the sound. Table 294 shows the additional dictionary entries specific to a sound object.

**Table 294 – Additional entries specific to a sound object**
Key	Type	Value
Type	name	(Optional) The type of PDF object that this dictionary describes; if present, shall be Sound for a sound object.
R	number	(Required) The sampling rate, in samples per second.
C	integer	(Optional) The number of sound channels. Default value: 1.
B	integer	(Optional) The number of bits per sample value per channel. Default value: 8.
E	name	(Optional) The encoding format for the sample data: Raw Unspecified or unsigned values in the range 0 to 2B − 1 Signed Twos-complement values muLaw m-law–encoded samples ALaw A-law–encoded samples Default value: Raw.
CO	name	(Optional) The sound compression format used on the sample data. (This is separate from any stream compression specified by the sound object’s Filter entry; see Table 5 and 7.4, “Filters.”) If this entry is absent, sound compression shall not be used; the data contains sampled waveforms that shall be played at R samples per second per channel.
CP	(various)	(Optional) Optional parameters specific to the sound compression format used. No standard values have been defined for the CO and CP entries.

Sample values shall be stored in the stream with the most significant bits first (big-endian order for samples larger than 8 bits). Samples that are not a multiple of 8 bits shall be packed into consecutive bytes, starting at the most significant end. If a sample extends across a byte boundary, the most significant bits shall be placed in the first byte, followed by less significant bits in subsequent bytes. For dual-channel stereophonic sounds, the samples shall be stored in an interleaved format, with each sample value for the left channel (channel 1) preceding the corresponding sample for the right (channel 2).

To maximize the portability of PDF documents containing embedded sounds, conforming readers should support at least the following formats (assuming the platform has sufficient hardware and OS support to play sounds at all):

R 8000, 11,025, or 22,050 samples per second C 1 or 2 channels B 8 or 16 bits per channel E Raw, Signed, or muLaw encoding

If the encoding (E) is Raw or Signed, R shall be 11,025 or 22,050 samples per channel. If the encoding is muLaw, R shall be 8000 samples per channel, C shall be 1 channel, and B shall be 8 bits per channel. Sound players shall convert between formats, downsample rates, and combine channels as necessary to render sound on the target platform.