PDF Reference, version 1.7

Previous Next

159 SECTION 3.8 Common Data Structures using PDFDocEncoding with the two characters thorn ydieresis, which is unlikely to be a meaningful beginning of a word or phrase). Note: Applications that process PDF files containing Unicode text strings should be prepared to handle supplementary characters; that is, characters requiring more than two bytes to represent. An escape sequence may appear anywhere in a Unicode text string to indicate the language in which subsequent text is written, which is useful when the language cannot be determined from the character codes used in the text. The escape sequence consists of the following elements, in order: 1. The Unicode value U+001B (that is, the byte sequence 0 followed by 27). 2. A 2-character ISO 639 language code—for example, en for English or ja for Japanese. Character in this context means byte (as in ASCII character), not Unicode character. 3. (Optional) A 2-character ISO 3166 country code—for example, US for the United States or JP for Japan. 4. The Unicode value U+001B. The complete list of codes defined by ISO 639 and ISO 3166 can be obtained from the International Organization for Standardization (see the Bibliography). PDFDocEncoded String Type A PDFDocEncoded string is similar to a string object, but it is a character string where characters are represented in a single byte using PDFDocEncoding. Note that PDFDocEncoding does not support all Unicode characters whereas UTF- 16BE does. Note: This type is not a true type. Rather, it is a string type that represents data en- coded using a specific convention. Byte String Type The byte string type is used for binary data represented as a series of 8-bit bytes, where each byte can be any value representable in 8 bits. The string may

Previous Next