PDF Reference, version 1.7

Previous Next

158 CHAPTER 3 Syntax The string types described in Table 3.32 specify increasingly specific encoding schemes, as shown in Figure 3.7. string type text string type ASCII string type byte string type PDFDocEncoded UTF-16BE encoded string with string type a leading byte order marker FIGURE 3.7 Relationship between string types Text String Type The text string type is used for character strings that contain information intended to be human-readable, such as text annotations, bookmark names, article names, document information, and so forth. The term character strings is used to describe such strings independent of the encoding with which they are represented in a PDF document. Note: This type is not a true type. Rather, it is a string type that represents data en- coded using specific conventions. The text string type is used for character strings that are encoded in either PDF- DocEncoding or the UTF-16BE Unicode character encoding scheme. PDFDocEn- coding can encode all of the ISO Latin 1 character set and is documented in Appendix D. UTF-16BE can encode all Unicode characters. UTF-16BE and Unicode character encoding are described in the Unicode Standard by the Unicode Consortium (see the Bibliography). Note that PDFDocEncoding does not support all Unicode characters whereas UTF-16BE does. For text strings encoded in Unicode, the first two bytes must be 254 followed by 255. These two bytes represent the Unicode byte order marker, U+FEFF, indicating that the string is encoded in the UTF-16BE (big-endian) encoding scheme specified in the Unicode standard. (This mechanism precludes beginning a string

Previous Next