Previous Next


                                          158
CHAPTER 3                                                                        Syntax



The string types described in Table 3.32 specify increasingly specific encoding
schemes, as shown in Figure 3.7.


                                            string type




                text string type          ASCII string type          byte string type




 PDFDocEncoded         UTF-16BE encoded string with
   string type          a leading byte order marker


                     FIGURE 3.7 Relationship between string types


Text String Type

The text string type is used for character strings that contain information
intended to be human-readable, such as text annotations, bookmark names,
article names, document information, and so forth. The term character strings is
used to describe such strings independent of the encoding with which they are
represented in a PDF document.

Note: This type is not a true type. Rather, it is a string type that represents data en-
coded using specific conventions.

The text string type is used for character strings that are encoded in either PDF-
DocEncoding or the UTF-16BE Unicode character encoding scheme. PDFDocEn-
coding can encode all of the ISO Latin 1 character set and is documented in
Appendix D. UTF-16BE can encode all Unicode characters. UTF-16BE and
Unicode character encoding are described in the Unicode Standard by the
Unicode Consortium (see the Bibliography). Note that PDFDocEncoding does
not support all Unicode characters whereas UTF-16BE does.

For text strings encoded in Unicode, the first two bytes must be 254 followed by
255. These two bytes represent the Unicode byte order marker, U+FEFF, indicating
that the string is encoded in the UTF-16BE (big-endian) encoding scheme
specified in the Unicode standard. (This mechanism precludes beginning a string

Previous Next