Previous Next


                                                 49
      SECTION 3.1                                                         Lexical Conventions



      encoding a specific set of 128 characters as binary numbers. However, a PDF file
      is not restricted to the ASCII character set; it can contain arbitrary 8-bit bytes,
      subject to the following considerations:

      • The tokens that delimit objects and that describe the structure of a PDF file are
        all written in the ASCII character set, as are all the reserved words and the
        names used as keys in standard dictionaries.
      • The data values of certain types of objects—strings and streams—can be but
        need not be written entirely in ASCII. For the purpose of exposition (as in this
        book), ASCII representation is preferred. However, in actual practice, data that
        is naturally binary, such as sampled images, is represented directly in binary for
        compactness and efficiency.
      • A PDF file containing binary data must be transported and stored by means
        that preserve all bytes of the file faithfully; that is, as a binary file rather than a
        text file. Such a file is not portable to environments that impose reserved char-
        acter codes, maximum line lengths, end-of-line conventions, or other restric-
        tions.

      Note: In this chapter, the term character is synonymous with byte and merely refers
      to a particular 8-bit value. This usage is entirely independent of any logical meaning
      that the value may have when it is treated as data in specific contexts, such as repre-
      senting human-readable text or selecting a glyph from a font.


3.1.1 Character Set

      The PDF character set is divided into three classes, called regular, delimiter, and
      white-space characters. This classification determines the grouping of characters
      into tokens, except within strings, streams, and comments; different rules apply
      in those contexts.

      White-space characters (see Table 3.1) separate syntactic constructs such as names
      and numbers from each other. All white-space characters are equivalent, except
      in comments, strings, and streams. In all other contexts, PDF treats any sequence
      of consecutive white-space characters as one character.

Previous Next