Previous Next


                                            469
    SECTION 5.9                                                Extraction of Text Content



    the glyph descriptions. These tables, as well as the “cmap” table, are required to be
    present when embedding fonts. In addition, for OpenType fonts based on True-
    Type, the “head,” “hhea,” “loca,” “maxp,” “cvt ,” “prep,” “hmtx,” and “fpgm” tables
    are required.

    Note: Other tables, such as those used for advanced line layout, need not be present;
    however, their absence may prevent editing of text containing the font.

    The process of finding glyph descriptions in OpenType fonts is the following:

    • For Type 1 fonts using “CFF” tables, the process is as described in “Encodings
      for Type 1 Fonts” on page 428.
    • For TrueType fonts using “glyf ” tables, the process is as described in “Encod-
      ings for TrueType Fonts” on page 429. Since this process sometimes produces
      ambiguous results, it is strongly recommended that PDF creators, instead of us-
      ing a simple font, use a Type 0 font with an Identity-H encoding and use the
      glyph indices as character codes, as described following Table 5.15 on page 442.
    • For CIDFontType0 fonts using “CFF” tables, the process is as described in the
      discussion of embedded Type 0 CIDFonts in “Glyph Selection in CIDFonts” on
      page 437.
    • For CIDFontType2 fonts using “glyf ” tables, the process is as described in the
      discussion of embedded Type 2 CIDFonts in “Glyph Selection in CIDFonts” on
      page 437.

    As discussed in Section 5.5.3, “Font Subsets,” an embedded font program may
    contain only the subset of glyphs that are used in the PDF document. This may be
    indicated by the presence of a CharSet or CIDSet entry in the font descriptor that
    refers to the font file, although subset fonts are not always so identified.


5.9 Extraction of Text Content

    The preceding sections describe all the facilities for showing text and causing
    glyphs to be painted on the page. In addition to displaying text, consumer appli-
    cations sometimes need to determine the information content of text—that is, its
    meaning according to some standard character identification as opposed to its
    rendered appearance. This need arises during operations such as searching, in-
    dexing, and exporting of text to other applications.

Previous Next