PDF Reference, version 1.7

Previous Next

471 SECTION 5.9 Extraction of Text Content the Adobe standard Latin character set and the set of named characters in the Symbol font (see Appendix D): 1. Map the character code to a character name according to Table D.1 on page 996 and the font’s Differences array. 2. Look up the character name in the Adobe Glyph List (see the Bibliography) to obtain the corresponding Unicode value. • If the font is a composite font that uses one of the predefined CMaps listed in Table 5.15 on page 442 (except Identity–H and Identity–V) or whose descendant CIDFont uses the Adobe-GB1, Adobe-CNS1, Adobe-Japan1, or Adobe-Korea1 character collection: 1. Map the character code to a character identifier (CID) according to the font’s CMap. 2. Obtain the registry and ordering of the character collection used by the font’s CMap (for example, Adobe and Japan1) from its CIDSystemInfo dic- tionary. 3. Construct a second CMap name by concatenating the registry and order- ing obtained in step 2 in the format registry–ordering–UCS2 (for example, Adobe–Japan1–UCS2). 4. Obtain the CMap with the name constructed in step 3 (available from the ASN Web site; see the Bibliography). 5. Map the CID obtained in step 1 according to the CMap obtained in step 4, producing a Unicode value. Note: Type 0 fonts whose descendant CIDFonts use the Adobe-GB1, Adobe-CNS1, Adobe-Japan1, or Adobe-Korea1 character collection (as specified in the CIDSystemInfo dictionary) must have a supplement number corresponding to the version of PDF supported by the application. See Table 5.16 on page 446 for a list of the character collections corresponding to a given PDF version. (Other supplements of these character collections can be used, but if the supplement is higher-numbered than the one corresponding to the supported PDF version, only the CIDs in the latter supplement are considered to be standard CIDs.) If these methods fail to produce a Unicode value, there is no way to determine what the character code represents.

Previous Next