Previous Next


                                       892
CHAPTER 10                                                   Document Interchange



In addition, Tagged PDF documents must allow some characteristics of the asso-
ciated fonts to be deduced (see “Font Characteristics” on page 892). These Uni-
code values and font characteristics can then be used for such operations as cut-
and-paste editing, searching, text-to-speech conversion, and exporting to other
applications or file formats.

Unicode Mapping in Tagged PDF

Tagged PDF requires that every character code in a document can be mapped to
a corresponding Unicode value. Unicode defines scalar values for most of the
characters used in the world’s languages and writing systems, as well as providing
a private use area for application-specific characters. Information about Unicode
can be found in the Unicode Standard, by the Unicode Consortium (see the Bib-
liography).

The methods for mapping a character code to a Unicode value are described in
Section 5.9.1, “Mapping Character Codes to Unicode Values.” Tagged PDF pro-
ducers should ensure that the PDF file contains enough information to map all
character codes to Unicode by one of the methods described there.

An Alt, ActualText, or E entry specified in a structure element dictionary or a
marked-content property list (see Sections 10.8.2, “Alternate Descriptions,”
10.8.3, “Replacement Text,” and 10.8.4, “Expansion of Abbreviations and Acro-
nyms”) may affect the character stream that some Tagged PDF consumers actual-
ly use. For example, some consumers may choose to use the Alt or ActualText
value and ignore all text and other content associated with the structure element
and its descendants.

Some uses of Tagged PDF require characters that may not be available in all fonts,
such as the soft hyphen (see “Incidental Artifacts” on page 888). Such characters
can be represented either by adding them to the font’s encoding or CMap and
using ToUnicode to map them to appropriate Unicode values, or by using an
ActualText entry in the associated structure element to provide substitute charac-
ters.

Font Characteristics

In addition to a Unicode value, each character code in a content stream has an as-
sociated set of font characteristics. These characteristics are useful when export-

Previous Next