CHAPTER 10
892
Document Interchange
In addition, Tagged PDF documents must allow some characteristics of the asso-
ciated fonts to be deduced (see “Font Characteristics” on page 892). These Uni-
code values and font characteristics can then be used for such operations as cut-
and-paste editing, searching, text-to-speech conversion, and exporting to other
applications or file formats.
Unicode Mapping in Tagged PDF
Tagged PDF requires that every character code in a document can be mapped to
a corresponding Unicode value. Unicode defines scalar values for most of the
characters used in the world’s languages and writing systems, as well as providing
a
private use area
for application-specific characters. Information about Unicode
can be found in the
Unicode Standard,
by the Unicode Consortium (see the Bib-
liography).
The methods for mapping a character code to a Unicode value are described in
Section 5.9.1, “Mapping Character Codes to Unicode Values.” Tagged PDF pro-
ducers should ensure that the PDF file contains enough information to map all
character codes to Unicode by one of the methods described there.
An
Alt
,
ActualText
, or
E
entry specified in a structure element dictionary or a
marked-content property list (see Sections 10.8.2, “Alternate Descriptions,”
10.8.3, “Replacement Text,” and 10.8.4, “Expansion of Abbreviations and Acro-
nyms”) may affect the character stream that some Tagged PDF consumers actual-
ly use. For example, some consumers may choose to use the
Alt
or
ActualText
value and ignore all text and other content associated with the structure element
and its descendants.
Some uses of Tagged PDF require characters that may not be available in all fonts,
such as the soft hyphen (see “Incidental Artifacts” on page 888). Such characters
can be represented either by adding them to the font’s encoding or CMap and
using
ToUnicode
to map them to appropriate Unicode values, or by using an
ActualText
entry in the associated structure element to provide substitute charac-
ters.
Font Characteristics
In addition to a Unicode value, each character code in a content stream has an as-
sociated set of font characteristics. These characteristics are useful when export-
Index Bookmark Pages Text
Previous Next
Pages: Index All Pages
This HTML file was created by VeryPDF PDF to HTML Converter product.