PDF Format Reference - Adobe Portable Document Format

SECTION 10.7

891

Tagged PDF

various technical and historical reasons, however, many such fonts follow the

same conventions as those designed for Western writing systems, with glyph ori-

gins at the lower left and positive widths, as shown in Figure 5.4 on page 394.

Consequently, showing text in such right-to-left writing systems requires either

positioning each glyph individually (which is tedious and costly) or representing

text with show strings (see “Organization and Use of Fonts” on page 388) whose

character codes are given in reverse order. When the latter method is used, the

character codes’ correct page content order is the reverse of their order within the

show string.

The marked-content tag

ReversedChars

informs the Tagged PDF consumer appli-

cation that show strings within a marked-content sequence contain characters in

the reverse of page content order. If the sequence encompasses multiple show

strings, only the individual characters within each string are reversed; the strings

themselves are in natural reading order. For example, the sequence

/ReversedChars

BMC

( olleH ) Tj

−200

0 Td

( . dlrow ) Tj

EMC

represents the text

Hello world .

The show strings may have a space character at the beginning or end to indicate a

word break (see “Identifying Word Breaks” on page 894) but may not contain

interior spaces. This limitation is not serious, since a space provides an opportu-

nity to realign the typography without visible effect, and it serves the valuable

purpose of limiting the scope of reversals for word-processing consumer applica-

tions.

Extraction of Character Properties

It is a requirement of Tagged PDF that character codes can be unambiguously

converted to Unicode values representing the information content of the text.

There are several methods for doing this; a Tagged PDF document must conform

to at least one of them (see “Unicode Mapping in Tagged PDF,” below).