Previous Next


                                        888
CHAPTER 10                                                    Document Interchange



Note: To support consumer applications in providing accessibility to users with dis-
abilities, Tagged PDF documents should use the natural language specification
(Lang), alternate description (Alt), replacement text (ActualText), and abbreviation
expansion text (E) facilities described in Section 10.8, “Accessibility Support.”

Incidental Artifacts

In addition to objects that are explicitly marked as artifacts and excluded from
the document’s logical structure, the running text of a page may contain other el-
ements and relationships that are not logically part of the document’s real con-
tent, but merely incidental results of the process of laying out that content into a
document. They may include the following elements:

• Hyphenation. Among the artifacts introduced by text layout is the hyphen
  marking the incidental division of a word at the end of a line. In Tagged PDF,
  such an incidental word division must be represented by a soft hyphen charac-
  ter, which the Unicode mapping algorithm (see “Unicode Mapping in Tagged
  PDF” on page 892) translates to the Unicode value U+00AD. (This character is
  distinct from an ordinary hard hyphen, whose Unicode value is U+002D.) The
  producer of a Tagged PDF document must distinguish explicitly between soft
  and hard hyphens so that the consumer does not have to guess which type a
  given character represents.
  Note: In some languages, the situation is more complicated: there may be multiple
  hyphen characters, and hyphenation may change the spelling of words. See Exam-
  ple 10.24 on page 944.
• Text discontinuities. The running text of a page, as expressed in page content
  order (see “Page Content Order,” below), may contain places where the normal
  progression of text suffers a discontinuity. For example, the page may contain
  the beginnings of two separate articles (see Section 8.3.2, “Articles”), each of
  which is continued onto a later page of the document. The last words of the
  first article appearing on the page should not be run together with the first
  words of the second article. Consumer applications can recognize such discon-
  tinuities by examining the document’s logical structure.
• Hidden page elements. For a variety of reasons, elements of a document’s logical
  content may be invisible on the page: they may be clipped, their color may
  match the background, or they may be obscured by other, overlapping objects.
  Consumer applications must still be able to recognize and process such hidden
  elements. For example, formerly invisible elements may become visible when a

Previous Next