Previous Next


                                                  883
    SECTION 10.7                                                                     Tagged PDF



      404 0 obj                                         % ID tree leaf node
         << /Limits [ ( Chap1 )    ( Sec1.3 ) ]         % Least and greatest keys in tree
            /Names [ ( Chap1 )     301 0 R              % Mapping from element identifiers
                      ( Sec1.1 )   302 0 R              % to structure elements
                      ( Sec1.2 )   303 0 R
                      ( Sec1.3 )   304 0 R
                    ]
         >>
      endobj


10.7 Tagged PDF

    Tagged PDF (PDF 1.4) is a stylized use of PDF that builds on the logical structure
    framework described in Section 10.6, “Logical Structure.” It defines a set of stan-
    dard structure types and attributes that allow page content (text, graphics, and
    images) to be extracted and reused for other purposes. It is intended for use by
    tools that perform the following types of operations:

    • Simple extraction of text and graphics for pasting into other applications
    • Automatic reflow of text and associated graphics to fit a page of a different size
      than was assumed for the original layout
    • Processing text for such purposes as searching, indexing, and spell-checking
    • Conversion to other common file formats (such as HTML, XML, and RTF)
      with document structure and basic styling information preserved
    • Making content accessible to users with visual impairments (see Section 10.8,
      “Accessibility Support)

    A tagged PDF document conforms to the following conventions:

    • Page content (Section 10.7.1, “Tagged PDF and Page Content”). Tagged PDF
      defines a set of rules for representing text in the page content so that characters,
      words, and text order can be determined reliably. All text is represented in a
      form that can be converted to Unicode. Word breaks are represented explicitly.
      Actual content is distinguished from artifacts of layout and pagination. Content
      is given in an order related to its appearance on the page, as determined by the
      authoring application.
    • A basic layout model (Section 10.7.2, “Basic Layout Model”). A set of rules for
      describing the arrangement of structure elements on the page.

Previous Next