Previous Next


                                                  953
        SECTION 10.9                                                                Web Capture



        If the name is used for an interactive form field, there is an additional encoding to
        ensure uniqueness and compatibility with interactive forms. Each byte in the
        source string, encoded as described above, is replaced by two bytes in the destina-
        tion string. The first byte in each pair is 65 (corresponding to the ASCII character
        A) plus the high-order 4 bits of the source byte; the second byte is 65 plus the low-
        order 4 bits of the source byte.


10.9.3 Content Sets

        A Web Capture content set is a dictionary describing a set of PDF objects gener-
        ated from the same source data. It may include information common to all the
        objects in the set as well as about the set itself. Table 10.38 shows the contents of
        this type of dictionary.


        Page Sets

        A page set is a content set containing a group of PDF page objects generated from
        a common source, such as an HTML file. The pages are listed in the O array (see
        Table 10.38) in the same order in which they were initially added to the file. A
        single page object may not belong to more than one page set. Table 10.39 shows
        the content set dictionary entries specific to this type of content set.

        The optional TID (text identifier) entry may be used to store an identifier gener-
        ated from the text of the pages belonging to the page set (see “Digital Identifi-
        ers” on page 950). This identifier may be used, for example, to determine
        whether the text of a document has changed. A text identifier may not be
        appropriate for some page sets (such as those with no text) and should be omit-
        ted in these cases.

                    TABLE 10.38 Entries common to all Web Capture content sets
KEY    TYPE          VALUE

Type   name          (Optional) The type of PDF object that this dictionary describes; if present, must be
                     SpiderContentSet for a Web Capture content set.

S      name          (Required) The subtype of content set that this dictionary describes:
                        SPS   (“Spider page set”) A page set
                        SIS   (“Spider image set”) An image set

Previous Next