Previous Next


                                               946
       CHAPTER 10                                                  Document Interchange



 10.9 Web Capture
       Web Capture is a PDF 1.3 feature that allows information from Internet-based or
       locally resident HTML, PDF, GIF, JPEG, and ASCII text files to be imported into
       a PDF file. This feature is implemented in Acrobat 4.0 and later viewers by a Web
       Capture plug-in extension (sometimes called AcroSpider). The information in
       the Web Capture data structures enables viewer applications to perform the fol-
       lowing operations:
       • Save locally and preserve the visual appearance of material from the Web
       • Retrieve additional material from the Web and add it to an existing PDF file
       • Update or modify existing material previously captured from the Web
       • Find source information for material captured from the Web, such as the URL
         (if any) from which it was captured
       • Find all material in a PDF file that was generated from a given URL
       • Find all material in a PDF file that matches a given digital identifier (MD5
         hash)

       The information needed to perform these operations is recorded in two data
       structures in the PDF file:
       • The Web Capture information dictionary holds document-level information
         related to Web Capture.
       • The Web Capture content database keeps track of the material retrieved by Web
         Capture and where it came from, enabling Web Capture to avoid downloading
         material that is already present in the file.

       The following sections provide a detailed overview of these structures. See
       Appendix C for information about implementation limits in Web Capture.

       Note: The following discussion centers on HTML and GIF files, although Web Cap-
       ture handles other file types as well.

10.9.1 Web Capture Information Dictionary

       The optional SpiderInfo entry in the document catalog (see Section 3.6.1, “Docu-
       ment Catalog”) holds an optional Web Capture information dictionary containing
       document-level information related to Web Capture. Table 10.37 shows the con-
       tents of this dictionary.

Previous Next