Previous Next


                                       1023
SECTION F.1                                             Background and Assumptions



• After a transaction has completed, obtaining more data requires a new request-
  response transaction. The connection between client and server does not ordi-
  narily persist beyond the end of a transaction, although some implementations
  may attempt to cache the open connection to expedite subsequent transactions
  with the same server.
• Round-trip delay can be significant. A request-response transaction can take
  up to several seconds, independent of the amount of data requested.
• The data rate may be limited. A typical bottleneck is a slow modem link be-
  tween the client and the Internet service provider.

These properties are generally shared by other wide-area network architectures
besides the Web. Also, CD-ROMs share some of these properties, since they have
relatively slow seek times and limited data rates compared to magnetic media.
The remainder of this appendix focuses on the Web.

Some additional properties of the HTTP protocol are relevant to the problem of
accessing PDF files efficiently. These properties may not all be shared by other
protocols or network environments.

• When a PDF file is initially accessed (such as by following a URL hyperlink
  from some other document), the file type is not known to the client. Therefore,
  the client initiates a transaction to retrieve the entire document and then in-
  spects the MIME tag of the response as it arrives. Only at that point is the doc-
  ument known to be PDF. Additionally, with a properly configured server
  environment, the length of the document becomes known at that time.
• The client can abort a response while the transaction is still in progress if it
  decides that the remainder of the data is not of immediate interest. In HTTP,
  aborting the transaction requires closing the connection, which interferes with
  the strategy of caching the open connection between transactions.
• The client can request retrieval of portions of a document by specifying one or
  more byte ranges (by offset and count) in the HTTP request headers. Each
  range can be relative to either the beginning or the end of the file. The client
  can specify as many ranges as it wants in the request, and the response consists
  of multiple blocks, each properly tagged.
• The client can initiate multiple concurrent transactions in an attempt to ob-
  tain multiple responses in parallel. This is commonly done, for instance, to re-
  trieve inline images referenced from an HTML document. This strategy is not

Previous Next