In this article, I will share some knowledge about PDF structure, hoping it will be helpful for further developing software around PDF file formats. A PDF Document may contain many layered things, which means it has many layers of abstraction. But when you use PDF file from different perspectives, each with its own advantages and disadvantages. Checking from the PDF surface, PDF contains raw document data like others file documents. Then to COS Layer, which organizes data into a tree of simple objects. At the PD layer, these simple objects are put together to implement useful intermediate level structures like Fonts and Images. These are in turn organized into higher level constructs like Annotations and Pages. Some of these objects are also used to impose logical structure, like paragraphs and article threads. And there are more layers still.
When you need to change data on surface, simply use easily tool like VeryPDF PDF Editor, then you can change data. When you need to edit objects, maybe you need to use some more professional toolkit like Advanced PDF Tools SDK. Then let us check more detail clarifications.
PDF syntax is best understood by considering it as four parts:
- Objects. A PDF document is a data structure composed from a small set of basic types of data objects. Sub-clause 7.2, "Lexical Conventions," describes the character set used to write objects and other syntactic elements. Sub-clause 7.3, "Objects," describes the syntax and essential properties of the objects. Sub-clause 7.3.8, "Stream Objects," provides complete details of the most complex data type, the stream object.
- File structure. The PDF file structure determines how objects are stored in a PDF file, how they are accessed, and how they are updated. This structure is independent of the semantics of the objects. Sub- clause 7.5, "File Structure," describes the file structure. Sub-clause 7.6, "Encryption," describes a file-level mechanism for protecting a document’s contents from unauthorized access.
- Document structure. The PDF document structure specifies how the basic object types are used to represent components of a PDF document: pages, fonts, annotations, and so forth. Sub-clause 7.7, "Document Structure," describes the overall document structure; later clauses address the detailed semantics of the components.
- Content streams. A PDF content stream contains a sequence of instructions describing the appearance of a page or other graphical entity. These instructions, while also represented as objects, are conceptually distinct from the objects that represent the document structure and are described separately. Sub-clause 7.8, "Content Streams and Resources," discusses PDF content streams and their associated resources.
VeryPDF develops software around PDF, so if you have more good suggestion and information about this file format, you are welcome to share it with us. We would like to discuses with you and if you have question about it , please contact us as soon as possible.