General Functions
- Information is extracted on the basis of the object type
- Able to extract all kinds of objects and their respective properties
- No need for other PDF software
- Support all versions of PDF format
- Easy command line operation
- Process user and owner password protected PDF files
- Extract embedded PDF fonts to TTF (TrueType), CFF, AFM font files
- Extract images to TIFF, JPG, PNG, PBM, PPM files
- Extract plain text to TXT files
- Extract text with positions to TXT files
- Extract metadata to XMP file
- Extract forms to FDF file
- Extract drawings to XML file
Extract Document Properties
- Query document attributes, including: Author, Title, Subject, Keywords, Application, PDF Producer, Creation date, Modification date, etc.
- Query Document Security settings
- Check if the document linearized (optimized for fast web view)?
- Retrieve PDF version, e.g. 1.4, 1.5, 1.6, 1.7, 1.8
- Query the number of pages from PDF file
- Read properties of bookmarks
- Query destinations of bookmarks
- Read page labels (e.g. "vii", "IX")
- Read properties of various resources, include images, colorspaces, fonts, paths, drawings etc.
- List and extract embedded files
- List and set optional content groups (layers)
Extract ColorSpace
- Query color space for each object
- Query color information
- Query components per pixel
- Color space (colorant, indexed, monochrome)
- Lookup color palette table
- Show colorspace name
Extract Image Objects
- Extract images and save to image files
- Query height and width in pixels
- Include height/width/bitcount to output filenames
- Read out image resolution (DPI)
- Number of bits per channel
- Colorspace (bi-tonal, monochrome, color)
- Convert any other colorspaces to RGB
- Extract image and set orientation
- Set the compression to output TIFF image file, include Flate, CCITT G3, G3-2D, G4, JPEG, LZW, none
- Retrieve JBIG2 encoded B/W images
- Retrieve mask image and transparency mask
Extract Transformation Matrix
- Transformation values
- Orientation
- Rotation
- Scaling in X and Y direction
- Positioning in X and Y direction
- Skewing in X and Y direction
Extract Annotation
- Annotation type (Text, Link, FreeText, Line, Square, Circle, Polygon, PolyLine, Highlight, Underline, Squiggly, StrikeOut, Stamp, Caret, Ink, Popup, FileAttachment, Sound, Movie, Widget, Screen, PrinterMark, TrapNet, Watermark, 3D, etc.)
- Color values
- Subject, TItle
- Destination
- Contents
- Date
- Flags
- MarkUp Annotation
- Name in Unicode
- Position (Rectangle)
- TextLabel
- URL, Link Target
- Corner points if it is a polygon
|
Extract Page Properties
- Query Page Size (MediaBox), Visible Size (CropBox) and other dimensions of relevance to printing (TrimBox, ArtBox, BleedBox)
- Query Viewing Rotation Angle
- Retrieve Page Contents and save to .txt file
- Retrieve Text Positions and save to .txt file
- Retrieve Annotations and their properties
Extract Page Contents
- Extract all objects (object, image, text, path, font, weight, annotation, etc.) and query their attributes
- Query current graphics state to XML file
Extract Text Contents
- Extract text as Unicode by the character, word or page, including visible and invisible text
- Support texts that do not contain space characters, able to add spaces automatically
- Extract words and their coordinates (X, Y)
- Extract text lines and their coordinates (X, Y)
- Extract
bounding box (rectangle) for each word
- Extract font size and baseline in points
- Extract text line length in points
- Extract text line length in characters
- Extract rotation for each word
Extract Font Type
- Extract font data and save to font files
- Query font name
- Query font type, e.g., TrueType, Type1, etc.
- Height of uppercase and lowercase letters
- Available character names of the font subset
- Font Encoding
- Font Flags
- Font Bounding box
- Datastream of a font program
- Tilt angle of italic fonts
- Recommended distance between base line and following line (leading)
- Vertical and horizontal width of glyph stems
- Render TrueType (TTF) fonts to GIF images
- Font view tool ftview.exe included
- TTF to GIF conversion tool ttf2img.exe included
Extract Graphics State
- Retrieve Current transformation matrix
- Spacing between characters and words
- Elements and phase of a dash pattern
- Colorspace of fill and line colors
- Fill and line colors as RGB or CMYK value
- Overprint settings for fill and line colors
- Alpha constant of fill and line colors
- Query blend mode
- Flatness tolerance
- Query font name and font size
- Horizontal scaling
- Text style (leading, line spacing)
- Line style (line cap, line join, miter limit) and line width
- Name of the rendering intent
- Smoothness tolerance
- Soft mask
- Text strikeout
- Text rendering mode
- Text relocation (up or down)
Extract Bookmarks
- Bookmark Title
- Bookmark Level
- Bookmark Page Number
- Bookmark Destination
- Quantity of bookmarks
Extract Destination
- Position (coordinates for bottom left and top right)
- Type
- Page Number
Extract Form Fields
- Retrieve name and values for all form fields
- Save form fields to FDF file
- Support unicode characters in form fields
- Field Type (Text Field, Check Box, Radio Button, Combo Box, List Box or Push Button, etc.)
- Field Name
- Field Flags
- Field Justification (Left, Center, Right)
- Field State Option (On, Off, Yes, No, etc.)
|