Home > Products > PDF Extract Tool Command Line
PDF Extract Tool Command Line $79.95

VeryPDF

PDF Extract Tool

Command Line

  • Extract text with positions from PDF file
  • Extract font from PDF file
  • Extract image from PDF file
Download Buy Now

VeryPDF PDF Extract Tool Command Line is a Command Line Tool specially designed for extracting font data, image data, text contents, page count, paper size etc. information from PDF files. This program can extract the fonts to TTF, CFF, and AFM files; extract images to TIFF, JPG, PNG, PBM, PPM files; extract text to TXT files; extract metadata to XMP file; extract forms to FDF file; extract drawings to XML file; etc. You can integrate this command line application into your product for reuse contents in PDF files easily.

VeryPDF PDF Extract Tool Command Line is a best tool to extract information from PDF document quickly and efficiently. The extracted information can be stored in a database or a disk file for further processing.

 

PDF Extract Tool Command Line is the ultimate "get info" utility for your PDF documents. It can extract a comprehensive list of attributes from a PDF file into an XML-based format.

System Requirements

  • Windows 2000 / XP / Server 2003 / Vista / Server 2008 / 7 / 8 / 10 / 11 and later systems of both 32-bit and 64-bit
  • Linux (Centos, SuSE and Red Hat on Intel)
  • IBM AIX – 32 Bit and 64 Bit
  • Mac OS X

Key Features

Extract fonts from encrypted PDF files

Extract Properties from PDF file

  • PDF Extract Tool is a Command Line for reading out the contents and properties of PDF documents. Query document attributes, including:
    Author, Title, Subject, Keywords, Application, PDF Producer, Creation date, Modification date, Fast Web View, Encryption, Viewer Preferences, MediaBox, CropBox, TrimBox, BleedBox, ArtBox, Rotation, etc.
Save fonts to font file

Extract Fonts from PDF and Save to font files

  • VeryPDF PDF Extract Tool Command Line can extract embedded fonts in PDF files and then save the fonts to font files. It supports font file formats like TTF (TrueType), CFF (Compact Font Format), and AFM (Adobe Font Metrics).
Save fonts to images

Extract Text with Positions from PDF file

  • Extract text with X, Y, Width, Height positions from PDF file. Extract text by the character, word or page (including invisible text). Search for keywords and retrieve their position. The Command Line Tool is generally used to extract data and resources from a PDF document for further processing.

Features of VeryPDF PDF Extract Tool Command Line

General Functions
  • Information is extracted on the basis of the object type
  • Able to extract all kinds of objects and their respective properties
  • No need for other PDF software
  • Support all versions of PDF format
  • Easy command line operation
  • Process user and owner password protected PDF files
  • Extract embedded PDF fonts to TTF (TrueType), CFF, AFM font files
  • Extract images to TIFF, JPG, PNG, PBM, PPM files
  • Extract plain text to TXT files
  • Extract text with positions to TXT files
  • Extract metadata to XMP file
  • Extract forms to FDF file
  • Extract drawings to XML file

 

Extract Document Properties
  • Query document attributes, including: Author, Title, Subject, Keywords, Application, PDF Producer, Creation date, Modification date, etc.
  • Query Document Security settings
  • Check if the document linearized (optimized for fast web view)?
  • Retrieve PDF version, e.g. 1.4, 1.5, 1.6, 1.7, 1.8
  • Query the number of pages from PDF file
  • Read properties of bookmarks
  • Query destinations of bookmarks
  • Read page labels (e.g. "vii", "IX")
  • Read properties of various resources, include images, colorspaces, fonts, paths, drawings etc.
  • List and extract embedded files
  • List and set optional content groups (layers)

 

Extract ColorSpace

  • Query color space for each object
  • Query color information
  • Query components per pixel
  • Color space (colorant, indexed, monochrome)
  • Lookup color palette table
  • Show colorspace name

 

Extract Image Objects

  • Extract images and save to image files
  • Query height and width in pixels
  • Include height/width/bitcount to output filenames
  • Read out image resolution (DPI)
  • Number of bits per channel
  • Colorspace (bi-tonal, monochrome, color)
  • Convert any other colorspaces to RGB
  • Extract image and set orientation
  • Set the compression to output TIFF image file, include Flate, CCITT G3, G3-2D, G4, JPEG, LZW, none
  • Retrieve JBIG2 encoded B/W images
  • Retrieve mask image and transparency mask

 

Extract Transformation Matrix

  • Transformation values
  • Orientation
  • Rotation
  • Scaling in X and Y direction
  • Positioning in X and Y direction
  • Skewing in X and Y direction

 

Extract Annotation

  • Annotation type (Text, Link, FreeText, Line, Square, Circle, Polygon, PolyLine, Highlight, Underline, Squiggly, StrikeOut, Stamp, Caret, Ink, Popup, FileAttachment, Sound, Movie, Widget, Screen, PrinterMark, TrapNet, Watermark, 3D, etc.)
  • Color values
  • Subject, TItle
  • Destination
  • Contents
  • Date
  • Flags
  • MarkUp Annotation
  • Name in Unicode
  • Position (Rectangle)
  • TextLabel
  • URL, Link Target
  • Corner points if it is a polygon
Extract Page Properties
  • Query Page Size (MediaBox), Visible Size (CropBox) and other dimensions of relevance to printing (TrimBox, ArtBox, BleedBox)
  • Query Viewing Rotation Angle
  • Retrieve Page Contents and save to .txt file
  • Retrieve Text Positions and save to .txt file
  • Retrieve Annotations and their properties

 

Extract Page Contents

  • Extract all objects (object, image, text, path, font, weight, annotation, etc.) and query their attributes
  • Query current graphics state to XML file

 

Extract Text Contents

  • Extract text as Unicode by the character, word or page, including visible and invisible text
  • Support texts that do not contain space characters, able to add spaces automatically
  • Extract words and their coordinates (X, Y)
  • Extract text lines and their coordinates (X, Y)
  • Extract bounding box (rectangle) for each word
  • Extract font size and baseline in points
  • Extract text line length in points
  • Extract text line length in characters
  • Extract rotation for each word

 

Extract Font Type

  • Extract font data and save to font files
  • Query font name
  • Query font type, e.g., TrueType, Type1, etc.
  • Height of uppercase and lowercase letters
  • Available character names of the font subset
  • Font Encoding
  • Font Flags
  • Font Bounding box
  • Datastream of a font program
  • Tilt angle of italic fonts
  • Recommended distance between base line and following line (leading)
  • Vertical and horizontal width of glyph stems
  • Render TrueType (TTF) fonts to GIF images
  • Font view tool ftview.exe included
  • TTF to GIF conversion tool ttf2img.exe included

 

Extract Graphics State

  • Retrieve Current transformation matrix
  • Spacing between characters and words
  • Elements and phase of a dash pattern
  • Colorspace of fill and line colors
  • Fill and line colors as RGB or CMYK value
  • Overprint settings for fill and line colors
  • Alpha constant of fill and line colors
  • Query blend mode
  • Flatness tolerance
  • Query font name and font size
  • Horizontal scaling
  • Text style (leading, line spacing)
  • Line style (line cap, line join, miter limit) and line width
  • Name of the rendering intent
  • Smoothness tolerance
  • Soft mask
  • Text strikeout
  • Text rendering mode
  • Text relocation (up or down)

 

Extract Bookmarks

  • Bookmark Title
  • Bookmark Level
  • Bookmark Page Number
  • Bookmark Destination
  • Quantity of bookmarks

 

Extract Destination

  • Position (coordinates for bottom left and top right)
  • Type
  • Page Number

 

Extract Form Fields

  • Retrieve name and values for all form fields
  • Save form fields to FDF file
  • Support unicode characters in form fields
  • Field Type (Text Field, Check Box, Radio Button, Combo Box, List Box or Push Button, etc.)
  • Field Name
  • Field Flags
  • Field Justification (Left, Center, Right)
  • Field State Option (On, Off, Yes, No, etc.)

Benefits

Properties and Benefits

Texts extracted using the VeryPDF PDF Extract Tool can be used for indexing documents or in search engines. For example, you can extract text contents, text positions, fonts, images, metadata, drawings, etc. information from a PDF document for further processing. You can get any information from PDF file as you want.

Performance Characteristics

  • Extract visible and invisible text by the character, word or page from PDF file
  • Search for keywords and get their position
  • Extract images from PDF file
  • Retrieve name and values from form fields
  • Extract document information such as version, metadata, encryption, linearization, etc.
  • List fonts, colorspaces, embedded files, etc.
  • Extract page information and page descriptions
  • Extract bookmarks and their attributes

Technical Details

Input Formats

  • PDF

 

Output Formats

  • TXT (Text Extraction without Positions)
  • TXT (Text Extraction with Positions)
  • TXT (Document Properties, Security Options, Catalog, Metadata, MediaBox, CropBox, TrimBox, BleedBox, ArtBox, Rotation, Hyperlinks, etc.)
  • JPG, PNG, TIF, PBM, PPM, etc. (Image Extraction)
  • XML (Drawing Extraction)
  • FDF (Form Extraction)
  • XMP (Metadata Extraction)
  • TTF, CFF, AFM (Font Extraction)

Programming Languages

All program libraries are written in efficient and thread-safe C++. API offers a selection of the following connections to programming languages:

  • C#, VB .NET, J# via .NET
  • Java via JNI
  • MS Visual Basic, Borland Delphi, MS Office products such as Access and C++ via COM
  • C and C++ via native C

 

Product Variants

  • Shell Tool (Command Line)
  • API & SDK (Programming Interface)
  • COM & ActiveX (Programming Interface)
Discount 45% ($49.90) to buy PDF to Word Converter, PDF to Excel Converter, and PDF to PowerPoint Converter.

Use As

Relative Products

Gold Support 30-DAY NO RISK REFUND
 
  Learn more about
PDF Extract Tool Command Line
  See other products   Download   Buy Now
 
 
                   
 You may like these products
VeryPDF PDFcamp Printer Pro
VeryPDF PDFcamp Printer Pro

$38.00

Convert files of Microsoft Word, PowerPoint, Excel, JPG, PNG, GIF, and HTML to PDF. Create PDF from printable documents.
VeryPDF PDF Editor
VeryPDF PDF Editor

$89.95

Create PDF, annotate PDF, fill PDF forms, edit PDF contents and hyperlinks, and convert PDF to image. It is a cost-effective PDF editor.
VeryPDF PDF to Word OCR Converter
VeryPDF PDF to Word OCR Converter

$59.95

Recognize characters in scanned image PDF and save as Word. It supports batch process that can convert multiple PDF files with one click.