VeryPDF OCR to Any Converter Command Line is a Windows Command Line (Console) application which can be used to batch convert scanned PDF, TIFF and Image files (JPEG, JPG, PNG, BMP, GIF, PCX, TGA, PBM, PNM, PPM) to editable Word, Excel, CSV, HTML, TXT, Pure Text Layer PDF, Invisible Text Layer PDF, etc. formats. OCR to Any Converter Command Line includes a great Table Recovery Engine, all table contents in scanned PDF, TIFF and Image files can be recognized as table objects and inserted into Word, Excel, HTML, Text, CSV, etc. formats.
- Windows 2000 / XP / Server 2003 / Vista / Server 2008 / 7 / 8 of both 32 and 64-bit.
Supported Input Formats
- Text based PDF files (or searchable PDF files)
- PDF files which contain embedded fonts only
- Scanned PDF files (or Image based PDF files)
- Scanned single page and multi-page TIFF files
- Scanned JPEG, JPG, PNG, BMP, GIF, PCX, TGA, PBM, PNM, PPM files
Supported Output Formats
- Plain text files without layout (.txt)
- Plain text files with layout (.txt)
- Plain text based PDF files (PDF is contain text layer only) (.pdf)
- Attach OCRed text layer to original PDF file (.pdf)
- OCRed BW PDF files with hidden text layer (.pdf)
- OCRed Color PDF files with hidden text layer (.pdf)
- OCRed Grayscale PDF files with hidden text layer (.pdf)
- OCR Scanned PDF, TIFF and Image files to RTF format (.rtf)
- OCR Scanned PDF, TIFF and Image files to DOC format (.doc)
- OCR Scanned PDF, TIFF and Image files to Tab Text format (.txt)
- OCR Scanned PDF, TIFF and Image files to CSV format (.csv)
- OCR Scanned PDF, TIFF and Image files to MS Excel format (.xls)
- OCR Scanned PDF, TIFF and Image files to HTML format (.htm, .html)
- Output to TIFF, PNG, BMP, TGA, GIF with Deskew, Despeckle, Noise Removal, Auto-Orientation, Dithering, Black Border Removal, etc. options
Powerful—OCR and Convert various scanned file formats to editable Word, Excel, CSV, HTML, Text, RTF formats quickly
- Batch convert scanned PDF & TIFF & Image files to editable Word, Excel, CSV, HTML, RTF, Text, etc. documents with professional OCR technology.
- Table Recovery: Superior reconstruction of bordered and borderless tables as table objects, with formatting, in Word, Excel, HTML, CSV, RTF etc. formats.
Powerful—OCR and Convert non-searchable PDF files to searchable PDF files
- Convert scanned PDF files and image files to plain text files and searchable PDF files by OCR technology.
- Convert embedded fonts in PDF file to a new searchable PDF file.
- Keep color during PDF, TIFF and image formats to searchable PDF files conversion.
- Repair & Reprocess scanned PDF files.
Powerful—Image to image conversion with more processing options
- Deskew, Despeckle and Noise Removal, Auto-Orientation, Dithering, Black Border Removal.
- Rotate, threshold, dither, resample, flip, mirror on image files.
- Set color depth to new image file with 1, 4, 8, 16, 24, 32 bitcount.
- Skew is measured using text and line objects in the image
Scanned PDF & TIFF & Image files to Office Documents with OCR Conversion
- Active Enhanced OCR Technology with -ocr2 option.
- Use Enhanced OCR Technology to convert Scanned PDF, TIFF and Image files to RTF, DOC, TXT, CSV, Excel, HTML formats.
- No need MS Office to create RTF, DOC, CSV, Excel files.
- Convert scanned PDF documents to MS Excel documents in several layouts,
1. One Excel Sheet per PDF page + One Excel Sheet for all PDF pages
2. One Excel Sheet per PDF page
3. One Excel Sheet contains all PDF pages
- PDF to Excel Converter: Batch convert tables from scanned PDF and Image files to Microsoft Excel spreadsheets.
- PDF to HTML Converter: Batch convert your scanned PDF and Image files to high quality reflowed HTML while preserving styles, tables, etc.
- PDF to Word Converter: Batch convert scanned PDF and Image files to Microsoft Word documents.
- Table Recovery Engine: Superior reconstruction of bordered and borderless tables as table objects, with formatting, into Word & HTML & Excel & CSV, etc.
- Auto-Orientation: Automatically detect text orientation, rotating in proper way landscape images acquired in portrait or portrait images acquired in landscape! Then OCR rotated images to Office documents.
- Able to use wildcard character (*.pdf, *.html, *.tif, *.jpg, *.png, *.doc, *.rtf, etc.) to batch convert scanned PDF & TIFF & Image files to RTF, DOC, XLS, CSV, TXT and HTML documents.
Scanned PDF & TIFF & Image files to "Searchable PDF" & "Plain Text PDF" with OCR Conversion
- Convert scanned PDF files and Image files to plain text files using OCR technology.
- Convert scanned PDF files and Image files to searchable PDF files using OCR technology, it does generate:
1. Grayscale searchable PDF files (Grayscale Image Layer + Invisible Text Layer);
2. Color searchable PDF files (Color Image Layer + Invisible Text Layer);
3. Black and White searchable PDF files (Black and White Image Layer + Invisible Text Layer);
4. Pure text layer PDF file (Visible Text Layer Only).
- Powerful OCR modes, include:
-ocrmode 0: output to text file
-ocrmode 1: OCR PDF pages and insert new text layer under original PDF pages
-ocrmode 2: output to plain text based PDF file
-ocrmode 3: output to OCRed PDF file (BW) with hidden text layer
-ocrmode 4: output to OCRed PDF file (Color) with hidden text layer
- Convert embedded fonts in PDF file to a new searchable PDF file with hidden text layer.
- Create searchable PDF with original color retained, insert a hidden text layer into resultant PDF file.
- Create searchable black-and-white PDF without image, contain pure text layer into resultant PDF file.
- Create searchable black-and-white PDF with image, insert a hidden text layer into resultant PDF file.
- Able to attach OCRed text contents as a hidden text layer and insert it into original PDF file.
- Create searchable PDF with specific color depth of image layer, e.g., Ture Color Image Layer, Grayscale Image Layer, or Black and White Image Layer.
- Create TEXT file containing the coordination information of each word in original PDF, TIFF, Image files, e.g., [X, Y, Width, Height] TEXT.
- When output to searchable PDF file, you can set owner passwords to protect PDF from unauthorized editing, printing, and coping.
- When output to searchable PDF file, you can set open or user passwords to protect PDF from unauthorized opening.
- Support 40-bit or 128-bit PDF encryption protection for output PDF file.
- Able to set descriptions to output PDF files, such as, title, subject, author, keywords, created time, modified time, creator and producer.
Image to Image Conversion
- Deskew, Despeckle and Noise Removal, Auto-Orientation, Dithering, Black Border Removal.
- Set color depth when render "Text Based PDF file", "Image Based PDF file" or "Image file" to a new image file, it can be set 1, 4, 8, 16, 24, 32 color depth, default is 8bit for PDF file importing.
- Deskew: Detect image skew angle and deskew image automatically. Image skew angle can be estimated with great accuracy in two ways: analyzing the text on the image or finding the black border around the paper.
- Despeckle and Noise Removal: Remove speckle cleaning images! Despeckle is the technique of removing speckles (extra pixels) from images gathered by using a scanner, or even after having been captured by digital or conventional cameras. The reasons why users prefer to remove this so-called noise range from aesthetic purposes to practical ones (so that the images look smoother and better, overall).
- Black Border Removal and Auto Cropping: Remove black border around images automatically.
- Resizing and Scaling:
Resize or scale images! You can set the new width and new height to the output images.
- Rotation and Flipping: Rotate images! You can rotate images by any angle. You can flip images horizontally or vertically.
- File Format Conversion:
Convert file formats! Convert files in batch to/from TIFF, JPEG, BMP, PNG, GIF and PDF format!
- Rich Dither Options: Convert the color images or grayscale images to B&W images (monochrome images) using the desired method,
-dither 0: Floyd-Steinberg
-dither 1: Ordered-Dithering (4x4)
-dither 2: Burkes
-dither 3: Stucki
-dither 4: Jarvis-Judice-Ninke
-dither 5: Sierra
-dither 6: Stevenson-Arce
-dither 7: Bayer (4x4 ordered dithering)
- Threshold: Specify a threshold value for image binarization, or find the optimal treshold for image binarization automatically.
- Convert specified pages of source PDF files.
- Specify resolution (DPI) when render PDF pages to image files.
- Set Owner Password or User Password to open encrypted PDF files.
- Support command line operation which is useful for batch process.
- No need for a third-party PDF reader application.
- No need for Adobe Reader and Adobe Acrobat applications.
- Support more than ten languages (download language packages here).
- Support multi-page PDF and TIFF files.
- Support scanned PDF, TIFF/TIF, JPG/JPEG, PNG, BMP, GIF, PCX, TGA, PBM, PNM, PPM files, the resolution should larger than 300 DPI, created from scanner, Black and White color depth.