pdf to text, pdf to txt, pdf to text ocr, pdf to txt ocrHome  PDF2TXT  Sample  Support  Document  Component

PDF to HTML OCR Converter Command Line

Convert normal PDF to HTML and scanned PDF to HTML of editable and searchable with OCR technology

What is OCR technology?

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. OCR technology is widely used as a form of data entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of printed records. OCR technology is crucial to the computerization of printed texts so that they can be electronically searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech and text mining. OCR technology is a field of research in pattern recognition, artificial intelligence and computer vision.

What is OCR technology? What is OCR? OCR Technology

VeryPDF's PDF to HTML OCR Converter Command Line converts PDF to HTML singly or in batches with OCR technology. There are in fact two types of PDF documents. Normal PDF files produced by any PDF software are searchable and editable files. PDF files created by scanning books, pages or any physical documents are image PDF files to the machine. Scanned PDF files can not be edited or searched by text. People had to retype entire documents in the past. Luckily VeryPDF's PDF to HTML OCR Converter Command Line allows converting scanned PDF to HTML of editable and searchable in seconds.

Download and Purchase PDF to HTML OCR Converter Command Line

Version

Quantity

Price (USD)

Download

Buy All

PDF to HTML OCR Converter Command Line

1 Server License 195/each

Download PDF to Text OCR Converter Command Line

Buy PDF to Text OCR Command Line Server License

1 Developer License 1495/each

Buy PDF to Text OCR Command Line Developer License

OCR Language Packs

 

Free

Download OCR Language Packs

Free

Note: PDF to HTML OCR Converter Command Line only contain OCR technology for language English. However you can download more OCR language packs at here.

Features and Abilities on PDF to HTML OCR Converter Command Line:

PDF to HTML OCR Converter Command Line Options:
-------------------------------------------------------
Usage: pdf2txtocr.exe [options] <PDF> <Text>

-firstpage <int>   : first PDF page to convert
-lastpage <int>    : last PDF page to convert
-res <int>         : set resolution, the unit is DPI (default is 300 dpi)
-ownerpwd <string> : set owner password for encrypted PDF file
-userpwd <string>  : set user password for encrypted PDF file
-layout            : maintain original physical layout
-noc               : don't insert page breaks 0x0C between pages in text file
-bitcount <int>    : set color depth when render PDF page to image data, it can be set 1, 8, 24, default is 8bit
-ocr               : enable OCR function for scanned PDF file
-lang <string>     : choose the language for OCR engine
-text <string>     : add additional text at end of each text page, this parameter supports the following variables:
    %PageNumber%   : current page number
    %PageCount%    : total page count of PDF file
-$ <string>        : input your License Key

Useful Examples:

pdf2txtocr.exe C:\in.pdf C:\out.html
pdf2txtocr.exe -firstpage 1 -lastpage 1 C:\in.pdf C:\out.html
pdf2txtocr.exe -ocr -res 300 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ownerpwd 123 -userpwd 456 C:\in.pdf C:\out.html
pdf2txtocr.exe -layout C:\in.pdf C:\out.txt
pdf2txtocr.exe -noc C:\in.pdf C:\out.txt
pdf2txtocr.exe C:\in.tif C:\out.txt
pdf2txtocr.exe C:\in.jpg C:\out.txt
pdf2txtocr.exe C:\in.bmp C:\out.txt
pdf2txtocr.exe C:\in.png C:\out.txt
pdf2txtocr.exe -ocr -lang eng C:\in.pdf C:\out.html
pdf2txtocr.exe -ocr -bitcount 1 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -bitcount 8 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -bitcount 24 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -lang deu C:\in.pdf C:\out.html
pdf2txtocr.exe -lang deu C:\in.tif C:\out.txt
pdf2txtocr.exe -text "PageText %PageNumber% of %PageCount%" C:\in.pdf C:\out.txt

Following command line will OCR all PDF files in D:\temp\ folder to text files:
for %F in (D:\temp\*.pdf) do pdf2txtocr.exe -ocr -lang deu "%F" "%~dpnF.txt"

Following command line will OCR all PDF files in D:\temp\ folder and subdirectories to text files:
for /r D:\temp %F in (*.pdf) do pdf2txtocr.exe -ocr "%F" "%~dpnF.txt"

Following command line will OCR all PDF files from D:\temp\ folder and output text files to C:\test folder:
for %F in (D:\temp\*.pdf) do pdf2txtocr.exe -ocr "%F" "C:\test\%~nF.txt""


Take a Look at Other Tools also:

DocConverter COM Component (+HTML2PDF.exe): Convert HTML, DOC, RTF, XLS, PPT, TXT etc. files to PDF files, it is depend on PDFcamp Printer product.
Image to PDF Converter: Convert 40+ image formats to PDF files.
HTML Converter: Convert HTML files to TIF, TIFF, JPG, JPEG, GIF, PNG, BMP, PCX, TGA, JP2 (JPEG2000), PNM, etc. formats.
PDF to HTML Converter: Convert PDF files to HTML documents.
PDF to Text Converter: Convert PDF files to plain text files.
PDF to Vector Converter: Convert PDF files to PS, EPS, WMF, EMF, XPS, PCL, HPGL, SWF, SVG, etc. vector files.
PDF to Image Converter: Convert PDF files to TIF, TIFF, JPG, GIF, PNG, BMP, EMF, PCX, TGA formats.

More Products at VeryPDF

Email Us: support@verypdf.com

Search By Keywords:
JPEG TO DOCUMENT :: JPEG TO DOC :: JPEG TO EDITABLE DOCUMENT :: JPEG TO EDITABLE DOC :: JPEG TO DOCX :: JPEG TO WORD :: JPEG TO OFFICE :: JPEG TO OPENOFFICE :: JPEG TO XML :: JPEG TO EDITABLE WORD :: PNG TO TXT :: PNG TO TEXT :: PNG TO PLAIN TEXT :: PNG TO RTF :: PNG TO HTML :: PNG TO ASCII :: PNG TO HTM :: PNG TO TEXT DOCUMENT :: PNG TO DOCUMENT :: PNG TO DOC ::


VeryPDF.com | VeryDOC.com | VeryPCL.com | Links | Contact

Copyright © 2002- VeryPDF.com, Inc. All rights reserved.
Send comments about this site to the webmaster.