Convert PDF to text using advanced OCR technology

     When you need to convert PDF to text by command line, the following article will be helpful for you. The software I will use is named as VeryPDF PDF to Text OCR Converter CMD, by which you can convert all version PDF files no matter image based or text based to text file. Meanwhile it also can be used to extract text from image file. In the following part, I will take converting PDF to text for example to show you how to use this software.

Step 1. Download PDF to Text OCR Converter Command Line

  • This software is same as other Window command line version software, you need to extract it when downloading finishes. As once downloading finishes, there will be an zip file.
  • In the extracted folder, you can find executable file, help document and bat files for checking examples at once.

Step 2. Convert PDF to text by command line.

  • When you run the conversion, please refer to the usage and examples in readme.txt.
  • Here is the usage for your reference: pdf2txtocr.exe [options] <PDF-file> <Text-file>

When converting PDF to text, please do it in the following two situations:
A: convert text based PDF to text.

When converting text based PDF file to text, you do not need to use OCR function, so please refer to the following command line templates.
pdf2txtocr.exe C:\in.pdf C:\out.txt
By this command line, you can convert single PDF file to text.
pdf2txtocr.exe C:\*.pdf C:\*.txt
When you need to converting text based PDF to text in batch, please using wild character. If you need to convert PDF to text and control conversion page range, please add the following two parameters.
-firstpage <int>    : first PDF page to convert
-lastpage <int>     : last PDF page to convert

B: Convert image PDF to text

  • When converting image based PDF file to text, please refer to the following command line template.
  • pdf2txtocr.exe -ocr -lang eng -ocrmode 0 C:\in.pdf C:\out.txt
    By this command line, you can convert English based image PDF file to text.
    pdf2txtocr.exe -ocr -lang deu -ocrmode 0 C:\in.pdf C:\out.txt
    By this command line, you can convert Germany based image PDF file to text.
    pdf2txtocr.exe -ocr -lang fra -ocrmode 0 C:\in.pdf C:\out.txt
    By this command line, you can convert French based image PDF file to text.
    pdf2txtocr.exe -ocr -lang spa -ocrmode 0 C:\in.pdf C:\out.txt
    By this command line, you can convert Spanish based image PDF file to text.

When converting image PDF to text, you need to add parameter –OCR and using OCR mode o. Now let us check related parameters.
-ocr                : enable OCR function for scanned PDF file
  -lang <string>      : choose the language for OCR engine
  -ocrmode <int>      : set OCR mode
    -ocrmode 0: output to text file

Now let us check the conversion effect from the following snapshot. During the using, if you have any question, please contact us as soon as possible.

input PDF file
                       This snapshot is from image PDF

output text file
                     This is from output text file.

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!