OCR to Any Converter-High quality optical character recognition in 30 languages

Even if we have entered the electronic paper document era, however, the paper document has not gone away and exit our history stage. When handling paper documents, we spend lots of time to classify, store, find and preserve. Many years ago, many companies began to scan paper documents to PDF or image then save them to disk. However, there is still another problem, when you need to retrieve the content from scan file, it will be different. As you can not do copy and paste in the scan files.

  Based on this need, VeryPDF developed software OCR to Any Converter Command Line, which can be used to extract content from scanned PDF and image files and it supports more than 30 OCR languages. In the following part, I will show you how to use this software.

Step 1. Download OCR to Any Converter

  • This is Windows command line application, once downloading finishes, there will be an zip file. Please extract it to some folder then check readme.txt and find the executable file.
  • Run bat file to check the conversion effect and check more examples.

Step 2. OCR scanned PDF file by command line and download language package

  • Here is the usage for your reference.
  • Usage:    ocr2any.exe [options] <PDF-file> <Text-file>
  • Here are the list of languages supported by this software. And when processing PDF, please download OCR language package according to the content in PDF or image files.
  • Bulgarian bul.zip     Catalan  cat.zip    Czech  ces.zip   Danish   dan.zip    German   deu.zip     Greek  ell.zip     English      eng.zip

    Finish     fin.zip       French   fra.zip     Hungarian hun.zip      Indonesian ind.zip    Italian ita.zip      Latvian  lav.zip

    Lithuanian  lit.zip    Dutch  nld.zip      Norwegian  nor.zip     Polish  pol.zip     Portuguese  por.zip    Romanian  ron.zip

    Russian  rus.zip      Slovak slk.zip      Slovenian  slv.zip        Spanish  spa.zip    Serbian   srp.zip        Swedish swe.zip

    Tagalog  tgl.zip      Turkish tur.zip     Ukranian   ukr.zip     Vietnamese  vie.zip

  • When you launch OCR engine, please add parameter -lang which allows you to choose the language for OCR engine. Here are some examples for your reference.
  • pdf2txtocr.exe -ocr -lang eng C:\in.pdf C:\out.txt
    By this above command line templates, you can convert PDF in English to text file. However, as the default language is English, you also do not need to add this parameter.
    pdf2txtocr.exe -ocr -lang deu C:\in.pdf C:\out.txt
    By this above command line, you can launch OCR engine to process PDF in German to text file.
    pdf2txtocr.exe -lang spa C:\in.tif C:\out.txt
    By this above command line template, you can convert tiff file in Spanish to text file.
    Now let us check related parameters.
    -lang <string>    : choose the language for OCR engine
      -ocrmode <int>  : set OCR mode
        -ocrmode 0: output to text file
        -ocrmode 1: OCR PDF pages and insert new text layer under original PDF pages
        -ocrmode 2: output to plain text based PDF file
        -ocrmode 3: output to OCRed PDF file (BW) with hidden text layer
        -ocrmode 4: output to OCRed PDF file (Color) with hidden text layer

    Now let us check the conversion effect from the following snapshot.

    input German PDF and output text

    All the German characters are kept perfectly. By this method, you can extract German content in PDF to text file. During the using, if you have any question, please contact us as soon as possible.

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

Verify Code   If you cannot see the CheckCode image,please refresh the page again!