Convert scanned PDF file to a new PDF file with OCR and despeckle processing


We're trying to use pdf2txtocrcmd to OCR a PDF. Here's the parameters we're using:

pdf2txtocrcmd.exe -imageopt -ocrmode 3 c:\filein.pdf c:\fileout.pdf

The input file is three pages. When we don't use the -imageopt flag, then the PDF OCR works as expected. However we're trying to get better results from the OCR so we want to also use the -imageopt flag so that we can despeckle the PDF before OCR. When we use the -imageopt flag, the first two pages of the output PDF are blank (except for your watermark). The third page does have the OCR.

We obviously need all the input pages output.
Thanks for your message, we will research this problem and try to fix it in the new version of PDF to Text OCR Command Line software shortly.

In the meantime, please download "Image to PDF OCR Converter Command Line" software from following web page to try,

after you download and unzip it to a folder, you can run following command line to convert your scanned PDF file to a new PDF file with OCR and despeckle functions,

img2pdfnew.exe -ocr 1 -tsocr -despeckle D:\downloads\noOCR_sub.pdf D:\downloads\newOCR_despeckle.pdf

The speckles will be removed from output PDF file, the output PDF file looks clear enough.

Here is the source PDF file, it is contain speckles and text contents are not selectable,


Here is the OCRed PDF file, as you see, the speckles are removed and text contents are selectable, you can select text contents and copy them into MS Word easily,



VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

Verify Code   If you cannot see the CheckCode image,please refresh the page again!