Hi,
We're trying to use pdf2txtocrcmd to OCR a PDF. Here's the parameters we're using:
pdf2txtocrcmd.exe -imageopt -ocrmode 3 c:\filein.pdf c:\fileout.pdf
The input file is three pages. When we don't use the -imageopt flag, then the PDF OCR works as expected. However we're trying to get better results from the OCR so we want to also use the -imageopt flag so that we can despeckle the PDF before OCR. When we use the -imageopt flag, the first two pages of the output PDF are blank (except for your watermark). The third page does have the OCR.
We obviously need all the input pages output.
Customer
-----------------------------------
Thanks for your message, we will research this problem and try to fix it in the new version of PDF to Text OCR Command Line software shortly.
In the meantime, please download "Image to PDF OCR Converter Command Line" software from following web page to try,
https://www.verypdf.com/app/image-to-pdf-ocr-converter/try-and-buy.html
https://www.verypdf.com/tif2pdf/image2pdf_cmd_ocr_trial.zip
after you download and unzip it to a folder, you can run following command line to convert your scanned PDF file to a new PDF file with OCR and despeckle functions,
img2pdfnew.exe -ocr 1 -tsocr -despeckle D:\downloads\noOCR_sub.pdf D:\downloads\newOCR_despeckle.pdf
The speckles will be removed from output PDF file, the output PDF file looks clear enough.
Here is the source PDF file, it is contain speckles and text contents are not selectable,
Here is the OCRed PDF file, as you see, the speckles are removed and text contents are selectable, you can select text contents and copy them into MS Word easily,