Convert scanned PDF file to a new PDF file with OCR and despeckle processing
We're trying to use pdf2txtocrcmd to OCR a PDF. Here's the parameters we're using:
pdf2txtocrcmd.exe -imageopt -ocrmode 3 c:\filein.pdf c:\fileout.pdf
The input file is three pages. When we don't use the -imageopt flag, then the PDF OCR works as expected. However we're trying to get better results from the OCR so we want to also use the -imageopt flag so that we can despeckle the PDF before OCR. When we use the -imageopt flag, the first two pages of the output PDF are blank (except for your watermark). The third page does have the OCR.
We obviously need all the input pages output.
Thanks for your message, we will research this problem and try to fix it in the new version of PDF to Text OCR Command Line software shortly.
In the meantime, please download "Image to PDF OCR Converter Command Line" software from following web page to try,
after you download and unzip it to a folder, you can run following command line to convert your scanned PDF file to a new PDF file with OCR and despeckle functions,
img2pdfnew.exe -ocr 1 -tsocr -despeckle D:\downloads\noOCR_sub.pdf D:\downloads\newOCR_despeckle.pdf
The speckles will be removed from output PDF file, the output PDF file looks clear enough.
Here is the source PDF file, it is contain speckles and text contents are not selectable,
Here is the OCRed PDF file, as you see, the speckles are removed and text contents are selectable, you can select text contents and copy them into MS Word easily,
- VeryPDF Image Processing SDK, Automatically clean-up images, including auto-rotation, auto-deskew, crop, noise removal, etc. operations.
- Convert scanned TIFF documents and multiple images to searchable PDF (OCR) with Image to PDF OCR Converter Command Line tools
- How to call pdf2txtocr software from Java source code to convert from Non Searchable PDF files to Searchable PDFs?
- I want to draw a region around the field (invoice number, invoice amount, invoice date …) to get its coordinates (x,y,width,height), and then OCR this field from command line.
- We are trying to convert scanned pdfs/images(tiff/jpgs) to searchable pdf files
- Split PDF file by search keyword in text contents, extract a group of pages from PDF file
- How to convert an image based PDF file to editable PDF file?
- How to replace a text word in a scanned PDF file or an image based PDF file or a graphics based PDF file or a AutoCAD drawing PDF file?
- Is there a way to convert scanned Color PDFs to searchable Color PDF?
- Our product does use OCR technology to extract content from Graphic Images
- How to convert PNM to PDF?
- Batch convert all TIFF files in sub-folders to PDF files
- How to convert PNG image to PDF file?
- Can I see the PDF file immediately after conversion if I use Image2PDF v3.2?
- I am testing your Image to PDF Converter GUI and Command Line products to convert incoming fax file on a server to pdf.