Despeckle PDF & image and then convert scan PDF to word document

  When you need to convert scan PDF to word, you many meet some scan files with speckle which will effect the conversion quality. The speckle is the presence of black points of noise in images acquired by a scanner or received by fax. Cleaning images is a very important preprocessing step to improve the compression rate, the visualization aspect and the accuracy using OCR. Despeckle is  a kind of  speckle detection and deletion knowledge which can be very easily performed, specifying the maximum width and height of isolated black elements to be considered as speckle.

   VeryPDF OCR to Any Converter Command Line was developed with despeckle and OCR tech. By this software, you can despeckle PDF and image, then convert the scan PDF to word. In the following part, I will show you how to use this software.

Step 1. Download OCR to Any Converter

  • Downloading finishes, there will be an zip file. Please extract this zip file to some folder then you can check the elements in it.
  • If you are not familiar with the command line version software, there is also GUI version available for you.

Step 2. Despeckle PDF and convert PDF to word

  • When run the conversion, please refer to the usage and examples.
  • Usage:               ocr2any.exe [options] <PDF-file> <Text-file>
  • When you despeckle PDF and image, please refer to the following command line template.
  • ocr2any.exe -imageopt C:\in.tif C:\out.tif
    By this command line, you can remove dirt from scanned tiff and deskew tiff files.
    Now let us check example of despeckle PDF or image from the following snapshot. By this software, all the black points of noise in images can be removed.
    deskew image
    ocr2any.exe -imageopt -rotate 45 C:\in.png C:\out.tif
    By this command line, you can despeckle PDF or image and rotate image or PDF at 45 degree.
    ocr2any.exe -imageopt -threshold 0 C:\in.tif C:\out.bmp
    By this command line, you can despeckle PDF or image and adjust threshold at 0 degree.
    Related Parameters:
    rotate <int>           : by this parameter, you can rotate pages before OCR
    -threshold <int>   : by this parameter, you can adjust the lightness threshold that used to convert image to B&W, from 1 to 255, 0 is auto, default is -1
    -imageopt             : this parameter either can help you deskew and despeckle images or PDF automatically.

Now let us check the conversion effect from the following snapshot.

compare despeckle tiff
  This snapshot is about despeckle PDF

  • When you need to convert the scan PDF to word, please refer to the following command line template.
    ocr2any.exe -ocr2 -ocr2aor C:\in.pdf C:\out.doc
    Related Parameters:
  • -ocr2  : use enhanced OCR module to convert scanned PDF and image files to RTF, DOC, TXT, CSV, Excel, HTML files
    -ocr2aor : detect page direction and rotate it automatically when -ocr2 used

Now maybe you have rough idea about how to despeckle PDF and convert PDF to word. During the using, if you have any question, please contact us as soon as possible.

VN:F [1.9.20_1166]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Despeckle PDF & image and then convert scan PDF to word document, 10.0 out of 10 based on 1 rating

Related Posts

This entry was posted in OCR Products and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!