When you need to convert scan PDF to word, you many meet some scan files with speckle which will effect the conversion quality. The speckle is the presence of black points of noise in images acquired by a scanner or received by fax. Cleaning images is a very important preprocessing step to improve the compression rate, the visualization aspect and the accuracy using OCR. Despeckle is a kind of speckle detection and deletion knowledge which can be very easily performed, specifying the maximum width and height of isolated black elements to be considered as speckle.
VeryPDF OCR to Any Converter Command Line was developed with despeckle and OCR tech. By this software, you can despeckle PDF and image, then convert the scan PDF to word. In the following part, I will show you how to use this software.
Step 1. Download OCR to Any Converter
- Downloading finishes, there will be an zip file. Please extract this zip file to some folder then you can check the elements in it.
- If you are not familiar with the command line version software, there is also GUI version available for you.
Step 2. Despeckle PDF and convert PDF to word
- When run the conversion, please refer to the usage and examples.
- Usage: ocr2any.exe [options] <PDF-file> <Text-file>
- When you despeckle PDF and image, please refer to the following command line template.
ocr2any.exe -imageopt C:\in.tif C:\out.tif
By this command line, you can remove dirt from scanned tiff and deskew tiff files.
Now let us check example of despeckle PDF or image from the following snapshot. By this software, all the black points of noise in images can be removed.
ocr2any.exe -imageopt -rotate 45 C:\in.png C:\out.tif
By this command line, you can despeckle PDF or image and rotate image or PDF at 45 degree.
ocr2any.exe -imageopt -threshold 0 C:\in.tif C:\out.bmp
By this command line, you can despeckle PDF or image and adjust threshold at 0 degree.
rotate <int> : by this parameter, you can rotate pages before OCR
-threshold <int> : by this parameter, you can adjust the lightness threshold that used to convert image to B&W, from 1 to 255, 0 is auto, default is -1
-imageopt : this parameter either can help you deskew and despeckle images or PDF automatically.
Now let us check the conversion effect from the following snapshot.
- When you need to convert the scan PDF to word, please refer to the following command line template.
ocr2any.exe -ocr2 -ocr2aor C:\in.pdf C:\out.doc
-ocr2 : use enhanced OCR module to convert scanned PDF and image files to RTF, DOC, TXT, CSV, Excel, HTML files
-ocr2aor : detect page direction and rotate it automatically when -ocr2 used
Now maybe you have rough idea about how to despeckle PDF and convert PDF to word. During the using, if you have any question, please contact us as soon as possible.