This article introduces a way to convert image and scanned PDF to searchable PDF, well retaining the original color and layout. The tool recommened is VeryPDF PDF to Text OCR Converter Command Line V 3.0.
The former version VeryPDF PDF to Text OCR Converter can only generate black and white PDF from scanned PDF or image. To meet some customers' needs,the new version adds a new option -
You can upgrade the product for free. The new feature of V 3.0 enables you to convert image and scanned PDF to black and white PDF, or color PDF with searchable text. -mode <int> permits four values: 0, 1, 2, 3, and 4.
How to convert scanned PDF to searchable PDF?
If the input is an scanned PDF as the following one, you need to use -ocrmode <int> with -ocr.
In four situations you may going to use four OCR modes:
- -ocrmode 0—When you want to convert a scanned PDF to TXT, you can use -ocrmode 0 as in the following command line: pdf2txtocr.exe -ocr -ocrmode 0 input.pdf output.txt where
- pdf2txtocr.exe is the executable file;
- -ocr is used when the input is a scanned PDF;
- -ocrmode 0 is for generating a text file;
- input.pdf represents the input file; and
- output.txt stands for the output file.
- -ocrmode 1—Convert scanned PDF to searchable PDF with original color retained. You can use -ocrmode 1 as in pdf2txtocr.exe -ocr -ocrmode 1 input.pdf 1.pdf
Figure 2. Result color PDF
-ocrmode 2—If you want to create a black & white searchable PDF without images, you can use -ocrmode 2 as in pdf2txtocr.exe -ocr -ocrmode 2 input.pdf 2.pdf
- -ocrmode 3—To create a B&W PDF with image, you can use -ocrmode 3 as in pdf2txtocr.exe -ocr -ocrmode 3 input.pdf 3.pdf
-ocrmode 4—Convert scanned PDF and image to searchable PDF in color. You can use -ocrmode 4 as in pdf2txtocr.exe -ocr -ocrmode 4 input.tif 3.pdf
Please choose one of the four modes to convert scanned PDF to searchable text file or PDF as you like. The rest part shows how to convert image to searchable PDF.
How to convert image to searchable PDF?
The option -ocrmode <int> is required to convert image to searchable PDF. But -ocr is not necessary when the input is image. Taking a TIF image as an example, you can convert it to different types of searchable PDF with the use of the following command lines:
- pdf2txtocr.exe -ocrmode 0 input.tif 0.pdf
- pdf2txtocr.exe -ocrmode 2 input.tif 2.pdf
- pdf2txtocr.exe -ocrmode 3 input.tif 3.pdf
- pdf2txtocr.exe -ocrmode 3 input.tif 4.pdf
I strongly recommend you to use -ocrmode 3 when convert image containing tables. The following shows the comparison between the original TIF (Figure 5), and the result researchable PDF (Figure 5).
How to get VeryPDF PDF to Text OCR Converter?
If you want to try VeryPDF PDF to Text OCR Converter Command Line V 3.0, please click here to download.
Please feel free to leave a message to ask any question. For more information, you can contact the support group of VeryPDF.