This article shares tips on how to convert scanned PDF to searchable PDF without losing color. VeryPDF PDF to Text Command Line v3.0 lets you use different ways to create searchable PDF according to different input files. Some input PDF contains scanned pages and editable pages, some PDF contains only scanned pages. If you want to convert such PDF to searchable PDF without losing original color, you can try the option -ocrmode <int>. Converter
-ocrmode 1 vs -ocrmode 2
Four values are permitted by -ocrmode <int>. To remain color, you can use -ocrmode 1 and -ocrmode 4. The following is the comparison between the two modes that you can use to generate color PDF where text is searchable:
|-ocrmode 1||-ocrmode 4|
|supported input formats||scanned PDF||PDF and images|
|text in output PDF||vector-based, searchable||raster-based, searchable|
|quality of the magnified text||high quality||loss clarity|
|text layer||under original PDF pages||hidden|
|original PDF pages||retain||removed|
When use -ocrmode 1?
If the input PDF contains only scanned pages, you are recommended to use -ocrmode 1 as in pdf2txtocr.exe -ocr -ocrmode 1 ocr.pdf ocr1.pdf, Where
- pdf2txtocr.exe calls VeryPDF PDF to Text Converter.
- -ocr calls the built-in OCR engine. This option must appear when convert scanned PDF.
- -ocrmode 1 means to recognize text in scanned PDF, and insert new text layer under original PDF pages.
- ocr.pdf represents the input file.
- ocr1.pdf stands for the output file.
The illustrations below show the effects of conversion from a scanned PDF to searchable PDF. The text in the result PDF can be magnified by any amount without lowering quality.
Fig.1 Input scanned PDF
Fig. 2 After use -ocrmode 1 Fig. 3 Magnified for 16 times
[Tips] -ocrmode 1 only recognizes text in scanned PDF. It can’t recognize text in images. If the input PDF has editable pages, there might appear two text layers: one is newly created, and the other belongs to original editable pages. Such problems can be solved using -ocrmode 4.
When use -ocrmode 4?
In order to convert image to searchable PDF, scanned PDF to searchable PDF, and PDF with some searchable pages to editable PDF, -ocrmode 4 is provided. When use -ocrmode 4 to convert scanned PDF, you will find that the text in the result PDF text will loss clarity as being magnified. The illustrations below show the effects:
Fig. 4 After use -ocrmode 4 Fig. 5 Magnified for 16 times
The following are two command lines for conversion from scanned PDF to searchable PDF:
- pdf2txtocr.exe -ocr -ocrmode 4 -bitcount 24 ocr.pdf color.pdf
- pdf2txtocr.exe -ocr -ocrmode 4 ocr.pdf grey.pdf
The illustrations below show the effects of the two command lines:
Fig. 7 1st command line Fig.8 2nd command line
The following are for conversion from image to PDF:
- pdf2txtocr.exe -ocrmode 4 ocr.tif color.pdf
- pdf2txtocr.exe -ocrmode 4 ocr.png color.pdf
[Tips] When convert image to PDF, -ocr is not required as in the fourth and last command lines above . When convert scanned PDF, -ocr must appear as in the first two command lines above. Moreover, to retain original color when create searchable PDF from scanned PDF, you need to use -bitcount 24. Otherwise, the result PDF will be grey as Fig 8.