Sometimes we do need to VeryPDF will show you one method of extracting text from scan image file by and I will take extracting text from multipage tiff file for example. The software I use is PDF to Text OCR Converter Command Line, which can be used to extract content from PDF image and other file.from scan file then we can reuse the content of it. In this article,
- There are only server version and developer version stated on our website. If you are common user on laptop or computer, please use the server version.
- Once downloading finishes, please extract zip file and open MS Dos Window then you can run the conversion.
Step 2. Extract text from multipage tiff file by.
- When you use this software, please refer to the usage and examples.
- Here is the usage for your reference: pdf2txtocr.exe [options] <PDF-file> <Text-file>
- When extracting text content from tiff file, please refer to the following command line templates. You can either convert scan tiff files to text or scan tiff file to text based PDF file.
pdf2txtocr.exe C:\in.tif C:\out.txt
By this command line, we can extract content in scan tiff file to text file directly.
pdf2txtocr.exe -ocrmode 3 -threshold 200 -ocr C:\in.tif C:\out.pdf
By this command line, we can convert tiff file to searchable PDF which allows you to copy text freely. And meanwhile you can set threshold and output to OCRed PDF file (BW) with hidden text layer.
pdf2txtocr.exe -ocrmode 4 -rotate 90 -ocr C:\in.tif C:\out.pdf
By this command line,we can rotate PDF in 90 degree and then convert tiff to searchable PDF file. This mode will output to OCRed PDF file (Color) with hidden text layer
Now let us check related parameters to the conversion.
-rotate <int> : rotate pages before OCR
-threshold <int> : lightness threshold that used to convert image to B&W
-ocrmode <int> : set OCR mode
-ocrmode 0: output to text file
-ocrmode 1: OCR PDF pages and insert new text layer under original PDF pages
-ocrmode 2: output to plain text based PDF file
-ocrmode 3: output to OCRed PDF file (BW) with hidden text layer
-ocrmode 4: output to OCRed PDF file (Color) with hidden text layer
There are too more functions to be listed here. If you need to know more parameters, please check them in readme.txt. Now let us check the extraction effect from the following snapshot. During the using, if you have any question, please contact us as soon as possible.