In the market, there are many software which can be used to convert PDF to text file. But most of only support English or some of widely used languages in the world. For converting PDF in all kinds of languages to text, VeryPDF software company developed software PDF to Text OCR Converter Command Line which can be used to recognize text from scanned documents with Optical Character Recognition technology. In the following part, I will show you how to make it in detail steps.
- Judging from its name, we can know that it is a suit of command line software which can be used together with other application. It is easily called from ASP/PHP/C#/.NET/... etc. server side applications.
- After downloading, it is a zip file. Please unzip it and check the elements in it.
- In this zip file, it only includes the English languages package. When need to convert other language PDF file, please download corresponding language package from this website:http://www.verypdf.com/pdf2txt/ocr-language.htm. There are more than 50 languages package stated there.
Second, run the conversion following the example in the readme.txt.
Usage: pdf2txtocr.exe [options] <PDF-file> <Text-file>
There are two cases that I need to mention here: image based PDF and text based PDF
- When converting text based PDF file, simply input the following command line:
pdf2txtocr.exe C:\in.pdf C:\out.txt
- When converting image based PDF file to text, please add this parameter:
-lang <string> : choose the language for OCR engine
Example: pdf2txtocr.exe -ocr -lang eng C:\in.pdf C:\out.txt
eng: short for English
ell: short for Greek
How to tell from image based PDF and text based PDF?
Please do copy& past from the PDF file. If you can do copy and paste, it is the text based PDF file. Or else, it is image based PDF file. There is also one exception, some of PDF file allows you to do copy and paste, but after pasting, in the text file, they are totally messy code. When meet this kind of situation, please convert it like image PDF file.
Now let us check the conversion effect from the following snapshot.
Up to here, the conversion has been finished. During the using, if you have any question, please contact us by the ways stated on out support website.