In this article, I will show you how to convert PDF to text by command line. Even if now many PDF readers allow you to directly select the contained text by mouse drag and then copy the selected text using Ctrl+C & Ctrl+V , you can paste the copied text wherever you want. But if we need do the copy & paste in an effective way, this way seems a little weak. So how can make it effective and accurately? If now you are searching a solution, please follow my steps.
PDF analysis is necessary when you choose software in the market as there are thousands of tools for this conversion. Which one is better? Maybe you have tried some of them, however, they always screw up the text part. According to my knowledge, I divide PDF to two kinds: image PDF and text PDF. Image PDF can not be selected in PDF reader but text PDF file allows you to do copy & paste in PDF reader. But there is one exception that if PDF file contains embedded fonts, even if you can do copy & paste on it, but the output text is message code.
First, convert text based PDF file to text by command line.
- Download PDF to TXT Converter, this software either can be used as GUI version or the command line version. So after downloading the exe, please install it by double click the exe file following installation message.
- If the installation finishes and successes, there will be an icon on the desktop. And you need to find the executable file in installation folder.
- Call it from MS Dos Windows and press Enter then you can check parameter list. If you need to know, please check user guide on our website.
- For this software, it is extremely easy to use and it only charges $38.00
Usage: PDF2TXT <input PDF file> [output TXT file]
Example: PDF2TXT C:\test\*.pdf C:\test\*.txt
You can use wild character to do the conversion in batch.
Example:PDF2TXT C:\input.pdf C:\output.txt -open –silent –first 1 –last 10 –unicode
By this command line, you can set conversion page range is 1-10. Once conversion finishes, the text will be open automatically. And it will export text file use (UTF-8) encoding. Now let us check the conversion effect from the following snapshot.
Second, image PDF to text by command line.
- The software I mentioned above only can be used to convert text based PDF to text. When encounter image PDF file, it will output message code as image PDF needs OCR technology.
- Then please choose software PDF to Text OCR Converter which can be used to recognize text from scanned documents, image PDF and image files. As it uses OCR technology, its price will be a little higher Server License $195.0 and it has more function than above one, I will list some of them below.
Input formats: TIFF, BMP, PNG, JPG, PCX, and TGA PDF and others.
More functions, please check on our website. If you are interested in it, please download it and have a free trial.
Usage: pdf2txtocr [options] <input file> <output file>
Example: pdf2txtocr.exe -text "PageText %PageNumber% of %PageCount%" C:\in.pdf C:\out.txt
By this command, you can add page number for output text.
Example: for %F in (D:\temp\*.pdf) do pdf2txtocr.exe -ocr -lang deu "%F" "%~dpnF.txt"
Following command line will OCR all PDF files in D:\temp\ folder to text files.
If you need to know more about his software, you’d better experience it yourself. Now let us check OCR conversion effect form the following snapshot.
During the using, if you have any question, please contact us by the ways supported on our contact us website.