Question:I want to convert a PDF file to text so that I can create a spreadsheet out of it. When I try to convert the PDF to text, all there is are a bunch of symbols and letters that don't make sense. When I try to convert other files, it works fine. But the one file that I really want to convert doesn't work. Any suggestions?
Answer: According to your needs, you need some software which has better OCR function to process that PDF file. Sometimes when OCR recognition fails, there will be some messy code. You can have a free trial of software VeryPDF PDF to Text OCR Converter CMD, by which you can convert PDF to text with better effect. In the following part, I will show you how to use this software.
- When downloading, please make sure download the right version according to your needs. There are server version and developer version.
- When downloading finishes, there will be a zip file. You need to extract it to some folder then you can call the executable file in MS Dos Windows.
Step 2. Convert PDF to text when there is some type of encryption.
- When you use this software, please refer to the usage and examples.
- Usage: pdf2txtocr.exe [options] <PDF-file> <Text-file>
- When convert password protected PDF to text, please refer to the following command line templates.
pdf2txtocr.exe -ownerpwd 123 -userpwd 456 C:\in.pdf C:\out.txt
By this command line, we can convert password protected text based PDF file to text. And PDF is protected by open password and owner password. If there is only open password or owner password, simply input the password that would be OK.
pdf2txtocr.exe -ownerpwd 123 -userpwd 456 -ocr -lang eng C:\in.pdf C:\out.txt
By this kind of command line, we can convert password protected image PDF to text. Please launch OCR function by parameter –ocr and specify OCR language according to the content in PDF file.
Now let us check related parameters:
- When converting PDF to text, this software also allows you to add page number to text file, specify conversion page range, rotate input PDF, adjust threshold of PDF and others. Please check more functions on software homepage.
-ownerpwd <string> : please input owner password for encrypted PDF file following this parameter.
-userpwd <string> : please input user password for encrypted PDF file following this parameter.
-ocr : this parameter will enable OCR function for scanned PDF file
-lang <string> : when you need to choose the language for OCR engine
-ocrmode <int> : set OCR mode
-ocrmode 0: output to text file
Please download language package for choosing OCR language.
So we can get that by this software we can remove password from PDF and get text from image PDF even if there is some encryption. During the using, if you have any question, please contact us as soon as possible.