Question:I am converting PDF files to text using some application, however I found that if a PDF has embedded fonts or Open Type fonts, I cannot get the text from the PDF. After conversion, there are full of messy code. Is there solution for this on VeryPDF? I just need to convert to text. Any help is appreciated. Thanks!
Answer: When converting PDF with embedded font to text, there will be message code in output text file. When encountering those kind of PDF file, you need to change embedded font to system font or converting PDF to text by software with OCR function. When you need to change embedded fonts to system fonts, please use software VeryPDF PDF Editor. When converting PDF to text by OCR function software, please have a free trial of OCR to Any Command Line Converter. Both of them can help extract text from PDF with embedded fonts. But the PDF Editor is GUI version software, but OCR to Any Command Line Converter is command line version software. Please choose the proper version according to your needs. In the following part, I will introduce those two method in brief.
Method 1: Change PDF embedded font to system fonts by PDF Editor.
- Download PDF Editor and install it by double clicking the downloaded exe file. By this software, you can change PDF embedded font to system fonts.
- The following snapshot is from the software interface and change PDF font, please have a check.
- Please add PDF file to software interface. Click button Edit Content then you can draw frame around the content of the whole page till there is red box around the text. Please click the red frame and then there will drop download list. Please choose Properties then you will find menu tab like I showed in above snapshot. Then you can change the font in PDF to system font. All the system fonts will be listed there, you can choose any one of them according to your needs.
- Then you can convert system font PDF to text easily.
Method 2: Convert PDF with embedded fonts to text by OCR
- When converting PDF with embedded fonts to text by OCR software, please use software VeryPDF OCR to Any Converter Command Line.
- This is command line version software, when downloading finishes, please extract it to some folder then you can check its usage, parameters and other related elements.
- Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. So it can overcome embedded font limitation.
- Usage: ocr2any.exe [options] <PDF-file> <Text-file>
- When converting embedded font PDF to text, please refer to the following code template:
ocr2any.exe -ocr -res 300 C:\in.pdf C:\out.txt
By this software, you can also convert PDF with embedded font to text by command line. Please choose the proper method according to your needs. During the using, if you have any question, please contact us as soon as possible.