Hello,
PFA Order invoice receipt of "VeryPDF PDF Parser & Modify Component for .NET Developer License",
Problem Description:
In the attached pdf sample, On parsing pdf using - VeryPDF_PDFParserSDK() API, the ouput htm file is having some hexadecimal characters.
PFA Sample & output htm file.
Please provide the solution ASAP.
Customer
-------------------------------------------
We apologize for any inconvenience this may have caused to you, we have double checked your PDF file carefully, your PDF file contains some special characters with "Customized fonts", please look at attached screenshot.
The characters which render by "Customized fonts" are not real characters, they have been converted to outlines, it is impossible to extract these characters from the PDF document.
You can open this PDF file in Adobe Reader, press CTRL+A to select all contents, press CTRL+C and CTRL+V to copy and paste all contents into notepad, you will notice these garbage characters too,
If you indeed need to extract these characters from PDF file to text or HTML, you can use "PDF to Text OCR Converter Command Line" software, you may download it from following web page to try,
https://www.verypdf.com/app/pdf-to-text-ocr-converter/try-and-buy.html#buy
https://www.verypdf.com/pdf2txt/pdf2txtocrcmd.zip
after you download it, you can run following command line to convert these special characters which render by customized fonts to text file easily,
pdf2txtocr.exe -ocr -lang eng D:\downloads\Sample4.pdf D:\downloads\Sample4.txt
you will able to get a correct text file after a few seconds.