VeryPDF PDF Parser & Modify Component for .NET Developer License failed to extract characters which render by embedded subset and customized fonts

Hello,

PFA Order invoice receipt of "VeryPDF PDF Parser & Modify Component for .NET Developer License",

Problem Description:

In the attached pdf sample, On parsing pdf using - VeryPDF_PDFParserSDK() API, the ouput htm file is having some hexadecimal characters.

PFA Sample & output htm file.

Please provide the solution ASAP.

Customer
-------------------------------------------
We apologize for any inconvenience this may have caused to you, we have double checked your PDF file carefully, your PDF file contains some special characters with "Customized fonts", please look at attached screenshot.

image

The characters which render by "Customized fonts" are not real characters, they have been converted to outlines, it is impossible to extract these characters from the PDF document.

You can open this PDF file in Adobe Reader, press CTRL+A to select all contents, press CTRL+C and CTRL+V to copy and paste all contents into notepad, you will notice these garbage characters too,

image

If you indeed need to extract these characters from PDF file to text or HTML, you can use "PDF to Text OCR Converter Command Line" software, you may download it from following web page to try,

https://www.verypdf.com/app/pdf-to-text-ocr-converter/try-and-buy.html#buy
https://www.verypdf.com/pdf2txt/pdf2txtocrcmd.zip

after you download it, you can run following command line to convert these special characters which render by customized fonts to text file easily,

pdf2txtocr.exe -ocr -lang eng D:\downloads\Sample4.pdf D:\downloads\Sample4.txt

you will able to get a correct text file after a few seconds.

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!