How to extract image elements and text elements from PDF pages using PDF Parser and Modify Component for .NET Developer License?

I am using your "VeryPDF PDF Parser & Modify Component for .NET Developer License" product now, I have a question.

Query Description:

In the attached sample, htm file is having only the text information and not the image element. Static text coordinates are to be given as image elements.

Please check.
Customer
----------------------------------

image
VeryPDF PDF Parse & Modify Component for .NET,

http://www.verypdf.com/app/pdftoolbox/pdf-parse-modify.html

Thanks for your sample files, I have checked your sample PDF file, your PDF file doesn't contain any images, the following text line in "Sample1_pg_0001.htm" is the background image,

<div style="position: absolute;top:2;left:-1"><img width="2550" height="3300" src="Sample1_pg_0001.png"></img></div>

If you needn't the background image, you can simple ignore or skip it, this can be done easily from your HTML Parser application.

But for the "Sample2.pdf" file, the "Sample2.pdf" file contains only a big picture, it doesn't contain any text contents, so the "Sample2.htm" file contains following lines for image element only,

<div style="position: absolute;top:2;left:-1">
<img width="10337" height="14617" src="Invoices(Invoices_FABRE_RAP.pdf_.pdf)_pg_0001.png">
</img>
</div>
<div type="ImageElement" style="position:absolute; border:2px solid blue; left: 149px; top: 149px; width: 10338px; height: 14617px;">
</div>

This "Sample2.htm" file is normal, "ImageElement" is the information for the big picture in the PDF file, if you needn't the image information, you can simple skip it.

VeryPDF
----------------------------------
Thanks for your information, I have another PDF file, please look at at below,

Please suggest a way to get the labels in red color either as image element or text element or any other way.

Regards
Customer
----------------------------------
Thanks for your sample PDF file, we have checked this PDF file carefully, the red labels in this PDF file are images, so it's impossible to convert these image elements to text elements, thanks for your understanding.

Please look at attached screenshot, when you zoom in the view in Adobe Reader, you will notice the quality is bad, it's indicate these red labels are images, but not text elements, so it's impossible to extract these red labels as text contents.

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

This entry was posted in PDF Parser & Modify SDK and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!