How to get the key value pairs from scanned PDF file?

Hi,

We are using your "VeryPDF PDF Parser & Modify Component for .NET Developer License" product currently.

Our PDF file contains only images, it hasn't any text contents, I want to extract text contents from this PDF file, could you let me know how do I do that?

As we are extracting data from pdf using Verypdf, we require key value pair either as image or as text element. Please suggest a way to get those elements along with data, Our main requirement is to get the key value pairs from this pdf.

Please suggest.

Regards
Customer
--------------------------------------------------

image
Thanks for your message, because your PDF file contains images only, so you should better use OCR technology to convert this PDF file to text file, you may download "VeryPDF OCR to Any Converter Command Line" from this web page to try,

http://www.verypdf.com/app/ocr-to-any-converter-cmd/try-and-buy.html#buy

after you download and unzip it to a folder, you may run following command line to convert your PDF file to text file with OCR technology,

ocr2any.exe -ocr2 -dumpwordpos D:\downloads\Sample1.pdf D:\downloads\Sample1.txt

After you run above command line, you will get some files,

Sample1.txt: This is a text file for your PDF file.
Sample1001.info: This file contains words and their coordinates, for page 1.
Sample1002.info: This file contains words and their coordinates, for page 2.
Sample1XXX.info: This file contains words and their coordinates, for page XXX.

You can parse Sample1XXX.info to get positions for each word easily, then you can make the key value pairs from your application easily.

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!