How to convert scanned color PDF file to Excel Spreadsheet?

Hi,

I just purchased this software. But it is unable to load multiple page pdf files. The software stopped working when I tried to load a multiple-page PDF.

Please let me know how to fix this.

I urgently need this for my dissertation work.

Thank you.
Customer
-------------------------------------------------------------

Thanks for your sample PDF file, the following is a snapshot of your PDF file, your PDF file is contain the non-white background, it is contain pictures only and without any real text contents.

image

In order to convert this PDF file to text based Excel Spreadsheet, we need convert this PDF file to black and white PDF or TIFF file first, we are using "PDF to Image Converter Command Line" to do this work, "PDF to Image Converter Command Line" can be downloaded from following web page,

http://www.verypdf.com/app/pdf-to-image-converter/try-and-buy.html#buy-cmd
http://www.verypdf.com/dl2.php/pdf2image_win.zip

After we download it, we will run following command line to convert this color PDF file to black and white TIFF file,

pdf2img.exe -r 300 -compress 4 -gray -threshold 150 -multipage D:\test.pdf D:\new.pdf

If some PDF pages are skewed, you can add "-imgopt" option to deskew and despeckle images automatically, e.g.,

pdf2img.exe -imgopt -imgthreshold 150 -r 300 -compress 4 -gray -threshold 150 -multipage D:\test.pdf D:\new.pdf

after a few seconds, we will get a new black and white TIFF file, the text contents in the new TIFF file look clear enough,

image

Now, we will open this new TIFF file in "VeryPDF Table Extractor OCR" application, "VeryPDF Table Extractor OCR" can be downloaded from this web page,

http://www.verypdf.com/app/pdf-to-table-extractor-ocr/try-and-buy.html
http://www.verypdf.com/dl2.php/verypdf-table-extractor-ocr.exe

We will draw a rectangle and columns on all TIFF pages, and OCR tables in every TIFF page, please notice, if some pages are skewed, you need deskew these pages prior to execute OCR operation,

image

After we OCR the text contents on all pages, click "Save" button to save to a XLS file, everything will be fine in the output XLS file,

image

If you have any question or encounter any problem with above steps, please feel free to let us know, we will assist you asap.

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

This entry was posted in Table Extractor OCR and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!