How to use PDF Table Extractor OCR software to extract table from color PDF file and save to Excel (XLS, CSV) document?

I had been running a trial license with no problems. We recently purchased a license and now the app crashes on startup every time, usually after two or three times trying will start up, but then crash trying to recognize text.

I'm thinking to uninstall and reinstall again? Any support would be much appreciated as we are currently unable to use the software we just bought.

I reinstalled PDF Table Extractor OCR software and got the program to run a character recognition, but what I get is far from usable, any suggestions on cleaning this up?

image

Customer

---------------------------------------------------------------

The original file (from a scan) is too big, here is a reduced version. But this is not the one I was trying to pull the table from. The original is 23,075KB. Is there a limit on the size of PDF the reader will bring in?

Customer

---------------------------------------------------------------

We have checked your PDF file carefully, there has two issues in your PDF file,

1. Your PDF file is contain color information, our OCR engine is work fine for black and white document, if your PDF file contains color information, the OCR engine may not work fine.

image

2. The background color of text area in your PDF file is not solid white color, this color background will affect the OCR accuracy too, you should better re-scan your document to black and white PDF file at 300 or more DPI resolution,

image

3. The paper width and height of this PDF file are too big, if we render this PDF file to TIFF file at 300DPI, the width and height in pixel will be,

3167.3*300/72=13197.08 pixel
2676.5*300/72=11152.08 pixel

this paper size is too big, it may affect the OCR engine too.

In order to get best OCR result from your color PDF file, we suggest you may cut the text area from color PDF file first, convert it to black and white TIFF file, scale it with 200%~300% to increase DPI resolution, then you can use Table Extractor OCR software to extract table from new TIFF file and save to editable document formats easily.

If you have source document, you can re-scan your paper document to TIFF file at 300 DPI, black and white color depth, you will able to get a better editable document with Table Extractor OCR software, this is the simplest solution.

We have created a new TIFF file from your color PDF file, this new TIFF file is just contain text area, on black and white background color, like below,

image

This TIFF file is the text part on PDF page, we have removed color background from this TIFF file by Adobe Photoshop software, scale it to 200%~300% to increase image quality (increase DPI), when we open new TIFF file in PDF Table Extract OCR software, the OCR engine is work great,

image

Click "Save" button in PDF Table Extract OCR software, save to Excel XLS format, the result is look better in MS Excel application,

image

Above steps are too complicated for non-professionals, if possible, we suggest you may rescan your paper document to a new PDF file or TIFF file, this new PDF file or TIFF file is just contain text contents on white color background, the DPI of this new PDF file or TIFF file is higher than 300 DPI, then you can use PDF Table Extractor OCR software to extract table contents from this new PDF file or TIFF file easily, the OCRed text contents in resultant Excel document will be more accurate.

The following is a sample PDF file, just for testing purposes,

http://online.verypdf.com/images/tiff/testocr.pdf

image

We open this PDF file in PDF Table Extractor Software, draw table and columns, click OCR button, the OCR result is great, because the text contents on this PDF file is very clear,

image

After we save to Excel document, the text contents are look perfect,

image

So, the biggest factor is the quality of the text, if text contents are great, you will get a great Excel document easily.

VeryPDF

VN:F [1.9.20_1166]
Rating: 10.0/10 (2 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
How to use PDF Table Extractor OCR software to extract table from color PDF file and save to Excel (XLS, CSV) document?, 10.0 out of 10 based on 2 ratings

Related Posts

This entry was posted in Table Extractor OCR and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!