How to read tables from a PDF file using C#?

Question: I want to extract table from PDF, I have a PDF file with a table inside, which SDK is used in C# to recognize tables inside PDF, and some mechanism to read cell by cell? Can VeryPDF please suggest, if you know any application which recognize tables inside PDF?

Answer: According to your needs, maybe you can have a free trial of this software: VeryPDF OCR to Any Converter Command Line Developer Version, by which you can extract table from PDF from C#, VB .NET, MS Visual Basic, Borland Delphi, VBA (MS Office products such as Access) and C++ via COM, C and C++ via native C. By this software, you can also extract table from Image files (JPEG, JPG, PNG, BMP, GIF, PCX, TGA, PBM, PNM, PPM) to editable Word, Excel, CSV, HTML, TXT, Pure Text Layer PDF, Invisible Text Layer PDF, etc. formats.  Please check more information of this software on homepage, in the following part, let us check how to use this software.

Step 1. Free OCR to Any Converter Command Line Developer License

  • By the developer version, you can integrate the corresponding SOFTWARE into your developed software and redistribute it with royalty-free. If the SOFTWARE contains source codes, you have the right to modify and reuse the codes under the Developer License.
  • When downloading finishes, there will be an zip file. Please extract it to some folder then you can call the executable file from C#. There are executable file, ocrsdk.dll, ocrdata folder, help document and others files.

Step 2. Extract table from PDF by C#

  • When you need to extract table from PDF, you may need the following parameters, please have a check.
    -ocr2                  : use enhanced OCR module to convert scanned PDF and image files to RTF, DOC, TXT, CSV, Excel, HTML files
    -ocr2aor             : detect page direction and rotate it automatically when -ocr2 used
    -ocr2autorotate         : same as -ocr2aor
    -ocr2excelmode <int>    : set output Excel format when -ocr2 used
    -ocr2excelmode 0: One big sheet + All page sheets
    -ocr2excelmode 1: All page sheets
    -ocr2excelmode 2: One big sheet, default mode
  • This software provides three modes for extracting table from PDF.
    Mode 1: when extract table from text based PDF file, you do not need to add any parameter during the conversion.
    Mode 2: when you need to extract table from PDF from image based PDF, you can use parameter –ocr2. Using the advanced OCR mode to convert scanned PDF and image files to RTF, DOC, TXT, CSV, Excel, HTML files.
    Mode 3: when you need to extract table from PDF and save output PDF file in Excel, there are more than three sub modes for you to choose.

Please check more table information and parameters in help document. In the following part, let us check the extracting effect from the following snapshot. During the using, if you have any question, please contact us as soon as possible.

input table PDF file
                               Input table PDF file

output table Excel file
                        Output table Excel file.

VN:F [1.9.20_1166]
Rating: 7.5/10 (8 votes cast)
VN:F [1.9.20_1166]
Rating: -3 (from 3 votes)
How to read tables from a PDF file using C#?, 7.5 out of 10 based on 8 ratings

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!