How can I convert a PDF file to HTML, Word, Excel file from C#?

Question:I am using C# 2.0 and developing an application to convert a PDF file into HTML, Word, Excel. Is there any solution available on VeryPDF for that?

Answer: According to your needs, maybe you can have a free trial of this software:VeryPDF OCR to Any Converter Command Line . By this software, you can convert text based PDF, image PDF, scanned image and other file to Word, Excel, HTML, Text, CSV, etc. formats. And there is also developer version of this software, so you can convert PDF to word, Excel, HTML, Text from Visual Basic, C/C++, Delphi, ASP, PHP, C#, and .NET.  Please check more related information on homepage. In the following part, let us check how to use this software.

Step 1. Download OCR to Any Converter Command Line Developer License

  • By the developer version, you can integrate the corresponding SOFTWARE into your developed software and redistribute it with royalty-free. The developer version was developed for those who need to develop software based on this one.
  • When downloading finishes, there will be a zip file. Please extract it to some folder then you can find executable file, help document and other related documents.

Step 2 Convert PDF to word, HTML, Excel and others from C#.

  • When you use this software, please refer to the usage and examples in the help documents.
  • When you run the conversion from PDF to word, HTML, Excel in C#, you may use the following parameters and functions.Here are some examples for your reference:
  • ocr2any.exe -ocr2 -ocr2aor C:\in.pdf C:\out.rtf
    ocr2any.exe -ocr2 -ocr2aor C:\in.pdf C:\out.doc
    ocr2any.exe -ocr2 -ocr2aor C:\in.pdf C:\out.xls
    ocr2any.exe -ocr2 -ocr2aor C:\in.pdf C:\out.rtf
    ocr2any.exe -ocr2 -ocr2aor C:\in.pdf C:\out.html
    By the above command line, we can convert image based, or scanned PDF file to Word, Excel, HTML and other editable files. As we use parameter –ocr2, which will launch the OCR engine to run the conversion.

  • When converting text based PDF file, you do not need to add any parameters.
    ocr2any.exe C:\in.pdf C:\out.doc
    ocr2any.exe C:\in.pdf C:\out.xls
    ocr2any.exe C:\in.pdf C:\out.html
    ocr2any.exe C:\in.pdf C:\out.rtf

Related parameters:
-ocrmode <int>          : set OCR mode
-ocrmode 0: output to text file
-ocrmode 1: OCR PDF pages and insert new text layer under original PDF pages
-ocrmode 2: output to plain text based PDF file
  -ocrmode 3: output to OCRed PDF file (BW) with hidden text layer
  -ocrmode 4: output to OCRed PDF file (Color) with hidden text layer

You can use the parameters and command line templates in C# calling. As there are two many functions and parameters, I can not list all of them here. During the using, if you have any question, please contact us as soon as possible.

VN:F [1.9.20_1166]
Rating: 8.7/10 (3 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
How can I convert a PDF file to HTML, Word, Excel file from C#?, 8.7 out of 10 based on 3 ratings

Related Posts

This entry was posted in OCR Products and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!