How to extract text from a PDF by API?

Question:Can anyone recommend a API for extracting the text from a PDF? We need to be able to get at text contained in document. We're currently looking at PdfTextStream which seems pretty good, but would like to hear other peoples experiences and suggestions.Are there alternatives (commercial ones or free) for extracting text from a PDF programmatically? Is there any solution on VeryPDF?

Answer: When you need to extract text from PDF by cloud based API application, maybe you can have a free trial of this software VeryPDF PDF to Text Converter Cloud API, by which you can text based PDF files to plain text files. With OCR module, this API can convert scanned PDF files to plain text files.For now this version can not be used to extract text from PDF. And when converting image based PDF to text, please use VeryPDF OCR Cloud API, by which you can also convert images (PDF, TIF, PNG, JPG) containing text into editable searchable text-based documents (PDF, TXT, RTF, DOC, XLS, PPT, XML, HTML). Please check related information of those software on homepage, in the following part, I will show you how to use it.

Step 1. Get an API code

  • VeryPDF cloud based application is not free but quite share. Just spend less than 20$ then you can use more than 20 software free. For using those API free, you need to get an API code by registering an account on registration page.
  • VeryPDF Cloud API is licensed by per account, once you purchase a plan, you can use your APIKEY to access all of VeryPDF Cloud APIs, include 20+ APIs and 200+ parameters.
  • Please make sure the email box same when registration and paying online. Or else the code can not be sent to you correctly.

Step 2. Extract text from PDF

  • When extracting text from PDF, please open browser and input URL following the below examples:
  • You can use -f and -l to specify page range when extract text from PDF file, for example,
    By this URL, we can convert text based PDF file to text and specify conversion page range as from 1 to 1.
    When you need to extract text from PDF and maintain layout , please add parameter –layout.
    Here are more parameters for your reference:
    -f <int>     : first page to convert
    -l <int>     : last page to convert
    -opw <string>: owner password (for encrypted files)
    -upw <string>: user password (for encrypted files)
    -layout      : maintain original physical layout in PDF to Text conversion
    -nopgbrk     : don't insert page breaks between pages in PDF to Text conversion

Now let us check the conversion effect from the following snapshots. During the using, if you have any question, please contact us as soon as possible.

input PDF file
                                   Input PDF file

output text from PDF 
   Output text file.

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Random Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

Verify Code   If you cannot see the CheckCode image,please refresh the page again!