How to extract text or recognize text from documents by cloud API?

Question:  I need to extract plain text from uploaded documents in order to make them searchable. Documents could be MS Word or PDF(either scanned or containing text). The application in question is running on a LAMP stack, but installing other software could be an option. Is there any tool, service, library or combination of those on VeryPDF that you could recommend to accomplish this task?

Answer: When you need to extract plain text from uploaded documents in order to make them searchable under the environment of LAMP stack, maybe you can have a free trial of this software:VeryPDF OCR Cloud API, which it is cloud base and there is no system requirement limitation. So you can use it under Linux environment,   meanwhile you can use it together with other applications like Apache+Mysql+Perl/PHP/Python. VeryPDF OCR Cloud API can help you extract text or recognize text from files like PDF, TIF, PNG, JPG and then make them searchable. So at least this software is a solution for you, it is worthy trying. Please check more information of this software on its homepage, in the following part, I will show you how to make it work.

Extract text from uploaded files under Linux system.

  • Most of the VeryPDF APIs run the conversion within browser under VeryPDF server, so you do not need to download any application.
  • Simply open browser and then input URL following the below example then you can recognize text from uploaded file.
    Here is an example:
  • http://online.verypdf.com/api/?apikey=XXXXXXXXXXXXX&app=ocr
    &infile=http://online.verypdf.com/examples/cloud-api/multipage.tif
    &outfile=out&lang=eng&format
    By this above example, we can convert a multipage TIFF file to HTML file with English language, the output HTML file is contain position for each word and character. Now let us check the conversion effect from the following snapshot.

    input tiff file
           The input tiff file.

    output HTML file from tiff
          Output searchable HTML file.

  • All the text have been recognized correctly. You can choose the output formats like text, PDF, word or others to save the output file. For showing example easily, I show it in HTML file.

Extract text from uploaded file together with PHP code.

  • When you need to extract text or recognize text from PHP code, please refer to the following code template:
  • <?php
    //The Code
    /* gets the data from a URL */
    function get_data($url)
    {
        $ch = curl_init();
        $timeout = 5;
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
        $data = curl_exec($ch);
        curl_close($ch);
        return $data;
    }

    //The Usage
    $returned_content = get_data('http://online.verypdf.com/api/?apikey=XXXXXXXXXXXXX&app=ocr
    &infile=http://online.verypdf.com/examples/cloud-api/multipage.tif
    &outfile=out&lang=eng&format
    ');
    echo $returned_content;
    ?>

By this API, you can recognize text  easily from LAMP stack environment. During the using, if you have any question, please contact us as soon as possible.

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

This entry was posted in VeryPDF Cloud API and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!