Question: I need to extract plain text from uploaded documents in order to make them searchable. Documents could be MS Word or PDF(either scanned or containing text). The application in question is running on a LAMP stack, but installing other software could be an option. Is there any tool, service, library or combination of those on VeryPDF that you could recommend to accomplish this task?
Answer: When you need to extract plain text from uploaded documents in order to make them searchable under the environment of LAMP stack, maybe you can have a free trial of this software:VeryPDF OCR Cloud API, which it is cloud base and there is no system requirement limitation. So you can use it under Linux environment, meanwhile you can use it together with other applications like Apache+Mysql+Perl/PHP/Python. VeryPDF OCR Cloud API can help you or from files like PDF, TIF, PNG, JPG and then make them searchable. So at least this software is a solution for you, it is worthy trying. Please check more information of this software on its homepage, in the following part, I will show you how to make it work.
Extract text from uploaded files under Linux system.
- Most of the VeryPDF APIs run the conversion within browser under VeryPDF server, so you do not need to download any application.
- Simply open browser and then input URL following the below example then you can
Here is an example:
from uploaded file.
- All the text have been recognized correctly. You can choose the output formats like text, PDF, word or others to save the output file. For showing example easily, I show it in HTML file.
By this above example, we can convert a multipage TIFF file to HTML file with English language, the output HTML file is contain position for each word and character. Now let us check the conversion effect from the following snapshot.
Extract text from uploaded file together with PHP code.
- When you need to or recognize text from PHP code, please refer to the following code template:
/* gets the data from a URL */
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
$returned_content = get_data('http://online.verypdf.com/api/?apikey=XXXXXXXXXXXXX&app=ocr
By this API, you can recognize text easily from LAMP stack environment. During the using, if you have any question, please contact us as soon as possible.