VeryPDF Cloud OCR API does convert scanned PDF files to text files, it can also extract text positions from scanned PDF files

VeryPDF Cloud API Platform:

https://www.verypdf.com/online/cloud-api/index.html

VeryPDF Cloud API is support two type of OCR apps, one is "ocr", another is "pdf2txtocr". "ocr" app is for image formats only, "pdf2txtocr" is for PDF format only, please understand their difference.

"ocr" app is not support PDF format, it is support .tif, .tiff, .png, .gif, .bmp, .jpg image formats, it can convert these image files to plain text files or HTML files, it can also extract position for each word from these image files.

The following URL will convert a multi-page TIFF file to a single text file, word positions will be output to browser,

http://online.verypdf.com/api/?apikey=XXXXXXXXXXXXXXXX&app=ocr
&infile=https://dl.dropboxusercontent.com/u/5570462/multipage.tif
&outfile=out.txt&format=1&dumpwordpos=1&dumptofile=0

The following is the OCRed contents from input TIFF file,

***** page_1; image "20140823-102452-2088203320.tif"; bbox 0 0 2000 2388; ppageno 0
[23 32 262 74] 'Universal'
[278 31 569 73] 'Declaration'
[586 31 637 73] 'of'
[649 31 836 72] 'Human'
[853 30 1012 83] 'Rights'
[24 152 197 186] 'Whereas'
[212 151 438 196] 'recognition'
[453 150 497 185] 'of'
[507 150 567 185] 'the'
[582 150 745 185] 'inherent'
[759 150 899 195] 'dignity'
[914 149 983 184] 'and'
[998 149 1041 184] 'of'
[1051 149 1111 184] 'the'
[1126 149 1232 194] 'equal'
[1249 148 1319 183] 'and'
[1334 148 1551 183] 'inalienable'
[1565 148 1677 193] 'rights'

The following URL will convert a multi-page TIFF file to a single text file, word positions will be redirect to a disk file on the server, the converted file URL will be outputted to web browser,

http://online.verypdf.com/api/?apikey=XXXXXXXXXXXXXXXX&app=ocr
&infile=https://dl.dropboxusercontent.com/u/5570462/multipage.tif
&outfile=out.txt&format=1&dumpwordpos=1&dumptofile=1

The following URL will convert a multi-page TIFF file to a single text file and without word positions,

http://online.verypdf.com/api/?apikey=XXXXXXXXXXXXXXXX&app=ocr
&infile=https://dl.dropboxusercontent.com/u/5570462/multipage.tif
&outfile=out.txt

If you want convert a scanned PDF file to text file, please use "pdf2txtocr" app to instead of "ocr" app. "pdf2txtocr" app is support PDF format only, it can convert online PDF files to text or HTML files, it can also extract word position for each word.

The following URL will extract text contents and text positions from PDF file and save to a disk file on server, the converted file URL will be outputted to web browser,

http://online.verypdf.com/api/?apikey=XXXXXXXXXX&app=pdf2txtocr
&infile=https://dl.dropboxusercontent.com/u/5570462/test-2pages.pdf
&outfile=out.txt&format=1&dumpwordpos=1&dumptofile=1

The following URL will extract text contents and text positions from PDF file, show text positions to web browser,

http://online.verypdf.com/api/?apikey=XXXXXXXXXX&app=pdf2txtocr
&infile=https://dl.dropboxusercontent.com/u/5570462/test-2pages.pdf
&outfile=out.txt&format=1&dumpwordpos=1&dumptofile=0

The following URL will extract text contents from PDF file and without text positions,

http://online.verypdf.com/api/?apikey=XXXXXXXXXX&app=pdf2txtocr
&infile=https://dl.dropboxusercontent.com/u/5570462/test-2pages.pdf
&outfile=out.txt

Please Notice:

1. Input TIFF, JPG, PNG or PDF file should be 300DPI or more DPIs.

2. Input TIFF, JPG, PNG or PDF file should be blank and white color depth.

3. Input TIFF, JPG, PNG or PDF file should be less than 3MB or less than 5 pages, too big or too many pages will cause timeout problem or be killed by server monitor.

If you encounter any problem with "ocr" or "pdf2txtocr" app, please feel free to let us know, we will assist you asap.

http://support.verypdf.com/open.php

Rating: 0.0/10 (0 votes cast)

Rating: 0 (from 0 votes)

August 2014
M	T	W	T	F	S	S
« Jul				Sep »
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Related Posts

Leave a Reply Cancel reply