VeryPDF OCR Cloud API is a part of VeryPDF Cloud API Platform. VeryPDF OCR Cloud API is allow you to convert scanned PDF, TIFF and other Image formats (PNG, JPG, BMP, GIF, PCX, TGA, etc.) to plain Text format (TXT), editable Word (DOC, DOCX), Excel (XLS, XLSX), PowerPoint (PPT, PPTX), RTF, HTML, XML, PDF etc. document formats.
VeryPDF Cloud API Platform Home Page:
https://www.verypdf.com/online/cloud-api/index.html
The list of supported languages:
Language Code | Language Description |
grc | Ancient Greek Language |
epo_alt | Esperanto alternative language |
eng | English language |
ukr | Ukrainian language |
tur | Turkish language |
tha | Thai language |
tgl | Tagalog language |
tel | Telugu language |
tam | Tamil language |
swe | Swedish language |
swa | Swahili language |
srp | Serbian (Latin) language |
sqi | Albanian language |
spa | Spanish language |
slv | Slovenian language |
slk | Slovakian language |
ron | Romanian language |
por | Portuguese language |
pol | Polish language |
nor | Norwegian language |
nld | Dutch language |
msa | Malay language |
mlt | Maltese language |
mkd | Macedonian language |
mal | Malayalam language |
lit | Lithuanian language |
lav | Latvian language |
kor | Korean language |
kan | Kannada language |
ita | Italian language |
isl | Icelandic language |
ind | Indonesian language |
chr | Cherokee language |
hun | Hungarian language |
hrv | Croatian language |
hin | Hindi language |
heb | Hebrew language |
glg | Galician language |
frm | Middle French (ca. 1400-1600) language |
frk | Frankish language |
fra | French language |
fin | Finnish language |
eus | Basque language |
est | Estonian language |
epo | Esperanto language |
enm | Middle English (1100-1500) language |
ell | Greek language |
deu | German language |
dan | Danish language |
ces | Czech language |
cat | Catalan language |
bul | Bulgarian language |
ben | Bengali language |
bel | Belarusian language |
aze | Azerbaijani language |
ara | Arabic language |
afr | Afrikaans language |
jpn | Japanese language |
chi_sim | Chinese (Simplified) language |
chi_tra | Chinese (Traditional) language |
rus | Russian Language |
vie | Vietnamese Language |
You can use -lang=XXXX parameter to set the OCR Language.
The following URL will convert a TIFF file to text file with English language,
The following URL will convert a TIFF file to HTML file with English language, the output HTML file is contain position for each word and character,
The following URL will convert a multipage TIFF file to text file with English language,
The following URL will convert a multipage TIFF file to HTML file with English language, the output HTML file is contain position for each word and character,
Convert Japanese characters in TIFF file to Japanese text file,
Convert German characters in TIFF file to German text file,
More articles for VeryPDF Cloud API Platform,
https://www.verypdf.com/wordpress/category/verypdf-cloud-api
If you need any other functions which are not included in VeryPDF Cloud API Platform, please feel free to let us know,
We have implemented “Extract text from image rectangles” today, we have added a “rectangle” parameter to OCR characters in a rectangle on image, you can use it like below,
http://online.verypdf.com/api/?apikey=XXXXXXXXXXXXXXXX&app=ocr&infile=https://dl.dropboxusercontent.com/u/5570462/49AD37032CCC2C0_newfilename10.tif&format=1&dumpwordpos=1&lang=swe&rectangle=200×1674+822+379
the meaning of “200×1674+822+379” is,
200 is width,
1674 is height,
822 is left position,
379 is top position,
You should better use urlencode() function to encodes string when you call this URL from PHP code, e.g.,
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$strURL = 'http://online.verypdf.com/api/?apikey=XXXXXXXXXXXXXXXX&app=ocr&infile=';
$strURL .= 'https://dl.dropboxusercontent.com/u/5570462/49AD37032CCC2C0_newfilename10.tif';
$strURL .= '&format=1&dumpwordpos=1&lang=swe';
$strURL .= '&rectangle=' . urlencode('200x1674+822+379');
$returned_content = get_data($strURL);
echo $returned_content;
You can use “rectangle” option to get characters from a special rectangle on image file easily.
>>What products can be used to convert scanned PDF to searchable PDF file?
Thanks for your message, the following products are all can convert scanned PDF files to searchable PDF files, the output PDF files will contain a hidden text layer, you can open OCRed PDF files in Adobe Reader and search text contents properly,
Image to PDF OCR Converter Command Line,
http://www.verypdf.com/app/image-to-pdf-ocr-converter/try-and-buy.html#buy-ocr-cmd
PDF to Text OCR Converter Command Line,
http://www.verypdf.com/app/pdf-to-text-ocr-converter/try-and-buy.html#buy
VeryPDF OCR to Any Converter Command Line,
http://www.verypdf.com/app/ocr-to-any-converter-cmd/try-and-buy.html
“VeryPDF OCR Cloud API” is able to OCR on scanned TIFF and PDF files, you can convert an online TIFF or PDF to text file using following URL,
http://online.verypdf.com/api/?apikey=XXXXXXXXXXXXX&app=ocr&infile=https://dl.dropboxusercontent.com/u/5570462/verypdf-cloud-api/table.tif&outfile=out.txt&lang=eng
http://online.verypdf.com/api/?apikey=XXXXXXXXXXXXX&app=ocr&infile=https://dl.dropboxusercontent.com/u/5570462/verypdf-cloud-api/table.pdf&outfile=out.txt&lang=eng
If you wish get the position for each word, you need add “&format=1&dumpwordpos=1” parameters, for example,
http://online.verypdf.com/api/?apikey=XXXXXXXXXXXXX&app=ocr&infile=https://dl.dropboxusercontent.com/u/5570462/verypdf-cloud-api/table.tif&outfile=out.txt&lang=eng&format=1&dumpwordpos=1
“VeryPDF OCR Cloud API” is an Online & Cloud application, if you want to do the batch conversion, the desktop application may work better for you, fast and without any network connection problems. If so, we suggest you may download “VeryPDF OCR to Any Converter Command Line” product from following web page to try, “VeryPDF OCR to Any Converter Command Line” is a powerful product which can convert scanned TIFF and PDF files to plain Text format (TXT), editable Word (DOC, DOCX), Excel (XLS, XLSX), PowerPoint (PPT, PPTX), RTF, HTML, XML, plain text based PDF etc. document formats, the OCR engine in this software is reach to 99.9% accuracy,
http://www.verypdf.com/app/ocr-to-any-converter-cmd/try-and-buy.html