How to use VeryPDF Cloud OCR API to OCR typical Invoice by a template?

Hi,

We are evaluating the VeryPDF Cloud OCR API. Already got the TIFF to Text running with the help of the samples.

Questions:
a) If say the TIFF is like a typical Invoice. How can I tell which parts at the top-header information and which part is the invoice-detail. In other words do you provide any sort of template capability. The OCR text is important but need to somehow have our users tell us which text represents which meta-data so we can use that in reducing data-entry and improving import process.

b) May have missed it but do you have regular desktop SDK where one can directly scan a document like with TWAIN or WIA compliant. This way we save a step having to convert to multi-page TIFF before sending to your service.

Thanks for your help
Customer
-----------------------------------------------------


>>a) If say the TIFF is like a typical Invoice. How can I tell which parts at the top-header information and which part is the invoice-detail. In other words do you provide any sort of template capability. The OCR text is important but need to somehow have our users tell us which text represents which meta-data so we can use that in reducing data-entry and improving import process.

Thanks for your message, VeryPDF Cloud OCR API has "Extract text from image rectangles" feature, please look at following message,

We have implemented "Extract text from image rectangles" feature in the VeryPDF Cloud OCR API, we have added a "rectangle" parameter to OCR characters in a rectangle on the image, you can use it like below,

http://online.verypdf.com/api/?apikey=XXXXXXXXXXXXXXXX
&app=ocr
&infile=https://dl.dropboxusercontent.com/u/5570462/test.tif
&format=1&dumpwordpos=1&lang=swe&rectangle=200×1674+822+379

the meaning of "200×1674+822+379" is,

200 is width,
1674 is height,
822 is left position,
379 is top position,

You should better use urlencode() function to encodes string when you call this URL from PHP code, e.g.,

< ?php
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}

$strURL = 'http://online.verypdf.com/api/?apikey=XXXXXXXXXXXXXXXX&app=ocr&infile=';
$strURL .= 'https://dl.dropboxusercontent.com/u/5570462/test.tif';
$strURL .= '&format=1&dumpwordpos=1&lang=swe';
$strURL .= '&rectangle=' . urlencode('200×1674+822+379');

$returned_content = get_data($strURL);
echo $returned_content;
?>

You can use "rectangle" option to get characters from a special rectangle on image file easily.

You may OCR same image with different rectangles, then you will get text contents for each rectangle or region.

>>b) May have missed it but do you have regular desktop SDK where one can directly scan a document like with TWAIN or WIA compliant. This way we save a step having to convert to multi-page TIFF before sending to your service.

Thanks for your message, the following products are all can convert scanned PDF files to searchable PDF files, the output PDF files will contain a hidden text layer, you can open OCRed PDF files in Adobe Reader and search text contents properly,

Image to PDF OCR Converter Command Line,
https://www.verypdf.com/app/image-to-pdf-ocr-converter/try-and-buy.html#buy-ocr-cmd

PDF to Text OCR Converter Command Line,
https://www.verypdf.com/app/pdf-to-text-ocr-converter/try-and-buy.html#buy

VeryPDF OCR to Any Converter Command Line,
https://www.verypdf.com/app/ocr-to-any-converter-cmd/try-and-buy.html

If you want scan the documents to TIFF files, you can try with our "VeryPDF Scan to Word OCR Converter" software, "VeryPDF Scan to Word OCR Converter" is contain a "Document Imaging (Scan And Edit Documents)" software, you can use this software to scan documents and save to TIFF files easily,

https://www.verypdf.com/scan-image-pdf-to-word-ocr/index.html
https://www.verypdf.com/dl.php?file=verypdfscan2wordocr.exe

btw, we haven't a desktop SDK to scan documents to PDF files directly yet, if you need this SDK, please feel free to let us know, we will develop a "Scan to PDF SDK" to you quickly, after this "Scan to PDF SDK" is ready, you can call it from Java, C#, VB.NET, ASP.NET, VBScript, C++, etc. program languages.

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!