Efficiently Analyze and Recognize Document Layouts with VeryPDF Layout Analysis SDK

VeryPDF Layout Analysis SDK: Automating Document Analysis and Recognition.

Managing large volumes of documents is often a daunting task. For example, scanning and digitizing old newspapers, forms, contracts, invoices, and other documents to make them searchable, editable, and accessible can be a time-consuming and error-prone process. The key to automating this process lies in understanding the layout of the document, which can be achieved using VeryPDF Layout Analysis SDK.

VeryPDF PaperTools COM/SDK:

https://www.verypdf.com/app/papertools/index.html

VeryPDF Layout Analysis SDK is a powerful tool that enables you to analyze the layout of any document and recognize the different types of areas on the page with high accuracy. The SDK uses complex algorithms to identify text, inverted text, noise, images, tables (rows, columns, and cells), and horizontal and vertical lines. This information can then be used to sub-classify the document and apply specific rules for each area. For instance, a text area can be classified as a "didascaly" if it is located immediately below a picture and has a smaller font size than the rest of the text on the page.

VeryPDF Layout Analysis SDK identifies the following types of areas:
* text
* inverted text
* noise
* images (pictures or drawings)
* tables (rows, columns and cells)
* horizontal and vertical lines

Layout analysis
The layout analysis of documents is crucial for recognizing their structure automatically, extracting the areas of interest, and running optical recognition engines such as OCR, ICR, or BCR. This process allows you to convert the original image into a structured document while preserving the layout of the original document. A good example of this is a searchable PDF of an old newspaper.

To obtain the best results from the layout analysis, it is essential to ensure that the image quality is as high as possible. This can be achieved using VeryPDF Image Processing libraries, such as Deskew, Despeckle, and Black Border Removal SDK.

Deskew
The Deskew SDK can correct the wrong inclination of the document automatically and quickly. This SDK can deskew images up to 45 degrees and estimate the angle using two methods: text analysis or finding the black border. This tool is especially useful for scanned images from hi-capacity scanners, where the ADF may skew the paper.

Despeckle and noise removal
The Despeckle SDK can remove small black points of noise from images acquired by scanners or received by fax. This process is designed to remove randomly distributed specks from the image, and the size of the speck can be specified by the user.

Black border removal and auto-cropping
The Black Border Removal SDK can automatically detect and remove the black border in monochrome or grayscale images. This tool is crucial for improving the compression rate, reducing file size, and enhancing the visualization aspect of the document.

VeryPDF Layout Analysis SDK can recognize all areas automatically and distinguish between text areas, inverted text areas, images, lines, and tables, as shown in the example below:

VeryPDF Layout Analysis Examples:

image

image

Here is the source code in Microsoft Visual C++ of a sample application using the VeryPDF Layout Analysis SDK:

    strInFile = strFolder + "\\sample\\test_table_ocr.tif";
     strOutFile = strFolder + "\\sample\\output\\_output_" + intToString(nFileIndex) + ".tif";
     strCmd = "-$ XXXXXXXXXXXXXXXXXX -ocr -boxpic \"" + strInFile + "\" \"" + strOutFile + "\"";
     printf("%s\n", strCmd.c_str());
     strReturn = strReturn + VeryPDFCom.PaperTools(strCmd.c_str());
     nFileIndex = nFileIndex + 1;
    
     strInFile = strFolder + "\\sample\\test_table_ocr.tif";
     strOutFile = strFolder + "\\sample\\output\\_output_" + intToString(nFileIndex) + ".tif";
     strCmd = "-$ XXXXXXXXXXXXXXXXXX -layout \"" + strInFile + "\" \"" + strOutFile + "\"";
     printf("%s\n", strCmd.c_str());
     strReturn = strReturn + VeryPDFCom.PaperTools(strCmd.c_str());
     nFileIndex = nFileIndex + 1;

VeryPDF Layout Analysis SDK is a powerful tool that can automate the document analysis and recognition process. By using complex algorithms to recognize the different types of areas on a page, you can sub-classify and apply specific rules to each area. Combined with VeryPDF Image Processing libraries, such as Deskew, Despeckle, and Black Border Removal SDK, you can achieve optimal results for your document management needs.

➤ Want to buy this product from VeryPDF?

Should you be interested in acquiring a license for our product or require assistance in developing a custom software solution based on it, please do not hesitate to reach out to us. Our team is always ready to assist you and provide you with the necessary support.

http://support.verypdf.com/

We look forward to the opportunity of working with you and providing developer assistance if required.

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!