Questions regarding PDFParserSDK API. How to extract text from PDF and render PDF pages to TIFF files at 300DPI?

These are the queries from our side regarding PDFParserSDK API(),

1. Separator in output result file (htm), should be configurable which is ";" at present.

This is required because, extracted data itself can contain “;” character, so parsing may produce incorrect results

2. PNG Image creation while calling API should be made optional

When we use this API, a PNG file is created, Make its creation optional, as we don’t require this file for processing.

3. In order to support Data PDFs in our product, we convert PDF to TIFF (using PDF2Image API) on desired DPI say 300, & also extract data from PDF using your API.

TIFF creation is required as we don’t have support for displaying a PDF in our applet Viewer. But If a TIFF is generated at 300 DPI, & textual component coordinates returned by PDFParserSDK API are w.r.t. png image generated, which is again at different DPI. So how to map these coordinates with TIFF.

Customer
---------------------------------------------------

>>1. Separator in output result file (htm), should be configurable which is ";" at present. This is required because, extracted data itself can contain “;” character, so parsing may produce incorrect results
>>2. PNG Image creation while calling API should be made optional
>>When we use this API, a PNG file is created, Make its creation optional, as we don’t require this file for processing.

Thanks for your message, the problem issues #1 and #2 can be solved in "PDF Parse & Modify Component for .NET Developer License", once you purchased Developer License, please send to us your Order ID, we will arrange our engineer to work on issues #1 and #2 and provide a new version to you within three business days, we hoping this offer will okay to you.

>>3. In order to support Data PDFs in our product, we convert PDF to TIFF (using PDF2Image API) on desired DPI say 300, & also extract data from PDF using your API.
TIFF creation is required as we don’t have support for displaying a PDF in our applet Viewer. But If a TIFF is generated at 300 DPI, & textual component coordinates returned by PDFParserSDK API are w.r.t. png image generated, which is again at different DPI. So how to map these coordinates with TIFF.

You can use "-r 300" option to convert PDF file to TIFF file at 300DPI,

private static extern int VeryPDF_PDFParserSDK(string lpPDFFile, string lpOutFile, string lpOptions);

// "lpOptions" parameter supports following options:
//
// -f <int> : first page to convert
// -l <int> : last page to convert
// -r <int> : resolution for both X and Y, in DPI (default is 150)
// -opw <string> : owner password (for encrypted files)
// -upw <string> : user password (for encrypted files)
// -html : output text information in HTML format instead of CSV format

For example, you can use "-r 300" option to extract text from PDF file and render PDF pages to TIFF file at 300DPI,

int Test_PDFParserSDK_1(char *pdf_filename, char *out_filename,char *parram)
{
        int nRet = 0;
        printf("parameter is %s", parram);
        char parameter[100];
        strcpy(parameter, "-html -r 150 ");
        strcat(parameter, parram);
        printf("parameter is %s", parameter);
        nRet = PDFParserSDK(pdf_filename, out_filename, parameter);
        return nRet;
}

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

This entry was posted in PDF Parser & Modify SDK and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!