I have a question for PDF2TXT COM for Table Analyzer version.

Hi
Thanks for good software.
I tested the PDF2TXT COM for Table Analyzer version.
I have a question.
How to use charmap.txt?
I will make software  to support CJK unicode.
I want sample charmap.txt or language pack.
Thank youe for read
=============================
"charmap.txt"  is used to map some special Unicode characters to ANSI code.

Also, "charmap.txt" doesn't support CJK Unicode, if you wish convert CJK characters from PDF file to text file, you need download "PDF2TXT SDK" from following web page,

https://www.verypdf.com/pdf2txt/pdf2txt.htm#dl
https://www.verypdf.com/pdf2txt/sdk/pdf2txt_trial_version.zip

you can call PDF2TXTEx() function from PDF2TXT SDK product, PDF2TXTEx() function can convert CJK characters from PDF file to text file properly.

VeryPDF
=============================
Thanks for answer

I have to explain again that I hope to ask you.
I tested two version(PDF2TXT  and PDF2TXT COM for Table Analyzer) both.

But, 'PDF2TXT COM for Table Analyzer' is not supported CJK unicode.

I need word position(left, top, width, height) for extracts text.

If PDF2TXT SDK is supported word position, I will be buy it.

Thanks you.
=============================
We suggest you may download and purchase PDF Parser SDK from following web page,

http://www.verydoc.com/pdfparsersdk.html

http://www.verydoc.com/pdfparsersdk.zip

PDF Parser SDK can extract text with positions and other information, it can also render PDF pages to image files, it has more functions than PDF2TXT SDK and 'PDF2TXT COM for Table Analyzer', we hoping this product will work fine to you.

Also, PDF Parser SDK does support CJK PDF files, if you encounter a PDF file that can't be render or parse by PDF Parser SDK, please email that PDF file to us, after we checked that PDF file, we will figure out a solution to you asap.

VeryPDF
=============================
Thank you for quick response.
I think the product, pdfparsersdk is for me.
I appologize I got more questions.
 
  1) When the extract using pdfparsersdk, Could I separate image and text each other?
      ( I got the result that is not separated. )
 
  2) When the extract using pdfparsersdk, Could I set up the size of image?
      ( I want set the size of images like this [width = 1000px] )
Thank you.

=============================
Hi,

>>  1) When the extract using pdfparsersdk, Could I separate image and text each other?
>>      ( I got the result that is not separated. )

Please refer to following sample code, you can call PDFParserSDK_GetImageData() function to get image data, and call PDFParserSDK_GetTextInfoData() to get the text data, you can separate them easily,
int Test_PDFParserSDK_3(char *pdf_filename, char *out_filename)
{
       int nRet = 0;
       HANDLE hPDFSDK = PDFParserSDK_GetHandle(pdf_filename, NULL);
       if(hPDFSDK == NULL)
              return nRet;
       int nCount = PDFParserSDK_GetCount(hPDFSDK);
       for(int i = 0; i < nCount; i++)
       {
              int nImageDataLen = PDFParserSDK_GetImageLength(hPDFSDK, i);
              int nTextInfoLen = PDFParserSDK_GetTextInfoLength(hPDFSDK, i);
              vector<BYTE> vecImgData;
              vector<BYTE> vecTxtData;
              vecImgData.resize(nImageDataLen);
              vecTxtData.resize(nTextInfoLen);
              PDFParserSDK_GetImageData(hPDFSDK, i, vecImgData.begin(), vecImgData.size());
              PDFParserSDK_GetTextInfoData(hPDFSDK, i, vecTxtData.begin(), vecTxtData.size());
       }
       PDFParserSDK_Free(hPDFSDK);
       hPDFSDK = NULL;
       return nRet;

>>  2) When the extract using pdfparsersdk, Could I set up the size of image?
>>      ( I want set the size of images like this [width = 1000px] )

You can’t set the size of image, but you can set the DPI to image, the DPI will affect the image’s size, e.g.,

-r <int>           : resolution for both X and Y, in DPI (default is 150)
int Test_PDFParserSDK_3(char *pdf_filename, char *out_filename)
{
       int nRet = 0;
       HANDLE hPDFSDK = PDFParserSDK_GetHandle(pdf_filename, "-r 150");
       if(hPDFSDK == NULL)
              return nRet;
       int nCount = PDFParserSDK_GetCount(hPDFSDK);
       for(int i = 0; i < nCount; i++)
       {
              int nImageDataLen = PDFParserSDK_GetImageLength(hPDFSDK, i);
              int nTextInfoLen = PDFParserSDK_GetTextInfoLength(hPDFSDK, i);
              vector<BYTE> vecImgData;
              vector<BYTE> vecTxtData;
              vecImgData.resize(nImageDataLen);
              vecTxtData.resize(nTextInfoLen);
              PDFParserSDK_GetImageData(hPDFSDK, i, vecImgData.begin(), vecImgData.size());
              PDFParserSDK_GetTextInfoData(hPDFSDK, i, vecTxtData.begin(), vecTxtData.size());
       }
       PDFParserSDK_Free(hPDFSDK);
       hPDFSDK = NULL;
       return nRet;

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

One Reply to “I have a question for PDF2TXT COM for Table Analyzer version.”

  1. I simply could not go away your web site before suggesting that I actually loved the usual information a person provide on your visitors? Is gonna be again incessantly in order to inspect new posts

    VA:F [1.9.20_1166]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.20_1166]
    Rating: 0 (from 0 votes)

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!