Our product does use OCR technology to extract content from Graphic Images. Currently we are using an OCR engine to extract content. We are also evaluating other OCR engines and as part of that we found that VeryPDF satisfies most of our requirements. However I have couple of questions and if you could clarify that and it works out for us and I do not see why we should not go with your OCR product. Please find my questions below,
1) I am trying to Extract data from FE.jpg (attached) and when i use -layout2 option the application is crashing.But when i use -layout option the text is getting extracted but its not in the original document format. Can you please tell me how to get the text content in the format that the file is in.
2) Attached is a *First3Pages_Of_Noise.pdf* document and it has text along with some table data. For this document also the extracted output is not retaining the layout.
Please do let me know how to get this work as per our needs.
Looking forward to your early response. Thanks a lot.
Thanks for your sample files, we suggest you may download VeryPDF OCR to Any Converter Command Line software from this web page to try again,
after you download it and unzip it to a folder, you can run following command line to convert your JPG and PDF files to text files with better layout and OCR precision,
ocr2any.exe -ocr2 D:\downloads\FE.jpg D:\downloads\FE2.txt
ocr2any.exe -ocr2 D:\downloads\First3pages_Of_Noise.pdf D:\downloads\First3pages_Of_Noise.txt
"-ocr2" option will use a better OCR engine to recognize the characters in your JPG and PDF files, this option will get the better OCR result, so you may use "-ocr2" option to convert your scanned JPG and PDF files to plain text files.
btw, you can also convert from your color FE.jpg to a black and white TIFF file first, run following command line to convert from B/W TIFF file to text file again, you will get a better text file with more accurate OCR results,
ocr2any.exe -ocr2 D:\downloads\FE.tif D:\downloads\FE.txt
Please look at a sample TIFF file at below,