I’ve been browsing your site with interest – I suspect you will have the components to do what I would like.
I would like to take a folder (batch) of scanned images (TIFF or PDF) and extract as much text as practical from them – I don’t want to have to post-process the text and close enough is good enough in this case. This seems straight forward with your or others tools.
However I would then like to be able to search that folder for documents containing specified words and be able to display (side by side?) the original image and extracted text (assuming that we are never going to get perfecttranslation).
I am a Software Developer and could assemble something given the right SDKs.
Do you have any suggestions for how this might be done?
You can use our PDF to TextConverter Command Line product to convert your PDF or TIFF files to text files, then you can search keywords in output text files easily, you can download the trial version of PDF to Text OCR Converter Command Line product from following web page to try,