Question: I have to extract text from a PDF within a specific rectangular region. The work-flow is as following. First of all PDF is converted to an jpg image. Then user draws selection rectangle on top of the picture. Then I somehow need to extract all text from PDF within that selection region. Any suggestions on VeryPDF?
Answer:I would suggest once you have converted the PDF into a JPEG image to use text recognition (OCR) to extract the text within the selected region. As far as extracting text from PDF is concerned the following illustrating how this be achieved more or less reliably. If converting PDF to JPEG and then to text is necessary, I guess you have a free trial of this software:VeryPDF OCR to Any Converter Command Line , But if converting PDF to JPEG and then to text is not necessary, you can have a free trial of this software: VeryPDF Table Extractor OCR, by which you can draw a specific rectangular region on the PDF directly then the content in it will be recognized to text at once. No matter input file is PDF or JPEG, this software also allows you to extract text from input within a specific rectangular region. So in the following part, I will show you how to use software Table Extractor OCR.
- There are two versions of this software: Mac and Windows. Please download the proper version according to your computer system.
- When downloading finishes, there will be an exe file. Please install this software by double clicking the exe and following installation message till short cut icon showing up on desktop. Simply click it then you can launch this software. The following snapshot is from the software interface, please have a check.
Step 2. Extract text from PDF
- When you open software interface, please click button Open to add PDF or image file to software interface.
- Click icon of rectangular then you can draw any area around the text where you need to convert it to text.
- Once you draw a specific rectangular region, the content will be shown at the bottom content part at once.
- Then click button Save to choose the output folder, you can save it as text, word, Excel or other file formats.
The reason of why I strongly recommend this software is that this software supports input file formats as PDF, TIFF, BMP, PNG, JPG, PCX, and TGA, meanwhile this software supports more than 20 OCR languages. By this software, you can extract text from PDF, image in a specific rectangular region casually. During the using, if you have any question, please contact us as soon as possible.