How to search content in image PDF file?

Question:Is there an easy way to find specific text within a paragraph?I'm having trouble with a big PDF file. I have all the answers I need in it to pass, but it is all randomly placed information. Like in one paragraph, it talks about farming, then the next about Communism.  But what I need to find is only some things. What I'm saying, is there an application that you can choose what text you want to be analyzed for a specific keyword, and then it will tell you where its at?

Answer: One solution is that “Hit Ctrl+F or look for the "Find" button, enter the word you're looking for and it'll jump through every point in the doc that the word comes up.” This solution works only when the PDF is text based. When handling image based PDF, searching content by this method is not reachable. In the following part, I will show you how to convert image based PDF to text based PDF file.

Step 1. Download OCR to Any Converter CMD

  • This is command line version software, when downloading finishes, please unzip it then you can call it from MS Dos Windows.
  • Please know more about VeryPDF OCR to Any Converter Command Line on our website. In brief, this software can batch convert scanned PDF, TIFF and Image files to editable Word, Excel, CSV, HTML, TXT, Pure Text Layer PDF and others.

Step 2. Convert image PDF to text based PDF for searching content in PDF

  • When you use this software, please check parameter explanation carefully as there are many OCR conversion modes for you to choose. Please choose the correct mode according to your needs.
  • Usage:     ocr2any.exe [options] <PDF-file> <Text-file>
  • When converting image PDF to text PDF by this software, please refer to the following command line templates.
  • ocr2any.exe -subject "subject" C:\in.pdf C:\out.pdf
    ocr2any.exe -subject "subject" -title "title" C:\in.pdf C:\out.pdf
    By this command line, you can con convert image PDF to text based PDF and then add subject, title or others basic information to the new output PDF.
    ocr2any.exe -ownerpwdout 123 -keylen 2 -encryption 3900 C:\in.pdf C:\out.pdf
    By this command line, you can convert image PDF to text PDF and set password to protect output PDF.
    ocr2any.exe -ocr -lang deu -ocrmode 1 C:\in.pdf C:\out.pdf
    ocr2any.exe -ocr -lang eng -ocrmode 2 C:\in.pdf C:\out.pdf
    ocr2any.exe -ocr -lang eng -ocrmode 3 C:\in.pdf C:\out.pdf
    ocr2any.exe -ocr -lang eng -ocrmode 2 -outboxfile C:\in.pdf C:\out.pdf
    By the above command line templates, we can convert image PDF to text based PDF according to language in image PDF file. By this method, the conversion accuracy can be enhanced greatly.  
    Related parameters:

    -subject <string>       : Set 'subject' to PDF file
    -title <string>         : Set 'title' to PDF file
    -author <string>        : Set 'author' to PDF file
    -keywords <string>      : Set 'keywords' to PDF file
    -ownerpwdout <string>   : Set 'owner password' to PDF file
    -openpwdout <string>    : Set 'open password' to PDF file
    -keylen <int>           : Key length (40 or 128 bit)
    -ocr2autorotate         : same as -ocr2aor
      -ocr2excelmode <int>    : set output Excel format when -ocr2 used
        -ocr2excelmode 0: One big sheet + All page sheets
        -ocr2excelmode 1: All page sheets
        -ocr2excelmode 2: One big sheet, default mode

Now let check the conversion effect from the following snapshot. Whether you can search content in output PDF file. During the using, if you have any question, please contact us as soon as possible.

image PDF
                               Input Image PDF file

output test based PDF
    Output Test Based PDF file

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

This entry was posted in OCR Products and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!