[VeryPDF Release Notes] VeryPDF has released OCR to Any Converter Command Line v6.0 today

VeryPDF has released OCR to Any Converter Command Line v6.0 today, you can download the new version from this web page,

http://www.verypdf.com/app/ocr-to-any-converter-cmd/try-and-buy.html#buy
http://www.verypdf.com/dl2.php/ocr2any_cmd.zip

OCR to Any Converter Command Line v6.0 has following new features,

  • Able to delete blank pages from PDF file.
  • Able to remove black borders which width less than a special value, default is 8.
  • Able to remove black borders automatically.
  • Able to remove the speckles which size less than a special value, default is 20.
  • Upgrade OCR engine to improve the recognition accuracy and speed.
  • Support more languages in one single document, e.g., one document may contain both English and Chinese characters.
  • Support threshold method to convert from a color image file to black and white image file without use halftone technology.
  • Able to determine document's direction automatically.
  • Output to a text file with coordinates for each character.
  • Output to a text file with coordinates for each word.
  • Able to generate debug image file when use -ocr2 option.
  • Output more document formats, include PDF, CSV, HTML, XLS, RTF, ASCII, WP50, WP51, etc..
  • and more minor upgrades.

Please refer to a complete user manual of OCR to Any Converter Command Line v6.0 at below,

VeryPDF OCR to Any Converter Command Line v5.3
Web: http://www.verypdf.com
Web: http://www.verydoc.com
Email: support@verypdf.com
Release Date: Jan  9 2017
-------------------------------------------------------
Description:
  1. Convert text based PDF files to plain text files.
  2. Convert scanned PDF files and image files to plain text files and searchable PDF files by OCR technology.
  3. Convert embedded fonts in PDF file to a new searchable PDF file.
  4. Keep color during PDF, TIFF and image formats to searchable PDF files conversion.
  5. Deskew, Despeckle and Noise Removal, Auto-Orientation, Dithering, Black Border Removal.
  6. Use Enhanced OCR Technology to convert Scanned PDF, TIFF and image files to RTF, DOC, TXT, CSV, Excel, HTML formats.
  7. Create MS Excel document in several layouts.
  8. PDF to Excel Converter: Convert tables from PDF and image files to Microsoft Excel spreadsheets.
  9. PDF to HTML Converter: Convert your PDFs to high quality reflowed HTML while preserving styles, tables, etc.
10. Table Recovery: Superior reconstruction of bordered and borderless tables as table objects, with formatting, in Word & HTML.
Input formats:
  1. Text based PDF files
  2. Scanned PDF files
  3. Scanned single page and multi-page TIFF files
  4. Scanned JPEG, PNG, BMP, GIF, PCX, TGA, PBM, PNM, PPM files
Output formats:
  1. Plain text files without layout
  2. Plain text files with layout
  3. Plain text based PDF files (PDF is contain text only)
  4. Attach OCRed text layer to original PDF file
  5. OCRed BW PDF files with hidden text layer
  6. OCRed Color PDF files with hidden text layer
  7. OCRed Grayscale PDF files with hidden text layer
  8. Output to TIFF, PNG, BMP, TGA, GIF with Deskew, Despeckle, etc. options
  9. Scanned PDF, TIFF and image files to RTF format
10. Scanned PDF, TIFF and image files to DOC format
11. Scanned PDF, TIFF and image files to Tab Text format
12. Scanned PDF, TIFF and image files to CSV format
13. Scanned PDF, TIFF and image files to MS Excel format
14. Scanned PDF, TIFF and image files to HTML format
15. Extract X1, Y1, X2, Y2 coordinates for each character
16. Extract X1, Y1, X2, Y2 coordinates for each Word
-------------------------------------------------------
Usage: ocr2any.exe [options] <PDF-file> <Text-file>
  -firstpage <int>  : first PDF page to convert
  -lastpage <int>   : last PDF page to convert
  -res <int>        : set resolution, the unit is DPI (default is 300 dpi)
  -ownerpwd <string>: set owner password for encrypted PDF file
  -userpwd <string> : set user password for encrypted PDF file
  -layout           : maintain original physical layout
  -layout2          : pdf to table conversion with Best Column Alignment
  -table            : same as -layout2
  -pdf2table        : same as -layout2
  -noc              : don't insert page breaks 0x0C between pages in text file
  -bitcount <int>   : set color depth when render PDF page to image data, it can be set 1, 8, 24, default is 8bit
  -rotate <int>     : rotate pages before OCR
  -threshold <int>  : lightness threshold that used to convert image to B&W, from 1 to 255, 0 is auto, default is -1
  -imageopt         : deskew and despeckle images automatically
  -dither <int>     : convert the color image to B&W using the desired method:
    -dither 0: Floyd-Steinberg
    -dither 1: Ordered-Dithering (4x4)
    -dither 2: Burkes
    -dither 3: Stucki
    -dither 4: Jarvis-Judice-Ninke
    -dither 5: Sierra
    -dither 6: Stevenson-Arce
    -dither 7: Bayer (4x4 ordered dithering)
  -resizewidth <int>      : resize the image's width, only availalbe when -resizeheight used
  -resizeheight <int>     : resize the image's height, only availalbe when -resizewidth used
  -flip                   : flip the image vertically
  -mirror                 : mirror the image horizontally
  -ocr                    : enable OCR function for scanned PDF file
  -lang <string>          : choose the language for OCR engine
  -ocrmode <int>          : set OCR mode
    -ocrmode 0: output to text file
    -ocrmode 1: OCR PDF pages and insert new text layer under original PDF pages
    -ocrmode 2: output to plain text based PDF file
    -ocrmode 3: output to OCRed PDF file (BW) with hidden text layer
    -ocrmode 4: output to OCRed PDF file (Color) with hidden text layer
  -text <string>          : add additional text at end of each text page, this parameter supports the following variables:
    %PageNumber%: current page number
    %PageCount% : total page count of PDF file
  -outboxfile             : output [X, Y, Width, Height] information for each word when OCR
  -producer <string>      : Set 'producer' to output PDF file
  -creator <string>       : Set 'creator' to output PDF file
  -subject <string>       : Set 'subject' to output PDF file
  -title <string>         : Set 'title' to output PDF file
  -author <string>        : Set 'author' to output PDF file
  -keywords <string>      : Set 'keywords' to output PDF file
  -ownerpwdout <string>   : Set 'owner password' to output PDF file
  -openpwdout <string>    : Set 'open password' to output PDF file
  -keylen <int>           : Key length (40 or 128 bit)
        -keylen 0:  40 bit RC4 encryption (Acrobat 3 or higher)
        -keylen 1: 128 bit RC4 encryption (Acrobat 5 or higher)
        -keylen 2: 128 bit RC4 encryption (Acrobat 6 or higher)
  -encryption <int>     : Restrictions
        -encryption    0: Encrypt the file only
        -encryption 3900: Deny anything
        -encryption    4: Deny printing
        -encryption    8: Deny modification of contents
        -encryption   16: Deny copying of contents
        -encryption   32: No commenting
        ===128 bit encryption only -> ignored if 40 bit encryption is used
        -encryption  256: Deny FillInFormFields
        -encryption  512: Deny ExtractObj
        -encryption 1024: Deny Assemble
        -encryption 2048: Disable high res. printing
        -encryption 4096: Do not encrypt metadata
  -ocr2                   : use enhanced OCR module to convert scanned PDF and image files to PDF, RTF, DOC, TXT, XLS, CSV, Excel, HTML files
  -ocr2aor  : detect page direction and rotate it automatically when -ocr2 used
  -ocr2autorotate         : same as -ocr2aor
  -ocr2excelmode <int>    : set output Excel format when -ocr2 used
    -ocr2excelmode 0: One big sheet + All page sheets
    -ocr2excelmode 1: All page sheets
    -ocr2excelmode 2: One big sheet, default mode
  -dumpcharpos     : Output to a Text file with coordinates for each character
  -dumpwordpos     : Output to a Text file with coordinates for each word
  -outputformat <int>     : the format of output document, default is controlled by extension name
    -outputformat    1: output to RTF format
    -outputformat    2: output to ASCII format
    -outputformat    3: output to ASCIILB format, center some text lines
    -outputformat    4: output to 123V2 format
    -outputformat    5: output to AMIPRO1_2 format
    -outputformat    6: output to COMMAASCII format
    -outputformat    7: output to EXCELV2 format
    -outputformat    8: output to SMARTASCII format
    -outputformat    9: output to WORDWIN format, same as RTF format
    -outputformat   10: output to WP50 format
    -outputformat   11: output to WP51 format
    -outputformat   12: output to NATIVE format
    -outputformat   13: output to NATIVE_TEXT format
    -outputformat   14: output to TABASCII format
    -outputformat   15: output to HTML format
    -outputformat 8888: output to plain text based PDF format
    -outputformat 8889: output to plain text file with original layout
    -outputformat 8890: output to plain HTML file with absolute position
    -outputformat 8891: output to CSV file with perfect columns
  -outfmt <int>           : same as -outputformat
  -gendebugimage          : Generate debug image file
  -delblankpages          : Delete blank pages from PDF file
  -linewidth <int>        : Remove black borders which width less than this value, default is 8
  -specklesize <int>      : Remove the speckles which size less than this value, default is 20
  -$ <string>             : input your License Key
Examples:
  ocr2any.exe C:\in.pdf C:\out.txt
  ocr2any.exe -firstpage 1 -lastpage 1 C:\in.pdf C:\out.txt
  ocr2any.exe -ocr -res 300 C:\in.pdf C:\out.txt
  ocr2any.exe -ownerpwd 123 -userpwd 456 C:\in.pdf C:\out.txt
  ocr2any.exe -layout C:\in.pdf C:\out.txt
  ocr2any.exe -layout2 C:\in.pdf C:\out.txt
  ocr2any.exe -table C:\in.pdf C:\out.txt
  ocr2any.exe -pdf2table C:\in.pdf C:\out.txt
  ocr2any.exe -noc C:\in.pdf C:\out.txt
  ocr2any.exe C:\in.tif C:\out.txt
  ocr2any.exe C:\in.jpg C:\out.txt
  ocr2any.exe C:\in.bmp C:\out.txt
  ocr2any.exe C:\in.png C:\out.txt
  ocr2any.exe -ocr -lang eng C:\in.pdf C:\out.txt
  ocr2any.exe -ocr -lang eng+kor C:\in.pdf C:\out.txt
  ocr2any.exe -ocr -lang eng+jpn C:\in.pdf C:\out.txt
  ocr2any.exe -ocr -bitcount 1 C:\in.pdf C:\out.txt
  ocr2any.exe -ocr -bitcount 8 C:\in.pdf C:\out.txt
  ocr2any.exe -ocr -bitcount 24 C:\in.pdf C:\out.txt
  ocr2any.exe -ocr -lang deu C:\in.pdf C:\out.txt
  ocr2any.exe -lang deu C:\in.tif C:\out.txt
  ocr2any.exe -text "PageText %PageNumber% of %PageCount%" C:\in.pdf C:\out.txt
  ocr2any.exe -subject "subject" C:\in.pdf C:\out.pdf
  ocr2any.exe -ownerpwdout 123 -keylen 2 -encryption 3900 C:\in.pdf C:\out.pdf
  ocr2any.exe -subject "subject" -title "title" C:\in.pdf C:\out.pdf
  ocr2any.exe -ocr -lang eng -ocrmode 0 C:\in.pdf C:\out.txt
  ocr2any.exe -ocr -lang deu -ocrmode 1 C:\in.pdf C:\out.pdf
  ocr2any.exe -ocr -lang eng -ocrmode 2 C:\in.pdf C:\out.pdf
  ocr2any.exe -ocr -lang eng -ocrmode 3 C:\in.pdf C:\out.pdf
  ocr2any.exe -ocr -lang eng -ocrmode 2 -outboxfile C:\in.pdf C:\out.pdf
  ocr2any.exe -ocr -lang fra -ocrmode 1 C:\in.pdf C:\out.pdf
  ocr2any.exe -ocr -lang ita -ocrmode 1 C:\in.pdf C:\out.pdf
  ocr2any.exe -ocr -lang nld -ocrmode 1 C:\in.pdf C:\out.pdf
  ocr2any.exe -ocr -lang spa -ocrmode 1 C:\in.pdf C:\out.pdf
  ocr2any.exe -bitcount 24 -ocrmode 4 -ocr C:\in.pdf C:\out.pdf
  ocr2any.exe -bitcount 8 -ocrmode 4 -ocr C:\in.pdf C:\out.pdf
  ocr2any.exe -ocrmode 4 -ocr C:\in.tif C:\out.pdf
  ocr2any.exe -ocrmode 3 -threshold 200 -ocr C:\in.tif C:\out.pdf
  ocr2any.exe -ocrmode 4 -rotate 90 -ocr C:\in.tif C:\out.pdf
  ocr2any.exe -ocr -lang jpn -ocrmode 4 -bitcount 24 -threshold 240 -res 200 C:\in.pdf C:\out.pdf
  ocr2any.exe -ocr -lang chi_sim -ocrmode 4 -threshold 240 -res 200 C:\in.pdf C:\out.pdf
  ocr2any.exe -ocr -lang chi_tra -ocrmode 4 -threshold 240 -res 200 C:\in.pdf C:\out.pdf
  ocr2any.exe -ocr -lang chi_sim+eng -ocrmode 4 -threshold 240 -res 200 C:\in.pdf C:\out.pdf
  ocr2any.exe -ocr -lang chi_sim+deu -ocrmode 4 -threshold 240 -res 200 C:\in.pdf C:\out.pdf
  ocr2any.exe -delblankpages D:\test.pdf D:\out.pdf
  ocr2any.exe -delblankpages -linewidth 8 D:\test.pdf D:\out.pdf
  ocr2any.exe -delblankpages -specklesize 20 D:\test.pdf D:\out.pdf

Use Enhanced OCR options:
  ocr2any.exe -ocr2 -ocr2aor C:\in.tif C:\out.rtf
  ocr2any.exe -ocr2 -ocr2aor C:\in.tif C:\out.doc
  ocr2any.exe -ocr2 -ocr2aor C:\in.tif C:\out.xls
  ocr2any.exe -ocr2 -ocr2aor C:\in.pdf C:\out.rtf
  ocr2any.exe -ocr2 -ocr2aor C:\in.pdf C:\out.doc
  ocr2any.exe -ocr2 -ocr2excelmode 0 C:\in.pdf C:\out.xls
  ocr2any.exe -ocr2 -ocr2excelmode 1 C:\in.pdf C:\out.xls
  ocr2any.exe -ocr2 -ocr2excelmode 2 C:\in.pdf C:\out.xls
  ocr2any.exe -ocr2 C:\in.pdf C:\out.doc
  ocr2any.exe -ocr2 C:\in.pdf C:\out.rtf
  ocr2any.exe -ocr2 C:\in.png C:\out.xls
  ocr2any.exe -ocr2 C:\in.tif C:\out.csv
  ocr2any.exe -ocr2 C:\in.bmp C:\out.txt
  ocr2any.exe -ocr2 C:\in.gif C:\out.htm
  ocr2any.exe -ocr2 C:\in.pdf C:\out.html
  ocr2any.exe -ocr2 D:\temp\*.pdf D:\temp\*.html
  ocr2any.exe -ocr2 D:\temp\*.pdf D:\temp\*.doc
  ocr2any.exe -ocr2 C:\in.pdf C:\out.rtf
  ocr2any.exe -ocr2 -lang deu C:\in.pdf C:\out.doc
  ocr2any.exe -ocr2 -lang deu C:\in.pdf C:\out.xls
  ocr2any.exe -ocr2 -dumpcharpos C:\in.pdf C:\out.txt
  ocr2any.exe -ocr2 -dumpwordpos C:\in.pdf C:\out.txt
  ocr2any.exe -ocr2 -dumpcharpos C:\in.pdf C:\out.rtf
  ocr2any.exe -ocr2 -dumpwordpos C:\in.pdf C:\out.rtf
  ocr2any.exe -ocr2 C:\in.pdf C:\text.pdf
  ocr2any.exe -ocr2 C:\in.tif C:\out.pdf
  ocr2any.exe -ocr2 C:\in.png C:\out.pdf
  ocr2any.exe -ocr2 C:\in.jpg C:\out.pdf
  ocr2any.exe -ocr2 C:\in.tif C:\out.doc
  ocr2any.exe -ocr2 C:\in.tif C:\out.rtf
  ocr2any.exe -ocr2 C:\in.tif C:\out.txt
  ocr2any.exe -ocr2 C:\in.tif C:\out.xls
  ocr2any.exe -ocr2 -ocr2autorotate C:\in.tif C:\out.pdf
  ocr2any.exe -ocr2 -ocr2autorotate C:\in.tif C:\out.doc
  ocr2any.exe -ocr2 -outputformat 1 C:\in.tif C:\out.rtf
  ocr2any.exe -ocr2 -outputformat 2 C:\in.tif C:\out.txt
  ocr2any.exe -ocr2 -outputformat 3 C:\in.tif C:\out.txt
  ocr2any.exe -ocr2 -outputformat 6 C:\in.tif C:\out.txt
  ocr2any.exe -ocr2 -outputformat 7 C:\in.tif C:\out.xls
  ocr2any.exe -ocr2 -outputformat 8 C:\in.tif C:\out.txt
  ocr2any.exe -ocr2 -outputformat 9 C:\in.tif C:\out.doc
  ocr2any.exe -ocr2 -outputformat 13 C:\in.tif C:\out.txt
  ocr2any.exe -ocr2 -outputformat 14 C:\in.tif C:\out.txt
  ocr2any.exe -ocr2 -outputformat 15 C:\in.tif C:\out.html
  ocr2any.exe -ocr2 -dumpcharpos -dumpwordpos -outputformat 8888 C:\in.tif C:\out.pdf
  ocr2any.exe -ocr2 -dumpcharpos -dumpwordpos -outputformat 8889 C:\in.tif C:\out.txt
  ocr2any.exe -ocr2 -dumpcharpos -dumpwordpos -outputformat 8890 C:\in.tif C:\out.html
  ocr2any.exe -ocr2 -dumpcharpos -dumpwordpos -outputformat 8891 C:\in.tif C:\out.csv

Process image files with Deskew, Despeckle and Noise Removal, Black Border Remova options:
  ocr2any.exe -imageopt C:\in.tif C:\out.tif
  ocr2any.exe -imageopt -rotate 45 C:\in.png C:\out.tif
  ocr2any.exe -imageopt -rotate 90 C:\in.png C:\out.tif
  ocr2any.exe -imageopt -threshold 0 C:\in.tif C:\out.bmp
  ocr2any.exe -threshold 240 C:\in.tif C:\out.bmp
  ocr2any.exe -dither 0 C:\in.bmp C:\out.png
  ocr2any.exe -dither 7 C:\in.bmp C:\out.png
  ocr2any.exe -imageopt -resizewidth 800 -resizeheight 600 C:\in.gif C:\out.tga
  ocr2any.exe -imageopt -flip C:\in.png C:\out.gif
  ocr2any.exe -imageopt -mirror C:\in.tif C:\out.pcx
  ocr2any.exe -imageopt C:\in.bmp C:\out.tif

Following command line will OCR all PDF files in D:\temp\ folder to text files:
  for %F in (D:\temp\*.pdf) do ocr2any.exe -ocr -lang deu "%F" "%~dpnF.txt"

Following command line will OCR all PDF files in D:\temp\ folder and subdirectories to text files:
  for /r D:\temp %F in (*.pdf) do ocr2any.exe -ocr "%F" "%~dpnF.txt"

Following command line will OCR all PDF files from D:\temp\ folder and output text files to C:\test folder:
  for %F in (D:\temp\*.pdf) do ocr2any.exe -ocr "%F" "C:\test\%~nF.txt"

Following command lines will use Enhanced OCR options:
  for %F in (D:\temp\*.pdf) do ocr2any.exe -ocr2 -lang deu "%F" "%~dpnF.txt"
  for %F in (D:\temp\*.pdf) do ocr2any.exe -ocr2 -lang eng "%F" "%~dpnF.doc"
  for %F in (D:\temp\*.tif) do ocr2any.exe -ocr2 "%F" "%~dpnF.doc"
  for %F in (D:\temp\*.tif) do ocr2any.exe -ocr2 -ocr2autorotate "%F" "%~dpnF.xls"
  for /r D:\temp %F in (*.pdf) do ocr2any.exe -ocr2 "%F" "%~dpnF.rtf"
  for %F in (D:\temp\*.pdf) do ocr2any.exe -ocr2 "%F" "C:\test\%~nF.html""
  ocr2any.exe -ocr2 D:\temp\*.tif D:\temp\*.html
  ocr2any.exe -ocr2 -ocr2excelmode 0 D:\temp\*.pdf D:\temp\*.xls
  ocr2any.exe -ocr2 D:\temp\*.png D:\temp\*.rtf
  ocr2any.exe -ocr2 D:\temp\*.tif D:\temp\*.csv
  ocr2any.exe -ocr2 D:\temp\*.pdf D:\temp\*.doc

OCR to Any Converter Command Line v6.0 has a new table analysis function, you can run following command line to generate a debug image file with table analysis result,

ocr2any.exe -ocr2 -dumpcharpos -dumpwordpos -outputformat 8891 -gendebugimage E:\OCR\test_table_ocr2.tif D:\downloads\_out.csv

Even if if your image doesn't contain the table lines, ocr2any.exe is still able to find out the table and cells,

image

We suggest you may download the trial version from our website for evaluation,

http://www.verypdf.com/app/ocr-to-any-converter-cmd/try-and-buy.html#buy
http://www.verypdf.com/dl2.php/ocr2any_cmd.zip

If you encounter any problem, please feel free to let us know,

http://support.verypdf.com/open.php

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

This entry was posted in OCR Products and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!