Home
PDF2TXT Sample
Support Document
Component Articles
Products
DownloadsWhat is OCR (Optical Character Recognition)?
Optical Character Recognition
(OCR) is a process of converting printed materials into text or word processing
files that can be easily edited and stored. The technology has enabled such
materials to be stored using much less storage space than the hard copy
materials. OCR technology has made a huge impact on the way information is
stored, shared and edited. Prior to optical character recognition, if someone
wanted to turn a book into a word processing file, each page would have to be
typed word for word.
OCR technology requires both hardware and software. In addition, sophisticated
OCR systems require an additional circuit board in the computer itself to
complete the process. An optical scanner scans the text on a page, then breaks
the fonts down into a series of dots called a bitmap. The software can read most
common fonts and distinguish where lines start and stop. This bitmap is then
translated into computer text.
While optical character recognition has made huge advances in recent years, it
still does not always perform well in recognizing handwriting or fonts that look
similar to handwriting. There are systems within the banking industry that use
OCR technology to try to read the amounts on hand-written checks, to go along
with the computer's ability to read the routing and account numbers.
To give an idea of the power of OCR, it can help to take a look at a real-world
example. Imagine a police department that has all its criminal records stored in
vast file cabinets. Although scanning millions of pages would be an expensive
and time-consuming undertaking, the benefits are huge.
Once the OCR system has converted the pages into computer-readable text, a
detective, for example, could search through the entire history in a few
seconds. Manually finding a particular record might not be too difficult, but
imagine a detective trying to search for all the crimes committed on a certain
intersection between 8:00 and 8:30. This example only scratches the surface of
the power of searchable text, and it is only one reason that many companies and
institutions are spending millions of dollars to OCR their legacy data.
PDF to Text OCR Converter:
Convert scanned PDF and image files to plain text files.
See Also:
What is OCR?
What is OCR? OCR Technology
PDF to Text OCR Converter:
Convert scanned PDF and image files to plain text files.
PDF to HTML
Converter: Convert PDF files to HTML documents.
PDF to Text
Converter: Convert PDF files to plain text files.
PDF to
Vector Converter: Convert PDF files to PS, EPS, WMF, EMF, XPS, PCL, HPGL,
SWF, SVG, etc. vector files.
PDF to Image
Converter: Convert PDF files to TIF, TIFF, JPG, GIF, PNG, BMP, EMF, PCX, TGA
formats.
DocConverter COM
Component (+HTML2PDF.exe): Convert HTML, DOC, RTF, XLS, PPT, TXT etc.
files to PDF files, it is depend on
PDFcamp Printer
product.
Image to
PDF Converter: Convert 40+ image formats to PDF files.
HTML
Converter: Convert HTML files to TIF, TIFF, JPG, JPEG, GIF, PNG, BMP, PCX,
TGA, JP2 (JPEG2000), PNM, etc. formats.
More PDF Products
Home |
Products |
Downloads |
Support |
Links | Contact
Copyright © 2000- VeryPDF.com, Inc. All rights reserved.
Send comments about this site to the webmaster.