List of optical character recognition software

List of optical character recognition software

An OCR SDK is a software development kit for adding optical character recognition capabilities to forms processing applications, document imaging management systems, e-discovery systems and records management solutions.

In order to avoid the difficulties of incorporating OCR technology, some OCR SDKs contain a high number of APIs, support multiple operating systems and programming languages.

Here is a non-exhaustive comparison of optical character recognition software:

Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Notes
ABBYY FineReader 1989 11 2011 Proprietary Yes Yes Yes No Yes C/C++ Yes 186[1] ? ABBYY also supplies SDKs for embedded or mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[2]
AnyDoc Software 1989 ? ? Proprietary No Yes No No No VBScript ? ? ? Works with structured, semi-structured, and unstructured documents.
CuneiForm/OpenOCR ? 12 2007 BSD variant No Yes Yes Yes Yes C/C++ Yes 28 Any printed font Enterprise-class system, can save text formatting and recognizes complicated tables of any structure
ExperVision TypeReader & RTK 1987 7.1.170.1125 2010 Proprietary Yes Yes Yes Yes Yes C/C++ Yes 17 2618 Won the highest marks in the independent testing performed by UNLV for X consecutive years (in 1994).[3][citation needed]


The speed of ExperVision’s OpenRTK is four to eight times faster than competition. — PC Magazine[citation needed] but also "Not as accurate as rival products, clumsy interface, limited options for proofreading, couldn't open some files in standard PDF or image formats." [4]PC Magazine

GOCR ? 0.47 2009 GPL No Yes Yes Yes Yes C ? ? ?
LEADTOOLS[5] 1990[6] 17 2010 Proprietary No Yes No No No various Yes 56[7] Any printed font Supports Latin, Asian, Arabic, and MICR character sets.[5] For full page, zonal, and form image processing. Includes OCR, barcode, OMR and forms recognition.[8] ICR (handwritten text recognition) is supported.[9]
Java OCR ? Java OCR 2010 {{{1}}} No Yes No No No ? ? ? ? Uses Java[citation needed]
Microsoft Office Document Imaging ? Office 2007 2007 Proprietary No Yes No No No ? ? ? ? Uses OmniPage[citation needed]
Microsoft Office OneNote 2007 2007 ? 2007 Proprietary No Yes No No No ? ? ? ?
Ocrad ? 0.20 2010 GPL Yes Yes Yes Yes Yes C++ Yes Latin alphabet ? Command line
OCRopus ? 0.3.1 2008 Apache No No No Yes No C++ and Lua ? ? ? Pluggable framework which can use Tesseract
OCRFeeder ? 0.7.6 2009 GPL No No No Yes No Python ? ? ? Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad
OmniPage 2005 18 2011 Proprietary No Yes Yes No No C/C++/C#[10] Yes ? ? Product of Nuance Communications
Puma.NET ? ? ? BSD No Yes No No No C# Yes 28 Any printed font .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications
Readiris ? 12 Pro 2009 Proprietary No Yes Yes No No C++ Yes ? ? Product of I.R.I.S. Group of Belgium. Asian and Middle Eastern editions.
ReadSoft ? ? ? Proprietary No Yes No No No ? ? ? ? Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes.
RelayFax ? ? ? Proprietary No Yes No No No ? ? Many ? Converts faxed pages into editable document formats (doc, PDF, etc...).
Scantron ? Cognition ? ? Proprietary No Yes No No No ? ? ? ? For working with localized interfaces, corresponding language support is required.
SimpleOCR 2002 3.5 2008 Freeware and Commercial No Yes No No No ? ? ? ?
SmartScore ? ? ? Proprietary No Yes Yes No No ? ? ? ? For musical scores
Tesseract ? 3.00 2010 Apache No Yes[11] Yes Yes No C++, C ? 35+[12] ? Created by Hewlett-Packard; under further development by Google
Transym OCR ? 3.0 2008 Proprietary No Yes No No No C#, C/C++, VB, VB.NET Yes 11 ?
Zonal OCR ? ? ? Proprietary No Yes No No No ? ? ? ?
Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Notes

References


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Optical character recognition — Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine encoded text. It is widely used to convert books and documents into… …   Wikipedia

  • Optical mark recognition — (also called Optical Mark Reading and OMR) is the process of capturing human marked data from document forms such as surveys and tests. Contents 1 OMR background 2 OMR software 2.1 Open Source …   Wikipedia

  • CuneiForm (software) — CuneiForm Original author(s) Cognitive Technologies Developer(s) Cognitive Technologies Stable release 1.1 / April 19, 2011; 6 months ago (2011 04 19) …   Wikipedia

  • List of PDF software — This list of PDF software includes links to articles on computer software used to manage Portable Document Format (PDF) documents. Contents 1 Multi platform 1.1 Converters 1.2 Editors 1.3 …   Wikipedia

  • List of Google products — This page is a summary of services and tools provided by Google Inc. For other uses, see Google (disambiguation). This list of Google products includes all major desktop, mobile and online products released or acquired by Google Inc.. They are… …   Wikipedia

  • List of GNU packages — This list of GNU packages lists notable software packages developed for or maintained by the Free Software Foundation for GNU, a Unix like computer operating system composed entirely of free software. Many of these software packages are today… …   Wikipedia

  • List of IBM products — The following is a list of notable products from the International Business Machines (IBM) Corporation and its predecessor corporations, beginning in the 1890s, and spanning punched card machinery, time clocks, and typewriters, via mainframe… …   Wikipedia

  • List of ISO standards — This is a list of ISO standards that are discussed in Wikipedia articles. For a list of all the more than 16,000 ISO standards (as of 2007), see the [http://www.iso.org/iso/en/CatalogueListPage.CatalogueList ISO Catalogue] .About 300 of the… …   Wikipedia

  • Automatic number plate recognition — The system must be able to deal with different styles of license plates …   Wikipedia

  • Tesseract (software) — Infobox Software name = Tesseract caption = author = Ray Smith, Hewlett Packard cite web|url = http://code.google.com/p/tesseract ocr/|title = tesseract ocr|accessdate = 2008 07 12|last = Google|authorlink = |year = 2008] developer = Google… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”