- HOCR (software)
Infobox Software
name = HOCR
caption =
author = Yaacov Zamir
developer =
released =
latest release version =
latest release date =
latest preview version =
latest preview date =
programming language = C, Python andC++
operating system =Linux and (unofficially)Mac OS X
platform =
language =
status =
genre =Optical character recognition
license =GPL v3.0
website = http://hocr.berlios.deIncomputer software , HOCR is a freeHebrew optical character recognition software. It is based on the libhocr Hebrew optical character recognition engine. Logo graphics by Shlomi Israel.About the libhocr OCR Engine
libhocr is a
GNU Hebrew optical character recognition engine. It is designed for use with old yellow stained Hebrew poetry and religious texts. libhocr includes an image processing unit to remove yellow stains and fix page images. libhocr can understand complex page layouts frequent in old religious texts (Talmud pages). libhocr can read and understandNikud , understandingNikud is essential for Hebrew poetry optical character recognition.libhocr can use the
GTK toolkit to load images. It can load png, jpeg, tiff, bmp, pnm and any other image format supported byGTK . libhocr can automatically fix stained, dark, bright and rotated images.libhocr outputs the recognized text using UTF-8 encoding. It can output the text as plain text or using
Google 's hocrhtml format for OCR output.User interfaces
HOCR includes two user interfaces. A graphical user interface and a command line tool.
* hocr-gtk is a graphical user interface built using
GTK and Python. It is a simple easy to use user interface. Interface designed by Yuval Tanny.hocr can process old yellow stained images and rotated texts.
hocr can undestand texts with
Nikud .* hocr is a command line tool. It is a more powerful tool designed for automation of the OCR process.
See also
*
Document Layout Analysis
Wikimedia Foundation. 2010.