- Document Layout Analysis
-
Document Layout Analysis is a part of Computer Vision indicating the process of identifying and categorizing the regions of interest in a document image, e.g. a scanned page. A reading system requires the segmentation of text zones from non-textual ones and the arrangement in their correct reading order.[1] Detection and labeling of the different zones (or blocks) as text body, pictures, math symbols, and tables embedded in a document is called geometric layout analysis. But text zones play different logical roles inside the document (titles, captions, footnotes, etc.) and this kind of semantic labeling is the scope of the logical layout analysis.
Document layout analysis is the union of geometric and logical labeling. It is typically performed before a document image is sent to an OCR engine, but it can be used also to detect duplicate copies of the same document in large archives, or to index documents by their structure or pictorial content.
Document layout is formally defined in the international standard ISO 8613-1:1989.[2]
Contents
Layout Analysis Software
See also
External links
- High Performance Document Layout Analysis by Thomas M. Breuel, at PARC, Palo Alto, CA, USA
- Geometric Layout Analysis Techniques for Document Image Understanding: a Review, ITC-irst Technical Report TR#9703-09
Notes
Optical character recognition software Free software Proprietary software ExperVision · FineReader · Microsoft Office Document Imaging · OmniPage · Readiris · ReadSoft · SimpleOCR · SmartScore · VueScanSee also Categories:
Wikimedia Foundation. 2010.