Hyphenation algorithm

Hyphenation algorithm

A hyphenation algorithm is a set of rules (especially one codified for implemention in a computer program) that decides at which points a word can be broken over two lines with a hyphen. For example, a hyphenation algorithm might decide that "impeachment" can be broken as "impeach-ment" or "im-peachment", but not, say, as "impe-achment".

One of the reasons for the complexity of the rules of word-breaking is that different 'dialects' of English tend to differ on the rule: American English tends to work on sound, while British English tends to look to the origins of the word and then to sound. There are also a large number of exceptions which further complicates matters.

Some rules of thumb can be found in the reference 'On Hyphenation - Anarchy of Pedantry.' Among algorithmic approaches to hyphenation, the one implemented in the TeX typesetting system is widely used. It is thoroughly documented in the first two volumes of
Computers and Typesetting and in Frank Liang's dissertation. Contrary to the belief that TeX relies on a large dictionary of exceptions, the point of Liang's work was to get the algorithm as accurate as he practically could and keep any exception dictionary small. In TeX's original hyphenation patterns for US English, the exception list contains fourteen words.

Ports of the TeX hyphenation algorithm are available as libraries for several programming languages, including Perl, Ruby, and PostScript.

References

*cite web | title=On Hyphenation - Anarchy of Pedantry | work=PC Update, the magazine of Melbourne PC User Group, Australia | url=http://www.melbpc.org.au/pcupdate/9100/9112article4.htm | accessdate=October 6 | accessyear=2005
*cite paper | author=Liang, Franklin Mark | title=Word Hy-phen-a-tion by Com-put-er
publisher=Stanford University | date=1983
url=http://www.tug.org/docs/liang/

*cite web | title= TeX-Hyphen | work=Comprehensive Perl Archive Network
url=http://www.cpan.org/modules/by-module/TeX/ | accessdate=October 18 | accessyear=2005

*cite web | title= text-hyphen | work=RubyForge
url=http://rubyforge.org/frs/?group_id=294 | accessdate=October 18 | accessyear=2005

*cite web | title= Knuth-Liang hyphenation for the PostScript language | work=anastigmatix.net | url=http://www.anastigmatix.net/postscript/Hyphenate.html | accessdate=October 6 | accessyear=2005
*cite web | title= TeXHyphenator-J: TeX Hyphenator in Java | url=http://texhyphj.sourceforge.net/ | accessdate=September 14 | accessyear=2006
*cite web | title= Hyphenation in Python, using Frank Liang's algorithm | url=http://www.nedbatchelder.com/code/modules/hyphenate.py | accessdate=July 10 | accessyear=2007
*cite web | title= Hyphenator.js-Hyphenation in JavaScript, using Frank Liang's algorithm | url=http://code.google.com/p/hyphenator/ | accessdate=January 3 | accessyear=2008


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Hyphenation by algorithm — Перенос (слова) по алгоритму …   Краткий толковый словарь по полиграфии

  • TeX — infobox software name = TeX developer = Donald Knuth latest release version = 3.1415926 latest release date = March 2008 operating system = Cross platform genre = Typesetting license = Permissive website = http://www.tug.org/TeX (pronEng|ˈtɛx, as …   Wikipedia

  • Word wrap — or line wrap is the feature, supported by most text editors, word processors, and web browsers, of automatically replacing some of the blank spaces between words by line breaks, such that each line fits in the viewable window, allowing text to be …   Wikipedia

  • Liste der Dateiendungen/H — In dieser Liste sind übliche Dateinamenserweiterungen aufgelistet, die in einigen Betriebssystemen (wie zum Beispiel Microsoft Windows) zur Unterscheidung von Dateiformaten verwendet werden. In anderen Betriebssystemen erfolgt die… …   Deutsch Wikipedia

  • Design Patterns — Not to be confused with the concept of a Design pattern. Design Patterns: Elements of Reusable Object Oriented Software …   Wikipedia

  • Trie — A trie for keys A , to , tea , ted , ten , i , in , and inn . In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search… …   Wikipedia

  • Spell checker — In computing, a spell checker is an applications program that flags words in a document that may not be spelled correctly. Spell checkers may be stand alone capable of operating on a block of text, or as part of a larger application, such as a… …   Wikipedia

  • Windows Presentation Foundation — This subsystem is a part of .NET Framework 3.0 Developed by Microsoft, the Windows Presentation Foundation (or WPF) is a computer software graphical subsystem for rendering user interfaces in Windows based applications. WPF, previously known as… …   Wikipedia

  • Sentence spacing — Double sentence spaced typewriter text (1946) vs. single sentence spaced typeset text (1979) Sentence spacing is the horizontal space between sentences in typeset text. It is a matter of typographical convention …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”