Homoglyph

Homoglyph

In typography, a homoglyph is one of two or more characters with shapes that are either identical, or cannot be differentiated by quick visual inspection. This designation is also applied to sequences of characters sharing these properties. The antonym is a synoglyph, which refers to glyphs that look different but mean the same thing. Synoglyphs are also known as "display variants". Synoglyphs are the equivalent of synonyms - words that mean the same thing.

The term homograph is sometimes used synonymously with homoglyph, but it must be noted that the typographic sense of this term is not included in the definition normally applied in linguistic discourse. In that context, homography is a property of words, not characters, and homographs are a type of homonym. References to characters in terms of the similarity of their appearance might therefore best be made without reliance on specialized vocabulary, for example, as 'seemingly identical', 'visually similar', 'visually confusable' or 'look-alike' characters. The Unicode Consortium has recently published its Technical Report #36 [http://www.unicode.org/reports/tr36/] on a range of issues deriving from the visual similarity of characters both in single scripts, and similarities between characters in different scripts.

Zero and O

Two common and important pairs of homoglyphs in use today are the digit zero and the capital letter O (i.e. 0 & O); and the digit one and the lowercase letter L (i.e. 1 & l). In the days of mechanical typewriters there was very little or no visual difference between these glyphs (some even omitted 1 and 0 completely), and typists treated them interchangeably as keyboarding shortcuts. (In fact, most keyboards did not even have a "1" key, requiring users to use an l instead.) As these same typists transitioned in the 1970s and 1980s to being computer keyboard operators, their old keyboarding habits betrayed them and became a source of great confusion. Ensuring these two pairs of homoglyphs are never confused is very important. Most current type designs carefully distinguish them, usually by drawing the digit zero narrower and by drawing the digit one with prominent serifs. Early computer print-outs went even further and marked the zero with a slash or dot — leading to a new conflict with the Scandinavian letter "Ø." The re-drawing of type designs to split these homoglyphs, combined with the passing of keyboard operators trained on mechanical typewriters has seen the prevalence of these particular homoglyph typos greatly diminish.

I, l and 1

Despite the splitting of O from 0, lowercase L often resembles the digit 1 in serif fonts (l & 1) and capital I in sans-serif fonts (l & I).

Multi-letter homoglyphs

Some other combinations of letters look similar, for instance rn looks similar to m and vv looks similar to w.

In certain narrow-spaced fonts (such as Tahoma), placing the letter c next to a letter such as j, l or i will create a homoglyph, such as cj cl ci (g d a).

Some typographic ligatures can look similar to standalone glyphs: for example, the fi ligature () can look similar to A in some typefaces. This potential for confusion is sometimes an argument made against the use of ligatures.

Unicode homoglyphs

The Unicode character set contains many strongly homoglyphic characters. These present security risks in a variety of situations (addressed in UTR#36) and have recently been called to particular attention in regard to internationalized domain names. One might deliberately spoof a domain name by substituting one character with its homograph, thus creating a second domain name, not readily distinguishable from the first, that can be exploited in phishing ("see main article IDN homograph attack"). In many fonts the Greek letter 'Α', the Cyrillic letter 'А' and the Latin letter 'A' are visually identical, as are the Latin letter 'a' and the Cyrillic letter 'а'. A domain name can be spoofed simply by substituting one of these forms for another in a separately registered name. There are also many examples of near-homoglyphs within the same script such as 'í' (with an acute accent) and 'i'. When discussing this specific security issue, any two sequences of similar characters may be assessed in terms of its potential to be taken as a 'homoglyph pair', or if the sequences clearly appear to be words, as 'pseudo-homographs' (noting again that these terms may themselves cause confusion in other contexts).

Efforts are underway by TLD registries and Web browser designers to minimize the risks of homoglyphic confusion to the fullest extent possible. Relevant documentation will be found both on the developers' Web sites, and on an IDN Forum [http://icann.org/announcements/announcement-20sep05.htm] provided by ICANN.

A manifestation of homoglyphic confusion in a historical regard results from the use of a 'y' to represent a 'þ' when setting older English texts in typefaces that do not contain the latter character. This has led in modern times to such phenomena as "Ye olde shoppe" - implying that the word "the" was formerly written "ye" (and pronounced IPA|jiː). For further discussion: thorn.

ee also

*Duplicate characters in Unicode


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Homoglyph — Karlgeorg Hoefer verlieh der Null mit der FE (fälschungserschwert) Mittelschrift deutschen KFZ Kennzeichen zur Vermeidung von Verwechslungen rechts oben eine Öffnung. Homoglyphen sind ähnlich oder gleich aussehende Schriftzeichen. Ihre… …   Deutsch Wikipedia

  • IDN homograph attack — The internationalized domain name (IDN) homograph attack is a means by which a malicious party may seek to deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters may have …   Wikipedia

  • Bi-directional text — is text containing text in both text directionalities, both right to left (RTL) and left to right (LTR). It generally involves text containing different types of alphabets, but may also refer to boustrophedon, which is changing text… …   Wikipedia

  • False friend — False friends (or faux amis ) are pairs of words in two languages or dialects (or letters in two alphabets) that look and/or sound similar, but differ in meaning.False cognates, by contrast, are similar words in different languages that appear to …   Wikipedia

  • Unicode — For the 1889 Universal Telegraphic Phrase book, see Commercial code (communications). The Unicode official logo since October 2009 …   Wikipedia

  • Homonym — In linguistics, a homonym is one of a group of words that share the same pronunciation but have different meanings, and are usually spelled differently. Some sources only require that homonyms share the same spelling or pronunciation (in addition …   Wikipedia

  • Homograph — A homograph is one of a group of words that share the same spelling but have different meanings. When spoken, the meanings are sometimes, but not necessarily, distinguished by different pronunciations. A homograph can be either a homonym or a… …   Wikipedia

  • ß — The letter ß (Unicode U+00DF) is a letter in the German alphabet. Its German name is Eszett (IPA2|ɛsˈtsɛt, lexicalized expression for sz) or scharfes S (sharp S), and is pronounced as an unvoiced s (IPA2|s). Origin in Blackletter as ligature of… …   Wikipedia

  • Ë — (e umlaut or diaeresis) is a letter of Albanian and Kashubian language. This letter also appears in Afrikaans, Dutch, French and Luxembourgish language as a variant of letter “e”. The letter also appears in Turoyo when written in Latin… …   Wikipedia

  • UTF-7 — (7 bit Unicode Transformation Format) is a variable length character encoding that was proposed for representing Unicode text using a stream of ASCII characters. It was originally intended to provide a means of encoding Unicode text for use in… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”