IDN homograph attack

IDN homograph attack

The internationalized domain name (IDN) homograph attack is a means by which a malicious party may seek to deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters may have nearly (or wholly) indistinguishable glyphs.

Homographs

In multilingual computer systems, different logical characters may have identical or very similar appearances.For example, Unicode character U+0430, Cyrillic small letter a ("а"), can look identical to Unicode character U+0061, Latin small letter a, ("a") which is the lowercase "a" used in English. Technically, characters that look alike in this way are known as "homoglyphs" (a subgroup of homographs). Spoofing attacks based on these similarities are known as homograph spoofing attacks.

The problem arises from the different treatment of the characters in the user's mind and the computer's programming. From the viewpoint of the user, a Cyrillic "а" within a Latin string "is" a Latin "a"; there is no difference in the glyphs for these characters in most fonts. However, the computer treats them differently when processing the character string as an identifier. Thus, the user's assumption of a one-to-one correspondence between the visual appearance of a name, and the named entity, breaks down.

In a typical example of a hypothetical attack, someone could register a domain name that appears identical to an existing domain but goes somewhere else. For example, the spoofed domain "pаypal.com" contains a Cyrillic "a", not a Latin "a". In many ways, this is not a new thing. For example, even staying within the old character set of A-Z, 0-9 and hyphen, "G00GLE.COM" looks much like "GOOGLE.COM" in some fonts; or, using a mix of uppercase and lowercase characters, "googIe.com" (capital "i", not small "L") looks much like "google.com" in some fonts. PayPal itself was a target of a phishing scam exploiting this, using the domain PayPaI.com. Or, displaying characters in lowercase alone, "rnozilla.org" ("RNOZILLA.ORG") looks very much like "mozilla.org" in many fonts; similarly, in certain narrow-spaced fonts such as Tahoma (the default address bar in Windows XP), placing a c in front of a j, l or i will produce homoglyphs such as cl cj ci (d g a). A unique homograph issue is that of long s (ſ), which has long been confused with "f" but is recognized as "s" in URLs. What "is" new was that the expansion by the internationalized domain name system of the character repertoire from a few dozen characters in a single alphabet to many thousands of characters in many scripts greatly increased the scope for homograph attacks.

Homographs in internationalized domain names

The limitation of domain names to ASCII characters may not last forever, and is coming under pressure from organizations based in regions that do not use Latin characters. Internationalized domain names provides a backward-compatible way for domain names to use the full Unicode character set, and this standard is already widely supported.

For example, the Russian newspaper website [http://gazeta.ru/ gazeta.ru] may wish to use the URL газета.рф, reflecting the newspaper's name spelled in Cyrillic. The disadvantage in this example is that the Cyrillic letters 'а', 'е', and 'р' all strongly resemble (or are indistinguishable, depending on the font) the Latin letters 'a', 'e', and 'p' Some of these pairings (such as а-a) are of two letters that are close etymologically, while others look similar by coincidence. For instance, the Cyrillic letter 'р' represents a phoneme similar to the English 'r', but the glyph strongly resembles the Latin letter 'p' in most fonts.

This opens a rich vein of opportunities for phishing and other varieties of fraud. An attacker could register a domain name that "looks" just like that of a legitimate website, but in which some of the letters have been replaced by homographs in another alphabet. The attacker could then send e-mail messages purporting to come from the original site, but directing people to the bogus site. The spoof site could then record information such as passwords or account details, while passing traffic through to the real site. The victims may never notice the difference, until suspicious or criminal activity occurs with their accounts.

The following alphabets have characters that can be used for spoofing attacks (please note, these are only the most obvious and common, given artistic license and how much risk the spoofer will take of getting caught; the possibilities are far more numerous than can be listed here):

Cyrillic

Cyrillic, by far, is the most commonly used alphabet for homoglyphs, largely because it contains 10 lowercase glyphs that are identical (or nearly identical) to Latin counterparts. The following Cyrillic letters have optical counterparts in the basic Latin alphabet: асһеіјорѕху, which look close or identical to acheijopsxy, and Cyrillic З resembles the numeral 3. Italic type generates more homoglyphs: "тпи" (тпи in standard type), resembling mnu. Cyrillic ёї can also be used if an IDN itself is being spoofed, to fake ëï.

If capital letters are counted, ВНКМТ can substitute BHKMT, in addition to the capitals for the lowercase Cyrillic homoglyphs.

Greek

From the Greek alphabet, only omikron ο and sometimes nu ν qualify in the lowercase used for URLs. Fonts that are in italic type will feature Greek alpha "α" looking like a Latin "a".

This list increases if close matches are also allowed (such as Greek εικηρτυωχγ for eiknptuwxy). Using capital letters, the list expands greatly. Greek ΑΒΕΗΙΚΜΝΟΡΤΧΥΖ looks identical to Latin ABEHIKMNOPTXYZ.

If an IDN itself is being spoofed, Greek beta β can be a substitute for German esszet ß in some fonts (and in fact, code page 437 treats them as equivalent), as can Greek sigma ς for ç; accented Greek substitutes "όίά" can usually be used for "óíá" in many fonts, with the last of these (alpha) again only resembling "a" in italic type.

Armenian

Also the Armenian alphabet can contribute critical characters: ցհոօզս which look like ghnoqu, յ which resembles j (albeit dotless), and ք, which can either resemble p or f depending on the font. However, the use of Armenian is problematic. Most standard fonts do not feature the Armenian glyphs (whereas the Greek and Cyrillic scripts are in most standard fonts). Because of this, Windows normally renders Armenian in a distinct font, Sylfaen, which supports Armenian, and the mixing of Armenian with Latin will appear obviously different if using a font other than Sylfaen or a Unicode typeface. Furthermore, this font differentiates Latin g from Armenian ց.

Two letters in Armenian (Ձշ) also can resemble the number 2, while another (վ) sometimes resembles the number 4.

Hebrew

Hebrew spoofing is generally rare. Only two letters from that alphabet can reliably be used: samekh (ס), which sometimes resembles o, and vav with diacritic, וֹ, which resembles an i. Less accurate approximants for some other alphanumerics can also be found, but these are usually only accurate enough to use for the purposes of foreign branding and not for substitution. Furthermore, the Hebrew alphabet is written from right to left and trying to mix it with left-to-right glyphs may cause problems.

Defending against the attack

The simplest defense is for web browsers not to support IDNA or other similar mechanisms, or for users to turn off whatever support their browsers have. That could mean blocking access to IDNA sites, but generally browsers permit access and just display IDNs in Punycode. Either way, this amounts to abandoning non-ASCII domain names.

Firefox and Opera display punycode for IDNs unless the top-level domain (TLD, for example, .ac or .museum) prevents homograph attacks by restricting which characters can be used in domain names. [cite web |url=http://www.opera.com/support/search/view/788/ |title=Advisory: Internationalized domain names (IDN) can be used for spoofing. |accessdate=2007-02-24 |date=2005-02-25 |publisher=Opera] They both also allow users to manually add TLDs to the allowed list. [cite web |url=http://www.mozilla.org/projects/security/tld-idn-policy-list.html |title=IDN-enabled TLDs |accessdate=2006-11-30 |date=2006-08-07 |publisher=Mozilla] [cite web |url=http://www.opera.com/support/usingopera/operaini/#network |date=2006-12-18|title=Opera's Settings File Explained: IDNA White List |accessdate=2007-02-24 |publisher=Opera Software]

Internet Explorer 7 allows IDNs except for labels that mix scripts for different languages. Labels that mix scripts are displayed in punycode. There are exceptions to locales where ASCII characters are commonly mixed with localized scripts. [cite web |url=http://blogs.msdn.com/ie/archive/2006/07/31/684337.aspx |title=Changes to IDN in IE7 to now allow mixing of scripts |accessdate=2006-11-30 |last=Sharif |first=Tariq |date=2006-07-31 |work=IEBlog |publisher=Microsoft]

As an additional defense, Internet Explorer 7, Firefox 2.0 and Opera 9.10 include phishing filters to alert users when they visit malicious websites. [cite web |url=http://blogs.msdn.com/ie/archive/2005/09/09/463204.aspx |title=Phishing Filter in IE7 |accessdate=2006-11-30 |last=Sharif |first=Tariq |date=2005-09-09 |work=IEBlog |publisher=Microsoft] [cite web |url=http://www.mozilla.com/en-US/firefox/phishing-protection/ |title=Firefox 2 Phishing Protection |accessdate=2006-11-30 |year=2006 |publisher=Mozilla] [cite web |url=http://www.opera.com/docs/fraudprotection/ |title=Opera Fraud Protection |accessdate=2007-02-24 |date=2006-12-18 |publisher=Opera Software]

Another possible defense would be for web browsers to display non-ASCII characters in URLs distinctively, perhaps by changing their color or that of their background. This wouldn't provide protection against spoofing by changing one non-ASCII character to another similar-looking one (for example, replacing a Greek ο with a Cyrillic о or vice versa). (A solution to this problem would be using a different color for all character groups, but no software implements it that way.) This approach was adopted, as of July 9, 2005, by the plug-in Quero Toolbar for Internet Explorer. Besides IDN highlighting Quero has implemented several other techniques to mitigate IDN spoofing attacks like mixed-script/missing glyph detection, IDN/digit indication and "core domain" highlighting.

Using certain fonts that differentiate between homoglyphs can help identify a phony character in a URL. For instance, Courier New, which is widely available as a standard monospace font and is the default font for text-based e-mails, constructs its characters in a way such that some characters that appear to be homoglyphs in other fonts appear distinctly different in Courier New (although there are still several characters that still appear identical). However, the ability to readily change the font of the address bar is not yet widespread or easy for the typical Internet user to implement at this time.

There is not yet (as of March 2005) a clear consensus as to the best way to balance the needs of the international community with protection against domain-name spoofing.

ee also

* Homoglyph
* Internationalizing Domain Names in Applications
* Phishing
* Punycode

References

External links

* http://www.shmoo.com/idn/homograph.txt "The state of homograph attacks", by 3ric Johanson.
* http://secunia.com/advisories/14163/, http://secunia.com/advisories/14209/ "Secunia advisories about IDN spoofing"
* http://www.centr.org/docs/2005/02/homographs.html "CENTR statement on IDN homograph attacks", issued by the Council of European National TLD registries.
* [http://www.cs.technion.ac.il/~gabr/papers/homograph_full.pdf The Homograph Attack] , Evgeniy Gabrilovich and Alex Gontmakher, "Communications of the ACM", 45(2):128, February 2002
* [http://www.quero.at Quero Toolbar] - An IDN-enabling plug-in for Internet Explorer with anti-spoofing techniques.
* [http://nameprep.org/ Erik van der Poel's Unofficial Nameprep/IDNA/Stringprep website]


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Evgeniy Gabrilovich — Infobox Scientist name = Evgeniy Gabrilovich caption = birth date = birth place = Russia death date = death place = ethnicity = residence = United States nationality = Israel field = Computational Linguistics work institution = Yahoo! Research… …   Wikipedia

  • Homoglyph — In typography, a homoglyph is one of two or more characters with shapes that are either identical, or cannot be differentiated by quick visual inspection. This designation is also applied to sequences of characters sharing these properties. The… …   Wikipedia

  • Spoofed URL — A Spoofed URL describes one website that poses as another. It sometimes applies a mechanism that exploits bugs in web browser technology, allowing a malicious computer attack. Such attacks are most effective against computers that lack recent… …   Wikipedia

  • Leet — For other uses, see Leet (disambiguation). One way to write the word Wikipedia in Leet Leet (or 1337 ), als …   Wikipedia

  • PayPaI — was a phishing scam in mid 2000 which targeted account holders of the widely used Internet payment service PayPal using the fact that a capital i may be difficult to distinguish from a minuscule L in some computer fonts. PayPal sends account… …   Wikipedia

  • Grapheme-color synesthesia — Grapheme → color synesthesia is a form of synesthesia in which an individual s perception of numbers and letters are associated with the experience of colors. Like all forms of synesthesia, grapheme → color synesthesia is involuntary, consistent …   Wikipedia

  • Greek letters used in English text — Graphic designers sometimes employ faux Greek or pseudo Greek typography to evoke Greece or Greeks by replacing Latin letters with Greek letters resembling them in appearance.By far the most common substitution is Σ (sigma) for E , though… …   Wikipedia

  • Internationalized domain name — An internationalized domain name (IDN) is an Internet domain name that contains one or more non ASCII characters. Such domain names could contain letters with diacritics, as required by many non English languages, or characters from non Latin… …   Wikipedia

  • Homographischer Angriff — Homographischer bzw. homografischer Angriff (oder homographisches bzw. homografisches Phishing) ist eine Methode des Spoofing, bei der der Angreifer das ähnliche Aussehen verschiedener Schriftzeichen dazu benutzt, Computernutzern eine falsche… …   Deutsch Wikipedia

  • Phishing — In the field of computer security, phishing is the criminally fraudulent process of attempting to acquire sensitive information such as usernames, passwords and credit card details, by masquerading as a trustworthy entity in an electronic… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”