- IDN homograph attack
The internationalized domain name (IDN) homograph attack is a means by which a malicious party may seek to deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters may have nearly (or wholly) indistinguishable
glyph s.Homographs
In
multilingual computer systems, different logical characters may have identical or very similar appearances.For example,Unicode character U+0430, Cyrillic small letter a ("а"), can look identical to Unicode character U+0061, Latin small letter a, ("a") which is the lowercase "a" used in English. Technically, characters that look alike in this way are known as "homoglyph s" (a subgroup ofhomograph s).Spoofing attack s based on these similarities are known as homograph spoofing attacks.The problem arises from the different treatment of the characters in the user's mind and the computer's programming. From the viewpoint of the user, a Cyrillic "а" within a Latin string "is" a Latin "a"; there is no difference in the glyphs for these characters in most fonts. However, the computer treats them differently when processing the character string as an identifier. Thus, the user's assumption of a one-to-one correspondence between the visual appearance of a name, and the named entity, breaks down.
In a typical example of a hypothetical attack, someone could register a
domain name that appears identical to an existing domain but goes somewhere else. For example, the spoofed domain "pаypal.com" contains a Cyrillic "a", not a Latin "a". In many ways, this is not a new thing. For example, even staying within the old character set of A-Z, 0-9 and hyphen, "G00GLE.COM" looks much like "GOOGLE.COM" in some fonts; or, using a mix of uppercase and lowercase characters, "googIe.com" (capital "i", not small "L") looks much like "google.com" in some fonts.PayPal itself was a target of a phishing scam exploiting this, using the domain PayPaI.com. Or, displaying characters in lowercase alone, "rnozilla.org" ("RNOZILLA.ORG") looks very much like "mozilla.org" in many fonts; similarly, in certain narrow-spaced fonts such asTahoma (the default address bar inWindows XP ), placing a c in front of a j, l or i will produce homoglyphs such as cl cj ci (d g a). A unique homograph issue is that oflong s (ſ), which has long been confused with "f" but is recognized as "s" in URLs. What "is" new was that the expansion by theinternationalized domain name system of the character repertoire from a few dozen characters in a single alphabet to many thousands of characters in many scripts greatly increased the scope for homograph attacks.Homographs in internationalized domain names
The limitation of domain names to
ASCII characters may not last forever, and is coming under pressure from organizations based in regions that do not use Latin characters.Internationalized domain name s provides a backward-compatible way for domain names to use the full Unicode character set, and this standard is already widely supported.For example, the
Russia n newspaper website [http://gazeta.ru/ gazeta.ru] may wish to use the URL газета.рф , reflecting the newspaper's name spelled inCyrillic . The disadvantage in this example is that the Cyrillic letters 'а', 'е', and 'р' all strongly resemble (or are indistinguishable, depending on the font) the Latin letters 'a', 'e', and 'p' Some of these pairings (such as а-a) are of two letters that are close etymologically, while others look similar by coincidence. For instance, the Cyrillic letter 'р' represents aphoneme similar to the English 'r', but the glyph strongly resembles the Latin letter 'p' in most fonts.This opens a rich vein of opportunities for
phishing and other varieties of fraud. An attacker could register a domain name that "looks" just like that of a legitimate website, but in which some of the letters have been replaced by homographs in another alphabet. The attacker could then send e-mail messages purporting to come from the original site, but directing people to the bogus site. The spoof site could then record information such as passwords or account details, while passing traffic through to the real site. The victims may never notice the difference, until suspicious or criminal activity occurs with their accounts.The following alphabets have characters that can be used for spoofing attacks (please note, these are only the most obvious and common, given artistic license and how much risk the spoofer will take of getting caught; the possibilities are far more numerous than can be listed here):
Cyrillic
Cyrillic, by far, is the most commonly used alphabet for homoglyphs, largely because it contains 10 lowercase glyphs that are identical (or nearly identical) to Latin counterparts. The following Cyrillic letters have optical counterparts in the basic Latin alphabet: асһеіјорѕху, which look close or identical to acheijopsxy, and Cyrillic З resembles the numeral 3.
Italic type generates more homoglyphs: "тпи" (тпи in standard type), resembling mnu. Cyrillic ёї can also be used if an IDN itself is being spoofed, to fake ëï.If capital letters are counted, ВНКМТ can substitute BHKMT, in addition to the capitals for the lowercase Cyrillic homoglyphs.
Greek
From the
Greek alphabet , only omikron ο and sometimes nu ν qualify in the lowercase used for URLs. Fonts that are initalic type will feature Greek alpha "α" looking like a Latin "a".This list increases if close matches are also allowed (such as Greek εικηρτυωχγ for eiknptuwxy). Using
capital letter s, the list expands greatly. Greek ΑΒΕΗΙΚΜΝΟΡΤΧΥΖ looks identical to Latin ABEHIKMNOPTXYZ.If an IDN itself is being spoofed, Greek beta β can be a substitute for German esszet
ß in some fonts (and in fact,code page 437 treats them as equivalent), as can Greek sigma ς for ç; accented Greek substitutes "όίά" can usually be used for "óíá" in many fonts, with the last of these (alpha) again only resembling "a" in italic type.Armenian
Also the
Armenian alphabet can contribute critical characters: ցհոօզս which look like ghnoqu, յ which resembles j (albeit dotless), and ք, which can either resemble p or f depending on the font. However, the use of Armenian is problematic. Most standard fonts do not feature the Armenian glyphs (whereas the Greek and Cyrillic scripts are in most standard fonts). Because of this, Windows normally renders Armenian in a distinct font, Sylfaen, which supports Armenian, and the mixing of Armenian with Latin will appear obviously different if using a font other than Sylfaen or a Unicode typeface. Furthermore, this font differentiates Latin g from Armenian ց.Two letters in Armenian (Ձշ) also can resemble the number 2, while another (վ) sometimes resembles the number 4.
Hebrew
Hebrew spoofing is generally rare. Only two letters from that alphabet can reliably be used: samekh (ס), which sometimes resembles o, and vav with diacritic, וֹ, which resembles an i. Less accurate approximants for some other alphanumerics can also be found, but these are usually only accurate enough to use for the purposes of
foreign branding and not for substitution. Furthermore, theHebrew alphabet is written from right to left and trying to mix it with left-to-right glyphs may cause problems.Defending against the attack
The simplest defense is for web browsers not to support IDNA or other similar mechanisms, or for users to turn off whatever support their browsers have. That could mean blocking access to IDNA sites, but generally browsers permit access and just display IDNs in
Punycode . Either way, this amounts to abandoning non-ASCII domain names.Firefox andOpera display punycode for IDNs unless the top-level domain (TLD, for example,.ac
or.museum
) prevents homograph attacks by restricting which characters can be used in domain names. [cite web |url=http://www.opera.com/support/search/view/788/ |title=Advisory: Internationalized domain names (IDN) can be used for spoofing. |accessdate=2007-02-24 |date=2005-02-25 |publisher=Opera] They both also allow users to manually add TLDs to the allowed list. [cite web |url=http://www.mozilla.org/projects/security/tld-idn-policy-list.html |title=IDN-enabled TLDs |accessdate=2006-11-30 |date=2006-08-07 |publisher=Mozilla] [cite web |url=http://www.opera.com/support/usingopera/operaini/#network |date=2006-12-18|title=Opera's Settings File Explained: IDNA White List |accessdate=2007-02-24 |publisher=Opera Software]
Internet Explorer 7 allows IDNs except for labels that mix scripts for different languages. Labels that mix scripts are displayed in punycode. There are exceptions to locales where ASCII characters are commonly mixed with localized scripts. [cite web |url=http://blogs.msdn.com/ie/archive/2006/07/31/684337.aspx |title=Changes to IDN in IE7 to now allow mixing of scripts |accessdate=2006-11-30 |last=Sharif |first=Tariq |date=2006-07-31 |work=IEBlog |publisher=Microsoft]As an additional defense, Internet Explorer 7, Firefox 2.0 and Opera 9.10 include phishing filters to alert users when they visit malicious websites. [cite web |url=http://blogs.msdn.com/ie/archive/2005/09/09/463204.aspx |title=Phishing Filter in IE7 |accessdate=2006-11-30 |last=Sharif |first=Tariq |date=2005-09-09 |work=IEBlog |publisher=Microsoft] [cite web |url=http://www.mozilla.com/en-US/firefox/phishing-protection/ |title=Firefox 2 Phishing Protection |accessdate=2006-11-30 |year=2006 |publisher=Mozilla] [cite web |url=http://www.opera.com/docs/fraudprotection/ |title=Opera Fraud Protection |accessdate=2007-02-24 |date=2006-12-18 |publisher=Opera Software]
Another possible defense would be for web browsers to display non-ASCII characters in URLs distinctively, perhaps by changing their color or that of their background. This wouldn't provide protection against spoofing by changing one non-ASCII character to another similar-looking one (for example, replacing a Greek ο with a Cyrillic о or vice versa). (A solution to this problem would be using a different color for all character groups, but no software implements it that way.) This approach was adopted, as of
July 9 ,2005 , by theplug-in Quero Toolbar forInternet Explorer . Besides IDN highlighting Quero has implemented several other techniques to mitigate IDN spoofing attacks like mixed-script/missing glyph detection, IDN/digit indication and "core domain" highlighting.Using certain fonts that differentiate between homoglyphs can help identify a phony character in a URL. For instance,
Courier New , which is widely available as a standard monospace font and is the default font for text-based e-mails, constructs its characters in a way such that some characters that appear to be homoglyphs in other fonts appear distinctly different in Courier New (although there are still several characters that still appear identical). However, the ability to readily change the font of the address bar is not yet widespread or easy for the typical Internet user to implement at this time.There is not yet (as of March 2005) a clear consensus as to the best way to balance the needs of the international community with protection against domain-name spoofing.
ee also
*
Homoglyph
*Internationalizing Domain Names in Applications
*Phishing
*Punycode References
External links
* http://www.shmoo.com/idn/homograph.txt "The state of homograph attacks", by 3ric Johanson.
* http://secunia.com/advisories/14163/, http://secunia.com/advisories/14209/ "Secunia advisories about IDN spoofing"
* http://www.centr.org/docs/2005/02/homographs.html "CENTR statement on IDN homograph attacks", issued by the Council of European National TLD registries.
* [http://www.cs.technion.ac.il/~gabr/papers/homograph_full.pdf The Homograph Attack] , Evgeniy Gabrilovich and Alex Gontmakher, "Communications of the ACM", 45(2):128, February 2002
* [http://www.quero.at Quero Toolbar] - An IDN-enabling plug-in for Internet Explorer with anti-spoofing techniques.
* [http://nameprep.org/ Erik van der Poel's Unofficial Nameprep/IDNA/Stringprep website]
Wikimedia Foundation. 2010.