IETF language tag

IETF language tag

IETF language tags are defined by BCP 47, which is currently RFC 4646 and RFC 4647. These language tags are used in a number of modern standards, such as HTTP, [ [http://tools.ietf.org/html/rfc2616#section-3.10 RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1, section 3.10] ] HTML, [ [http://www.w3.org/TR/html401/struct/dirlang.html#h-8.1 HTML 4.01 Specification, section 8.1] ] XML [ [http://www.w3.org/TR/REC-xml/#sec- _ta. PNG] ] . [ [http://www.w3.org/TR/PNG/#11iTXt Portable Network Graphics (PNG) Specification (Second Edition), section 11.3.4.5] ]

Each language tag is composed of one or more “subtags” separated by hyphens. With the exception of private use language tags and grandfathered language tags, the subtags occur in the following order:
*a language subtag (potentially followed by up to three extended language subtags)
*an optional script subtag
*an optional region subtag
*optional variant subtags
*optional extension subtags
*optional private use subtagsLanguage subtags are mainly derived from ISO 639-1 and ISO 639-2, script subtags from ISO 15924, and region subtags from ISO 3166-1 alpha-2 and UN M.49. Variant subtags are not derived from any standard. No extension subtags have yet been defined. The Language Subtag Registry, maintained by IANA, lists the current valid public subtags.

The most commonly seen language tags consist of just a language subtag, or a language subtag and a region subtag. For example, en represents English, and consists of a single language subtag (from ISO 639-1), while en-CA represents Canadian English, and consists of the language subtag en followed by the region subtag CA (from ISO 3166-1).

Subtags are not case sensitive, but the specification recommends using the same case as in the Language Subtag Registry, where region subtags are uppercase, script subtags are titlecase and all other subtags are lowercase. This capitalization follows the recommendations of the underlying ISO standards.

History

IETF language tags were first defined in RFC 1766, published in March 1995. In January 2001 this was superseded by RFC 3066, which added the use of ISO 639-2 codes (whereas previously only ISO 639-1 codes had been allowed), permitted subtags with digits for the first time, and adopted the concept of language ranges from HTTP/1.1 to help with matching of language tags.

The next revision of the specification came in September 2006 with the publication of RFC 4646 (the main part of the specification) and RFC 4647 (which deals with matching behaviour). RFC 4646 introduced a more structured format for language tags and replaced the old register of tags with a new register of subtags that utilizes ISO 15924 and UN M.49 in addition to the previously used ISO 639 and ISO 3166. The small number of previously defined tags that did not conform to the new structure were grandfathered in order to maintain compatibility with RFC 3066.

An IETF Working Group is currently preparing the next version of the specification. The main purpose of this revision is to incorporate codes from ISO 639-3 into the Language Subtag Registry. [ [http://www.ietf.org/html.charters/ltru-charter.html Language Tag Registry Update charter] ]

Relation to other standards

Although subtags are often derived from ISO standards, they do not follow these standards absolutely as this could lead to the meaning of language tags changing over time.

In particular, a subtag derived from a code assigned by ISO 639, ISO 15924 or ISO 3166 remains a valid (though deprecated) subtag even if the code is withdrawn from the corresponding ISO standard. If the ISO standard later assigns a new meaning to the withdrawn code, the corresponding subtag will still retain its old meaning.

This stability was introduced in RFC 4646. Before RFC 4646, changes in the meaning of ISO codes could cause changes in the meaning of language tags.

Issues with ISO 3166-1 and UN M.49

If a new ISO 3166-1 alpha-2 code would conflict with an existing region subtag (due to the code having previously had a different meaning), a UN M.49 code can be used instead. This rule was introduced in RFC 4646 and so far there has been no need to use it. UN M.49 is also the source for region subtags such as 005 for South America, as ISO 3166 does not provide codes for supranational regions.

:Further|The CS controversy

Relation to ISO 639-3

RFC 4646, unlike its predecessors, defines the concept of an “extended language subtag”, although it does not permit the registration of such subtags. The next version of the specification (currently in draft) is expected to require certain ISO 639-3 codes to be registered as extended language subtags, and to require other ISO 639-3 codes to be registered as (primary) language subtags. [cite web
url=http://tools.ietf.org/html/draft-ietf-ltru-4646bis
title=Tags for Identifying Languages
author=A. Phillips, M. Davis
year=2008
publisher= [http://tools.ietf.org/wg/ltru IETF WG LTRU]
accessdate=2008-06-23
] , [cite web
url=http://tools.ietf.org/html/draft-ietf-ltru-4645bis
title=Update to the Language Subtag Registry
author=D. Ewell
year=2008
format=1MB
publisher= [http://tools.ietf.org/wg/ltru IETF WG LTRU]
accessdate=2008-06-23
]

ee also

*Language code

References

External links

* [http://tools.ietf.org/rfc/bcp/bcp47.txt BCP 47] - current specification
* [http://www.iana.org/assignments/language-subtag-registry Language Subtag Registry] - maintained by IANA
* [http://www.w3.org/International/articles/language-tags/Overview.en.php Language tags in HTML and XML] - from the W3C
* [http://www.langtag.net/ Language Tags] - an unofficial site (includes various tools)
* [http://rishida.net/utils/subtags/ IANA Language Subtag Registry Search] - an unofficial tool for users to find subtags and view entries in the registry


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Language code — A language code is a code that assigns letters or numbers as identifiers for languages. These codes are often used to organize library collections, to choose the correct localizations and translations in computing, and as a shorthand designation… …   Wikipedia

  • Crow (language) — Crow (langue) Pour les articles homonymes, voir Crow. Crow Apsáalooke Parlée aux  États Unis …   Wikipédia en Français

  • Mwotlap language — Mwotlap mwotlap Parlée au Vanuatu Région Mota Lava, dans les îles Banks Nombre de locuteurs env. 1800 (2000)[1] Classification par famille langues austronésiennes …   Wikipédia en Français

  • Étiquette d'identification de langues IETF — Les étiquettes d’identification de langues IETF sont issues d’un code standardisé qui permet d’attribuer des étiquettes structurées et hiérarchisées permettant d’identifier les langues ou familles et collections de langues ou variétés… …   Wikipédia en Français

  • HTML-Tag — Vorlage:Infobox Dateiformat/Wartung/magic fehltVorlage:Infobox Dateiformat/Wartung/website fehlt HTML (Hypertext Markup Language) Dateiendung .html, .htm …   Deutsch Wikipedia

  • Generalized Markup Language — Vorlage:Infobox Dateiformat/Wartung/extension fehltVorlage:Infobox Dateiformat/Wartung/magic fehltVorlage:Infobox Dateiformat/Wartung/owner fehltVorlage:Infobox Dateiformat/Wartung/website fehlt Standard Generalized Markup Language …   Deutsch Wikipedia

  • HyperText Markup Language — L’Hypertext Markup Language, généralement abrégé HTML, est le format de données conçu pour représenter les pages web. C’est un langage de balisage qui permet d’écrire de l’hypertexte, d’où son nom. HTML permet également de structurer… …   Wikipédia en Français

  • Hypertext Markup Language — L’Hypertext Markup Language, généralement abrégé HTML, est le format de données conçu pour représenter les pages web. C’est un langage de balisage qui permet d’écrire de l’hypertexte, d’où son nom. HTML permet également de structurer… …   Wikipédia en Français

  • Hypertext markup language — L’Hypertext Markup Language, généralement abrégé HTML, est le format de données conçu pour représenter les pages web. C’est un langage de balisage qui permet d’écrire de l’hypertexte, d’où son nom. HTML permet également de structurer… …   Wikipédia en Français

  • Langue norvégienne — Norvégien  Cet article concerne la langue norvégienne. Pour le félin, voir Norvégien (chat). Norvégien Norsk Parlée en Norvège, (incluant le Svalbard et l île Jan Mayen) Région Europe Nombre de locuteurs …   Wikipédia en Français

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”