HZ (character encoding)

HZ (character encoding)

The HZ character encoding is an encoding of GB2312 that was formerly commonly used in email and USENET postings. It was designed in 1989 by Fung Fung Lee (李楓峰) of Stanford University, and subsequently codified in 1995 into RFC 1843.

The HZ (short for Hanzi) encoding was invented to facilitate the use of Chinese characters through e-mail, which at that time only allowed 7-bit characters. Therefore, in lieu of standard ISO 2022 escape sequences (as in the case of ISO-2022-JP) or 8-bit characters (as in the case of EUC), the HZ code uses only printable, 7-bit characters to represent Chinese characters.

It was also popular in USENET networks, which in the late 1980s and early 1990s, generally did not allow transmission of 8-bit characters or escape characters.

tructure and use

In the HZ encoding system, the character sequences "~{" and "~}" act as escape sequences; anything between them is interpreted as Chinese encoded in GB2312 (the most significant bits are ignored). Outside the escape sequences, characters are assumed to be ASCII.

An example will help illustrate the relationship between GB2312, EUC-CN, and the HZ code:

HZ was originally designed to be used purely as a 7-bit code. However, when situations allow, the escape sequences "~{" and "~}" sometimes surround characters represented in EUC-CN; this alternative use allows Chinese to be readable either with the help of HZ decoder software, or with a system that understands EUC-CN.

Additionally, the specification defines that
* the sequence "~~" is to be treated as encoding a single ASCII "~"
* the character "~" followed by a newline is to be discarded.However, not all HZ decoders follow these two rules.

HZ decoders

The first HZ decoder was written in 1989 by the code's inventor for the Unix operating system.

The hztty program, also for the Unix operating system, was also among the first and one of the most popular HZ decoders. It deviates from the specification in that it will display the escape sequences (i.e., "~{" and "~}"), and it does not treat "~~" and "~" followed by a newline specially. This was probably to allow software which assumes one character to occupy one screen position (on a text screen) to function correctly without modification.

Support on Microsoft Windows came later, and a number of third-party "Chinese systems" support HZ. These systems may provide an option to hide the escape sequences.

References

* [http://quimby.gnus.org/rfc/rfc1843.txt RFC 1843]
* [http://umunhum.stanford.edu/~lee/chicomp/HZ_spec.html HZ — A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and ASCII Characters] , [http://web.archive.org/web/20051027040810/umunhum.stanford.edu/~lee/chicomp/HZ_spec.html Archived version]


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Character encoding — Special characters redirects here. For the Wikipedia editor s handbook page, see Help:Special characters. A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of… …   Wikipedia

  • Chinese character encoding — In computing, Chinese character encodings can be used to represent text written in the CJK languages Chinese, Japanese, Korean and (rarely) obsolete Vietnamese, all of which use Chinese characters. Several general purpose character encodings… …   Wikipedia

  • character encoding — noun A well defined correspondence between characters and numbers used to represent them. See Also: character set …   Wiktionary

  • Encoding — is the process of transforming information from one format into another. The opposite operation is called decoding. There are a number of more specific meanings that apply in certain contexts:*Encoding (in cognition) is a basic perceptual process …   Wikipedia

  • Character encodings in HTML — For a list of character entity references, see List of XML and HTML character entity references. HTML HTML and HTML5 Dynamic HTML XHTML XHTML Mobile Profile and C HTML Canvas element Character encodings Document Object Model Font family HTML… …   Wikipedia

  • Character (computing) — In computer and machine based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural… …   Wikipedia

  • character set — noun a) The set of characters encoded by a given character encoding. b) A set of characters together with a character encoding. See Also: charset, character encoding, DRCS …   Wiktionary

  • Character large object — CLOB redirects here. For the card game, see Clob (card). For the formerly proposed securities trading system, see central limit order book. A Character Large Object (or CLOB) is a collection of character data in a database management system,… …   Wikipedia

  • Percent-encoding — For the urlencode in MediaWiki, see Help:Magic words. Percent encoding, also known as URL encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI) under certain circumstances. Although it is known as URL encoding… …   Wikipedia

  • Variable-width encoding — This article is about the storage of text in computers. For the transmission of data across noisy channels, see variable length code. A variable width encoding is a type of character encoding scheme in which codes of differing lengths are used to …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”