Combining grapheme joiner

Combining grapheme joiner

The combining grapheme joiner (CGJ), U+034F ͏ combining grapheme joiner (HTML: ͏ ) is a Unicode character that has no visible glyph and is "default ignorable" by applications. Its name is a misnomer which does not describe the function of this character. Despite its name, it does not join graphemes.[1] Its purpose is to separate characters that should not be considered digraphs.

For example, in a Hungarian language context, adjoining characters c and s would normally be considered equivalent to the cs digraph. If they are separated by the CGJ, they will be considered as two separate graphemes.

It is also needed for complex scripts. For example, in most cases the Hebrew cantillation accent Metheg is supposed to appear to the left of the vowel point and by default most display systems will render it like this even if it is typed before the vowel. But in some words in Biblical Hebrew the Metheg appears to the right of the vowel, and to tell the display engine to render it properly on the right, CGJ must be typed between the Metheg and the vowel. Compare:

he + pathah + metheg הַֽ
he + metheg + pathah הַֽ
he + metheg + CGJ + pathah הֽ͏ַ

(The examples in the table may not be supported if you don't have a font that properly supports Hebrew cantillation display. Ezra SIL SR is recommended.)

In the case of several consecutive combining diacritics, an intervening CGJ indicates that they should not be subject to canonical reordering.[1]

Compare to this the "zero-width non-joiner" (as it were a space mark of width zero) at U+200C in the General Punctuation range.

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Combining Grapheme Joiner — Der Combining Grapheme Joiner (CGJ, deutsch Kombinierender Graphemverbinder) ist ein unsichtbares Sonderzeichen, das normalerweise von den Anwendungsprogrammen völlig ignoriert wird (engl.: „default ignorable“). Der Name ist insofern… …   Deutsch Wikipedia

  • Combining — may refer to: Combining capacity, in chemistry Combining character, in digital photography Combining form, in linguistics Combining grapheme joiner, Unicode character that has no visible glyph Combining Cyrillic Hundred Thousands, modifier in the …   Wikipedia

  • Combining character — In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents). Unicode also… …   Wikipedia

  • Zero-width non-joiner — The zero width non joiner (ZWNJ) is a non printing character used in the computerization of writing systems that make use of ligatures. When placed between two characters that would otherwise be connected into a ligature, a ZWNJ causes them to be …   Wikipedia

  • Mapping of Unicode characters — Unicode’s Universal Character Set has a potential capacity to support over 1 million characters. Each UCS character is mapped to a code point which is an integer between 0 and 1,114,111 used to represent each character within the internal logic… …   Wikipedia

  • Unicode — For the 1889 Universal Telegraphic Phrase book, see Commercial code (communications). The Unicode official logo since October 2009 …   Wikipedia

  • Phonetic symbols in Unicode — Unicode supports several phonetic scripts and notations through the existing writing systems and the addition of extra blocks with phonetic characters. These phonetic extras are derived of an existing script, usually Latin, Greek or Cyrillic. In… …   Wikipedia

  • Basic Multilingual Plane — Logo von Unicode Unicode [ˈjuːnɪkoʊd] ist ein internationaler Standard, in dem langfristig für jedes sinntragende Schriftzeichen oder Textelement aller bekannten Schriftkulturen und Zeichensysteme ein digitaler Code festgelegt wird. Ziel ist es,… …   Deutsch Wikipedia

  • Unicode-Block — Logo von Unicode Unicode [ˈjuːnɪkoʊd] ist ein internationaler Standard, in dem langfristig für jedes sinntragende Schriftzeichen oder Textelement aller bekannten Schriftkulturen und Zeichensysteme ein digitaler Code festgelegt wird. Ziel ist es,… …   Deutsch Wikipedia

  • Unicode-Ebene — Logo von Unicode Unicode [ˈjuːnɪkoʊd] ist ein internationaler Standard, in dem langfristig für jedes sinntragende Schriftzeichen oder Textelement aller bekannten Schriftkulturen und Zeichensysteme ein digitaler Code festgelegt wird. Ziel ist es,… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”