Bi-directional text

Bi-directional text is text containing text in both text directionalities, both right-to-left (RTL) and left-to-right (LTR). It generally involves text containing different types of alphabets, but may also refer to boustrophedon, which is changing text directionality in each row.

Some writing systems of the world, notably the Arabic and Hebrew scripts, are written in a form known as right-to-left (RTL), in which writing begins at the right-hand side of a page and concludes at the left-hand side. This is different from the left-to-right (LTR) direction used by most languages in the world. When LTR text is mixed with RTL in the same paragraph, each type of text is written in its own direction, which is known as bi-directional text. This can get rather complex when multiple levels of quotation are used.

Many computer programs fail to display bi-directional text correctly. For example, the Hebrew name Sarah (שרה) is spelled shin (ש) resh (ר) heh (ה) from right to left. Some Web browsers may display the Hebrew text in this article in the opposite direction.

1 Unicode support
2 Scripts using bi-directional text
3 See also
4 References
5 External links

Unicode support

Bidirectional script support is the capability of a computer system to correctly display bi-directional text. The term is often shortened to the jargon term BiDi or bidi.

Early computer installations were designed only to support a single writing system, typically for left-to-right scripts based on the Latin alphabet only. Adding new character sets and character encodings enabled a number of other left-to-right scripts to be supported, but did not easily support right-to-left scripts such as Arabic or Hebrew, and mixing the two was not practical. Right-to-left scripts were introduced through encodings like ISO/IEC 8859-6 and ISO/IEC 8859-6, storing the letters (usually) in writing and reading order. It is possible to simply flip the left-to-right display order to a right-to-left display order, but doing this sacrifices the ability to correctly display left-to-right scripts. With bidirectional script support, it is possible to mix scripts from different scripts on the same page, regardless of writing direction.

In particular, the Unicode standard provides foundations for complete BiDi support, with detailed rules as to how mixtures of left-to-right and right-to-left scripts are to be encoded and displayed.

In Unicode encoding, all non-punctuation characters are stored in writing order. This means that the writing direction of characters is stored within the characters. If this is the case, the character is called "strong". Punctuation characters however, can appear in both LTR and RTL scripts. They are called "weak" characters because they do not contain any directional information. So it is up to the software to decide in which direction these "weak" characters will be placed. Sometimes (in mixed-directions text) this leads to display errors, caused by the BiDi-algorithm that runs through the text and identifies LTR and RTL strong characters and assigns a direction to weak characters, according to the algorithm's rules.

In the algorithm, each sequence of concatenated strong characters is called a "run". A weak character that is located between two strong characters with the same orientation will inherit their orientation. A weak character that is located between two strong characters with a different writing direction, will inherit the main context's writing direction (in an LTR document the character will become LTR, in an RTL document, it will become RTL). If a "weak" character is followed by another "weak" character, the algorithm will look at the first neighbouring "strong" character. Sometimes this leads to unintentional display errors. These errors are corrected or prevented with "pseudo-strong" characters. Such Unicode control characters are called marks. The mark (U+200E left-to-right mark (HTML: ‎ &lrm; LRM) or U+200F right-to-left mark (HTML: ‏ &rlm; RLM)) is to be inserted into a location to make an enclosed weak character inherit its writing direction.

For example, to correctly display the U+2122 ™ trade mark sign for an English name brand (LTR) in an Arabic (RTL) passage, an LRM mark is inserted after the trademark symbol if the symbol is not followed by LTR text. If the LRM mark is not added, the weak character ™ will be neighbored by a strong LTR character and a strong RTL character. Hence, in an RTL context, it will be considered to be RTL, and displayed in an incorrect order.

Possible BiDi-types of a character, to be used by the BiDi algorithm, are:

Bidirectional character type (Unicode character property Bidi_Class)^[1]v · [2]
Description	Strong/Weak effect	General scope	Bidi_Control character^[3]
&01L	Left-to-Right	Strong	Most alphabetic and syllabic characters, Han ideographs, non-European or non-Arabic digits, LRM character, ...	U+200E left-to-right mark (LRM)
&02LRE	Left-to-Right Embedding	Strong	LRE character only	U+202A left-to-right embedding (LRE)
&03LRO	Left-to-Right Override	Strong	LRO character only	U+202D left-to-right override (LRO)
&04R	Right-to-Left	Strong	Hebrew alphabet and related punctuation, RLM character	U+200F right-to-left mark (RLM)
&05AL	Right-to-Left Arabic	Strong	Arabic, Thaana and Syriac alphabets, and most punctuation specific to those scripts
&06RLE	Right-to-Left Embedding	Strong	RLE character only	U+202B ‭right-to-left embedding (RLE)
&07RLO	Right-to-Left Override	Strong	RLO character only	U+202E ‭right-to-left override (RLO)
&08PDF	Pop Directional Format	Weak	PDF character only	U+202C pop directional formatting (PDF)
&09EN	European Number	Weak	European digits, Eastern Arabic-Indic digits, ...
&10ES	European Separator	Weak	plus sign, minus sign, ...
&11ET	European Number Terminator	Weak	degree sign, currency symbols, ...
&12AN	Arabic Number	Weak	Arabic-Indic digits, Arabic decimal and thousands separators, ...
&13CS	Common Number Separator	Weak	colon, comma, full stop, no-break space, ...
&14NSM	Nonspacing Mark	Weak	Characters in General Categories Mark, nonspacing and Mark, enclosing (Mn, Me)
&15BN	Boundary Neutral	Weak	Default ignorables, non-characters, control characters other than those explicitly given other types
&16B	Paragraph Separator	Neutral	paragraph separator, appropriate Newline Functions, higher-level protocol paragraph determination
&17S	Segment Separator	Neutral	Tab
&18WS	Whitespace	Neutral	space, figure space, line separator, form feed, General Punctuation block spaces	This set is smaller than Unicode whitespace list
&19ON	Other Neutrals	Neutral	All other characters, including object replacement character
Notes 1. ^ Unicode Bidirectional Algorithm (UAX#9), As of version 6.0.0 2.^ Possible Bidirectional character types for character property: Bidi_Class or 'type' 3.^ Bidi_Control characters: Seven Bidi_Control formatting characters are defined. They are invisible, and have no effect apart from directionality. Five of them have a unique, overruling BiDi-type that is used by the algorithm; their type is also their acronym (e.g. character 'LRE' has BiDi type 'LRE').

Scripts using bi-directional text

There are very few scripts that can be written in either direction.

Writing a boustrophedon requires every second line to use mirrored glyphs.

Egyptian hieroglyphs can be written bi-directional too, where the signs had a distinct "head" that faced the beginning of a line and "tail" that faced the end.

Chinese characters can also be written in either direction as well as vertically (top to bottom then right to left), especially in signs (such as plaques), but the orientation of the individual characters is never changed. This can often be seen on tour buses in China, where the company name customarily runs from the front of the vehicle to its rear - that is, from right to left on the right side of the bus, and from left to right on the left side of the bus.

The right side (text runs from right to left)
The left side (text runs from left to right)
On the right side of this Hainan Airlines aircraft, the text runs from right to left ( 空航南海 ).
The left side, however, shows the text running from left to right ( 海南航空 ).
A photo that shows text on both sides of a China Post vehicle (thanks to the open door)

Another variety of writing style, called boustrophedon, was used in some ancient Greek inscriptions, Tuareg, and Hungarian runes. This method of writing alternates direction, and usually reverses the individual characters, on each successive line.

References

External links

Unicode Standards Annex #9 The Bidirectional Algorithm
W3C guidelines on authoring techniques for bi-directional text - includes examples and good explanations
GNU FriBidi A free implementation of the Unicode bidirectional algorithm
ICU International Components for Unicode contains an implementation of the bidirectional algorithm — along with other internationalization services
UCData: "Pretty Good Bidi Algorithm Library" A small and fast bidirectional reordering algorithm that works pretty good, but not necessarily compliant to the Unicode algorithm
Bidirectional Scripts in Desktop Software Working group for supporting BiDi in Free Software. Contains several links to readings and implementation regarding BiDi in computer systems.
Another Wiki about BiDi
Bidirectional text - Examples and practical advice
.Net BiDi Implementation
A freely available rather final version of Israeli standard 5194 - bidirectional text editing
Work in progress on new version of Bidi editing standard + reference implementation
Series of articles about pitfalls of BiDi programming

Unicode

Unicode Consortium · ISO/IEC 10646 (Universal Character Set)

Code points

Code point · Plane · Block · Mapping characters · Character property · Character charts

Characters


Special purpose	BOM · Combining grapheme joiner · Left-to-right mark and Right-to-left mark · Soft hyphen · Zero-width non-breaking space · Zero-width joiner · Zero-width non-joiner · Zero-width space

Miscellaneous lists	Combining character · Duplicate characters · Graphic characters

Processing


Algorithms	Bi-directional text · Collation (ISO 14651) · Equivalence

Transformation	BOCU-1 · CESU-8 · UTF-1 · UTF-7 · UTF-8 · UTF-9/UTF-18 · UTF-16/UCS-2 · UTF-32/UCS-4 · UTF-EBCDIC · Punycode · SCSU · Comparison

On pairs
of code points

Equivalence · Combining character · Duplicates · Homoglyph · Precomposed character (List) · Compatibility characters · Z-variant

Usage

Unicode and e-mail · Unicode and HTML · Character entity references · Unicode input · Internationalized domain name · Numeric character reference · Private Use U+F8FF · Typefaces (fonts) ·

Related standards

Common Locale Data Repository (CLDR) · GB 18030 · Han unification · ISO/IEC 8859 (8-bit encodings) · ISO 14651 (Collation) · ISO 15924 (Script codes)

Look at other dictionaries:

Directional statistics — is the subdiscipline of statistics that deals with directions (unit vectors in Rn), axes (lines through the origin in Rn) or rotations in Rn. More generally, directional statistics deals with observations on compact Riemannian manifolds. The… … Wikipedia
Text Executive Programming Language — In 1979, Honeywell Information Systems announced a new programming language for their time sharing service named TEX, an acronym for the Text Executive processor. TEX was a first generation scripting language, developed around the time of AWK and … Wikipedia
mined (text editor) — MinEd Mined editing Unicode text Developer(s) Thomas Wolff Stable release 2011.17 / June 2011 … Wikipedia
Complex text layout — The Devanagari ddhrya ligature of JanaSanskritSans, should be invoked by the layout engine to render the sequence of seven Unicode characters द + ् + ध + ् + र + ् + य = द्ध्र्य … Wikipedia
Mined (text editor) — Infobox Software name = Mined caption = Mined editing Unicode text developer = [http://towo.net/mined Thomas Wolff] latest release version = 2000.14 latest release date = July 2007 operating system = OS independent genre = Text editor license =… … Wikipedia
Average Directional Movement Index — Der Average Directional Movement Index ADX ist ein Indikator der technischen Analyse und dient der Trendstärkebestimmung eines Kurses. Er wurde von Welles Wilder 1978 entwickelt und in seinem Buch „New Concepts in Technical Trading Systems“… … Deutsch Wikipedia
Power dividers and directional couplers — A 10 dB 1.7–2.2 GHz directional coupler. From left to right: input, coupled, isolated (terminated with a load), and transmitted port … Wikipedia
Comparison of text editors — This article provides basic comparisons for common text editors. More feature details for text editors are available from the Category of text editor features and from the individual products articles. This article may not be up to date or… … Wikipedia
Mapping of Unicode characters — Unicode’s Universal Character Set has a potential capacity to support over 1 million characters. Each UCS character is mapped to a code point which is an integer between 0 and 1,114,111 used to represent each character within the internal logic… … Wikipedia
Unicode character property — Unicode assigns character properties to each code point.[1] These properties can be used to handle characters (code points) in processes, like in line breaking, script direction right to left or applying controls. Slightly inconsequently, some… … Wikipedia

Academic Dictionaries and Encyclopedias

Bi-directional text

Contents

Unicode support

Scripts using bi-directional text

See also

References

External links

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Bi-directional text

Contents

Unicode support

Scripts using bi-directional text

See also

References

External links

Look at other dictionaries:

Share the article and excerpts

Direct link