A bidirectional text contains two text directionalities , right-to-left (RTL) and left-to-right (LTR). It generally involves text containing different types of alphabets , but may also refer to boustrophedon , which is changing text direction in each row.
25-617: An example is the RTL Hebrew name Sarah: שרה, spelled sin (ש) on the right, resh (ר) in the middle, and heh (ה) on the left. Many computer program failed to display this correctly, because they were designed to display text in one direction only. Some so-called right-to-left scripts such as the Persian script and Arabic are mostly, but not exclusively, right-to-left—mathematical expressions, numeric dates and numbers bearing units are embedded from left to right. That also happens if text from
50-441: A left-to-right language such as English is embedded in them; or vice versa, if Arabic is embedded in a left-to-right script such as English. Bidirectional script support is the capability of a computer system to correctly display bidirectional text. The term is often shortened to " BiDi " or " bidi ". Early computer installations were designed only to support a single writing system , typically for left-to-right scripts based on
75-462: A paragraph separator, or a "pop" character. If a "weak" character is followed by another "weak" character, the algorithm will look at the first neighbouring "strong" character. Sometimes this leads to unintentional display errors. These errors are corrected or prevented with "pseudo-strong" characters. Such Unicode control characters are called marks . The mark ( U+200E LEFT-TO-RIGHT MARK (LRM) or U+200F RIGHT-TO-LEFT MARK (RLM))
100-466: A piece of text is to be treated as directionally distinct. The text within the scope of the embedding formatting characters is not independent of the surrounding text. Also, characters within an embedding can affect the ordering of characters outside. Unicode 6.3 recognized that directional embeddings usually have too strong an effect on their surroundings and are thus unnecessarily difficult to use. The "isolate" directional formatting characters signal that
125-435: A piece of text is to be treated as directionally isolated from its surroundings. As of Unicode 6.3, these are the formatting characters that are being encouraged in new documents – once target platforms are known to support them. These formatting characters were introduced after it became apparent that directional embeddings usually have too strong an effect on their surroundings and are thus unnecessarily difficult to use. Unlike
150-448: Is not added, the weak character ™ will be neighbored by a strong LTR character and a strong RTL character. Hence, in an RTL context, it will be considered to be RTL, and displayed in an incorrect order (e.g. " قرأ Misplaced Pages™ طوال اليوم. "). The "embedding" directional formatting characters are the classical Unicode method of explicit formatting, and as of Unicode 6.3, are being discouraged in favor of "isolates". An "embedding" signals that
175-444: Is not always possible to classify some ancient writing systems as purely RTL or LTR. Right-to-left, top-to-bottom text is supported in common computer software. Often, this support must be explicitly enabled. Right-to-left text can be mixed with left-to-right text in bi-directional text . Examples of right-to-left scripts (with ISO 15924 codes in brackets) are: Trojan Source Too Many Requests If you report this error to
200-652: Is possible to simply flip the left-to-right display order to a right-to-left display order, but doing this sacrifices the ability to correctly display left-to-right scripts. With bidirectional script support, it is possible to mix characters from different scripts on the same page, regardless of writing direction. In particular, the Unicode standard provides foundations for complete BiDi support, with detailed rules as to how mixtures of left-to-right and right-to-left scripts are to be encoded and displayed. The Unicode standard calls for characters to be ordered 'logically', i.e. in
225-850: Is similar, but developed from Proto-Hebrew rather than Aramaic. Many other ancient and historic scripts derived from Aramaic inherited its right-to-left direction. Several languages have both Arabic RTL and non-Arabic LTR writing systems. For example, Sindhi is commonly written in Arabic and Devanagari scripts, and a number of others have been used. Kurdish may be written in the Arabic or Latin script. Thaana appeared around 1600 CE. Most modern scripts are LTR, but Niko (1949), Mende Kikakui (19th century), Adlam (1980s) and Hanifi Rohingya (1980s) were created in modern times and are RTL. Ancient examples of text using alphabets such as Phoenician, Greek, or Old Italic may exist variously in left-to-right, right-to-left, or boustrophedon order; therefore, it
250-500: Is to be inserted into a location to make an enclosed weak character inherit its writing direction. For example, to correctly display the U+2122 ™ TRADE MARK SIGN for an English name brand (LTR) in an Arabic (RTL) passage, an LRM mark is inserted after the trademark symbol if the symbol is not followed by LTR text (e.g. " قرأ Misplaced Pages™ طوال اليوم. "). If the LRM mark
275-518: Is true of the other directional formatting characters, "overrides" can be nested one inside another, and in embeddings and isolates. Using unicode U+202D (LTR Override) will switch direction from Left-to-Right to Right-to-Left. Similarly, using U+202E (RTL Override) will switch direction from Right-to-Left to Left-to-Right. Refer to the Unicode Bidirectional Algorithm . The "pop" directional formatting characters terminate
SECTION 10
#1732779828765300-476: The Latin alphabet only. Adding new character sets and character encodings enabled a number of other left-to-right scripts to be supported, but did not easily support right-to-left scripts such as Arabic or Hebrew , and mixing the two was not practical. Right-to-left scripts were introduced through encodings like ISO/IEC 8859-6 and ISO/IEC 8859-8 , storing the letters (usually) in writing and reading order. It
325-852: The Unicode encoding standard divides all its characters into one of four types: 'strong', 'weak', 'neutral', and 'explicit formatting'. Strong characters are those with a definite direction. Examples of this type of character include most alphabetic characters, syllabic characters, Han ideographs, non-European or non-Arabic digits, and punctuation characters that are specific to only those scripts. Weak characters are those with vague direction. Examples of this type of character include European digits, Eastern Arabic-Indic digits, arithmetic symbols, and currency symbols. Neutral characters have direction indeterminable without context. Examples include paragraph separators, tabs, and most other whitespace characters. Punctuation symbols that are common to many scripts, such as
350-541: The character will become LTR, in an RTL document, it will become RTL). Unicode bidirectional characters are used in the Trojan Source vulnerability. Visual Studio Code highlights BiDi control characters since version 1.62 released in October 2021. Visual Studio highlights BiDi control characters since version 17.0.3 released on December 14, 2021. Egyptian hieroglyphs were written bidirectionally, where
375-405: The colon, comma, full-stop, and the no-break-space also fall within this category. Explicit formatting characters, also referred to as "directional formatting characters", are special Unicode sequences that direct the algorithm to modify its default behavior. These characters are subdivided into "marks", "embeddings", "isolates", and "overrides". Their effects continue until the occurrence of either
400-487: The legacy 'embedding' directional formatting characters, 'isolate' characters have no effect on the ordering of the text outside their scope. Isolates can be nested, and may be placed within embeddings and overrides. The "override" directional formatting characters allow for special cases, such as for part numbers (e.g. to force a part number made of mixed English, digits and Hebrew letters to be written from right to left), and are recommended to be avoided wherever possible. As
425-622: The most widespread RTL writing systems in modern times. As usage of the Arabic script spread, the repertoire of 28 characters used to write the Arabic language was supplemented to accommodate the sounds of many other languages such as Kashmiri , Pashto , etc. While the Hebrew alphabet is used to write the Hebrew language , it is also used to write other Jewish languages such as Yiddish and Ladino . Syriac and Mandaean (Mandaic) scripts are derived from Aramaic and are written RTL. Samaritan
450-535: The right of the page and continues to the left, proceeding from top to bottom for new lines. Arabic and Hebrew are the most widespread RTL writing systems in modern times. Right-to-left can also refer to [REDACTED] top-to-bottom, right-to-left (TB-RL or vertical ) scripts of tradition, such as Chinese , Japanese , and Korean , though in modern times they are also commonly written [REDACTED] left to right (with lines going from top to bottom). Books designed for predominantly vertical TBRL text open in
475-502: The same direction as those for RTL horizontal text : the spine is on the right and pages are numbered from right to left. These scripts can be contrasted with many common modern [REDACTED] left-to-right writing systems , where writing starts from the left of the page and continues to the right. The Arabic script is mostly but not exclusively right-to-left; mathematical expressions, numeric dates and numbers bearing units are embedded from left to right. Hebrew and Arabic are
500-451: The scope of the most recent "embedding", "override", or "isolate". In the algorithm, each sequence of concatenated strong characters is called a "run". A "weak" character that is located between two "strong" characters with the same orientation will inherit their orientation. A "weak" character that is located between two "strong" characters with a different writing direction will inherit the main context's writing direction (in an LTR document
525-415: The sequence they are intended to be interpreted, as opposed to 'visually', the sequence they appear. This distinction is relevant for bidi support because at any bidi transition, the visual presentation ceases to be the 'logical' one. Thus, in order to offer bidi support, Unicode prescribes an algorithm for how to convert the logical sequence of characters into the correct visual presentation. For this purpose,
SECTION 20
#1732779828765550-457: The signs that had a distinct "head" or "tail" faced the beginning of the line. Chinese characters can be written in either direction as well as vertically (top to bottom then right to left), especially in signs (such as plaques), but the orientation of the individual characters does not change. This can often be seen on tour buses in China, where the company name customarily runs from the front of
575-419: The text changed direction (but not character orientation) at the end of the lines. Special embossed lines connected the end of a line and the beginning of the next. Around 1990, it changed to a left-to-right orientation. Right-to-left script In a [REDACTED] right-to-left, top-to-bottom script (commonly shortened to right to left or abbreviated RTL , RL-TB or Role ), writing starts from
600-582: The two most common forms. Boustrophedon is a writing style found in ancient Greek inscriptions, in Old Sabaic (an Old South Arabian language) and in Hungarian runes . This method of writing alternates direction, and usually reverses the individual characters, on each successive line. Moon type is an embossed adaptation of the Latin alphabet invented as a tactile alphabet for the blind. Initially
625-612: The vehicle to its rear — that is, from right to left on the right side of the bus, and from left to right on the left side of the bus. English texts on the right side of the vehicle are also quite commonly written in reverse order. (See pictures of tour bus and post vehicle below.) Likewise, other CJK scripts made up of the same square characters, such as the Japanese writing system and Korean writing system , can also be written in any direction, although horizontally left-to-right, top-to-bottom and vertically top-to-bottom right-to-left are
#764235