Misplaced Pages

OCR-A

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

OCR-A is a font issued in 1966 and first implemented in 1968. A special font was needed in the early days of computer optical character recognition , when there was a need for a font that could be recognized not only by the computers of that day, but also by humans. OCR-A uses simple, thick strokes to form recognizable characters. The font is monospaced (fixed-width), with the printer required to place glyphs 0.254  cm ( 0.10  inch) apart, and the reader required to accept any spacing between 0.2286  cm ( 0.09  inch) and 0.4572  cm ( 0.18  inch).

#34965

74-616: The OCR-A font was standardized by the American National Standards Institute (ANSI) as ANSI X3.17-1981. X3.4 has since become the INCITS and the OCR-A standard is now called ISO 1073-1:1976 . In 1968, American Type Founders produced OCR-A, one of the first optical character recognition typefaces to meet the criteria set by the U.S. Bureau of Standards. The design is simple so that it can be easily read by

148-808: A Debian package from this implementation. In 2008. Luc Devroye corrected the vertical positioning in John Sauter's implementation, and fixed the name of lower case z. Independently, Matthew Skala used mftrace to convert the Metafont definitions to TrueType format in 2006. In 2011 he released a new version created by rewriting the Metafont definitions to work with METATYPE1 , generating outlines directly without an intermediate tracing step. On September 27, 2012, he updated his implementation to version 0.2. In addition to these free implementations of OCR-A, there are also implementations sold by several vendors. As

222-514: A 7-bit or 8-bit environment), but not both. Which style of C1 invocation is used must be specified in the definition of the code version. For example, ISO/IEC 4873 specifies CR bytes for the C1 controls which it uses (SS2 and SS3). If necessary, which invocation is used may be communicated using announcer sequences . In the latter case, single control functions from the C1 control code set are invoked using "type Fe" escape sequences, meaning those where

296-515: A graphical set designation sequence, if the second I byte (for a single-byte set) or the third I byte (for a double-byte set) is 0x20 (space), the set denoted is a " dynamically redefinable character set " (DRCS) defined by prior agreement, which is also considered private use. A graphical set being considered a DRCS implies that it represents a font of exact glyphs, rather than a set of abstract characters. The manner in which DRCS sets and associated fonts are transmitted, allocated and managed

370-496: A joke, Tobias Frere-Jones in 1995 created Estupido-Espezial, a redesign with swashes and a long s . It was used in a "technology"-themed section of Rolling Stone . Maxitype designed the OCR-X typeface—based on the OCR-A typeface with OpenType features, alien/technology-themed dingbats and available in six weights (Thin, Light, Regular, Medium, Bold, Black). Although optical character recognition technology has advanced to

444-555: A line. Furthermore, the escape sequences declaring the national character sets may be absent if a specific ISO-2022-based encoding permits or requires this, and dictates that particular national character sets are to be used. For example, ISO-8859-1 states that no defining escape sequence is needed. To represent large character sets, ISO/IEC 2022 builds on ISO/IEC 646 's property that a seven-bit character representation will normally be able to represent 94 graphic (printable) characters (in addition to space and 33 control characters); if only

518-498: A machine, but it is more difficult for the human eye to read. As metal type gave way to computer-based typesetting, Tor Lillqvist used Metafont to describe the OCR-A font. That definition was subsequently improved by Richard B. Wales. Their work is available from CTAN . To make the free version of the font more accessible to users of Microsoft Windows, John Sauter converted the Metafont definitions to TrueType using potrace and FontForge in 2004. In 2007, Gürkan Sengün created

592-533: A modified 7-bit ASCII set (also known by its ISO-IR number ISO-IR-91) including only uppercase letters, digits, a subset of the punctuation and symbols, and some additional symbols. Codes which are redefined relative to ASCII, as opposed to simply omitted, are listed below: Additionally, the long vertical mark ( [REDACTED] ) is encoded at 0x7C, corresponding to the ASCII vertical bar (|). The following characters have been defined for control purposes and are now in

666-430: A national standards organization. According to Adam Stanton, the first permanent secretary and head of staff in 1919, AESC started as an ambitious program and little else. Staff for the first year consisted of one executive, Clifford B. LePage, who was on loan from a founding member, ASME. An annual budget of $ 7,500 was provided by the founding bodies. In 1931, the organization (renamed ASA in 1928) became affiliated with

740-449: A single byte, regardless of the number of bytes used for graphical characters. CJK encodings used in 7-bit environments which use ISO 2022 mechanisms to switch between character sets are often given names starting with "ISO-2022-", most notably ISO-2022-JP , although some other CJK encodings such as EUC-JP also make use of ISO 2022 mechanisms. Since the first 256 code points of Unicode were taken from ISO 8859-1 , Unicode inherits

814-634: A standard document; however, registration does not create a new ISO standard, does not commit the ISO or IEC to adopt it as an international standard, and does not commit the ISO or IEC to add any of its characters to the Universal Coded Character Set . ISO-IR registered escape sequences are also used encapsulated in a Formal Public Identifier to identify character sets used for numeric character references in SGML (ISO 8879). For example,

SECTION 10

#1732794106035

888-563: A syntax for escape sequences, multiple-byte sequences beginning with the ESC control code, which can likewise be used for in-band instructions. Specific sets of control codes and escape sequences designed to be used with ISO 2022 include ISO/IEC 6429 , portions of which are implemented by ANSI.SYS and terminal emulators . ISO 2022 itself also defines particular control codes and escape sequences which can be used for switching between different coded character sets (for example, between ASCII and

962-430: Is Unicode , also known as ISO 10646 . Unicode contains ASCII and has special provisions for OCR characters, so some implementations of OCR-A have looked to Unicode for guidance on character code assignments. The ISO standard ISO 2033 :1983, and the corresponding Japanese Industrial Standard JIS X 9010:1984 (originally JIS C 6229-1984), define character encodings for OCR-A, OCR-B and E-13B . For OCR-A, they define

1036-642: Is copyright infringement for them to be provided to the public by others free of charge. These assertions have been the subject of criticism and litigation. ANSI was most likely formed in 1918, when five engineering societies and three government agencies founded the American Engineering Standards Committee ( AESC ). In 1928, the AESC became the American Standards Association ( ASA ). In 1966,

1110-567: Is a private nonprofit organization that oversees the development of voluntary consensus standards for products, services, processes, systems, and personnel in the United States. The organization also coordinates U.S. standards with international standards so that American products can be used worldwide. ANSI accredits standards that are developed by representatives of other standards organizations , government agencies , consumer groups , companies, and others. These standards ensure that

1184-810: Is an ISO / IEC standard in the field of character encoding . It is equivalent to the ECMA standard ECMA-35 , the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202 . Originating in 1971, it was most recently revised in 1994. ISO 2022 specifies a general structure which character encodings can conform to, dedicating particular ranges of bytes ( 0x 00–1F and 0x7F–9F) to be used for non-printing control codes for formatting and in-band instructions (such as line breaks or formatting instructions for text terminals ), rather than graphical characters . It also specifies

1258-442: Is funded by the sale of publications, membership dues and fees, accreditation services, fee-based programs, and international standards programs. Many ANSI regulations are incorporated by reference into United States federal statutes (i.e. by OSHA regulations referring to individual ANSI specifications). ANSI does not make these standards publicly available, and charges money for access to these documents; it further claims that it

1332-659: Is in turn conformed to by ISO/IEC 8859 , and Extended Unix Code , which is used for East Asian languages. More specialised applications of ISO 2022 include the MARC-8 encoding system used in MARC 21 library records. The escape sequences for switching to particular character sets or encodings are registered with the ISO-IR registry (except for those set apart for private use, the meanings of which are defined by vendors, or by protocol specifications such as ARIB STD-B24 ) and follow

1406-419: Is not stipulated by ISO/IEC 2022 / ECMA-35 itself, although it recommends allocating them sequentially starting with F byte 0x40 ( @ ); however, a manner for transmitting DRCS fonts is defined within some telecommunication protocols such as World System Teletext . There are also three special cases for multi-byte codes. The code sequences ESC $ @ , ESC $ A , and ESC $ B were all registered when

1480-560: Is required that any C0 character set include the ESC character at position 0x1B, so that further changes are possible. The control set designation sequences (as opposed to the graphical set ones) may also be used from within ISO/IEC 10646 (UCS/Unicode), in contexts where processing ANSI escape codes is appropriate, provided that each byte in the sequence is padded to the code unit size of the encoding. A table of escape sequence I bytes and

1554-463: Is the difficulty of balancing "the interests of both the nation's industrial and commercial sectors and the nation as a whole." Although ANSI itself does not develop standards, the Institute oversees the development and use of standards by accrediting the procedures of standards developing organizations. ANSI accreditation signifies that the procedures used by standards developing organizations meet

SECTION 20

#1732794106035

1628-478: Is used for the subtitles in films and television series such as Blacklist and for the main titles in The Pretender . Additionally, OCR-A is used for the films Crimson Tide and 13 Hours: The Secret Soldiers of Benghazi . A font is a set of character shapes, or glyphs . For a computer to use a font, each glyph must be assigned a code point in a character set . When OCR-A was being standardized

1702-586: The "Optical Character Recognition" Unicode range 2440–245F : All implementations of OCR-A use U+0020 for space, U+0030 through U+0039 for the decimal digits, U+0041 through U+005A for the unaccented upper case letters, and U+0061 through U+007A for the unaccented lower case letters. In addition to the digits and unaccented letters, many of the characters of OCR-A have obvious code points in ASCII. Of those that do not, most, including all of OCR-A's accented letters, have obvious code points in Unicode. Linotype coded

1776-692: The English alphabet ), and does not provide good support for languages which use additional letters, or which use a different writing system altogether. Other writing systems with relatively few characters, such as Greek , Cyrillic , Arabic or Hebrew , as well as forms of the Latin script using diacritics or letters absent from the ISO Basic Latin alphabet, have historically been represented on personal computers with different 8- bit , single byte , extended ASCII encodings, which follow ASCII when

1850-537: The VT100 , and are thus supported by terminal emulators . By default, GL codes specify G0 characters and GR codes (where available) specify G1 characters; this may be otherwise specified by prior agreement. The set invoked over each area may also be modified with control codes referred to as shifts, as shown in the table below. An 8-bit code may have GR codes specifying G1 characters, i.e. with its corresponding 7-bit code using Shift In and Shift Out to switch between

1924-616: The most significant bit is 0 (i.e. bytes 0x00–7F, when represented in hexadecimal ), and include additional characters for a most significant bit of 1 (i.e. bytes 0x80–FF). Some of these, such as the ISO 8859 series, conform to ISO 2022, while others such as DOS code page 437 do not, usually due to not reserving the bytes 0x80–9F for control codes. Certain East Asian languages, specifically Chinese , Japanese , and Korean (collectively " CJK "), are written using far more characters than

1998-495: The 0x20/A0 and 0x7F/FF bytes are actually assigned by the set; some examples of graphical character sets which are registered as 96-sets but do not use those bytes include the G1 set of I.S. 434 , the box drawing set from ISO/IEC 10367 , and ISO-IR-164 (a subset of the G1 set of ISO-8859-8 with only the letters, used by CCITT ). Characters are expected to be spacing characters, not combining characters, unless specified otherwise by

2072-671: The ASA was reorganized and became United States of America Standards Institute ( USASI ). The present name was adopted in 1969. Prior to 1918, these five founding engineering societies: had been members of the United Engineering Society (UES). At the behest of the AIEE, they invited the U.S. government Departments of War, Navy (combined in 1947 to become the Department of Defense or DOD) and Commerce to join in founding

2146-454: The C0 control codes (narrowly defined) are excluded, this can be expanded to 96 characters. Using two bytes, it is thus possible to represent up to 8,836 (94×94) characters; and, using three bytes, up to 830,584 (94×94×94) characters. Though the standard defines it, no registered character set uses three bytes (although EUC-TW 's unregistered G2 does, as does the similarly unregistered CCCII ). For

2220-589: The C0 set, besides the ten included by ISO 6429 / ECMA-48 (namely SOH, STX, ETX, EOT, ENQ, ACK, DLE, NAK, SYN and ETB), or inclusion of any of those ten in the C1 set, is also prohibited by the ISO/IEC 2022 / ECMA-35 standard. A C0 control set is invoked over the CL range 0x00 through 0x1F, whereas a C1 control function may be invoked over the CR range 0x80 through 0x9F (in an 8-bit environment) or by using escape sequences (in

2294-467: The CR range always either invokes the secondary (C1) controls or is unused. The delete character DEL (0x7F), the escape character ESC (0x1B) and the space character SP (0x20) are designated "fixed" coded characters and are always available when G0 is invoked over GL, irrespective of what character sets are designated. They may not be included in graphical character sets, although other sizes or types of whitespace character may be. Sequences using

OCR-A - Misplaced Pages Continue

2368-448: The ESC (escape) character take the form ESC [ I ...] F , where the ESC character is followed by zero or more intermediate bytes ( I ) from the range 0x20–0x2F, and one final byte ( F ) from the range 0x30–0x7E. The first I byte, or absence thereof, determines the type of escape sequence; it might, for instance, designate a working set, or denote a single control function. In all types of escape sequences, F bytes in

2442-566: The ESC (escape) control character at 0x1B (a C0 set containing only ESC is registered as ISO-IR-104), whereas a C1 control set may not contain the escape control whatsoever. Hence, they are entirely separate registrations, with a C0 set being only a C0 set and a C1 set being only a C1 set. If codes from the C0 set of ISO 6429 / ECMA-48, i.e. the ASCII control codes , appear in the C0 set, they are required to appear at their ISO 6429 / ECMA-48 locations. Inclusion of transmission control characters in

2516-542: The ESC control character is followed by a byte from columns 04 or 05 (that is to say, ESC 0x40 (@) through ESC 0x5F (_) ). Additional control functions are assigned to "type Fs" escape sequences (in the range ESC 0x60 (`) through ESC 0x7E (~) ); these have permanently assigned meanings rather than depending on the C0 or C1 designations. Registration of control functions to type "Fs" sequences must be approved by ISO/IEC JTC 1/SC 2 . Other single control functions may be registered to type "3Ft" escape sequences (in

2590-556: The ISO and the IEC, and administers many key committees and subgroups. In many instances, U.S. standards are taken forward to ISO and IEC, through ANSI or the USNC, where they are adopted in whole or in part as international standards. Adoption of ISO and IEC standards as American standards increased from 0.2% in 1986 to 15.5% in May 2012. The Institute administers nine standards panels: Each of

2664-566: The ISO-IR registry is specified by ISO/IEC 2375 . Each registration receives a unique escape sequence, and a unique registry entry number to identify it. For example, the CCITT character set for Simplified Chinese is known as ISO-IR-165 . Registration of coded character sets with the ISO-IR registry identifies the documents specifying the character set or control function associated with an ISO/IEC 2022 non‑private-use escape sequence. This may be

2738-435: The ISO/IEC 2022 / ECMA-35 standard itself. They may be described elsewhere using hexadecimal , as is often used in this article, or using the corresponding ASCII characters, although the escape sequences are actually defined in terms of byte values, and the graphic assigned to that byte value may be altered without affecting the control sequence. Byte values from the 7-bit ASCII graphic range (hexadecimal 0x20–0x7F), being on

2812-564: The Japanese JIS X 0208 ) so as to use multiple in a single document, effectively combining them into a single stateful encoding (a feature less important since the advent of Unicode ). It is designed to be usable in both 8-bit environments and 7-bit environments (those where only seven bits are usable in a byte, such as e-mail without 8BITMIME ). The ASCII character set supports the ISO Basic Latin alphabet (equivalent to

2886-693: The U.S. National Committee of the International Electrotechnical Commission ( IEC ), which had been formed in 1904 to develop electrical and electronics standards. ANSI's members are government agencies, organizations, academic and international bodies, and individuals. In total, the Institute represents the interests of more than 270,000 companies and organizations and 30 million professionals worldwide. ANSI's market-driven, decentralized approach has been criticized in comparison with more planned and organized international approaches to standardization. An underlying issue

2960-514: The adoption of international standards as national standards where appropriate. The institute is the official U.S. representative to the two major international standards organizations, the International Organization for Standardization (ISO), as a founding member, and the International Electrotechnical Commission (IEC), via the U.S. National Committee (USNC). ANSI participates in almost the entire technical program of both

3034-542: The basis that it leaves the graphical character repertoire undefined. ISO/IEC 4873 / ECMA-43 does, however, permit the use of the GCC function provided that the sequence of characters is kept the same and merely displayed in one space, rather than being over-stamped to form a character with a different meaning. Control character sets are classified as "primary" or "secondary" control code sets, respectively also called "C0" and "C1" control code sets. A C0 control set must contain

OCR-A - Misplaced Pages Continue

3108-575: The characteristics and performance of products are consistent, that people use the same definitions and terms, and that products are tested the same way. ANSI also accredits organizations that carry out product or personnel certification in accordance with requirements defined in international standards. The organization's headquarters are in Washington, D.C. ANSI's operations office is located in New York City. The ANSI annual operating budget

3182-503: The concept of C0 and C1 control codes from ISO 2022, although it adds other non-printing characters besides the ISO 2022 control codes. However, Unicode transformation formats such as UTF-8 generally deviate from the ISO 2022 structure in various ways, including: ISO 2022 escape sequences do, however, exist for switching to and from UTF-8 as a " coding system different from that of ISO 2022 ", which are supported by certain terminal emulators such as xterm . ISO/IEC 2022 specifies

3256-513: The contemporary version of the standard allowed multi-byte sets only in G0, so must be accepted in place of the sequences ESC $ ( @ through ESC $ ( B to designate to the G0 character set. There are additional (rarely used) features for switching control character sets, but this is a single-level lookup, in that (as noted above) the C0 set is always invoked over CL, and the C1 set is always invoked over CR or by using escape codes. As noted above, it

3330-434: The designation or other function which they perform is below. Note that the registry of F bytes is independent for the different types. The 94-character graphic set designated by ESC ( A through ESC + A is not related in any way to the 96-character set designated by ESC - A through ESC / A . And neither of those is related to the 94 -character set designated by ESC $ ( A through ESC $ + A , and so on;

3404-516: The escape sequences listed below, whereas the others are part of a C0 or C1 control code set (as shown below, SI (LS0) and SO (LS1) are C0 controls and SS2 and SS3 are C1 controls), meaning that their coding and availability may vary depending on which control sets are designated: they must be present in the designated control sets if their functionality is used. The C1 controls themselves, as mentioned above, may be represented using escape sequences or 8-bit bytes, but not both. Alternative encodings of

3478-678: The following non-standard code points: The Barcodesoft implementation of OCR-A has the following non-standard code points: The Morovia implementation of OCR-A has the following non-standard code points: The IDAutomation implementation of OCR-A has the following non-standard code points: The MS-DOS OCR-A encoding is code page 876 . Characters not in Unicode: American National Standards Institute The American National Standards Institute ( ANSI / ˈ æ n s i / AN -see )

3552-683: The following: A specific implementation does not have to implement all of the standard; the conformance level and the supported character sets are defined by the implementation. Although many of the mechanisms defined by the ISO/IEC 2022 standard are infrequently used, several established encodings are based on a subset of the ISO/IEC 2022 system. In particular, 7-bit encoding systems using ISO/IEC 2022 mechanisms include ISO-2022-JP (or JIS encoding ), which has primarily been used in Japanese-language e-mail . 8-bit encoding systems conforming to ISO/IEC 2022 include ISO/IEC 4873 (ECMA-43), which

3626-490: The form ESC ( ! F have been assigned. At the other extreme, no multibyte 96-sets have been registered, so the sequences below are strictly theoretical. As with other escape sequence types, the range 0x30–0x3F is reserved for private-use F bytes, in this case for private-use character set definitions (which might include unregistered sets defined by protocols such as ARIB STD-B24 or MARC-8 , or vendor-specific sets such as DEC Special Graphics ). However, in

3700-492: The graphical set in question. ISO 2022 / ECMA-35 also recognizes the use of the backspace and carriage return control characters as means of combining otherwise spacing characters, as well as the CSI sequence "Graphic Character Combination" (GCC) ( CSI 0x20 (SP) 0x5F (_) ). Use of the backspace and carriage return in this manner is permitted by ISO/IEC 646 but prohibited by ISO/IEC 4873 / ECMA-43 and by ISO/IEC 8859 , on

3774-442: The institute's requirements for openness, balance, consensus, and due process. ANSI also designates specific standards as American National Standards, or ANS, when the Institute determines that the standards were developed in an environment that is equitable, accessible and responsive to the requirements of various stakeholders. Voluntary consensus standards quicken the market acceptance of products while making clear how to improve

SECTION 50

#1732794106035

3848-419: The left side of a character code table, are referred to as "GL" codes (with "GL" standing for "graphics left") while bytes from the "high ASCII" range (0xA0–0xFF), if available (i.e. in an 8-bit environment), are referred to as the "GR" codes ("graphics right") . The terms "CL" (0x00–0x1F) and "CR" (0x80–0x9F) are defined for the control ranges, but the CL range always invokes the primary (C0) controls, whereas

3922-537: The maximum of 256 which can be represented in a single byte, and were first represented on computers with language-specific double-byte encodings or variable-width encodings ; some of these (such as the Simplified Chinese encoding GB 2312 ) conform to ISO 2022 , while others (such as the Traditional Chinese encoding Big5 ) do not. Control codes in ISO 2022 are always represented with

3996-656: The panels works to identify, coordinate, and harmonize voluntary standards relevant to these areas. In 2009, ANSI and the National Institute of Standards and Technology (NIST) formed the Nuclear Energy Standards Coordination Collaborative (NESCC). NESCC is a joint initiative to identify and respond to the current need for standards in the nuclear industry. ISO-IR ISO/IEC 2022 Information technology—Character code structure and extension techniques ,

4070-408: The patterns defined within the standard. Character encodings making use of these escape sequences require data to be processed sequentially in a forward direction, since the correct interpretation of the data depends on previously encountered escape sequences. Specific profiles such as ISO-2022-JP may impose extra conditions, such as that the current character set is reset to US-ASCII before the end of

4144-399: The point where such simple fonts are no longer necessary, the OCR-A font has remained in use. Its usage remains widespread in the encoding of checks around the world. Some lock box companies still insist that the account number and amount owed on a bill return form be printed in OCR-A. Also, because of its unusual look, it is sometimes used in advertising and display graphics. Notably, it

4218-777: The range ESC 0x23 (#) [ I ...] 0x40 (@) through ESC 0x23 (#) [ I ...] 0x7E (~) ), although no "3Ft" sequences are currently assigned (as of 2019). Some of these are specified in ECMA-35 (ISO 2022 / ANSI X3.41), others in ECMA-48 (ISO 6429 / ANSI X3.64). ECMA-48 refers to these as "independent control functions". Escape sequences of type "Fp" ( ESC 0x30 (0) through ESC 0x3F (?) ) or of type "3Fp" ( ESC 0x23 (#) [ I ...] 0x30 (0) through ESC 0x23 (#) [ I ...] 0x3F (?) ) are reserved for single private use control codes, by prior agreement between parties. Several such sequences of both types are used by DEC terminals such as

4292-446: The range 0x20–0x2F, then by a single byte in the range 0x40–0x7E, the entire sequence being called a "control sequence". Each of the four working sets G0 through G3 may be a 94-character set or a 94 -character multi-byte set . Additionally, G1 through G3 may be a 96- or 96 -character set. In a 96- or 96 -character set, the bytes 0x20 through 0x7F when GL-invoked, or 0xA0 through 0xFF when GR-invoked, are allocated to and may be used by

4366-427: The range 0x30–0x3F are reserved for unregistered private uses defined by prior agreement between parties. Control functions from some sets may make use of further bytes following the escape sequence proper. For example, the ISO 6429 control function " Control Sequence Introducer ", which can be represented using an escape sequence, is followed by zero or more bytes in the range 0x30–0x3F, then zero or more bytes in

4440-451: The remaining characters of OCR-A as follows: The fonts that descend from the work of Tor Lillqvist and Richard B. Wales define four characters not in OCR-A to fill out the ASCII character set. These shapes use the same style as the OCR-A character shapes. They are: Linotype also defines additional characters. Some implementations do not use the above code point assignments for some characters. The PrecisionID implementation of OCR-A has

4514-507: The safety of those products for the protection of consumers. There are approximately 9,500 American National Standards that carry the ANSI designation. The American National Standards process involves: In addition to facilitating the formation of standards in the United States, ANSI promotes the use of U.S. standards internationally, advocates U.S. policy and technical positions in international and regional standards organizations, and encourages

SECTION 60

#1732794106035

4588-669: The same pair of C0 control characters (0x0F and 0x0E) as the names "shift in" (SI) and "shift out" (SO). However, the standard refers to them as LS0 and LS1 when they are used in 8-bit environments and as SI and SO when they are used in 7-bit environments. The ISO/IEC 2022 / ECMA-35 standard permits, but discourages, invoking G1, G2 or G3 in both GL and GR simultaneously. The ISO International register of coded character sets to be used with escape sequences (ISO-IR) lists graphical character sets, control code sets, single control codes and so forth which have been registered for use with ISO/IEC 2022. The procedure for registering codes and sets with

4662-440: The set is single-byte or multi-byte (although not how many bytes it uses if it is multi-byte), and also whether each byte has 94 or 96 permitted values. ISO/IEC 2022 coding specifies a two-layer mapping between character codes and displayed characters. Escape sequences allow any of a large registry of graphic character sets to be "designated" into one of four working sets, named G0 through G3, and shorter control sequences specify

4736-455: The set. In a 94- or 94 -character set, the bytes 0x20 and 0x7F are not used. When a 96- or 96 -character set is invoked in the GL region, the space and delete characters (codes 0x20 and 0x7F) are not available until a 94- or 94 -character set (such as the G0 set) is invoked in GL. 96-character sets cannot be designated to G0. Registration of a set as a 96-character set does not necessarily mean that

4810-424: The sets (e.g. JIS X 0201 ), although some instead have GR codes specifying G2 characters, with the corresponding 7-bit code using a single-shift code to access the second set (e.g. T.51 ). The codes shown in the table below are the most common encodings of these control codes, conforming to ISO/IEC 6429 . The LS2, LS3, LS1R, LS2R and LS3R shifts are registered as single control functions and are always encoded as

4884-403: The single-shift area. This must be specified in the definition of the code version. For instance, ISO/IEC 4873 specifies GL, whereas packed EUC specifies GR. In 7-bit environments, only GL is used as the single-shift area. If necessary, which single-shift area is used may be communicated using announcer sequences . The names "locking shift zero" (LS0) and "locking shift one" (LS1) refer to

4958-468: The single-shifts as C0 control codes are available in certain control code sets. For example, SS2 and SS3 are usually available at 0x19 and 0x1D respectively in T.51 and T.61 . This coding is currently recommended by ISO/IEC 2022 / ECMA-35 for applications requiring 7-bit single-byte representations of SS2 and SS3, and may also be used for SS2 only, although older code sets with SS2 at 0x1C also exist, and were mentioned as such in an earlier edition of

5032-521: The standard. The 0x8E and 0x8F coding of the single shifts as shown below is mandatory for ISO/IEC 4873 levels 2 and 3. Although officially considered shift codes and named accordingly, single-shift codes are not always viewed as shifts, and they may simply be viewed as prefix bytes (i.e. the first bytes in a multi-byte sequence), since they do not require the encoder to keep the currently active set as state , unlike locking shift codes. In 8-bit environments, either GL or GR, but not both, may be used as

5106-542: The string ISO 646-1983//CHARSET International Reference Version (IRV)//ESC 2/5 4/0 can be used to identify the International Reference Version of ISO 646 -1983, and the HTML 4.01 specification uses ISO Registration Number 177//CHARSET ISO/IEC 10646-1:1993 UCS-4 with implementation level 3//ESC 2/5 2/15 4/6 to identify Unicode. The textual representation of the escape sequence, included in

5180-458: The third element of the FPI, will be recognised by SGML implementations for supported character sets. Escape sequences to designate character sets take the form ESC I [ I ...] F . As mentioned above, the intermediate ( I ) bytes are from the range 0x20–0x2F, and the final ( F ) byte is from the range 0x30–0x7E. The first I byte (or, for a multi-byte set, the first two) identifies

5254-424: The two-byte character sets, the code point of each character is normally specified in so-called row-cell or kuten form, which comprises two numbers between 1 and 94 inclusive, specifying a row and cell of that character within the zone. For a three-byte set, an additional plane number is included at the beginning. The escape sequences do not only declare which character set is being used, but also whether

5328-423: The type of character set and the working set it is to be designated to, whereas the F byte (and any additional I bytes) identify the character set itself, as assigned in the ISO-IR register (or, for the private-use escape sequences, by prior agreement). Additional I bytes may be added before the F byte to extend the F byte range. This is currently only used with 94-character sets, where codes of

5402-588: The usual character coding was the American Standard Code for Information Interchange or ASCII. Not all of the glyphs of OCR-A fit into ASCII, and for five of the characters there were alternate glyphs, which might have suggested the need for a second font. However, for convenience and efficiency all of the glyphs were expected to be accessible in a single font using ASCII coding, with the additional characters placed at coding points that would otherwise have been unused. The modern descendant of ASCII

5476-410: The working set that is "invoked" to interpret bytes in the stream. Encoding byte values ("bit combinations") are often given in column-line notation , where two decimal numbers in the range 00–15 (each corresponding to a single hexadecimal digit) are separated by a slash. Hence, for instance, codes 2/0 (0x20) through 2/15 (0x2F) inclusive may be referred to as "column 02". This is the notation used in

#34965