Misplaced Pages

EBCDIC

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

#964035

56-407: Extended Binary Coded Decimal Interchange Code ( EBCDIC ; / ˈ ɛ b s ɪ d ɪ k / ) is an eight- bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six-bit binary-coded decimal code used with most of IBM's computer peripherals of the late 1950s and early 1960s. It

112-717: A binit as an arbitrary information unit equivalent to some fixed but unspecified number of bits. C0 and C1 control codes#C0 controls C0 codes are the range 00 HEX –1F HEX and the default C0 set was originally defined in ISO 646 ( ASCII ). C1 codes are the range 80 HEX –9F HEX and the default C1 set was originally defined in ECMA-48 (harmonized later with ISO 6429). The ISO/IEC 2022 system of specifying control and graphic characters allows other C0 and C1 sets to be available for specialized applications, but they are rarely used. ASCII defined 32 control characters, plus

168-409: A byte or word , is referred to, it is usually specified by a number from 0 upwards corresponding to its position within the byte or word. However, 0 can refer to either the most or least significant bit depending on the context. Similar to torque and energy in physics; information-theoretic information and data storage size have the same dimensionality of units of measurement , but there

224-509: A unit of information , the bit is also known as a shannon , named after Claude E. Shannon . The symbol for the binary digit is either "bit", per the IEC 80000-13 :2008 standard, or the lowercase character "b", per the IEEE 1541-2002 standard. Use of the latter may create confusion with the capital "B" which is the international standard symbol for the byte. The encoding of data by discrete bits

280-446: A "control picture" for any of these. There is also no well-known variation of Caret notation for them either. Some terminal emulators , including xterm , use OSC sequences for setting the window title and changing the colour palette. They may also support terminating an OSC sequence with BEL instead of ST. Kermit used APC to transmit commands. The ISO/IEC 2022 (ECMA-35) extension mechanism allowed escape sequences to change

336-559: A 7-bit environment, thus it was decided that no alternative character set could use them, and that these codes should be additional control codes, which become known as the C1 control codes . To allow a 7-bit environment to use these new controls, the sequences ESC @ through ESC _ were to be considered equivalent. The later ISO 8859 standards abandoned support for 7-bit codes, but preserved this range of control characters. The first C1 control code set to be registered for use with ISO 2022

392-622: A Belgian bank was still using EBCDIC internally in 2019. A customer insisted that the correct spelling of his surname included an umlaut , which the bank omitted, and the customer filed a complaint citing the guarantee in the General Data Protection Regulation of the right to timely "rectification of inaccurate personal data." The bank's argument included the fact that their system used EBCDIC, as well as that it did not support letters with diacritics (or lower case, for that matter). The appeals court ruled in favor of

448-482: A Bell Labs memo on 9 January 1947 in which he contracted "binary information digit" to simply "bit". A bit can be stored by a digital device or other physical system that exists in either of two possible distinct states . These may be the two stable states of a flip-flop , two positions of an electrical switch , two distinct voltage or current levels allowed by a circuit , two distinct levels of light intensity , two directions of magnetization or polarization ,

504-429: A bit was represented by the polarity of magnetization of a certain area of a ferromagnetic film, or by a change in polarity from one direction to the other. The same principle was later used in the magnetic bubble memory developed in the 1980s, and is still found in various magnetic strip items such as metro tickets and some credit cards . In modern semiconductor memory , such as dynamic random-access memory ,

560-620: A manner specified by IBM's Character Data Representation Architecture (CDRA). Although the default mapping of New Line (NL) corresponds to the ISO/IEC 6429 Next Line (NEL) character (the behaviour of which is also specified, but not required, in Unicode Annex 14), most of these C1-mapped controls match neither those in the ISO/IEC 6429 C1 set , nor those in other registered C1 control sets such as ISO 6630 . Although this effectively makes

616-554: A method so an 8-bit "extended ASCII" code could be converted to a corresponding 7-bit code, and vice versa . In a 7-bit environment, the Shift Out ( SO ) would change the meaning of the 96 bytes 0x20 through 0x7F (i.e. all but the C0 control codes), to be the characters that an 8-bit environment would print if it used the same code with the high bit set. This meant that the range 0x80 through 0x9F could not be printed in

SECTION 10

#1732764890965

672-479: A necessary extra character for the DEL character, 7F HEX or 01111111 BIN (needed to punch out all the holes on a paper tape and erase it). This large number of codes was desirable at the time, as multi-byte controls would require implementation of a state machine in the terminal, which was very difficult with contemporary electronics and mechanical terminals. Only a few codes have maintained their use: BEL, ESC, and

728-404: A time in serial transmission , and by a multiple number of bits in parallel transmission . A bitwise operation optionally processes bits one at a time. Data transfer rates are usually measured in decimal SI multiples of the unit bit per second (bit/s), such as kbit/s. In the earliest non-electronic information processing devices, such as Jacquard's loom or Babbage's Analytical Engine , a bit

784-486: Is in general no meaning to adding, subtracting or otherwise combining the units mathematically, although one may act as a bound on the other. Units of information used in information theory include the shannon (Sh), the natural unit of information (nat) and the hartley (Hart). One shannon is the maximum amount of information needed to specify the state of one bit of storage. These are related by 1 Sh ≈ 0.693 nat ≈ 0.301 Hart. Some authors also define

840-554: Is more compressed—the same bucket can hold more. For example, it is estimated that the combined technological capacity of the world to store information provides 1,300 exabytes of hardware digits. However, when this storage space is filled and the corresponding content is optimally compressed, this only represents 295 exabytes of information. When optimally compressed, the resulting carrying capacity approaches Shannon information or information entropy . Certain bitwise computer processor instructions (such as bit set ) operate at

896-541: Is supported by various non-IBM platforms, such as Fujitsu-Siemens ' BS2000/OSD , OS-IV, MSP, and MSP-EX, the SDS Sigma series , Unisys VS/9 , Unisys MCP and ICL VME . EBCDIC was devised in 1963 and 1964 by IBM and was announced with the release of the IBM System/360 line of mainframe computers . It is an eight-bit character encoding, developed separately from the seven-bit ASCII encoding scheme. It

952-506: The Unix info format and Python 's splitlines string method. The names of some codes were changed in ISO 6429:1992 (or ECMA-48:1991) to be neutral with respect to writing direction. The abbreviations used were not changed, as the standard had already specified that those would remain unchanged when the standard is translated to other languages. In this table both new and old names are shown for

1008-410: The yottabit (Ybit). When the information capacity of a storage system or a communication channel is presented in bits or bits per second , this often refers to binary digits, which is a computer hardware capacity to store binary data ( 0 or 1 , up or down, current or not, etc.). Information capacity of a storage system is only an upper bound to the quantity of information stored therein. If

1064-436: The "Format effector " (FE n ) characters BS, TAB, LF, VT, FF, and CR. Others are unused or have acquired different meanings such as NUL being the C string terminator . Some data transfer protocols such as ANPA-1312 , Kermit , and XMODEM do make extensive use of SOH, STX, ETX, EOT, ACK, NAK and SYN for purposes approximating their original definitions; and some file formats use the "Information Separators" (IS n ) such as

1120-449: The 1940s, computer builders experimented with a variety of storage methods, such as pressure pulses traveling down a mercury delay line , charges stored on the inside surface of a cathode-ray tube , or opaque spots printed on glass discs by photolithographic techniques. In the 1950s and 1960s, these methods were largely supplanted by magnetic storage devices such as magnetic-core memory , magnetic tapes , drums , and disks , where

1176-430: The 8-bit forms of these codes were almost never used. CSI , DCS and OSC are used to control text terminals and terminal emulators , but almost always by using their 7-bit escape code representations. Nowadays if these codes are encountered it is far more likely they are intended to be printing characters from that position of Windows-1252 or Mac OS Roman . Except for NEL Unicode does not provide

SECTION 20

#1732764890965

1232-577: The C0 and C1 sets. The standard C0 control character set shown above is chosen with the sequence ESC ! @ and the above C1 set chosen with the sequence ESC " C . Several official and unofficial alternatives have been defined, but this is pretty much obsolete. Most were forced to retain a good deal of compatibility with the ASCII controls for interoperability. The standard makes ESC, SP and DEL "fixed" coded characters, which are available in their ASCII locations in all encodings that conform to

1288-507: The C0 format controls HT, LF, VT, FF, and CR (note BS is missing); the C0 information separators FS, GS, RS, US (and SP); and the C1 control NEL. The rest of the codes are transparent to Unicode and their meanings are left to higher-level protocols, with ISO/IEC 6429 suggested as a default. Unicode includes many additional format effector characters besides these, such as marks, embeds, isolates and pops for explicit bidirectional formatting, and

1344-576: The EBCDIC character set are made in the 1979 computer game series Zork . In the "Machine Room" in Zork II , EBCDIC is used to imply an incomprehensible language: This is a large room full of assorted heavy machinery, whirring noisily. The room smells of burned resistors. Along one wall are three buttons which are, respectively, round, triangular, and square. Naturally, above these buttons are instructions written in EBCDIC... In 2021, it became public that

1400-605: The EBCDIC variants and how to convert between them is still internally classified top-secret, burn-before-reading. Hackers blanch at the very name of EBCDIC and consider it a manifestation of purest evil. EBCDIC design was also the source of many jokes. One such joke, found in the Unix fortune file of 4.3BSD Reno (1990) went: Professor: "So the American government went to IBM to come up with an encryption standard , and they came up with—" Student: "EBCDIC!" References to

1456-469: The absence of several ASCII punctuation characters fairly important for modern computer languages (exactly which characters are absent varies according to which version of EBCDIC you're looking at). IBM adapted EBCDIC from punched card code in the early 1960s and promulgated it as a customer-control tactic (see connector conspiracy ), spurning the already established ASCII standard. Today, IBM claims to be an open-systems company, but IBM's own description of

1512-407: The absence of use for other purposes), so this mapping is permissible in, but not specified by, Unicode. The following code pages have the full Latin-1 character set (ISO/IEC 8859-1). The first column gives the original code page number. The second column gives the number of the code page updated with the euro sign (€) replacing the universal currency sign (¤) (or in the case of EBCDIC 924, with

1568-409: The ambiguity of relying on the underlying hardware design, the unit octet was defined to explicitly denote a sequence of eight bits. Computers usually manipulate bits in groups of a fixed size, conventionally named " words ". Like the byte, the number of bits in a word also varies with the hardware design, and is typically between 8 and 80 bits, or even more in some specialized computers. In

1624-424: The average. This principle is the basis of data compression technology. Using an analogy, the hardware binary digits refer to the amount of storage space available (like the number of buckets available to store things), and the information content the filling, which comes in different levels of granularity (fine or coarse, that is, compressed or uncompressed information). When the granularity is finer—when information

1680-464: The customer. Bit The bit is the most basic unit of information in computing and digital communication . The name is a portmanteau of binary digit . The bit represents a logical state with one of two possible values . These values are most commonly represented as either " 1 " or " 0 " , but other representations such as true / false , yes / no , on / off , or + / − are also widely used. The relation between these values and

1736-415: The early 21st century, retail personal or server computers have a word size of 32 or 64 bits. The International System of Units defines a series of decimal prefixes for multiples of standardized units which are commonly also used with the bit and the byte. The prefixes kilo (10 ) through yotta (10 ) increment by multiples of one thousand, and the corresponding units are the kilobit (kbit) through

EBCDIC - Misplaced Pages Continue

1792-506: The hardware level, to accelerate translation between character sets. Not all operating systems running on IBM hardware use EBCDIC; IBM AIX , Linux on IBM Z , and Linux on Power all use ASCII, as do all operating systems that run on the IBM Personal Computer and its successors. There were numerous difficulties to writing software that would work in both ASCII and EBCDIC. There are hundreds of EBCDIC code pages based on

1848-791: The integrity of the physical card. While IBM was a chief proponent of the ASCII standardization committee, the company did not have time to prepare ASCII peripherals (such as card punch machines) to ship with its System/360 computers, so the company settled on EBCDIC. The System/360 became wildly successful, together with clones such as RCA Spectra 70 , ICL System 4 , and Fujitsu FACOM, thus so did EBCDIC. All IBM's mainframe operating systems , and its IBM i operating system for midrange computers , use EBCDIC as their inherent encoding (with toleration for ASCII, for example, ISPF in z/OS can browse and edit both EBCDIC and ASCII encoded files). Software can translate to and from encodings, and modern mainframes (such as IBM Z ) include processor instructions, at

1904-519: The invariant subset works only for languages using the ISO basic Latin alphabet , such as English (excluding loanwords and some uncommon orthographic variations) and Dutch (if the "ij" and "IJ" ligatures are written as two characters). Following are the definitions of EBCDIC control characters which either do not map onto the ASCII control characters , or have additional uses. When mapped to Unicode, these are mostly mapped to C1 control character codepoints in

1960-409: The level of manipulating bits rather than manipulating data interpreted as an aggregate of bits. In the 1980s, when bitmapped computer displays became popular, some computers provided specialized bit block transfer instructions to set or copy the bits that corresponded to a given rectangular area on the screen. In most computers and programming languages, when a bit within a group of bits, such as

2016-656: The non-ASCII EBCDIC controls a unique C1 control set, they are not among the C1 control sets registered in the ISO-IR registry, meaning that they do not have an assigned control set designation sequence (as specified by ISO/IEC 2022 , and optionally permitted in ISO/IEC 10646 (Unicode)). Besides U+0085 (Next Line), the Unicode Standard does not prescribe an interpretation of C1 control characters, leaving their interpretation to higher level protocols (it suggests, but does not require, their ISO/IEC 6429 interpretations in

2072-408: The orientation of reversible double stranded DNA , etc. Bits can be implemented in several forms. In most modern computing devices, a bit is usually represented by an electrical voltage or current pulse, or by the electrical state of a flip-flop circuit. For devices using positive logic , a digit value of 1 (or a logical value of true) is represented by a more positive voltage relative to

2128-467: The original EBCDIC character encoding; there are a variety of EBCDIC code pages intended for use in different parts of the world, including code pages for non-Latin scripts such as Chinese, Japanese (e.g., EBCDIC 930, JEF, and KEIS), Korean, and Greek (EBCDIC 875). There is also a huge number of variations with the letters swapped around for no discernible reason. The table below shows the "invariant subset" of EBCDIC, which are characters that should have

2184-443: The physical states of the underlying storage or device is a matter of convention, and different assignments may be used even within the same device or program . It may be physically implemented with a two-state device. A contiguous group of binary digits is commonly called a bit string , a bit vector, or a single-dimensional (or multi-dimensional) bit array . A group of eight bits is called one  byte , but historically

2240-437: The renamed controls (the old name is the one matching the abbreviation). Unicode provides Control Pictures that can replace C0 control characters to make them visible on screen. However caret notation is used more often. Teletype used these for the paper tape reader and the paper tape punch. The first use became the de facto standard for software flow control . In 1973, ECMA-35 and ISO 2022 attempted to define

2296-517: The representation of 0 . Different logic families require different voltages, and variations are allowed to account for component aging and noise immunity. For example, in transistor–transistor logic (TTL) and compatible circuits, digit values 0 and 1 at the output of a device are represented by no higher than 0.4 V and no lower than 2.6 V, respectively; while TTL inputs are specified to recognize 0.8 V or below as 0 and 2.2 V or above as 1 . Bits are transmitted one at

EBCDIC - Misplaced Pages Continue

2352-584: The same assignments on all EBCDIC code pages that use the Latin alphabet. (This includes most of the ISO/IEC 646 invariant repertoire, except the exclamation mark .) It also shows (in gray) missing ASCII and EBCDIC punctuation, located where they are in Code Page 37 (one of the code page variants of EBCDIC). The blank cells are filled with region-specific characters in the variants, but the characters in gray are often swapped around or replaced as well. Like ASCII,

2408-533: The set changed to match ISO 8859-15 ) Different countries have different code pages because these code pages originated as code pages with country-specific character repertoires, and were later expanded to contain the entire ISO 8859-1 repertoire, meaning that a given ISO 8859-1 character may have different code point values in different code pages. They are known as Country Extended Code Pages ( CECP s). Open-source software advocate and software developer Eric S. Raymond writes in his Jargon File that EBCDIC

2464-424: The size of the byte is not strictly defined. Frequently, half, full, double and quadruple words consist of a number of bytes which is a low power of two. A string of four bits is usually a nibble . In information theory , one bit is the information entropy of a random binary variable that is 0 or 1 with equal probability, or the information that is gained when the value of such a variable becomes known. As

2520-443: The standard. It also specifies that if a C0 set included transmission control (TC n ) codes, they must be encoded at their ASCII locations and could not be put in a C1 set, and any new transmission controls must be in a C1 set. Unicode reserves the 65 code points described above for compatibility with the C0 and C1 control codes, giving them the general category Cc (control). These are: Unicode only specifies semantics for

2576-577: The thickness of alternating black and white lines. The bit is not defined in the International System of Units (SI). However, the International Electrotechnical Commission issued standard IEC 60027 , which specifies that the symbol for binary digit should be 'bit', and this should be used in all multiples, such as 'kbit', for kilobit. However, the lower-case letter 'b' is widely used as well and

2632-556: The two possible values of one bit of storage are not equally likely, that bit of storage contains less than one bit of information. If the value is completely predictable, then the reading of that value provides no information at all (zero entropic bits, because no resolution of uncertainty occurs and therefore no information is available). If a computer file that uses n  bits of storage contains only m  <  n  bits of information, then that information can in principle be encoded in about m  bits, at least on

2688-444: The two values of a bit may be represented by two levels of electric charge stored in a capacitor . In certain types of programmable logic arrays and read-only memory , a bit may be represented by the presence or absence of a conducting path at a certain point of a circuit. In optical discs , a bit is encoded as the presence or absence of a microscopic pit on a reflective surface. In one-dimensional bar codes , bits are encoded as

2744-660: Was DIN 31626 , a specialised set for bibliographic use which was registered in 1979. The more common general-use ISO/IEC 6429 set was registered in 1983, although the ECMA-48 specification upon which it was based had been first published in 1976 and JIS X 0211 (formerly JIS C 6323). Symbolic names defined by RFC   1345 and early drafts of ISO 10646, but not in ISO/IEC 6429 ( PAD , HOP and SGC ) are also used. Except for SS2 and SS3 in EUC-JP text, and NEL in text transcoded from EBCDIC ,

2800-451: Was also used in Morse code (1844) and early digital communications machines such as teletypes and stock ticker machines (1870). Ralph Hartley suggested the use of a logarithmic measure of information in 1928. Claude E. Shannon first used the word "bit" in his seminal 1948 paper " A Mathematical Theory of Communication ". He attributed its origin to John W. Tukey , who had written

2856-407: Was created to extend the existing Binary-Coded Decimal (BCD) Interchange Code, or BCDIC , which itself was devised as an efficient means of encoding the two zone and number punches on punched cards into six bits. The distinct encoding of 's' and 'S' (using position 2 instead of 1) was maintained from punched cards where it was desirable not to have hole punches too close to each other to ensure

SECTION 50

#1732764890965

2912-437: Was loathed by hackers, by which he meant members of a subculture of enthusiastic programmers. The Jargon File 4.4.7 gives the following definition: EBCDIC: /eb´s@·dik/, /eb´see`dik/, /eb´k@·dik/, n. [abbreviation, Extended Binary Coded Decimal Interchange Code] An alleged character set used on IBM dinosaurs. It exists in at least six mutually incompatible versions, all featuring such delights as non-contiguous letter sequences and

2968-460: Was often stored as the position of a mechanical lever or gear, or the presence or absence of a hole at a specific point of a paper card or tape . The first electrical devices for discrete logic (such as elevator and traffic light control circuits , telephone switches , and Konrad Zuse's computer) represented bits as the states of electrical relays which could be either "open" or "closed". When relays were replaced by vacuum tubes , starting in

3024-507: Was recommended by the IEEE 1541 Standard (2002) . In contrast, the upper case letter 'B' is the standard and customary symbol for byte. Multiple bits may be expressed and represented in several ways. For convenience of representing commonly reoccurring groups of bits in information technology, several units of information have traditionally been used. The most common is the unit byte , coined by Werner Buchholz in June 1956, which historically

3080-541: Was used in the punched cards invented by Basile Bouchon and Jean-Baptiste Falcon (1732), developed by Joseph Marie Jacquard (1804), and later adopted by Semyon Korsakov , Charles Babbage , Herman Hollerith , and early computer manufacturers like IBM . A variant of that idea was the perforated paper tape . In all those systems, the medium (card or tape) conceptually carried an array of hole positions; each position could be either punched through or not, thus carrying one bit of information. The encoding of text by bits

3136-405: Was used to represent the group of bits used to encode a single character of text (until UTF-8 multibyte encoding took over) in a computer and for this reason it was used as the basic addressable element in many computer architectures . The trend in hardware design converged on the most common implementation of using eight bits per byte, as it is widely used today. However, because of

#964035