Electronics is a scientific and engineering discipline that studies and applies the principles of physics to design, create, and operate devices that manipulate electrons and other electrically charged particles . It is a subfield of physics and electrical engineering which uses active devices such as transistors , diodes , and integrated circuits to control and amplify the flow of electric current and to convert it from one form to another, such as from alternating current (AC) to direct current (DC) or from analog signals to digital signals.
60-495: [REDACTED] Look up ocr in Wiktionary, the free dictionary. OCR may refer to: Science and technology [ edit ] Optical character recognition , conversion of images of text into characters Organically moderated and cooled reactor , a type of nuclear reactor Oxidizable carbon ratio dating , a method of absolute dating Transvaginal oocyte retrieval ,
120-464: A dropout color which can be easily removed by the OCR system. Palm OS used a special set of glyphs, known as Graffiti , which are similar to printed English characters but simplified or modified for easier recognition on the platform's computationally limited hardware. Users would need to learn how to write these special glyphs. Zone-based OCR restricts the image to a specific part of a document. This
180-432: A mass-production basis, which limited them to a number of specialised applications. The MOSFET was invented at Bell Labs between 1955 and 1960. It was the first truly compact transistor that could be miniaturised and mass-produced for a wide range of uses. Its advantages include high scalability , affordability, low power consumption, and high density . It revolutionized the electronics industry , becoming
240-452: A "Statistical Machine" for searching microfilm archives using an optical code recognition system. In 1931, he was granted US Patent number 1,838,389 for the invention. The patent was acquired by IBM . In 1974, Ray Kurzweil started the company Kurzweil Computer Products, Inc. and continued development of omni- font OCR, which could recognize text printed in virtually any font. (Kurzweil is often credited with inventing omni-font OCR, but it
300-439: A character error rate of 1% (99% accuracy) may result in an error rate of 5% or worse if the measurement is based on whether each whole word was recognized with no incorrect letters. Using a large enough dataset is important in a neural-network-based handwriting recognition solutions. On the other hand, producing natural datasets is very complicated and time-consuming. An example of the difficulties inherent in digitizing old text
360-491: A font designed to simplify character recognition OCR-B Topics referred to by the same term [REDACTED] This disambiguation page lists articles associated with the title OCR . If an internal link led you here, you may wish to change the link to point directly to the intended article. Retrieved from " https://en.wikipedia.org/w/index.php?title=OCR&oldid=1253619333 " Category : Disambiguation pages Hidden categories: Short description
420-469: A list of optical character recognition software, see Comparison of optical character recognition software . OCR accuracy can be increased if the output is constrained by a lexicon – a list of words that are allowed to occur in a document. This might be, for example, all the words in the English language, or a more technical lexicon for a specific field. This technique can be problematic if
480-566: A mix of the two types. Analog circuits are becoming less common, as many of their functions are being digitized. Analog circuits use a continuous range of voltage or current for signal processing, as opposed to the discrete levels used in digital circuits. Analog circuits were common throughout an electronic device in the early years in devices such as radio receivers and transmitters. Analog electronic computers were valuable for solving problems with continuous variables until digital processing advanced. As semiconductor technology developed, many of
540-559: A noun, for example, allowing greater accuracy. The Levenshtein Distance algorithm has also been used in OCR post-processing to further optimize results from an OCR API. In recent years, the major OCR technology providers began to tweak OCR systems to deal more efficiently with specific types of input. Beyond an application-specific lexicon, better performance may be had by taking into account business rules, standard expression, or rich information contained in color images. This strategy
600-540: A physical space, although in more recent years the trend has been towards electronics lab simulation software , such as CircuitLogix , Multisim , and PSpice . Today's electronics engineers have the ability to design circuits using premanufactured building blocks such as power supplies , semiconductors (i.e. semiconductor devices, such as transistors), and integrated circuits. Electronic design automation software programs include schematic capture programs and printed circuit board design programs. Popular names in
660-399: A ranked list of candidate characters. Software such as Cuneiform and Tesseract use a two-pass approach to character recognition. The second pass is known as adaptive recognition and uses the letter shapes recognized with high confidence on the first pass to better recognize the remaining letters on the second pass. This is advantageous for unusual fonts or low-quality scans where the font
SECTION 10
#1732772408719720-411: A single character) – are still the subject of active research. The MNIST database is commonly used for testing systems' ability to recognize handwritten digits. Accuracy rates can be measured in several ways, and how they are measured can greatly affect the reported accuracy rate. For example, if word context (a lexicon of words) is not used to correct software finding non-existent words,
780-612: A technique used in vitro fertilization Oil control ring, a piston ring Over consolidation ratio, a consolidation measurement in geotechnical engineering Offices of civil rights [ edit ] Office for Civil Rights , U.S. Department of Education State Office of Civil Rights, United States Department of State GSA Office of Civil Rights, General Services Administration HHS Office for Civil Rights, United States Department of Health and Human Services DOJ Office for Civil Rights, Office of Justice Programs Economics [ edit ] Official cash rate ,
840-451: A time. Advanced systems capable of producing a high degree of accuracy for most fonts are now common, and with support for a variety of image file format inputs. Some systems are capable of reproducing formatted output that closely approximates the original page including images, columns, and other non-textual components. Early optical character recognition may be traced to technologies involving telegraphy and creating reading devices for
900-479: Is a common method of digitizing printed texts so that they can be electronically edited, searched, stored more compactly, displayed online, and used in machine processes such as cognitive computing , machine translation , (extracted) text-to-speech , key data and text mining . OCR is a field of research in pattern recognition , artificial intelligence and computer vision . Early versions needed to be trained with images of each character, and worked on one font at
960-428: Is accomplished relatively simply by aligning the image to a uniform grid based on where vertical grid lines will least often intersect black areas. For proportional fonts , more sophisticated techniques are needed because whitespace between letters can sometimes be greater than that between words, and vertical lines can intersect more than one character. There are two basic types of core OCR algorithm, which may produce
1020-548: Is called "Application-Oriented OCR" or "Customized OCR", and has been applied to OCR of license plates , invoices , screenshots , ID cards , driver's licenses , and automobile manufacturing . The New York Times has adapted the OCR technology into a proprietary tool they entitle Document Helper , that enables their interactive news team to accelerate the processing of documents that need to be reviewed. They note that it enables them to process what amounts to as many as 5,400 pages per hour in preparation for reporters to review
1080-548: Is defined as unwanted disturbances superposed on a useful signal that tend to obscure its information content. Noise is not the same as signal distortion caused by a circuit. Noise is associated with all electronic circuits. Noise may be electromagnetically or thermally generated, which can be decreased by lowering the operating temperature of the circuit. Other types of noise, such as shot noise cannot be removed as they are due to limitations in physical properties. Many different methods of connecting components have been used over
1140-417: Is different from Wikidata All article disambiguation pages All disambiguation pages Optical character recognition Optical character recognition or optical character reader ( OCR ) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo (for example
1200-413: Is distorted (e.g. blurred or faded). As of December 2016 , modern OCR software includes Google Docs OCR, ABBYY FineReader , and Transym. Others like OCRopus and Tesseract use neural networks which are trained to recognize whole lines of text instead of focusing on single characters. A technique known as iterative OCR automatically crops a document into sections based on the page layout. OCR
1260-411: Is generally an offline process, which analyses a static document. There are cloud based services which provide an online OCR API service. Handwriting movement analysis can be used as input to handwriting recognition . Instead of merely using the shapes of glyphs and words, this technique is able to capture motion, such as the order in which segments are drawn, the direction, and the pattern of putting
SECTION 20
#17327724087191320-486: Is often referred to as Template OCR . Crowdsourcing humans to perform the character recognition can quickly process images like computer-driven OCR, but with higher accuracy for recognizing images than that obtained via computers. Practical systems include the Amazon Mechanical Turk and reCAPTCHA . The National Library of Finland has developed an online interface for users to correct OCRed texts in
1380-434: Is the inability of OCR to differentiate between the " long s " and "f" characters. Web-based OCR systems for recognizing hand-printed text on the fly have become well known as commercial products in recent years (see Tablet PC history ). Accuracy rates of 80% to 90% on neat, clean hand-printed characters can be achieved by pen computing software, but that accuracy rate still translates to dozens of errors per page, making
1440-595: Is the voltage comparator which receives a continuous range of voltage but only outputs one of two levels as in a digital circuit. Similarly, an overdriven transistor amplifier can take on the characteristics of a controlled switch , having essentially two levels of output. Analog circuits are still widely used for signal amplification, such as in the entertainment industry, and conditioning signals from analog sensors, such as in industrial measurement and control. Digital circuits are electric circuits based on discrete voltage levels. Digital circuits use Boolean algebra and are
1500-572: Is then performed on each section individually using variable character confidence level thresholds to maximize page-level OCR accuracy. A patent from the United States Patent Office has been issued for this method. The OCR result can be stored in the standardized ALTO format, a dedicated XML schema maintained by the United States Library of Congress . Other common formats include hOCR and PAGE XML. For
1560-448: Is therefore the process of defining and developing complex electronic devices to satisfy specified requirements of the user. Due to the complex nature of electronics theory, laboratory experimentation is an important part of the development of electronic devices. These experiments are used to test or verify the engineer's design and detect errors. Historically, electronics labs have consisted of electronics devices and equipment located in
1620-454: The Amount line of a check (which is always a written-out number) is an example where using a smaller dictionary can increase recognition rates greatly. The shapes of individual cursive characters themselves simply do not contain enough information to accurately (greater than 98%) recognize all handwritten cursive script. Most programs allow users to set "confidence rates". This means that if
1680-624: The IBM 608 was the first IBM product to use transistor circuits without any vacuum tubes and is believed to be the first all-transistorized calculator to be manufactured for the commercial market. The 608 contained more than 3,000 germanium transistors. Thomas J. Watson Jr. ordered all future IBM products to use transistors in their design. From that time on transistors were almost exclusively used for computer logic circuits and peripheral devices. However, early junction transistors were relatively bulky devices that were difficult to manufacture on
1740-448: The 1960s, U.S. manufacturers were unable to compete with Japanese companies such as Sony and Hitachi who could produce high-quality goods at lower prices. By the 1980s, however, U.S. manufacturers became the world leaders in semiconductor development and assembly. However, during the 1990s and subsequently, the industry shifted overwhelmingly to East Asia (a process begun with the initial movement of microchip mass-production there in
1800-538: The 1970s), as plentiful, cheap labor, and increasing technological sophistication, became widely available there. Over three decades, the United States' global share of semiconductor manufacturing capacity fell, from 37% in 1990, to 12% in 2022. America's pre-eminent semiconductor manufacturer, Intel Corporation , fell far behind its subcontractor Taiwan Semiconductor Manufacturing Company (TSMC) in manufacturing technology. By that time, Taiwan had become
1860-469: The 2000s, OCR was made available online as a service (WebOCR), in a cloud computing environment, and in mobile applications like real-time translation of foreign-language signs on a smartphone . With the advent of smartphones and smartglasses , OCR can be used in internet connected mobile device applications that extract text captured using the device's camera. These devices that do not have built-in OCR functionality will typically use an OCR API to extract
OCR - Misplaced Pages Continue
1920-712: The EDA software world are NI Multisim, Cadence ( ORCAD ), EAGLE PCB and Schematic, Mentor (PADS PCB and LOGIC Schematic), Altium (Protel), LabCentre Electronics (Proteus), gEDA , KiCad and many others. Heat generated by electronic circuitry must be dissipated to prevent immediate failure and improve long term reliability. Heat dissipation is mostly achieved by passive conduction/convection. Means to achieve greater dissipation include heat sinks and fans for air cooling, and other forms of computer cooling such as water cooling . These techniques use convection , conduction , and radiation of heat energy . Electronic noise
1980-409: The basis of all digital computers and microprocessor devices. They range from simple logic gates to large integrated circuits, employing millions of such gates. Digital circuits use a binary system with two voltage levels labelled "0" and "1" to indicated logical status. Often logic "0" will be a lower voltage and referred to as "Low" while logic "1" is referred to as "High". However, some systems use
2040-463: The blind. In 1914, Emanuel Goldberg developed a machine that read characters and converted them into standard telegraph code. Concurrently, Edmund Fournier d'Albe developed the Optophone , a handheld scanner that when moved across a printed page, produced tones that corresponded to specific letters or characters. In the late 1920s and into the 1930s, Emanuel Goldberg developed what he called
2100-459: The circuit, thus slowing the computer. The invention of the integrated circuit by Jack Kilby and Robert Noyce solved this problem by making all the components and the chip out of the same block (monolith) of semiconductor material. The circuits could be made smaller, and the manufacturing process could be automated. This led to the idea of integrating all components on a single-crystal silicon wafer, which led to small-scale integration (SSI) in
2160-828: The contents. There are several techniques for solving the problem of character recognition by means other than improved OCR algorithms. Special fonts like OCR-A , OCR-B , or MICR fonts, with precisely specified sizing, spacing, and distinctive character shapes, allow a higher accuracy rate during transcription in bank check processing. Several prominent OCR engines were designed to capture text in popular fonts such as Arial or Times New Roman, and are incapable of capturing text in these fonts that are specialized and very different from popularly used fonts. As Google Tesseract can be trained to recognize new fonts, it can recognize OCR-A, OCR-B and MICR fonts. Comb fields are pre-printed boxes that encourage humans to write more legibly – one glyph per box. These are often printed in
2220-432: The development of many aspects of modern society, such as telecommunications , entertainment, education, health care, industry, and security. The main driving force behind the advancement of electronics is the semiconductor industry , which in response to global demand continually produces ever-more sophisticated electronic devices and circuits. The semiconductor industry is one of the largest and most profitable sectors in
2280-406: The document contains words not in the lexicon, like proper nouns . Tesseract uses its dictionary to influence the character segmentation step, for improved accuracy. The output stream may be a plain text stream or file of characters, but more sophisticated OCR systems can preserve the original layout of the page and produce, for example, an annotated PDF that includes both the original image of
2340-874: The early 1960s, and then medium-scale integration (MSI) in the late 1960s, followed by VLSI . In 2008, billion-transistor processors became commercially available. An electronic component is any component in an electronic system either active or passive. Components are connected together, usually by being soldered to a printed circuit board (PCB), to create an electronic circuit with a particular function. Components may be packaged singly, or in more complex groups as integrated circuits . Passive electronic components are capacitors , inductors , resistors , whilst active components are such as semiconductor devices; transistors and thyristors , which control current flow at electron level. Electronic circuit functions can be divided into two function groups: analog and digital. A particular device may consist of circuitry that has either or
2400-452: The electronic logic gates to generate binary states. Highly integrated devices: Electronic systems design deals with the multi-disciplinary design issues of complex electronic devices and systems, such as mobile phones and computers . The subject covers a broad spectrum, from the design and development of an electronic system ( new product development ) to assuring its proper function, service life and disposal . Electronic systems design
2460-435: The field of electronics and the electron age. Practical applications started with the invention of the diode by Ambrose Fleming and the triode by Lee De Forest in the early 1900s, which made the detection of small electrical voltages, such as radio signals from a radio antenna , practicable. Vacuum tubes (thermionic valves) were the first active electronic components which controlled current flow by influencing
OCR - Misplaced Pages Continue
2520-456: The flow of individual electrons , and enabled the construction of equipment that used current amplification and rectification to give us radio , television , radar , long-distance telephony and much more. The early growth of electronics was rapid, and by the 1920s, commercial radio broadcasting and telecommunications were becoming widespread and electronic amplifiers were being used in such diverse applications as long-distance telephony and
2580-562: The following ways: The electronics industry consists of various sectors. The central driving force behind the entire electronics industry is the semiconductor industry sector, which has annual sales of over $ 481 billion as of 2018. The largest industry sector is e-commerce , which generated over $ 29 trillion in 2017. The most widely manufactured electronic device is the metal-oxide-semiconductor field-effect transistor (MOSFET), with an estimated 13 sextillion MOSFETs having been manufactured between 1960 and 2018. In
2640-489: The functions of analog circuits were taken over by digital circuits, and modern circuits that are entirely analog are less common; their functions being replaced by hybrid approach which, for instance, uses analog circuits at the front end of a device receiving an analog signal, and then use digital processing using microprocessor techniques thereafter. Sometimes it may be difficult to classify some circuits that have elements of both linear and non-linear operation. An example
2700-462: The global economy, with annual revenues exceeding $ 481 billion in 2018. The electronics industry also encompasses other sectors that rely on electronic devices and systems, such as e-commerce, which generated over $ 29 trillion in online sales in 2017. The identification of the electron in 1897 by Sir Joseph John Thomson , along with the subsequent invention of the vacuum tube which could amplify and rectify small electrical signals , inaugurated
2760-872: The interest rate paid by banks in the overnight money market Optimum currency region , a theoretical optimal area where one currency would make most benefit Other uses [ edit ] Original cast recording , a recording of a stage musical featuring the show's original cast Otago Central Railway , now a heritage railway in Otago, New Zealand Ottawa Central Railway a Canadian Shortline owned by CN Rail OverClocked ReMix , an organization and website dedicated to preserving and paying tribute to video game music through re-orchestration and reinterpretation Oxford, Cambridge and RSA Examinations , an exam board in England, Wales and Northern Ireland Obstacle course racing See also [ edit ] OCR-A ,
2820-594: The leaders of the National Federation of the Blind . In 1978, Kurzweil Computer Products began selling a commercial version of the optical character recognition computer program. LexisNexis was one of the first customers, and bought the program to upload legal paper and news documents onto its nascent online databases. Two years later, Kurzweil sold his company to Xerox , which eventually spun it off as Scansoft , which merged with Nuance Communications . In
2880-722: The most authoritative of the Annual Test of OCR Accuracy from 1992 to 1996. Recognition of typewritten, Latin script text is still not 100% accurate even where clear imaging is available. One study based on recognition of 19th- and early 20th-century newspaper pages concluded that character-by-character OCR accuracy for commercial OCR software varied from 81% to 99%; total accuracy can be achieved by human review or Data Dictionary Authentication. Other areas – including recognition of hand printing, cursive handwriting, and printed text in other scripts (especially those East Asian language characters which have many strokes for
2940-399: The most widely used electronic device in the world. The MOSFET is the basic element in most modern electronic equipment. As the complexity of circuits grew, problems arose. One problem was the size of the circuit. A complex circuit like a computer was dependent on speed. If the components were large, the wires interconnecting them must be long. The electric signals took time to go through
3000-677: The music recording industry. The next big technological step took several decades to appear, when the first working point-contact transistor was invented by John Bardeen and Walter Houser Brattain at Bell Labs in 1947. However, vacuum tubes played a leading role in the field of microwave and high power transmission as well as television receivers until the middle of the 1980s. Since then, solid-state devices have all but completely taken over. Vacuum tubes are still used in some specialist applications such as high power RF amplifiers , cathode-ray tubes , specialist audio equipment, guitar amplifiers and some microwave devices . In April 1955,
3060-406: The page and a searchable textual representation. Near-neighbor analysis can make use of co-occurrence frequencies to correct errors, by noting that certain words are often seen together. For example, "Washington, D.C." is generally far more common in English than "Washington DOC". Knowledge of the grammar of the language being scanned can also help determine if a word is likely to be a verb or
SECTION 50
#17327724087193120-416: The pen down and lifting it. This additional information can make the process more accurate. This technology is also known as "online character recognition", "dynamic character recognition", "real-time character recognition", and "intelligent character recognition". OCR software often pre-processes images to improve the chances of successful recognition. Techniques include: Segmentation of fixed-pitch fonts
3180-540: The reverse definition ("0" is "High") or are current based. Quite often the logic designer may reverse these definitions from one circuit to the next as they see fit to facilitate their design. The definition of the levels as "0" or "1" is arbitrary. Ternary (with three states) logic has been studied, and some prototype computers made, but have not gained any significant practical acceptance. Universally, Computers and Digital signal processors are constructed with digital circuits using Transistors such as MOSFETs in
3240-595: The software does not achieve their desired level of accuracy, a user can be notified for manual review. An error introduced by OCR scanning is sometimes termed a scanno (by analogy with the term typo ). Characters to support OCR were added to the Unicode Standard in June 1993, with the release of version 1.1. Some of these characters are mapped from fonts specific to MICR , OCR-A or OCR-B . Electronics Electronic devices have hugely influenced
3300-576: The standardized ALTO format. Crowd sourcing has also been used not to perform character recognition directly but to invite software developers to develop image processing algorithms, for example, through the use of rank-order tournaments . Commissioned by the U.S. Department of Energy (DOE), the Information Science Research Institute (ISRI) had the mission to foster the improvement of automated technologies for understanding machine printed documents, and it conducted
3360-458: The technology useful only in very limited applications. Recognition of cursive text is an active area of research, with recognition rates even lower than that of hand-printed text . Higher rates of recognition of general cursive script will likely not be possible without the use of contextual or grammatical information. For example, recognizing entire words from a dictionary is easier than trying to parse individual characters from script. Reading
3420-688: The text from the image file captured by the device. The OCR API returns the extracted text, along with information about the location of the detected text in the original image back to the device app for further processing (such as text-to-speech) or display. Various commercial and open source OCR systems are available for most common writing systems , including Latin, Cyrillic, Arabic, Hebrew, Indic, Bengali (Bangla), Devanagari, Tamil, Chinese, Japanese, and Korean characters. OCR engines have been developed into software applications specializing in various subjects such as receipts, invoices, checks, and legal billing documents. The software can be used for: OCR
3480-401: The text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example: from a television broadcast). Widely used as a form of data entry from printed paper data records – whether passport documents, invoices, bank statements , computerized receipts, business cards, mail, printed data, or any suitable documentation – it
3540-762: The years. For instance, early electronics often used point to point wiring with components attached to wooden breadboards to construct circuits. Cordwood construction and wire wrap were other methods used. Most modern day electronics now use printed circuit boards made of materials such as FR4 , or the cheaper (and less hard-wearing) Synthetic Resin Bonded Paper ( SRBP , also known as Paxoline/Paxolin (trade marks) and FR2) – characterised by its brown colour. Health and environmental concerns associated with electronics assembly have gained increased attention in recent years, especially for products destined to go to European markets. Electrical components are generally mounted in
3600-402: Was in use by companies, including CompuScan, in the late 1960s and 1970s. ) Kurzweil used the technology to create a reading machine for blind people to have a computer read text to them out loud. The device included a CCD -type flatbed scanner and a text-to-speech synthesizer. On January 13, 1976, the finished product was unveiled during a widely reported news conference headed by Kurzweil and
#718281