Misplaced Pages

Data processing

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

Data processing is the collection and manipulation of digital data to produce meaningful information. Data processing is a form of information processing , which is the modification (processing) of information in any manner detectable by an observer.

#65934

72-445: Data processing may involve various processes, including: The United States Census Bureau history illustrates the evolution of data processing from manual through electronic procedures. Although widespread use of the term data processing dates only from the 1950s, data processing functions have been performed manually for millennia. For example, bookkeeping involves functions such as posting transactions and producing reports like

144-488: A mass noun in singular form. This usage is common in everyday language and in technical and scientific fields such as software development and computer science . One example of this usage is the term " big data ". When used more specifically to refer to the processing and analysis of sets of data, the term retains its plural form. This usage is common in the natural sciences, life sciences, social sciences, software development and computer science, and grew in popularity in

216-436: A basis for calculation, reasoning, or discussion. Data can range from abstract ideas to concrete measurements, including, but not limited to, statistics . Thematically connected data presented in some relevant context can be viewed as information . Contextually connected pieces of information can then be described as data insights or intelligence . The stock of insights and intelligence that accumulate over time resulting from

288-486: A check in the appropriate box on the form. From 1850 to 1880 the Census Bureau employed "a system of tallying, which, by reason of the increasing number of combinations of classifications required, became increasingly complex. Only a limited number of combinations could be recorded in one tally, so it was necessary to handle the schedules 5 or 6 times, for as many independent tallies." "It took over 7 years to publish

360-584: A climber's guidebook containing practical information on the best way to reach Mount Everest's peak may be considered "knowledge". "Information" bears a diversity of meanings that range from everyday usage to technical use. This view, however, has also been argued to reverse how data emerges from information, and information from knowledge. Generally speaking, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern , perception, and representation. Beynon-Davies uses

432-404: A common view, data is collected and analyzed; data only becomes information suitable for making decisions once it has been analyzed in some fashion. One can say that the extent to which a set of data is informative to someone depends on the extent to which it is unexpected by that person. The amount of information contained in a data stream may be characterized by its Shannon entropy . Knowledge

504-668: A description of other data. A similar yet earlier term for metadata is "ancillary data." The prototypical example of metadata is the library catalog, which is a description of the contents of books. Whenever data needs to be registered, data exists in the form of a data document . Kinds of data documents include: Some of these data documents (data repositories, data studies, data sets, and software) are indexed in Data Citation Indexes , while data papers are indexed in traditional bibliographic databases, e.g., Science Citation Index . Gathering data can be accomplished through

576-553: A few decades. Scientific publishers and libraries have been struggling with this problem for a few decades, and there is still no satisfactory solution for the long-term storage of data over centuries or even for eternity. Data accessibility . Another problem is that much scientific data is never published or deposited in data repositories such as databases . In a recent survey, data was requested from 516 studies that were published between 2 and 22 years earlier, but less than one out of five of these studies were able or willing to provide

648-448: A ledger. Beginning in 1970 information was gathered via mailed forms. To reduce paper usage, reduce payroll expense and acquire the most comprehensive list of addresses ever compiled, 500,000 handheld computers (HHCs) (specifically designed, single-purpose devices) were used for the first time in 2009 during the address canvassing portion of the 2010 Decennial Census Project. Projected savings were estimated to be over $ 1 billion. The HHC

720-511: A northern and southern half called "divisions". In the following decades, several other systems were used, until the current one was introduced in 1910. This system has seen only minor changes: New Mexico and Arizona were both added to the Mountain division upon statehood in 1912, the North region was divided into a Northeast and a North Central region in 1940, Alaska and Hawaii were both added to

792-470: A primary source (the researcher is the first person to obtain the data) or a secondary source (the researcher obtains the data that has already been collected by other sources, such as data disseminated in a scientific journal). Data analysis methodologies vary and include data triangulation and data percolation. The latter offers an articulate method of collecting, classifying, and analyzing data using five possible angles of analysis (at least three) to maximize

SECTION 10

#1732772896066

864-458: A real-time estimate in U.S. and World Population Clock. Only peoples whose live in the 50 states and within the District of Columbia are included in the estimation. The United States Census Bureau is committed to confidentiality and guarantees non-disclosure of any addresses or personal information related to individuals or establishments. Title 13 of the U.S. Code establishes penalties for

936-433: A zero and uses the term " decennial " to describe the operation. Between censuses, the Census Bureau makes population estimates and projections. In addition, census data directly affects how more than $ 400 billion per year in federal and state funding is allocated to communities for neighborhood improvements, public health , education, transportation and more. The Census Bureau is mandated with fulfilling these obligations:

1008-420: Is a combination of machines , people, and processes that for a set of inputs produces a defined set of outputs . The inputs and outputs are interpreted as data , facts , information etc. depending on the interpreter's relation to the system. A term commonly used synonymously with data or storage (codes) processing system is information system . With regard particularly to electronic data processing ,

1080-519: Is an individual value in a collection of data. Data are usually organized into structures such as tables that provide additional context and meaning, and may themselves be used as data in larger structures. Data may be used as variables in a computational process . Data may represent abstract ideas or concrete measurements. Data are commonly used in scientific research , economics , and virtually every other form of human organizational activity. Examples of data sets include price indices (such as

1152-482: Is at stake, the census also runs the risk of being politicized." Such political tensions highlight the complexity of identity and classification ; some argue that unclear results from the population data "is due to distortions brought about by political pressures." One frequently used example includes ambiguous ethnic counts, which often involves underenumeration and/or undercounting of minority populations. Ideas about race, ethnicity and identity have also evolved in

1224-553: Is pervasive. The territories are not included, but the District of Columbia is. Regional divisions used by the United States Census Bureau: The first census was collected in 1790 and published in 1791. It was 56 pages and cost $ 44,377.28. The current system was introduced for the 1910 census, but other ways of grouping states were used historically by the Census Bureau. The first of these

1296-405: Is the awareness of its environment that some entity possesses, whereas data merely communicates that knowledge. For example, the entry in a database specifying the height of Mount Everest is a datum that communicates a precisely-measured value. This measurement may be included in a book along with other data on Mount Everest to describe the mountain in a manner useful for those who wish to decide on

1368-443: Is the longevity of data. Scientific research generates huge amounts of data, especially in genomics and astronomy , but also in the medical sciences , e.g. in medical imaging . In the past, scientific data has been published in papers and books, stored in libraries, but more recently practically all data is stored on hard drives or optical discs . However, in contrast to paper, these storage devices may become unreadable after

1440-404: Is the plural of datum , "(thing) given," and the neuter past participle of dare , "to give". The first English use of the word "data" is from the 1640s. The word "data" was first used to mean "transmissible and storable computer information" in 1946. The expression "data processing" was first used in 1954. When "data" is used more generally as a synonym for "information", it is treated as

1512-581: The 1950 United States Census , using a UNIVAC I system, delivered in 1952. The term data processing has mostly been subsumed by the more general term information technology (IT). The older term "data processing" is suggestive of older technologies. For example, in 1996 the Data Processing Management Association (DPMA) changed its name to the Association of Information Technology Professionals . Nevertheless,

SECTION 20

#1732772896066

1584-702: The American people and economy . The U.S. Census Bureau is part of the U.S. Department of Commerce and its director is appointed by the President of the United States . Currently, Robert Santos is the Director of the U.S. Census Bureau and Ron S. Jarmin is the Deputy Director. The Census Bureau's primary mission is conducting the U.S. census every ten years, which allocates the seats of

1656-795: The Bureau of Justice Statistics (BJS), the Department of Housing and Urban Development (HUD), the National Center for Education Statistics (NCES), and the National Science Foundation (NSF), among others. Since 1903, the official census-taking agency of the United States government has been the Bureau of the Census. The Census Bureau is headed by a director, assisted by a deputy director and an executive staff composed of

1728-494: The Census Information Center cooperative program that involves 58 "national, regional, and local non-profit organizations". The CIC program aims to represent the interests of underserved communities. The 1890 census was the first to use the electric tabulating machines invented by Herman Hollerith . For 1890–1940 details, see Truesdell, Leon E. (1965). The Development of Punch Card Tabulation in

1800-646: The Topologically Integrated Geographic Encoding and Referencing (TIGER) database system. Census officials were able to evaluate the more sophisticated and detailed results that the TIGER system produced; furthermore, TIGER data is also available to the public. And while the TIGER system does not directly amass demographic data, as a geographic information system (GIS), it can be used to merge demographics to conduct more accurate geospatial and mapping analysis. In July 2019,

1872-456: The U.S. House of Representatives to the states based on their population. The bureau's various censuses and surveys help allocate over $ 675 billion in federal funds every year and it assists states, local communities, and businesses make informed decisions. The information provided by the census informs decisions on where to build and maintain schools, hospitals, transportation infrastructure, and police and fire departments. In addition to

1944-421: The balance sheet and the cash flow statement . Completely manual methods were augmented by the application of mechanical or electronic calculators . A person whose job was to perform calculations manually or using a calculator was called a " computer ." The 1890 United States Census schedule was the first to gather data by individual rather than household . A number of questions could be answered by making

2016-526: The consumer price index ), unemployment rates , literacy rates, and census data. In this context, data represent the raw facts and figures from which useful information can be extracted. Data are collected using techniques such as measurement , observation , query , or analysis , and are typically represented as numbers or characters that may be further processed . Field data are data that are collected in an uncontrolled, in-situ environment. Experimental data are data that are generated in

2088-421: The 1880 census. It is estimated that using Hollerith's system saved some $ 5 million in processing costs" in 1890 dollars even though there were twice as many questions as in 1880. Computerized data processing, or electronic data processing represents a later development, with a computer used instead of several independent pieces of equipment. The Census Bureau first made limited use of electronic computers for

2160-443: The 20th and 21st centuries. Some style guides do not recognize the different meanings of the term and simply recommend the form that best suits the target audience of the guide. For example, APA style as of the 7th edition requires "data" to be treated as a plural form. Data, information , knowledge , and wisdom are closely related concepts, but each has its role concerning the other, and each term has its meaning. According to

2232-428: The Bureau of the Census, 1890–1940: With outlines of actual tabulation programs . U.S. GPO . In 1946, knowing of the bureau's funding of Hollerith and, later, Powers , John Mauchly approached the bureau about early funding for UNIVAC development. A UNIVAC I computer was accepted by the bureau in 1951. Historically, the census information was gathered by census takers going door-to-door collecting information in

Data processing - Misplaced Pages Continue

2304-491: The Bureau pretests surveys and digital products before they are fielded and then evaluates them after they have been conducted. Data In common usage , data ( / ˈ d eɪ t ə / , also US : / ˈ d æ t ə / ) is a collection of discrete or continuous values that convey information , describing the quantity , quality , fact , statistics , other basic units of meaning, or simply sequences of symbols that may be further interpreted formally . A datum

2376-559: The Census Bureau released individual information regarding several hundred young men to the Justice Department and Selective Service system for the purpose of prosecutions for draft evasion. During World War II , the United States Census Bureau assisted the government's Japanese American internment efforts by providing confidential neighborhood information on Japanese-Americans . The bureau's role

2448-458: The Census Bureau stopped releasing new data via American FactFinder, which was decommissioned in March 2020 after 20 years of being the agency's primary tool for data dissemination. The new platform is data.census.gov. Throughout the decade between censuses, the bureau conducts surveys to produce a general view and comprehensive study of the United States' social and economic conditions. Staff from

2520-524: The Current Surveys Program conduct over 130 ongoing and special surveys about people and their characteristics. A network of professional field representatives gathers information from a sample of households, responding to questions about employment, consumer expenditures, health, housing, and other topics. Surveys conducted between decades: The Census Bureau also collects information on behalf of survey sponsors. These sponsors include

2592-501: The HHC. Since the units were updated nightly with important changes and updates, operator implementation of proper procedure was imperative. Census Bureau stays current by conducting research studies to improve the work that they do. Census researchers explore topics about survey innovations, participation, and data accuracy, such as undercount, overcount, the use of technologies, multilingual research, and ways to reduce costs. In addition,

2664-455: The Nation's people and economy." Only after 72 years does the information collected become available to other agencies or the general public. Seventy-two years was picked because usually by 72 years since the census is taken, most participants would be deceased. Despite these guarantees of confidentiality, the Census Bureau has some history of disclosures to other government agencies. In 1918,

2736-592: The Pacific division upon statehood in 1959, and the North Central region was renamed the Midwest in 1984. Many federal, state, local and tribal governments use census data to: Census data is used to determine how seats of Congress are distributed to states. Census data is not used to determine or define race genetically, biologically or anthropologically. The census data is also used by the Bureau to obtain

2808-498: The U.S. Code. By law, the Census Bureau must count everyone and submit state population totals to the U.S. president by December 31 of any year ending in a zero. States within the Union receive the results in the spring of the following year. The United States Census Bureau defines four statistical regions, with nine divisions. The Census Bureau regions are "widely used...for data collection and analysis". The Census Bureau definition

2880-516: The United States or foreign governments, or law enforcement agencies such as the IRS or the FBI or Interpol . "Providing quality data, for public good—while respecting individual privacy and, at the same time, protecting confidentiality—is the Census Bureau's core responsibility"; "Keeping the public's trust is critical to the Census's ability to carry out the mission as the leading source of quality data about

2952-581: The United States, and such changes warrant examination of how these shifts have impacted the accuracy of census data over time. The United States Census Bureau began pursuing technological innovations to improve the precision of its census data collection in the 1980s. Robert W. Marx, the Chief of the Geography Division of the USCB teamed up with the U.S. Geological Survey and oversaw the creation of

Data processing - Misplaced Pages Continue

3024-570: The associate directors. The Census Bureau headquarters has been in Suitland, Maryland , since 1942. A new headquarters complex completed there in 2007 supports over 4,000 employees. > The bureau operates regional offices in 6 cities: > New York City , Philadelphia , Chicago , Atlanta , Denver , and Los Angeles . The National Processing Center is in Jeffersonville, Indiana . Additional temporary processing facilities facilitate

3096-413: The bank. A more sophisticated record keeping system might further identify the transactions— for example deposits by source or checks by type, such as charitable contributions. This information might be used to obtain information like the total of all contributions for the year. The important thing about this example is that it is a system , in which, all transactions are recorded consistently, and

3168-433: The best method to climb it. Awareness of the characteristics represented by this data is knowledge. Data are often assumed to be the least abstract concept, information the next least, and knowledge the most abstract. In this view, data becomes information by interpretation; e.g., the height of Mount Everest is generally considered "data", a book on Mount Everest geological characteristics may be considered "information", and

3240-434: The binary alphabet. Some special forms of data are distinguished. A computer program is a collection of data, that can be interpreted as instructions. Most computer languages make a distinction between programs and the other data on which programs operate, but in some languages, notably Lisp and similar languages, programs are essentially indistinguishable from other data. It is also useful to distinguish metadata , that is,

3312-515: The census was taken by marshals of the judicial districts . The Census Act of 1840 established a central office which became known as the Census Office. Several acts followed that revised and authorized new censuses, typically at the 10-year intervals. In 1902, the temporary Census Office was moved under the Department of Interior , and in 1903 it was renamed the Census Bureau under the new Department of Commerce and Labor . The department

3384-792: The collecting of statistics about the nation, its people, and economy. The Census Bureau's legal authority is codified in Title 13 of the United States Code . The Census Bureau also conducts surveys on behalf of various federal government and local government agencies on topics such as employment, crime, health, consumer expenditures , and housing. Within the bureau, these are known as "demographic surveys" and are conducted perpetually between and during decennial (10-year) population counts. The Census Bureau also conducts economic surveys of manufacturing, retail, service, and other establishments and of domestic governments. Between 1790 and 1840,

3456-617: The concept of a sign to differentiate between data and information; data is a series of symbols, while information occurs when the symbols are used to refer to something. Before the development of computing devices and machines, people had to manually collect data and impose patterns on it. With the development of computing devices and machines, these devices can also collect data. In the 2010s, computers were widely used in many fields to collect data and sort or process it, in disciplines ranging from marketing , analysis of social service usage by citizens to scientific research. These patterns in

3528-439: The corresponding concept is referred to as electronic data processing system . A very simple example of a data processing system is the process of maintaining a check register. Transactions— checks and deposits— are recorded as they occur and the transactions are summarized to determine a current balance. Monthly the data recorded in the register is reconciled with a hopefully identical list of transactions processed by

3600-444: The course of a controlled scientific experiment. Data are analyzed using techniques such as calculation , reasoning , discussion, presentation , visualization , or other forms of post-analysis. Prior to analysis, raw data (or unprocessed data) is typically cleaned: Outliers are removed, and obvious instrument or data entry errors are corrected. Data can be seen as the smallest units of factual information that can be used as

3672-408: The data are seen as information that can be used to enhance knowledge. These patterns may be interpreted as " truth " (though "truth" can be a subjective concept) and may be authorized as aesthetic and ethical criteria in some disciplines or cultures. Events that leave behind perceivable physical or virtual remains can be traced back through data. Marks are no longer considered data once the link between

SECTION 50

#1732772896066

3744-628: The decennial census, the Census Bureau continually conducts over 130 surveys and programs a year, including the American Community Survey , the U.S. Economic Census , and the Current Population Survey . The U.S. Economic Census occurs every five years and reports on American Business and the American economy in order to plan business decisions. Furthermore, economic and foreign trade indicators released by

3816-500: The decennial census, which employs more than a million people. The cost of the 2000 census was $ 4.5 billion. During the years just prior to the decennial census, parallel census offices, known as "Regional Census Centers" are opened in the field office cities. The decennial operations are carried out from these facilities. The Regional Census Centers oversee the openings and closings of smaller "Area Census Offices" within their collection jurisdictions. In 2020, Regional Census Centers oversaw

3888-529: The disclosure of this information. All census employees must sign an affidavit of non-disclosure prior to employment. This non-disclosure states "I will not disclose any information contained in the schedules, lists, or statements obtained for or prepared by the Census Bureau to any person or persons either during or after employment." The punishment for breaking the non-disclosure is a fine up to $ 250,000 or 5 years in prison. The bureau cannot share responses, addresses or personal information with anyone, including

3960-575: The ethos of data as "given". Peter Checkland introduced the term capta (from the Latin capere , "to take") to distinguish between an immense number of possible data and a sub-set of them, to which attention is oriented. Johanna Drucker has argued that since the humanities affirm knowledge production as "situated, partial, and constitutive," using data may introduce assumptions that are counterproductive, for example that phenomena are discrete or are observer-independent. The term capta , which emphasizes

4032-658: The federal government typically contain data produced by the Census Bureau. Article One of the United States Constitution (section II) directs the population be enumerated at least once every ten years and the resulting counts used to set the number of members from each state in the House of Representatives and, by extension, in the Electoral College . The Census Bureau now conducts a full population count every ten years in years ending with

4104-538: The mark and observation is broken. Mechanical computing devices are classified according to how they represent data. An analog computer represents a datum as a voltage, distance, position, or other physical quantity. A digital computer represents a piece of data as a sequence of symbols drawn from a fixed alphabet . The most common digital computers use a binary alphabet, that is, an alphabet of two characters typically denoted "0" and "1". More familiar representations, such as numbers or letters, are then constructed from

4176-638: The operation of 248 Area Census Offices, The estimated cost of the 2010 census is $ 14.7 billion. On January 1, 2013, the Census Bureau consolidated its twelve regional offices into six. Increasing costs of data collection, changes in survey management tools such as laptops and the increasing use of multi-modal surveys (i.e. internet, telephone, and in-person) led the Bureau to consolidate. The six regional offices that closed were Boston, Charlotte, Dallas, Detroit, Kansas City and Seattle. The remaining regional offices are New York City, Philadelphia, Chicago, Atlanta, Denver, and Los Angeles. The Census Bureau also runs

4248-522: The petabyte scale. Using traditional data analysis methods and computing, working with such large (and growing) datasets is difficult, even impossible. (Theoretically speaking, infinite data would yield infinite information, which would render extracting insights or intelligence impossible.) In response, the relatively new field of data science uses machine learning (and other artificial intelligence (AI)) methods that allow for efficient applications of analytic methods to big data. The Latin word data

4320-445: The populace's private information. Enumerators (information gatherers) that had operational problems with the device understandably made negative reports. During the 2009 Senate confirmation hearings for Robert Groves , President Obama's Census Director appointee, there was much mention of problems but very little criticism of the units. In rural areas, the sparsity of cell phone towers caused problems with data transmission to and from

4392-405: The problem of reproducibility is the attempt to require FAIR data , that is, data that is Findable, Accessible, Interoperable, and Reusable. Data that fulfills these requirements can be used in subsequent research and thus advances science and technology. Although data is also increasingly used in other fields, it has been suggested that the highly interpretive nature of them might be at odds with

SECTION 60

#1732772896066

4464-452: The requested data. Overall, the likelihood of retrieving data dropped by 17% each year after publication. Similarly, a survey of 100 datasets in Dryad found that more than half lacked the details to reproduce the research results from these studies. This shows the dire situation of access to scientific data that is not published or does not have enough details to be reproduced. A solution to

4536-457: The research's objectivity and permit an understanding of the phenomena under investigation as complete as possible: qualitative and quantitative methods, literature reviews (including scholarly articles), interviews with experts, and computer simulation. The data is thereafter "percolated" using a series of pre-determined steps so as to extract the most relevant information. An important field in computer science , technology , and library science

4608-451: The results of the 1880 census" using manual processing methods. The term automatic data processing was applied to operations performed by means of unit record equipment , such as Herman Hollerith 's application of punched card equipment for the 1890 United States Census . "Using Hollerith's punchcard equipment, the Census Office was able to complete tabulating most of the 1890 census data in 2 to 3 years, compared with 7 to 8 years for

4680-463: The same method of bank reconciliation is used each time. This is a flowchart of a data processing system combining manual and computerized processing to handle accounts receivable , billing, and general ledger [REDACTED] United States Census Bureau The United States Census Bureau ( USCB ), officially the Bureau of the Census , is a principal agency of the U.S. Federal Statistical System , responsible for producing data about

4752-474: The synthesis of data into information, can then be described as knowledge . Data has been described as "the new oil of the digital economy ". Data, as a general concept , refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing . Advances in computing technologies have led to the advent of big data , which usually refers to very large quantities of data, usually at

4824-596: The term data processing is typically used for the initial stage followed by a data analysis in the second stage of the overall data handling. Data analysis uses specialized algorithms and statistical calculations that are less often observed in a typical general business environment. For data analysis, software suites like SPSS or SAS , or their free counterparts such as DAP , gretl , or PSPP are often used. These tools are usually helpful for processing various huge data sets, as they are able to handle enormous amount of statistical analysis. A data processing system

4896-449: The terms are approximately synonymous. Commercial data processing involves a large volume of input data, relatively few computational operations, and a large volume of output. For example, an insurance company needs to keep records on tens or hundreds of thousands of policies, print and mail bills, and receive and post payments. In science and engineering, the terms data processing and information systems are considered too broad, and

4968-552: Was denied for decades but was finally proven in 2007. United States census data are valuable for the country's political parties; Democrats and Republicans are highly interested in knowing the accurate number of persons in their respective districts. These insights are often linked to financial and economic strategies that are central to federal, state and city investments for locations of particular populations. Such apportionments are designed to distribute political power across neutral spatial allocations; however, "because so much

5040-511: Was intended to consolidate overlapping statistical agencies, but Census Bureau officials were hindered by their subordinate role in the department. An act in 1920 changed the date and authorized manufacturing censuses every two years and agriculture censuses every 10 years. In 1929, a bill was passed mandating the House of Representatives be reapportioned based on the results of the 1930 census . In 1954, various acts were codified into Title 13 of

5112-721: Was introduced after the 1850 census by statistician and later census superintendent J. D. B. De Bow . He published a compendium where the states and territories were grouped into five "great division", namely the Middle, New England, the Northwestern, the Southern, and the Southwestern great divisions. Unsatisfied with this system, De Bow devised another one four years later, with states and territories grouped into an Eastern, Interior, and Western "great section", each divided into

5184-414: Was manufactured by Harris Corporation , an established Department of Defense contractor, via a controversial contract with the Department of Commerce . Secured access via a fingerprint swipe guaranteed only the verified user could access the unit. A GPS capacity was integral to the daily address management and the transfer of gathered information. Of major importance was the security and integrity of

#65934