Misplaced Pages

Data model

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities . For instance, a data model may specify that the data element representing a car be composed of a number of other elements which, in turn, represent the color and size of the car and define its owner.

#6993

77-418: The corresponding professional activity is called generally data modeling or, more specifically, database design . Data models are typically specified by a data expert, data specialist, data scientist, data librarian, or a data scholar. A data modeling language and notation are often represented in graphical form as diagrams. A data model can sometimes be referred to as a data structure , especially in

154-498: A data processing problem". They wanted to create "a notation that should enable the analyst to organize the problem around any piece of hardware ". Their work was the first effort to create an abstract specification and invariant basis for designing different alternative implementations using different hardware components. The next step in IS modeling was taken by CODASYL , an IT industry consortium formed in 1959, who essentially aimed at

231-511: A 'classification relation', being a binary relation between an individual thing and a kind of thing (a class) and a 'part-whole relation', being a binary relation between two things, one with the role of part, the other with the role of whole, regardless the kind of things that are related. Given an extensible list of classes, this allows the classification of any individual thing and to specify part-whole relations for any individual object. By standardization of an extensible list of relation types,

308-517: A carefully chosen data structure will allow the most efficient algorithm to be used. The choice of the data structure often begins from the choice of an abstract data type . A data model describes the structure of the data within a given domain and, by implication, the underlying structure of that domain itself. This means that a data model in fact specifies a dedicated grammar for a dedicated artificial language for that domain. A data model represents classes of entities (kinds of things) about which

385-440: A cohesive, inseparable, whole by eliminating unnecessary data redundancies and by relating data structures with relationships . A different approach is to use adaptive systems such as artificial neural networks that can autonomously create implicit models of data. A data structure is a way of storing data in a computer so that it can be used efficiently. It is an organization of mathematical and logical concepts of data. Often

462-535: A company wishes to hold information, the attributes of that information, and relationships among those entities and (often implicit) relationships among those attributes. The model describes the organization of the data to some extent irrespective of how data might be represented in a computer system. The entities represented by a data model can be the tangible entities, but models that include such concrete entity classes tend to change over time. Robust data models often identify abstractions of such entities. For example,

539-561: A data model for XML documents. The main aim of data models is to support the development of information systems by providing the definition and format of data. According to West and Fowler (1999) "if this is done consistently across systems then compatibility of data can be achieved. If the same data structures are used to store and access data then different applications can share data. The results of this are indicated above. However, systems and interfaces often cost more than they should, to build, operate, and maintain. They may also constrain

616-455: A data model is sometimes referred to as the physical data model , but in the original ANSI three schema architecture, it is called "logical". In that architecture, the physical model describes the storage media (cylinders, tracks, and tablespaces). Ideally, this model is derived from the more conceptual data model described above. It may differ, however, to account for constraints like processing capacity and usage patterns. While data analysis

693-415: A data model might include an entity class called "Person", representing all the people who interact with an organization. Such an abstract entity class is typically more appropriate than ones called "Vendor" or "Employee", which identify specific roles played by those people. The term data model can have two meanings: A data model theory has three main components: For example, in the relational model ,

770-412: A data modeling language.[3] A data model instance may be one of three kinds according to ANSI in 1975: The significance of this approach, according to ANSI, is that it allows the three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual model. The table/column structure can change without (necessarily) affecting

847-427: A database involves producing the previously described three types of schemas – conceptual, logical, and physical. The database design documented in these schemas is converted through a Data Definition Language , which can then be used to generate a database. A fully attributed data model contains detailed attributes (descriptions) for every entity within it. The term "database design" can describe many different parts of

SECTION 10

#1732791596007

924-407: A design can be detailed into a logical data model . In later stages, this model may be translated into physical data model . However, it is also possible to implement a conceptual model directly. One of the earliest pioneering works in modeling information systems was done by Young and Kent (1958), who argued for "a precise and abstract way of specifying the informational and time characteristics of

1001-460: A generic data model enables the expression of an unlimited number of kinds of facts and will approach the capabilities of natural languages. Conventional data models, on the other hand, have a fixed and limited domain scope, because the instantiation (usage) of such a model only allows expressions of kinds of facts that are predefined in the model. The logical data structure of a DBMS, whether hierarchical, network, or relational, cannot totally satisfy

1078-449: A later development, with a computer used instead of several independent pieces of equipment. The Census Bureau first made limited use of electronic computers for the 1950 United States Census , using a UNIVAC I system, delivered in 1952. The term data processing has mostly been subsumed by the more general term information technology (IT). The older term "data processing" is suggestive of older technologies. For example, in 1996

1155-502: A semantic logical data model . This is transformed into a physical data model instance from which is generated a physical database. For example, a data modeler may use a data modeling tool to create an entity–relationship model of the corporate data repository of some business enterprise. This model is transformed into a relational model , which in turn generates a relational database . Patterns are common data modeling structures that occur in many data models. A data-flow diagram (DFD)

1232-414: A system by system basis, then not only is the same analysis repeated in overlapping areas, but further analysis must be performed to create the interfaces between them. Most systems within an organization contain the same basic data, redeveloped for a specific purpose. Therefore, an efficiently designed basic data model can minimize rework with minimal modifications for the purposes of different systems within

1309-414: A technique for detailing business requirements for specific databases . It is sometimes called database modeling because a data model is eventually implemented in a database. Data models provide a framework for data to be used within information systems by providing specific definitions and formats. If a data model is used consistently across systems then compatibility of data can be achieved. If

1386-440: A type of data model, but more or less an alternative model. Within the field of software engineering, both a data model and an information model can be abstract, formal representations of entity types that include their properties, relationships and the operations that can be performed on them. The entity types in the model may be kinds of real-world objects, such as devices in a network, or they may themselves be abstract, such as for

1463-421: Is a combination of machines , people, and processes that for a set of inputs produces a defined set of outputs . The inputs and outputs are interpreted as data , facts , information etc. depending on the interpreter's relation to the system. A term commonly used synonymously with data or storage (codes) processing system is information system . With regard particularly to electronic data processing ,

1540-440: Is a common term for data modeling, the activity actually has more in common with the ideas and methods of synthesis (inferring general concepts from particular instances) than it does with analysis (identifying component concepts from more general ones). { Presumably we call ourselves systems analysts because no one can say systems synthesists . } Data modeling strives to bring the data structures of interest together into

1617-469: Is a graphical representation of the "flow" of data through an information system . It differs from the flowchart as it shows the data flow instead of the control flow of the program. A data-flow diagram can also be used for the visualization of data processing (structured design). Data-flow diagrams were invented by Larry Constantine , the original developer of structured design, based on Martin and Estrin's "data-flow graph" model of computation. It

SECTION 20

#1732791596007

1694-513: Is a technique for defining business requirements for a database. It is sometimes called database modeling because a data model is eventually implemented in a database. The figure illustrates the way data models are developed and used today. A conceptual data model is developed based on the data requirements for the application that is being developed, perhaps in the context of an activity model . The data model will normally consist of entity types, attributes, relationships, integrity rules, and

1771-420: Is an abstract conceptual representation of structured data. Entity–relationship modeling is a relational schema database modeling method, used in software engineering to produce a type of conceptual data model (or semantic data model ) of a system, often a relational database , and its requirements in a top-down fashion. These models are being used in the first stage of information system design during

1848-402: Is common practice to draw a context-level data-flow diagram first which shows the interaction between the system and outside entities. The DFD is designed to show how a system is divided into smaller portions and to highlight the flow of data between those parts. This context-level data-flow diagram is then "exploded" to show more detail of the system being modeled An Information model is not

1925-401: Is no such thing as the final data model for a business or application. Instead a data model should be considered a living document that will change in response to a changing business. The data models should ideally be stored in a repository so that they can be retrieved, expanded, and edited over time. Whitten et al. (2004) determined two types of data modeling: Data modeling is also used as

2002-531: Is the process of creating a data model for an information system by applying certain formal techniques. It may be applied as part of broader Model-driven engineering (MDE) concept. Data modeling is a process used to define and analyze data requirements needed to support the business processes within the scope of corresponding information systems in organizations. Therefore, the process of data modeling involves professional data modelers working closely with business stakeholders, as well as potential users of

2079-478: Is then translated into a logical data model , which documents structures of the data that can be implemented in databases. Implementation of one conceptual data model may require multiple logical data models. The last step in data modeling is transforming the logical data model to a physical data model that organizes the data into tables, and accounts for access, performance and storage details. Data modeling defines not just data elements, but also their structures and

2156-400: Is to be stored in a database . This technique can describe any ontology , i.e., an overview and classification of concepts and their relationships, for a certain area of interest . In the 1970s G.M. Nijssen developed "Natural Language Information Analysis Method" (NIAM) method, and developed this in the 1980s in cooperation with Terry Halpin into Object–Role Modeling (ORM). However, it

2233-440: Is to create a structural model of a piece of the real world, called "universe of discourse". For this, three fundamental structural relations are considered: A semantic data model can be used to serve many purposes, such as: The overall goal of semantic data models is to capture more meaning of data by integrating relational concepts with more powerful abstraction concepts known from the artificial intelligence field. The idea

2310-547: Is to provide high level modeling primitives as integral part of a data model in order to facilitate the representation of real world situations. Data processing Data processing is the collection and manipulation of digital data to produce meaningful information. Data processing is a form of information processing , which is the modification (processing) of information in any manner detectable by an observer. Data processing may involve various processes, including: The United States Census Bureau history illustrates

2387-663: The Data Processing Management Association (DPMA) changed its name to the Association of Information Technology Professionals . Nevertheless, the terms are approximately synonymous. Commercial data processing involves a large volume of input data, relatively few computational operations, and a large volume of output. For example, an insurance company needs to keep records on tens or hundreds of thousands of policies, print and mail bills, and receive and post payments. In science and engineering,

Data model - Misplaced Pages Continue

2464-480: The constraints that bind them. The basic graphic elements of DSDs are boxes , representing entities, and arrows , representing relationships. Data structure diagrams are most useful for documenting complex data entities. Data structure diagrams are an extension of the entity–relationship model (ER model). In DSDs, attributes are specified inside the entity boxes rather than outside of them, while relationships are drawn as boxes composed of attributes which specify

2541-441: The objects and relationships found in a particular application domain: for example the customers, products, and orders found in a manufacturing organization. At other times it refers to the set of concepts used in defining such formalizations: for example concepts such as entities, attributes, relations, or tables. So the "data model" of a banking application may be defined using the entity–relationship "data model". This article uses

2618-422: The relational model for database management based on first-order predicate logic . In the 1970s entity–relationship modeling emerged as a new type of conceptual data modeling, originally formalized in 1976 by Peter Chen . Entity–relationship models were being used in the first stage of information system design during the requirements analysis to describe information needs or the type of information that

2695-459: The requirements for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. Therefore, the need to define data from a conceptual view has led to the development of semantic data modeling techniques. That is, techniques to define the meaning of data within the context of its interrelationships with other data. As illustrated in

2772-491: The requirements analysis to describe information needs or the type of information that is to be stored in a database . The data modeling technique can be used to describe any ontology (i.e. an overview and classifications of used terms and their relationships) for a certain universe of discourse i.e. area of interest. Several techniques have been developed for the design of data models. While these methodologies guide data modelers in their work, two different people using

2849-521: The application of mechanical or electronic calculators . A person whose job was to perform calculations manually or using a calculator was called a " computer ." The 1890 United States Census schedule was the first to gather data by individual rather than household . A number of questions could be answered by making a check in the appropriate box on the form. From 1850 to 1880 the Census Bureau employed "a system of tallying, which, by reason of

2926-413: The bank. A more sophisticated record keeping system might further identify the transactions— for example deposits by source or checks by type, such as charitable contributions. This information might be used to obtain information like the total of all contributions for the year. The important thing about this example is that it is a system , in which, all transactions are recorded consistently, and

3003-496: The business rather than support it. A major cause is that the quality of the data models implemented in systems and interfaces is poor". The reason for these problems is a lack of standards that will ensure that data models will both meet business needs and be consistent. A data model explicitly determines the structure of data. Typical applications of data models include database models, design of information systems, and enabling exchange of data. Usually, data models are specified in

3080-616: The cardinality. A data model in Geographic information systems is a mathematical construct for representing geographic objects or surfaces as data. For example, Generic data models are generalizations of conventional data models. They define standardized general relation types, together with the kinds of things that may be related by such a relation type. Generic data models are developed as an approach to solving some shortcomings of conventional data models. For example, different modelers usually produce different conventional data models of

3157-414: The conceptual model. In each case, of course, the structures must remain consistent with the other model. The table/column structure may be different from a direct translation of the entity classes and attributes, but it must ultimately carry out the objectives of the conceptual entity class structure. Early phases of many software development projects emphasize the design of a conceptual data model . Such

Data model - Misplaced Pages Continue

3234-453: The constraints that bind entities together. DSDs differ from the ER model in that the ER model focuses on the relationships between different entities, whereas DSDs focus on the relationships of the elements within an entity and enable users to fully see the links and relationships between each entity. There are several styles for representing data structure diagrams, with the notable difference in

3311-531: The context of programming languages . Data models are often complemented by function models , especially in the context of enterprise models . A data model explicitly determines the structure of data ; conversely, structured data is data organized according to an explicit data model or data structure. Structured data is in contrast to unstructured data and semi-structured data . The term data model can refer to two distinct but closely related concepts. Sometimes it refers to an abstract formalization of

3388-439: The corresponding concept is referred to as electronic data processing system . A very simple example of a data processing system is the process of maintaining a check register. Transactions— checks and deposits— are recorded as they occur and the transactions are summarized to determine a current balance. Monthly the data recorded in the register is reconciled with a hopefully identical list of transactions processed by

3465-413: The data and their relationship in a database, the procedures in an application program. Object orientation, however, combined an entity's procedure with its data." During the early 1990s, three Dutch mathematicians Guido Bakema, Harm van der Lek, and JanPieter Zwart, continued the development on the work of G.M. Nijssen . They focused more on the communication part of the semantics. In 1997 they formalized

3542-424: The definitions of those objects. This is then used as the start point for interface or database design . Some important properties of data for which requirements need to be met are: Another kind of data model describes how to organize data using a database management system or other data management technology. It describes, for example, relational tables and columns or object-oriented classes and attributes. Such

3619-411: The design of an overall database system . Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model these are the tables and views . In an object database the entities and relationships map directly to object classes and named relationships. However, the term "database design" could also be used to apply to

3696-505: The differences less significant. A semantic data model in software engineering is a technique to define the meaning of data within the context of its interrelationships with other data. A semantic data model is an abstraction that defines how the stored symbols relate to the real world. A semantic data model is sometimes called a conceptual data model . The logical data structure of a database management system (DBMS), whether hierarchical , network , or relational , cannot totally satisfy

3773-427: The distinction between a logical data model and a physical data model is blurred. In addition, some CASE tools don't make a distinction between logical and physical data models . There are several notations for data modeling. The actual model is frequently called "entity–relationship model", because it depicts data in terms of the entities and relationships described in the data . An entity–relationship model (ERM)

3850-481: The domain context. More in general the term information model is used for models of individual things, such as facilities, buildings, process plants, etc. In those cases the concept is specialised to Facility Information Model , Building Information Model , Plant Information Model, etc. Such an information model is an integration of a model of the facility with the data and documents about the facility. Data modeling Data modeling in software engineering

3927-473: The entities used in a billing system. Typically, they are used to model a constrained domain that can be described by a closed set of entity types, properties, relationships and operations. According to Lee (1999) an information model is a representation of concepts, relationships, constraints, rules, and operations to specify data semantics for a chosen domain of discourse. It can provide sharable, stable, and organized structure of information requirements for

SECTION 50

#1732791596007

4004-498: The entity boxes rather than outside of them, while relationships are drawn as lines, with the relationship constraints as descriptions on the line. The E-R model, while robust, can become visually cumbersome when representing entities with several attributes. There are several styles for representing data structure diagrams, with a notable difference in the manner of defining cardinality. The choices are between arrow heads, inverted arrow heads (crow's feet), or numerical representation of

4081-419: The essential messiness of the real world, and the task of the data modeler to create order out of chaos without excessively distorting the truth. In the 1980s, according to Jan L. Harrington (2000), "the development of the object-oriented paradigm brought about a fundamental change in the way we look at data and the procedures that operate on data. Traditionally, data and procedures have been stored separately:

4158-427: The evolution of data processing from manual through electronic procedures. Although widespread use of the term data processing dates only from the 1950s, data processing functions have been performed manually for millennia. For example, bookkeeping involves functions such as posting transactions and producing reports like the balance sheet and the cash flow statement . Completely manual methods were augmented by

4235-413: The figure. The real world, in terms of resources, ideas, events, etc., are symbolically defined within physical data stores. A semantic data model is an abstraction that defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world. Data architecture is the design of data for use in defining the target state and the subsequent planning needed to hit

4312-397: The increasing number of combinations of classifications required, became increasingly complex. Only a limited number of combinations could be recorded in one tally, so it was necessary to handle the schedules 5 or 6 times, for as many independent tallies." "It took over 7 years to publish the results of the 1880 census" using manual processing methods. The term automatic data processing

4389-449: The information system provided the data and information for management purposes. The first generation database system , called Integrated Data Store (IDS), was designed by Charles Bachman at General Electric. Two famous database models, the network data model and the hierarchical data model , were proposed during this period of time". Towards the end of the 1960s, Edgar F. Codd worked out his theories of data arrangement, and proposed

4466-436: The information system. There are three different types of data models produced while progressing from requirements to the actual database to be used for the information system. The data requirements are initially recorded as a conceptual data model which is essentially a set of technology independent specifications about the data and is used to discuss initial requirements with the business stakeholders. The conceptual model

4543-524: The manner of defining cardinality . The choices are between arrow heads, inverted arrow heads ( crow's feet ), or numerical representation of the cardinality. An entity–relationship model (ERM), sometimes referred to as an entity–relationship diagram (ERD), could be used to represent an abstract conceptual data model (or semantic data model or physical data model) used in software engineering to represent structured data. There are several notations used for ERMs. Like DSD's, attributes are specified inside

4620-453: The meaning of data within the context of its interrelationships with other data. As illustrated in the figure the real world, in terms of resources, ideas, events, etc., is symbolically defined by its description within physical data stores. A semantic data model is an abstraction which defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world. The purpose of semantic data modeling

4697-419: The method Fully Communication Oriented Information Modeling FCO-IM . A database model is a specification describing how a database is structured and used. Several such models have been suggested. Common models include: A data structure diagram (DSD) is a diagram and data model used to describe conceptual data models by providing graphical notations which document entities and their relationships , and

SECTION 60

#1732791596007

4774-429: The organization Data models represent information areas of interest. While there are many ways to create data models, according to Len Silverston (1997) only two modeling methodologies stand out, top-down and bottom-up: Sometimes models are created in a mixture of the two methods: by considering the data needs and structure of an application and by consistently referencing a subject-area model. In many environments

4851-517: The overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the Database Management System or DBMS. In the process, system interfaces account for 25% to 70% of the development and support costs of current systems. The primary reason for this cost is that these systems do not share a common data model . If data models are developed on

4928-505: The relationships between them. Data modeling techniques and methodologies are used to model data in a standard, consistent, predictable manner in order to manage it as a resource. The use of data modeling standards is strongly recommended for all projects requiring a standard means of defining and analyzing data within an organization, e.g., using data modeling: Data modeling may be performed during various types of projects and in multiple phases of projects. Data models are progressive; there

5005-481: The requirements for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. That is unless the semantic data model is implemented in the database on purpose, a choice which may slightly impact performance but generally vastly improves productivity. Therefore, the need to define data from a conceptual view has led to the development of semantic data modeling techniques. That is, techniques to define

5082-565: The same data structures are used to store and access data then different applications can share data seamlessly. The results of this are indicated in the diagram. However, systems and interfaces are often expensive to build, operate, and maintain. They may also constrain the business rather than support it. This may occur when the quality of the data models implemented in systems and interfaces is poor. Some common problems found in data models are: In 1975 ANSI described three kinds of data-model instance : According to ANSI, this approach allows

5159-495: The same domain. This can lead to difficulty in bringing the models of different people together and is an obstacle for data exchange and data integration. Invariably, however, this difference is attributable to different levels of abstraction in the models and differences in the kinds of facts that can be instantiated (the semantic expression capabilities of the models). The modelers need to communicate and agree on certain elements that are to be rendered more concretely, in order to make

5236-445: The same methodology will often come up with very different results. Most notable are: Generic data models are generalizations of conventional data models . They define standardized general relation types, together with the kinds of things that may be related by such a relation type. The definition of generic data model is similar to the definition of a natural language. For example, a generic data model may define relation types such as

5313-410: The same thing as Young and Kent: the development of "a proper structure for machine-independent problem definition language, at the system level of data processing". This led to the development of a specific IS information algebra . In the 1960s data modeling gained more significance with the initiation of the management information system (MIS) concept. According to Leondes (2002), "during that time,

5390-433: The structural part is based on a modified concept of the mathematical relation ; the integrity part is expressed in first-order logic and the manipulation part is expressed using the relational algebra , tuple calculus and domain calculus . A data model instance is created by applying a data model theory. This is typically done to solve some business enterprise requirement. Business requirements are normally captured by

5467-423: The target state, Data architecture describes how data is processed, stored, and utilized in a given system. It provides criteria for data processing operations that make it possible to design data flows and also control the flow of data in the system. Data modeling in software engineering is the process of creating a data model by applying formal data model descriptions using data modeling techniques. Data modeling

5544-478: The target state. It is usually one of several architecture domains that form the pillars of an enterprise architecture or solution architecture . A data architecture describes the data structures used by a business and/or its applications. There are descriptions of data in storage and data in motion; descriptions of data stores, data groups, and data items; and mappings of those data artifacts to data qualities, applications, locations, etc. Essential to realizing

5621-460: The term in both senses. Managing large quantities of structured and unstructured data is a primary function of information systems . Data models describe the structure, manipulation, and integrity aspects of the data stored in data management systems such as relational databases. They may also describe data with a looser structure, such as word processing documents, email messages , pictures, digital audio, and video: XDM , for example, provides

5698-683: The terms data processing and information systems are considered too broad, and the term data processing is typically used for the initial stage followed by a data analysis in the second stage of the overall data handling. Data analysis uses specialized algorithms and statistical calculations that are less often observed in a typical general business environment. For data analysis, software suites like SPSS or SAS , or their free counterparts such as DAP , gretl , or PSPP are often used. These tools are usually helpful for processing various huge data sets, as they are able to handle enormous amount of statistical analysis. A data processing system

5775-554: The three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual schema. The table/column structure can change without (necessarily) affecting the conceptual schema. In each case, of course, the structures must remain consistent across all schemas of the same data model. In the context of business process integration (see figure), data modeling complements business process modeling , and ultimately results in database generation. The process of designing

5852-497: Was Terry Halpin's 1989 PhD thesis that created the formal foundation on which Object–Role Modeling is based. Bill Kent, in his 1978 book Data and Reality, compared a data model to a map of a territory, emphasizing that in the real world, "highways are not painted red, rivers don't have county lines running down the middle, and you can't see contour lines on a mountain". In contrast to other researchers who tried to create models that were mathematically clean and elegant, Kent emphasized

5929-665: Was applied to operations performed by means of unit record equipment , such as Herman Hollerith 's application of punched card equipment for the 1890 United States Census . "Using Hollerith's punchcard equipment, the Census Office was able to complete tabulating most of the 1890 census data in 2 to 3 years, compared with 7 to 8 years for the 1880 census. It is estimated that using Hollerith's system saved some $ 5 million in processing costs" in 1890 dollars even though there were twice as many questions as in 1880. Computerized data processing, or electronic data processing represents

#6993