Misplaced Pages

Relational model

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

The relational model ( RM ) is an approach to managing data using a structure and language consistent with first-order predicate logic , first described in 1969 by English computer scientist Edgar F. Codd , where all data is represented in terms of tuples , grouped into relations . A database organized in terms of the relational model is a relational database .

#950049

69-570: The purpose of the relational model is to provide a declarative method for specifying data and queries: users directly state what information the database contains and what information they want from it, and let the database management system software take care of describing data structures for storing the data and retrieval procedures for answering queries. Most relational databases use the SQL data definition and query language; these systems implement what can be regarded as an engineering approximation to

138-816: A {\displaystyle a} , b {\displaystyle b} , c {\displaystyle c} , where the values of b {\displaystyle b} are numbers, then Π a ,   b × 0.5 ,   c ( R ) {\displaystyle \Pi _{a,\ b\times 0.5,\ c}(R)} is like R {\displaystyle R} , but with all b {\displaystyle b} -values halved. The closely related concept in set theory (see: projection (set theory) ) differs from that of relational algebra in that, in set theory, one projects onto ordered components, not onto attributes. For instance, projecting ( 3 , 7 ) {\displaystyle (3,7)} onto

207-426: A 1 , . . . , a n ] {\displaystyle t[a_{1},...,a_{n}]} is the restriction of the tuple t {\displaystyle t} to the set { a 1 , . . . , a n } {\displaystyle \{a_{1},...,a_{n}\}} so that where ( a ′ , v ) {\displaystyle (a',v)}

276-421: A , b , c {\displaystyle a,b,c} to range over them. Another basic notion is the set of atomic values that contains values such as numbers and strings. Our first definition concerns the notion of tuple , which formalizes the notion of row or record in a table: The next definition defines relation that formalizes the contents of a table as it is defined in the relational model. Such

345-453: A first-class object in the program. Although pure functional languages are non-imperative, they often provide a facility for describing the effect of a function as a series of steps. Other functional languages, such as Lisp , OCaml and Erlang , support a mixture of procedural and functional programming. Some logic programming languages, such as Prolog , and database query languages, such as SQL, while declarative in principle, also support

414-411: A tuple allows for a unique empty tuple with no values, corresponding to the empty set of attributes. If a relation has a degree of 0 (i.e. its heading contains no attributes), it may have either a cardinality of 0 (a body containing no tuples) or a cardinality of 1 (a body containing the single empty tuple). These relations represent Boolean truth values . The relation with degree 0 and cardinality 0

483-418: A declarative programming style. In logic programming, programs consist of sentences expressed in logical form, and computation uses those sentences to solve problems, which are also expressed in logical form. In a pure functional language , such as Haskell , all functions are without side effects , and state changes are only represented as functions that transform the state, which is explicitly represented as

552-448: A description of some relvars ( relation variables) and their attributes: In this design we have three relvars: Customer, Order, and Invoice. The bold, underlined attributes are candidate keys . The non-bold, underlined attributes are foreign keys . Usually one candidate key is chosen to be called the primary key and used in preference over the other candidate keys, which are then called alternate keys . A candidate key

621-467: A function to output the first n square numbers in Racket can be done accordingly: The map function accepts a function and a list; the output is a list of results of the input function on each element of the input list. ML (1973) stands for "Meta Language." ML is statically typed, and function arguments and return types may be annotated. ML is not as bracket-centric as Lisp , and instead uses

690-505: A general model of data, and subsequently promoted by Chris Date and Hugh Darwen among others. In their 1995 The Third Manifesto , Date and Darwen try to demonstrate how the relational model can accommodate certain "desired" object-oriented features. Some years after publication of his 1970 model, Codd proposed a three-valued logic (True, False, Missing/ NULL ) version of it to deal with missing information, and in his The Relational Model for Database Management Version 2 (1990) he went

759-464: A goal is a logical consequence of the program, or by showing that the goal is true in a model defined by the program. Prolog computes by reducing goals to subgoals, top-down using backward reasoning , whereas most Datalog systems compute bottom-up using forward reasoning . Answer set programs typically use SAT solvers to generate a model of the program. Models, or mathematical representations, of physical systems may be implemented in computer code that

SECTION 10

#1732779597951

828-450: A key. For example, a relation describing a company's employees may have two attributes: ID and Name. Even if no employees currently share a name, if it is possible to eventually hire a new employee with the same name as a current employee, the attribute subset {Name} is not a key. Conversely, if the subset {ID} is a key, this means not only that no employees currently share an ID, but that no employees will ever share an ID. A foreign key

897-517: A number of other operators – many of which can be defined in terms of those listed above. These include semi-join, outer operators such as outer join and outer union, and various forms of division. Then there are operators to rename columns, and summarizing or aggregating operators, and if you permit relation values as attributes (relation-valued attribute), then operators such as group and ungroup. The flexibility of relational databases allows programmers to write queries that were not anticipated by

966-798: A particular Order, we can query for all orders where Order ID in the Order relation equals the Order ID in OrderInvoice, and where Invoice ID in OrderInvoice equals the Invoice ID in Invoice. A data type in a relational database might be the set of integers, the set of character strings, the set of dates, etc. The relational model does not dictate what types are to be supported. Attributes are commonly represented as columns , tuples as rows , and relations as tables . A table

1035-619: A procedural style of programming. Declarative programming is an umbrella term that includes a number of better-known programming paradigms . Constraint programming states relations between variables in the form of constraints that specify the properties of the target solution. The set of constraints is solved by giving a value to each variable so that the solution is consistent with the maximum number of constraints. Constraint programming often complements other paradigms: functional, logical, or even imperative programming. Well-known examples of declarative domain-specific languages (DSLs) include

1104-426: A relation closely corresponds to what is usually called the extension of a predicate in first-order logic except that here we identify the places in the predicate with attribute names. Usually in the relational model a database schema is said to consist of a set of relation names, the headers that are associated with these names and the constraints that should hold for every instance of the database schema. One of

1173-514: A relational database by sending it a query . In response to a query, the database returns a result set. Often, data from multiple tables are combined into one, by doing a join . Conceptually, this is done by taking all possible combinations of rows (the Cartesian product ), and then filtering out everything except the answer. There are a number of relational operations in addition to join. These include project (the process of eliminating some of

1242-405: A separate area around 1977. Syntactically and semantically , it is a subset of Prolog. But because it does not have compound terms , it is not Turing-complete . Most Datalog systems execute programs bottom-up, using rules to reason forwards , deriving new facts from existing facts, and terminating when there are no new facts that can be derived, or when the derived facts unify with the query. In

1311-435: A step further with a four-valued logic (True, False, Missing but Applicable, Missing but Inapplicable) version. A relation consists of a heading and a body . The heading defines a set of attributes , each with a name and data type (sometimes called a domain ). The number of attributes in this set is the relation's degree or arity . The body is a set of tuples . A tuple is a collection of n values , where n

1380-468: A unique ID, since the Name field is not part of the primary key. Foreign keys are integrity constraints enforcing that the value of the attribute set is drawn from a candidate key in another relation . For example, in the Order relation the attribute Customer ID is a foreign key. A join is the operation that draws on information from several relations at once. By joining relvars from

1449-520: A value that can be attributed to a relvar. If we attempted to insert a new customer with the ID 123 , this would violate the design of the relvar since Customer ID is a primary key and we already have a customer 123 . The DBMS must reject a transaction such as this that would render the database inconsistent by a violation of an integrity constraint . However, it is possible to insert another customer named Alice , as long as this new customer has

SECTION 20

#1732779597951

1518-693: A webpage - it specifies neither control flow for rendering a page nor the page's possible interactions with a user . As of 2013 , some software systems combine traditional user-interface markup languages (such as HTML) with declarative markup that defines what (but not how) the back-end server systems should do to support the declared interface. Such systems, typically using a domain-specific XML namespace , may include abstractions of SQL database syntax or parameterized calls to web services using representational state transfer (REST) and SOAP . Functional programming languages such as Haskell , Scheme , and ML evaluate expressions via function application. Unlike

1587-488: A wider variety of syntax to codify the relationship between code elements, rather than appealing to list ordering and nesting to express everything. The following is an application of times_10 : It returns "20 : int", that is, 20 , a value of type int . Like Lisp , ML is tailored to process lists, though all elements of a list must be the same type. Prolog (1972) stands for "PROgramming in LOGic." It

1656-401: Is False , while the relation with degree 0 and cardinality 1 is True . If a relation of Employees contains the attributes {Name, ID} , then the tuple {Alice, 1} represents the proposition: "There exists an employee named Alice with ID 1 ". This proposition may be true or false. If this tuple exists in the relation's body, the proposition is true (there is such an employee). If this tuple

1725-430: Is age years old and weighs weight ." Then the given projection represents the predicate, "There exists Name such that Name is age years old and weighs weight ." Note that Harry and Peter have the same age and weight, but since the result is a relation, and therefore a set, this combination only appears once in the result. More formally the semantics of projection are defined as follows: where t [

1794-416: Is a many-to-many relationship between Order and Invoice (also called a non-specific relationship ). To represent this relationship in the database a new relvar should be introduced whose role is to specify the correspondence between Orders and Invoices: Now, the Order relvar has a one-to-many relationship to the OrderInvoice table, as does the Invoice relvar. If we want to retrieve every Invoice for

1863-429: Is a formal system . A relation's attributes define a set of logical propositions . Each proposition can be expressed as a tuple. The body of a relation is a subset of these tuples, representing which propositions are true. Constraints represent additional propositions which must also be true. Relational algebra is a set of logical rules that can validly infer conclusions from these propositions. The definition of

1932-473: Is a relation and a 1 , . . . , a n {\displaystyle a_{1},...,a_{n}} are attribute names. Its result is defined as the set obtained when the components of the tuples in R {\displaystyle R} are restricted to the set { a 1 , . . . , a n } {\displaystyle \{a_{1},...,a_{n}\}} – it discards (or excludes )

2001-455: Is a database definition language, which combines a relational view of data, as in the relational model, with a logical view, as in logic programming . Whereas relational databases use a relational calculus or relational algebra, with relational operations , such as union , intersection , set difference and cartesian product to specify queries, Datalog uses logical connectives, such as if , or , and and not to define relations as part of

2070-441: Is a subset of attributes {A} in a relation R 1 that corresponds with a key of another relation R 2 , with the property that the projection of R 1 on {A} is a subset of the projection of R 2 on {A} . In other words, if a tuple in R 1 contains values for a foreign key, there must be a corresponding tuple in R 2 containing the same values for the corresponding key. Users (or programs) request data from

2139-420: Is a unique identifier enforcing that no tuple will be duplicated; this would make the relation into something else, namely a bag , by violating the basic definition of a set . Both foreign keys and superkeys (that includes candidate keys) can be composite, that is, can be composed of several attributes. Below is a tabular depiction of a relation of our example Customer relvar; a relation can be thought of as

Relational model - Misplaced Pages Continue

2208-410: Is an attribute value, a ′ {\displaystyle a'} is an attribute name, and v {\displaystyle v} is an element of that attribute's domain — see Relation (database) . The result of a projection Π a 1 , . . . , a n ( R ) {\displaystyle \Pi _{a_{1},...,a_{n}}(R)}

2277-407: Is declarative. The code contains a number of equations, not imperative assignments, that describe ("declare") the behavioral relationships. When a model is expressed in this formalism, a computer is able to perform algebraic manipulations to best formulate the solution algorithm. The mathematical causality is typically imposed at the boundaries of the physical system, while the behavioral description of

2346-446: Is defined only if { a 1 , . . . , a n } {\displaystyle \{a_{1},...,a_{n}\}} is a subset of the header of R {\displaystyle R} . Projection over no attributes at all is possible, yielding a relation of degree zero . In this case the cardinality of the result is zero if the operand is empty, otherwise one. The two relations of degree zero are

2415-620: Is in contrast with imperative programming , which implements algorithms in explicit steps. Declarative programming often considers programs as theories of a formal logic , and computations as deductions in that logic space. Declarative programming may greatly simplify writing parallel programs . Common declarative languages include those of database query languages (e.g., SQL , XQuery ), regular expressions , logic programming (e.g. Prolog , Datalog , answer set programming ), functional programming , configuration management , and algebraic modeling systems. Declarative programming

2484-422: Is in the first normal form is vulnerable to all types of anomalies, while a database that is in the domain/key normal form has no modification anomalies. Normal forms are hierarchical in nature. That is, the lowest level is the first normal form, and the database cannot meet the requirements for higher level normal forms without first having met all the requirements of the lesser normal forms. The relational model

2553-462: Is not in the relation's body, the proposition is false (there is no such employee). Furthermore, if {ID} is a key, then a relation containing the tuples {Alice, 1} and {Bob, 1} would represent the following contradiction : Under the principle of explosion , this contradiction would allow the system to prove that any arbitrary proposition is true. The database must enforce the key constraint to prevent this. An idealized, very simple example of

2622-485: Is often defined as any style of programming that is not imperative. A number of other common definitions attempt to define it by simply contrasting it with imperative programming. For example: These definitions overlap substantially. Declarative programming is a non-imperative style of programming in which programs describe their desired results without explicitly listing commands or steps that must be performed. Functional and logic programming languages are characterized by

2691-537: Is represented solely by values within tuples, corresponding to attributes, in relations identified by relvars. A database may define arbitrary boolean expressions as constraints . If all constraints evaluate as true , the database is consistent ; otherwise, it is inconsistent . If a change to a database's relvars would leave the database in an inconsistent state, that change is illegal and must not succeed. In general, constraints are expressed using relational comparison operators, of which just one, "is subset of" (⊆),

2760-401: Is specified as a list of column definitions, each of which specifies a unique column name and the type of the values that are permitted for that column. An attribute value is the entry in a specific column and row. A database relvar (relation variable) is commonly known as a base table . The heading of its assigned value at any time is as specified in the table declaration and its body

2829-419: Is that most recently assigned to it by an update operator (typically, INSERT, UPDATE, or DELETE). The heading and body of the table resulting from evaluating a query are determined by the definitions of the operators used in that query. SQL, initially pushed as the standard language for relational databases , deviates from the relational model in several places. The current ISO SQL standard doesn't mention

Relational model - Misplaced Pages Continue

2898-496: Is the property that a value in a tuple may be derived from another value in that tuple. Other models include the hierarchical model and network model . Some systems using these older architectures are still in use today in data centers with high data volume needs, or where existing systems are so complex and abstract that it would be cost-prohibitive to migrate to systems employing the relational model. Also of note are newer object-oriented databases . and Datalog . Datalog

2967-483: Is the relation's degree, and each value in the tuple corresponds to a unique attribute. The number of tuples in this set is the relation's cardinality . Relations are represented by relational variables or relvars , which can be reassigned. A database is a collection of relvars. In this model, databases follow the Information Principle : At any given time, all information in the database

3036-539: Is theoretically sufficient. Two special cases of constraints are expressed as keys and foreign keys : A candidate key , or simply a key , is the smallest subset of attributes guaranteed to uniquely differentiate each tuple in a relation. Since each tuple in a relation must be unique, every relation necessarily has a key, which may be its complete set of attributes. A relation may have multiple keys, as there may be multiple ways to uniquely differentiate each tuple. An attribute may be unique across tuples without being

3105-551: The DPLL algorithm to generate one or more models of the program. Its applications are oriented towards solving difficult search problems and knowledge representation . Projection (relational algebra) In relational algebra , a projection is a unary operation written as Π a 1 , . . . , a n ( R ) {\displaystyle \Pi _{a_{1},...,a_{n}}(R)} , where R {\displaystyle R}

3174-440: The factorial function can be defined as follows: This defines the factorial function using its recursive definition. In contrast, it is more typical to define a procedure for an imperative language. In lisps and lambda calculus, functions are generally first-class citizens . Loosely, this means that functions can be inputs and outputs for other functions. This can simplify the definition of some functions. For example, writing

3243-642: The yacc parser generator input language, QML , the Make build specification language, Puppet 's configuration management language, regular expressions , Datalog , answer set programming and a subset of SQL (SELECT queries, for example). DSLs have the advantage of being useful while not necessarily needing to be Turing-complete , which makes it easier for a language to be purely declarative. Many markup languages such as HTML , MXML , XAML , XSLT or other user-interface markup languages are often declarative. HTML, for example, only describes what should appear on

3312-455: The "default projection" returns a multiset instead of a set, and the π projection is obtained by the addition of the DISTINCT keyword to eliminate duplicate data. For an example, consider the relations depicted in the following two tables which are the relation Person and its projection on (some say "over") the attributes Age and Weight : Suppose the predicate of Person is " Name

3381-556: The Invoice relvar will have one Order ID, which implies that there is precisely one Order for each Invoice. But in reality an invoice can be created against many orders, or indeed for no particular order. Additionally the Order relvar contains an Invoice ID attribute, implying that each Order has a corresponding Invoice. But again this is not always true in the real world. An order is sometimes paid through several invoices, and sometimes paid without an invoice. In other words, there can be many Invoices per Order and many Orders per Invoice. This

3450-571: The ML and Lisp families, are not purely functional , and thus allow the introduction of stateful effects in programs. Makefiles, for example, specify dependencies in a declarative fashion, but include an imperative list of actions to take as well. Similarly, yacc specifies a context free grammar declaratively, but includes code snippets from a host language, which is usually imperative (such as C ). Logic programming languages, such as Prolog , Datalog and answer set programming , compute by proving that

3519-487: The above example, a typical Datalog system would first derive the new facts: Using these facts, it would then derive the additional fact: It would then terminate, both because no new, additional facts can be derived, and because the newly derived fact unifies with the query Datalog has been applied to such problems as data integration , information extraction , networking , security , cloud computing and machine learning . Answer set programming (ASP) evolved in

SECTION 50

#1732779597951

3588-543: The answer substitution X = tom . Prolog executes programs top-down, using SLD resolution to reason backwards , reducing goals to subgoals. In this example, it uses the last rule of the program to reduce the goal of answering the query eat ( X , jerry ) to the subgoals of first finding an X such that big ( X ) holds and then of showing that small ( jerry ) holds. It repeatedly uses rules to further reduce subgoals to other subgoals, until it eventually succeeds in unifying all subgoals with facts in

3657-417: The columns), restrict (the process of eliminating some of the rows), union (a way of combining two tables with similar structures), difference (that lists the rows in one table that are not found in the other), intersect (that lists the rows found in both tables), and product (mentioned above, which combines each row of one table with each row of the other). Depending on which other sources you consult, there are

3726-452: The database designers. As a result, relational databases can be used by multiple applications in ways the original designers did not foresee, which is especially important for databases that might be used for a long time (perhaps several decades). This has made the idea and implementation of relational databases very popular with businesses. Relations are classified based upon the types of anomalies to which they're vulnerable. A database that

3795-510: The database itself. In contrast with the relational model, which cannot express recursive queries without introducing a least-fixed-point operator, recursive relations can be defined in Datalog, without introducing any new logical connectives or operators. Declarative programming In computer science , declarative programming is a programming paradigm —a style of building the structure and elements of computer programs—that expresses

3864-594: The example above we could query the database for all of the Customers, Orders, and Invoices. If we only wanted the tuples for a specific customer, we would specify this using a restriction condition . If we wanted to retrieve all of the Orders for Customer 123 , we could query the database to return every row in the Order table with Customer ID 123 . There is a flaw in our database design above. The Invoice relvar contains an Order ID attribute. So, each tuple in

3933-477: The excluded attributes. In a computer language it is of course possible to provide notations for both, and that was done in ISBL and several languages that have taken their cue from ISBL. A nearly identical concept occurs in the category of monoids , called a string projection , which consists of removing all of the letters in the string that do not belong to a given alphabet . When implemented in SQL standard

4002-402: The late 1990s, based on the stable model (answer set) semantics of logic programming. Like Datalog, it is a subset of Prolog; and, because it does not have compound terms, it is not Turing-complete. Most implementations of ASP execute a program by first "grounding" the program, replacing all variables in rules by constants in all possible ways, and then using a propositional SAT solver, such as

4071-408: The logic of a computation without describing its control flow . Many languages that apply this style attempt to minimize or eliminate side effects by describing what the program must accomplish in terms of the problem domain , rather than describing how to accomplish it as a sequence of the programming language primitives (the how being left up to the language's implementation ). This

4140-489: The other attributes. In practical terms, if a relation is thought of as a table, then projection can be thought of as picking a subset of its columns. For example, if the attributes are (name, age), then projection of the relation {(Alice, 5), (Bob, 8)} onto attribute list (age) yields {5,8} – we have discarded the names, and only know what ages are present. Projections may also modify attribute values. For example, if R {\displaystyle R} has attributes

4209-467: The program. This backward reasoning, goal-reduction strategy treats rules in logic programs as procedures, and makes Prolog both a declarative and procedural programming language. The broad range of Prolog applications is highlighted in the Year of Prolog Book, celebrating the 50 year anniversary of Prolog. The origins of Datalog date back to the beginning of logic programming, but it was identified as

SECTION 60

#1732779597951

4278-409: The related but more imperative paradigm of procedural programming , functional programming places little emphasis on explicit sequencing. Instead, computations are characterised by various kinds of recursive higher-order function application and composition , and as such can be regarded simply as a set of mappings between domains and codomains . Many functional languages, including most of those in

4347-433: The relational model or use relational terms or concepts. According to the relational model, a Relation's attributes and tuples are mathematical sets , meaning they are unordered and unique. In a SQL table, neither rows nor columns are proper sets. A table may contain both duplicate rows and duplicate columns, and a table's columns are explicitly ordered. SQL uses a Null value to indicate missing data, which has no analog in

4416-434: The relational model. A table in a SQL database schema corresponds to a predicate variable; the contents of a table to a relation; key constraints, other constraints, and SQL queries correspond to predicates. However, SQL databases deviate from the relational model in many details , and Codd fiercely argued against deviations that compromise the original principles. The relational model was developed by Edgar F. Codd as

4485-428: The relational model. Because a row can represent unknown information, SQL does not adhere to the relational model's Information Principle . Basic notions in the relational model are relation names and attribute names . We will represent these as strings such as "Person" and "name" and we will usually use the variables r , s , t , … {\displaystyle r,s,t,\ldots } and

4554-444: The second component yields 7. Projection is relational algebra's counterpart of existential quantification in predicate logic . The attributes not included correspond to existentially quantified variables in the predicate whose extension the operand relation represents. The example below illustrates this point. Because of the correspondence with existential quantification, some authorities prefer to define projection in terms of

4623-468: The simplest and most important types of relation constraints is the key constraint . It tells us that in every instance of a certain relational schema the tuples can be identified by their values for certain attributes. A superkey is a set of column headers for which the values of those columns concatenated are unique across all rows. Formally: A candidate key is a superkey that cannot be further subdivided to form another superkey. Functional dependency

4692-449: The system itself is declarative or acausal. Declarative modeling languages and environments include Analytica , Modelica and Simile . Lisp is a family of programming languages loosely inspired by mathematical notation and Alonzo Church 's lambda calculus . Some dialects, such as Common Lisp , are primarily imperative but support functional programming. Others, such as Scheme , are designed for functional programming. In Scheme,

4761-436: Was developed for natural language question answering , using SL resolution both to deduce answers to queries and to parse and generate natural language sentences. The building blocks of a Prolog program are facts and rules . Here is a simple example: Given this program, the query eat ( tom , jerry ) succeeds, while eat ( jerry , tom ) fails. Moreover, the query eat ( X , jerry ) succeeds with

#950049