Null (SQL) - Misplaced Pages

In SQL , null or NULL is a special marker used to indicate that a data value does not exist in the database . Introduced by the creator of the relational database model, E. F. Codd , SQL null serves to fulfill the requirement that all true relational database management systems ( RDBMS ) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory . In SQL, NULL is a reserved word used to identify this marker.

#368631

54-429: A null should not be confused with a value of 0 . A null indicates a lack of a value, which is not the same as a zero value. For example, consider the question "How many books does Adam own?" The answer may be "zero" (we know that he owns none ) or "null" (we do not know how many he owns). In a database table, the column reporting this answer would start with no value (marked by null), and it would not be updated with

108-746: A ternary (three-valued) logic when comparing to nulls; it also details the treatment of Nulls on other set operations (the latter issue still controversial today). In database theory circles, the original proposal of Codd (1975, 1979) is now referred to as "Codd tables". Codd later reinforced his requirement that all RDBMSs support Null to indicate missing data in a 1985 two-part article published in Computerworld magazine. The 1986 SQL standard basically adopted Codd's proposal after an implementation prototype in IBM System R . Although Don Chamberlin recognized nulls (alongside duplicate rows) as one of

162-403: A cell) to store one value (the field value). The terms record and field come from the more practical field of database usage and traditional DBMS system usage (This was linked into business like terms used in manual databases e.g. filing cabinet storage with records for each customer). The terms row and column come from the more theoretical study of relational theory. Another distinction between

216-541: A comparison with Null.) Basic SQL comparison operators always return Unknown when comparing anything with Null, so the SQL standard provides for two special Null-specific comparison predicates. The IS NULL and IS NOT NULL predicates (which use a postfix syntax) test whether data is, or is not, Null. The SQL standard contains the optional feature F571 "Truth value tests" that introduces three additional logical unary operators (six in fact, if we count their negation, which

270-477: A data value for each column and would then be understood as a single structured data value. For example, a database that represents company contact information might have the following columns: ID, Company Name, Address Line 1, Address Line 2, City, and Postal Code. More formally, a row is a tuple containing a specific value for each column, for example: (1234, 'Big Company Inc.', '123 East Example Street', '456 West Example Drive', 'Big City', 98765). The word 'field'

324-605: A database that quickly needs to support electric cars while having a miles-per-gallon column. Codd indicated in his 1990 book The Relational Model for Database Management, Version 2 that the single Null mandated by the SQL standard was inadequate, and should be replaced by two separate Null-type markers to indicate why data is missing. In Codd's book, these two Null-type markers are referred to as 'A-Values' and 'I-Values', representing 'Missing But Applicable' and 'Missing But Inapplicable', respectively. Codd's recommendation would have required SQL's logic system be expanded to accommodate

378-507: A four-valued logic system. Because of this additional complexity, the idea of multiple Nulls with different definitions has not gained widespread acceptance in the database practitioners' domain. It remains an active field of research though, with numerous papers still being published. Null has been the focus of controversy and a source of debate because of its associated three-valued logic (3VL), special requirements for its use in SQL joins , and

432-410: A framework in which to evaluate the intended semantics of various proposals to implement missing-value semantics, that is referred to as Imieliński-Lipski Algebras . This section roughly follows chapter 19 of the "Alice" textbook. A similar presentation appears in the review of Ron van der Meyden, §10.4. Constructs representing missing information, such as Codd tables, are actually intended to represent

486-419: A set of relations, one for each possible instantiation of their parameters; in the case of Codd tables, this means replacement of Nulls with some concrete value. For example, A construct (such as a Codd table) is said to be a strong representation system (of missing information) if any answer to a query made on the construct can be particularized to obtain an answer for any corresponding query on

540-410: A specific sequence to produce a grammatically correct sentence that is announced throughout the facility. One of the principles of relational database design is that the fields of data tables should reflect a single characteristic of the table's subject, which means that they should not contain concatenated strings. When concatenation is desired in a report, it should be provided at the time of running

594-519: A table containing any nulls is not equal with a natural self-join of itself, meaning that whereas R ⋈ R = R {\displaystyle R\bowtie R=R} is true for any relation R in relational algebra , a SQL self-join will exclude all rows having a Null anywhere. An example of this behavior is given in the section analyzing the missing-value semantics of Nulls. The SQL COALESCE function or CASE expressions can be used to "simulate" Null equality in join criteria, and

SECTION 10

#1732798668369

648-486: A third logical result, Unknown. The logical result of the expression below, which compares the value 10 to Null, is Unknown: However, certain operations on Null can return values if the absent value is not relevant to the outcome of the operation. Consider the following example: In this case, the fact that the value on the left of OR is unknowable is irrelevant, because the outcome of the OR operation would be True regardless of

702-452: Is "snowball". In certain formalizations of concatenation theory , also called string theory, string concatenation is a primitive notion . In many programming languages , string concatenation is a binary infix operator , and in some it is written without an operator. This is implemented in different ways: In programming, string concatenation generally occurs at run time, as string values are typically not known until run time. However, in

756-577: Is encountered in Data Manipulation Language (DML) in comparison predicates of DML statements and queries. The WHERE clause causes the DML statement to act on only those rows for which the predicate evaluates to True. Rows for which the predicate evaluates to either False or Unknown are not acted on by INSERT , UPDATE , or DELETE DML statements, and are discarded by SELECT queries. Interpreting Unknown and False as

810-523: Is most commonly cited with the semantics of Null (as adopted in SQL) is his 1979 paper in the ACM Transactions on Database Systems , in which he also introduced his Relational Model/Tasmania , although much of the other proposals from the latter paper have remained obscure. Section 2.3 of his 1979 paper details the semantics of Null propagation in arithmetic operations as well as comparisons employing

864-403: Is normally used interchangeably with 'column'. However, database perfectionists tend to favor using 'field' to signify a specific cell of a given row. This is to enable accuracy in communicating with other developers. Columns (really column names) being referred to as field names (common for each row/record in the table). Then a field refers to a single storage location in a specific record (like

918-510: Is not a data value, but a marker for an absent value, using mathematical operators on Null gives an unknown result, which is represented by Null. In the following example, multiplying 10 by Null results in Null: This can lead to unanticipated results. For instance, when an attempt is made to divide Null by zero, platforms may return Null instead of throwing an expected "data exception – division by zero". Though this behavior

972-574: Is not defined by the ISO SQL standard many DBMS vendors treat this operation similarly. For instance, the Oracle, PostgreSQL, MySQL Server, and Microsoft SQL Server platforms all return a Null result for the following: String concatenation operations, which are common in SQL, also result in Null when one of the operands is Null. The following example demonstrates the Null result returned by using Null with

1026-564: Is part of their syntax), also using postfix notation. They have the following truth tables: The F571 feature is orthogonal to the presence of the Boolean datatype in SQL (discussed later in this article) and, despite syntactic similarities, F571 does not introduce Boolean or three-valued literals in the language. The F571 feature was actually present in SQL92 , well before the Boolean datatype

1080-417: Is the sure information, i.e. information which can be certainly extracted from the database regardless of what values are used to replace Nulls in the database. In the example we considered above, it's easy to see that the intersection of all possible models (i.e. the sure information) of the query selecting WHERE Age = 22 is actually empty because, for instance, the (unlifted) query returns no rows for

1134-399: Is therefore desirable. Imielinski and Lipski introduced the notion of weak representation , which essentially allows (lifted) queries over a construct to return a representation only for sure information, i.e. if it's valid for all " possible world " instantiations (models) of the construct. Concretely, a construct is a weak representation system if The right-hand side of the above equation

SECTION 20

#1732798668369

1188-420: Is unknown precisely when p is unknown and true otherwise. Because direct comparisons with Null result in the unknown logical value, the following query is not equivalent in SQL with if the column x contains any Nulls; in that case, the second query would return some rows the first one does not return, namely all those in which x is Null. In classical two-valued logic, the law of the excluded middle would allow

1242-404: Is usually expressed as simple juxtaposition (as with multiplication ). The strings over an alphabet, with the concatenation operation, form an associative algebraic structure with identity element the null string —a free monoid . Sets of strings with concatenation and alternation form a semiring , with concatenation (*) distributing over alternation (+); 0 is the empty set and 1

1296-543: The IS NULL and IS NOT NULL predicates can be used in the join criteria as well. The following predicate tests for equality of the values A and B and treats Nulls as being equal. SQL provides two flavours of conditional expressions . One is called "simple CASE" and operates like a switch statement . The other is called a "searched CASE" in the standard, and operates like an if...elseif . The simple CASE expressions use implicit equality comparisons which operate under

1350-469: The concatenation S 1 S 2 consists of all strings of the form vw where v is a string from S 1 and w is a string from S 2 , or formally S 1 S 2 = { vw : v ∈ S 1 , w ∈ S 2 } . Many authors also use concatenation of a string set and a single string, and vice versa, which are defined similarly by S 1 w = { vw : v ∈ S 1 } and vS 2 = { vw : w ∈ S 2 } . In these definitions,

1404-498: The semantics section of this article). Chamberlin also argued that besides providing some missing-value functionality, practical experience with Nulls also led to other language features that rely on Nulls, like certain grouping constructs and outer joins. Finally, he argued that in practice Nulls also end up being used as a quick way to patch an existing schema when it needs to evolve beyond its original intent, coding not for missing but rather for inapplicable information; for example,

1458-470: The F571 feature, it is possible to emulate IS UNKNOWN p by going over every argument that could make the expression p Unknown and test those arguments with IS NULL or other NULL-specific functions, although this may be more cumbersome. In SQL's three-valued logic the law of the excluded middle , p OR NOT p , no longer evaluates to true for all p . More precisely, in SQL's three-valued logic p OR NOT p

1512-502: The SQL || string concatenation operator. This is not true for all database implementations. In an Oracle RDBMS, for example, NULL and the empty string are considered the same thing and therefore 'Fish ' || NULL || 'Chips' results in 'Fish Chips'. Since Null is not a member of any data domain , it is not considered a "value", but rather a marker (or placeholder) indicating the undefined value . Because of this, comparisons with Null can never result in either True or False, but always in

1566-541: The argument set and 1<>NULL is Unknown, just as a 1=NULL is Unknown. (The CAST in this example is needed only in some SQL implementations like PostgreSQL, which would reject it with a type checking error otherwise. In many systems plain SELECT NULL works in the subquery.) The missing case above is of course: Joins evaluate using the same comparison rules as for WHERE clauses. Therefore, care must be taken when using nullable columns in SQL join criteria. In particular

1620-467: The case of string literals, the values are known at compile time, and thus string concatenation can be done at compile time, either via string literal concatenation or via constant folding , a potential run-time optimization. In formal language theory and pattern matching (including regular expressions ), the concatenation operation on strings is generalised to an operation on sets of strings as follows: For two sets of strings S 1 and S 2 ,

1674-418: The concatenation of the seven fields should happen upon running the report. The reason for such principles is that without them, the entry and updating of large volumes of data becomes error-prone and labor-intensive. Separately entering the city, state, ZIP code, and nation allows data-entry validation (such as detecting an invalid state abbreviation). Then those separate items can be used for sorting or indexing

Null (SQL) - Misplaced Pages Continue

1728-399: The condition column is interpreted as the row doesn't exist if the condition is false. It turns out that because the formulas in the condition column of a c-table can be arbitrary propositional logic formulas, an algorithm for the problem whether a c-table represents some concrete relation has a co-NP-complete complexity, thus is of little practical worth. A weaker notion of representation

1782-465: The law of excluded middle can be written. Assuming the IS UNKNOWN operator is present, p OR (NOT p ) OR ( p IS UNKNOWN) is true for every predicate p . Among logicians, this is called law of excluded fourth . There are some SQL expressions in which it is less obvious where the false dilemma occurs, for example: produces no rows because IN translates to an iterated version of equality over

1836-454: The most controversial features of SQL, he defended the design of Nulls in SQL invoking the pragmatic arguments that it was the least expensive form of system support for missing information, saving the programmer from many duplicative application-level checks (see semipredicate problem ) while at the same time providing the database designer with the option not to use Nulls if they so desire; for example, to avoid well-known anomalies (discussed in

1890-431: The other hand, a "searched" CASE expression can use predicates like IS NULL and IS NOT NULL in its conditions. The following example shows how to use a searched CASE expression to properly check for Null: In the searched CASE expression, the string 'Null Result' is returned for all rows in which i is Null. Oracle's dialect of SQL provides a built-in function DECODE which can be used instead of

1944-556: The other provides a grammatically correct sentence to the listener. This technique is also used in number change announcements, voice mail systems, or most telephony applications that provide dynamic feedback to the caller (e.g. moviefone , tellme , and others). Programming for any kind of computerised public address system can also employ concatenation for dynamic public announcements (for example, flights in an airport). The system would archive recorded speech of numbers, routes or airlines, destinations, times, etc. and play them back in

1998-532: The previous section: Column (database) In a relational database , a column is a set of data values of a particular type , one value for each row of a table . A column may contain text values, numbers, or even pointers to files in the operating system. Columns typically contain simple types , though some relational database systems allow columns to contain more complex data types, such as whole documents, images, or even video clips. A column can also be called an attribute . Each row would provide

2052-415: The records, such as all with "Boulder" as the city name. In recreational mathematics , many problems concern the properties of numbers under concatenation of their numerals in some base . Examples include home primes (primes obtained by repeatedly factoring the increasing concatenation of prime factors of a given number), Smarandache–Wellin numbers (the concatenations of the first prime numbers ), and

2106-422: The relation EmpH37. More generally, it was shown by Imielinski and Lipski that Codd tables are a weak representation system if the query language is restricted to projections, selections (and renaming of columns). However, as soon as we add either joins or unions to the query language, even this weak property is lost, as evidenced in the next section. Consider the following query over the same Codd table Emp from

2160-477: The relations it represents, which are seen as models of the construct. More precisely, if q is a query formula in the relational algebra (of "pure" relations) and if q is its lifting to a construct intended to represent missing information, a strong representation has the property that for any query q and (table) construct T , q lifts all the answers to the construct, i.e.: (The above has to hold for queries taking any number of tables as arguments, but

2214-403: The report. For example, to display the physical address of a certain customer, the data might include building number, street name, building sub-unit number, city name, state/province name, postal code, and country name, e.g., "123 Fake St Apt 4, Boulder, CO 80302, USA", which combines seven fields. However, the customers data table should not use one field to store that concatenated string; rather,

Null (SQL) - Misplaced Pages Continue

2268-512: The restriction to one table suffices for this discussion.) Clearly Codd tables do not have this strong property if selections and projections are considered as part of the query language. For example, all the answers to should include the possibility that a relation like EmpH22 may exist. However, Codd tables cannot represent the disjunction "result with possibly 0 or 1 rows". A device, mostly of theoretical interest, called conditional table (or c-table) can however represent such an answer: where

2322-508: The same logical result is a common error encountered while dealing with Nulls. The following simple example demonstrates this fallacy: The example query above logically always returns zero rows because the comparison of the i column with Null always returns Unknown, even for those rows where i is Null. The Unknown result causes the SELECT statement to summarily discard every row. (However, in practice, some SQL tools will retrieve rows using

2376-535: The same rules as the DML WHERE clause rules for Null. Thus, a simple CASE expression cannot check for the existence of Null directly. A check for Null in a simple CASE expression always results in Unknown, as in the following: Because the expression i = NULL evaluates to Unknown no matter what value column i contains (even if it contains Null), the string 'Is Null' will never be returned. On

2430-482: The set consisting of just the null string. In programming for telephony, concatenation is used to provide dynamic audio feedback to a user. For example, in a "time of day" speaking clock , concatenation is used to give the correct time by playing the appropriate recordings concatenated together. For example: "at the tone, the time will be", "eight", "thirty", "five", "and", "twenty", "five", "seconds". The recordings themselves exist separately, but playing them one after

2484-638: The simple CASE expressions and considers two nulls equal. Finally, all these constructs return a NULL if no match is found; they have a default ELSE NULL clause. SQL/PSM (SQL Persistent Stored Modules) defines procedural extensions for SQL, such as the IF statement. However, the major SQL vendors have historically included their own proprietary procedural extensions. Procedural extensions for looping and comparisons operate under Null comparison rules similar to those for DML statements and queries. The following code fragment, in ISO SQL standard format, demonstrates

2538-471: The simplification of the WHERE clause predicate, in fact its elimination. Attempting to apply the law of the excluded middle to SQL's 3VL is effectively a false dichotomy . The second query is actually equivalent with: Thus, to correctly simplify the first statement in SQL requires that we return all rows in which x is not null. In view of the above, observe that for SQL's WHERE clause a tautology similar to

2592-469: The special handling required by aggregate functions and SQL grouping operators. Computer science professor Ron van der Meyden summarized the various issues as: "The inconsistencies in the SQL standard mean that it is not possible to ascribe any intuitive logical semantics to the treatment of nulls in SQL." Although various proposals have been made for resolving these issues, the complexity of the alternatives has prevented their widespread adoption. Because Null

2646-460: The string vw is the ordinary concatenation of strings v and w as defined in the introductory section. For example, if F = { a, b, c, d, e, f, g, h } , and R = { 1, 2, 3, 4, 5, 6, 7, 8 } , then FR denotes the set of all chess board coordinates in algebraic notation , while e R denotes the set of all coordinates of the kings' file . In this context, sets of strings are often referred to as formal languages. The concatenation operator

2700-419: The terms 'column' and 'field' is that the term 'column' does not apply to certain databases, for instance key-value stores , that do not conform to the traditional relational database structure. Concatenation In formal language theory and computer programming , string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball"

2754-648: The use of Null 3VL in an IF statement. The IF statement performs actions only for those comparisons that evaluate to True. For statements that evaluate to False or Unknown, the IF statement passes control to the ELSEIF clause, and finally to the ELSE clause. The result of the code above will always be the message 'Result is Unknown' since the comparisons with Null always evaluate to Unknown. The groundbreaking work of T. Imieliński and W. Lipski Jr. (1984) provided

SECTION 50

#1732798668369

2808-555: The value on the left. SQL implements three logical results, so SQL implementations must provide for a specialized three-valued logic (3VL) . The rules governing SQL three-valued logic are shown in the tables below ( p and q represent logical states)" The truth tables SQL uses for AND, OR, and NOT correspond to a common fragment of the Kleene and Łukasiewicz three-valued logic (which differ in their definition of implication, however, SQL defines no such operation). SQL three-valued logic

2862-476: The value zero until it is ascertained that Adam owns no books. In SQL, null is a marker, not a value. This usage is quite different from most programming languages, where a null value of a reference means it is not pointing to any object . E. F. Codd mentioned nulls as a method of representing missing data in the relational model in a 1975 paper in the FDT Bulletin of ACM - SIGMOD . Codd's paper that

2916-415: Was introduced to the standard in 1999. The F571 feature is implemented by few systems however; PostgreSQL is one of those implementing it. The addition of IS UNKNOWN to the other operators of SQL's three-valued logic makes the SQL three-valued logic functionally complete , meaning its logical operators can express (in combination) any conceivable three-valued logical function. On systems that don't support

#368631