Misplaced Pages

Berkeley DB

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

Berkeley DB ( BDB ) is an embedded database software library for key/value data, historically significant in open-source software . Berkeley DB is written in C with API bindings for many other programming languages . BDB stores arbitrary key/data pairs as byte arrays and supports multiple data items for a single key. Berkeley DB is not a relational database , although it has database features including database transactions , multiversion concurrency control and write-ahead logging . BDB runs on a wide variety of operating systems , including most Unix-like and Windows systems, and real-time operating systems .

#186813

37-538: BDB was commercially supported and developed by Sleepycat Software from 1996 to 2006. Sleepycat Software was acquired by Oracle Corporation in February 2006, who continued to develop and sell the C Berkeley DB library. In 2013 Oracle re-licensed BDB under the AGPL license and released new versions until May 2020. Bloomberg L.P. continues to develop a fork of the 2013 version of BDB within their Comdb2 database, under

74-481: A distributed database , where no single node is responsible for all data affecting a transaction, presents additional complications. Network connections might fail, or one node might successfully complete its part of the transaction and then be required to roll back its changes because of a failure on another node. The two-phase commit protocol (not to be confused with two-phase locking ) provides atomicity for distributed transactions to ensure that each participant in

111-724: A proprietary software license that included standard commercial features, and simultaneously under the newly created Sleepycat License, which allows open source use and distribution of Berkeley DB with a copyleft redistribution condition similar to the GNU General Public License . Sleepycat had offices in California , Massachusetts and the United Kingdom , and was profitable during its entire existence. ACID In computer science , ACID ( atomicity , consistency , isolation , durability )

148-424: A consequence, the transaction cannot be observed to be in progress by another database client. At one moment in time, it has not yet happened, and at the next, it has already occurred in whole (or nothing happened if the transaction was cancelled in progress). Consistency ensures that a transaction can only bring the database from one consistent state to another, preserving database invariants : any data written to

185-485: A new database, unencumbered by any AT&T patents: an on-disk hash table that outperformed the existing dbm libraries. Berkeley DB itself was first released in 1991 and later included with 4.4BSD. In 1996 Netscape requested that the authors of Berkeley DB improve and extend the library, then at version 1.86, to suit Netscape's requirements for an LDAP server and for use in the Netscape browser . That request led to

222-724: A record. Berkeley DB puts no constraints on the record's data. The record and its key can both be up to four gigabytes long. Berkeley DB supports database features such as ACID transactions , fine-grained locking , hot backups and replication . The name "Berkeley DB" is used by Oracle Corporation for three different products, only one of which is BDB: BDB was once very widespread, but usage dropped steeply from 2013 (see licensing section ). Notable software that still uses Berkeley DB for data storage include: Open-source operating systems and languages such as Perl and Python still support old BerkelyDB interfaces. The FreeBSD and OpenBSD operating systems ship Berkeley DB 1.8x to support

259-424: A row in one table whose primary key is referred to by at least one foreign key in other tables. To demonstrate isolation, we assume two transactions execute at the same time, each attempting to modify the same data. One of the two must wait until the other completes in order to maintain isolation. Consider two transactions: Combined, there are four actions: If these operations are performed in order, isolation

296-418: A table at the same time). Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially. Isolation is the main goal of concurrency control ; depending on the isolation level used, the effects of an incomplete transaction might not be visible to other transactions. Durability guarantees that once

333-455: A transaction has been committed, it will remain committed even in the case of a system failure (e.g., power outage or crash ). This usually means that completed transactions (or their effects) are recorded in non-volatile memory . The following examples further illustrate the ACID properties. In these examples, the database table has two columns, A and B. An integrity constraint requires that

370-464: Is freely-licensed database software originally developed at the University of California, Berkeley for 4.4BSD Unix . Developers from that project founded Sleepycat in 1996 to provide commercial support after a request by Netscape to provide new features in the software. In February 2006, Sleepycat was acquired by Oracle Corporation , which continued developing Berkeley DB. The founders of

407-406: Is a monetary transfer from bank account A to account B. It consists of two operations, withdrawing the money from account A and depositing it to account B. We would not want to see the amount removed from account A before we are sure it has also been transferred into account B. Performing these operations in an atomic transaction ensures that the database remains in a consistent state , that is, money

SECTION 10

#1732772538187

444-481: Is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases , a sequence of database operations that satisfies the ACID properties (which can be perceived as a single logical operation on the data) is called a transaction . For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another,

481-432: Is a single transaction. In 1983, Andreas Reuter and Theo Härder coined the acronym ACID , building on earlier work by Jim Gray who named atomicity, consistency, and durability, but not isolation, when characterizing the transaction concept. These four properties are the major guarantees of the transaction paradigm, which has influenced many aspects of development in database systems . According to Gray and Reuter,

518-527: Is checked after each transaction, it is known that A + B = 100 before the transaction begins. If the transaction removes 10 from A successfully, atomicity will be achieved. However, a validation check will show that A + B = 90 , which is inconsistent with the rules of the database. The entire transaction must be canceled and the affected rows rolled back to their pre-transaction state. If there had been other constraints, triggers, or cascades, every single change operation would have been checked in

555-454: Is maintained, although T 2 must wait. Consider what happens if T 1 fails halfway through. The database eliminates T 1 's effects, and T 2 sees only valid data. By interleaving the transactions, the actual order of actions might be: Again, consider what happens if T 1 fails while modifying B in Step 4. By the time T 1 fails, T 2 has already modified A; it cannot be restored to

592-420: Is neither debited nor credited if either of those two operations fails. Consistency is a very general term, which demands that the data must meet all validation rules. In the previous example, the validation is a requirement that A + B = 100 . All validation rules must be checked to ensure consistency. Assume that a transaction attempts to subtract 10 from A without altering B . Because consistency

629-472: Is not based on a server/client model, and does not provide support for network access – programs access the database using in-process API calls. Oracle added support for SQL in 11g R2 release based on the popular SQLite API by including a version of SQLite in Berkeley DB (it uses Berkeley DB for storage). A program accessing the database is free to decide how the data is to be stored in

666-586: Is running a transaction that has to read a row of data that user B wants to modify, user B must wait until user A's transaction completes. Two-phase locking is often applied to guarantee full isolation. An alternative to locking is multiversion concurrency control , in which the database provides each reading transaction the prior, unmodified version of data that is being modified by another active transaction. This allows readers to operate without acquiring locks, i.e., writing transactions do not block reading transactions, and readers do not block writers. Going back to

703-712: The dbopen() operating system call used by password programs such as pwb_mkdb . Linux operating systems, including those based on Debian, and Fedora ship Berkeley DB 5.3 libraries. Berkeley DB V2.0 and higher is available under a dual license : Switching the open source license in 2013 from the Sleepycat license to the AGPL had a major effect on open source software. Since BDB is a library, any application linking to it must be under an AGPL-compatible license. Many open source applications and all closed source applications would need to be relicensed to become AGPL-compatible, which

740-512: The IBM Information Management System supported ACID transactions as early as 1973 (although the acronym was created later). The characteristics of these four properties as defined by Reuter and Härder are as follows: Transactions are often composed of multiple statements . Atomicity guarantees that each transaction is treated as a single "unit", which either succeeds completely or fails completely: if any of

777-423: The ability to replicate log records and create a distributed highly available single-master multi-replica database. This is called the "High Availability" (HA) feature set. Berkeley DB's evolution has sometimes led to minor API changes or log format changes, but very rarely have database formats changed. Berkeley DB HA supports online upgrades from one version to the next by maintaining the ability to read and apply

SECTION 20

#1732772538187

814-563: The company were spouses Margo Seltzer and Keith Bostic , who are also original authors of Berkeley DB. Another original author, Michael Olson, was the President and CEO of Sleepycat. They were all at University of California, Berkeley , where they developed the software that grew to become Berkeley DB. Sleepycat was originally based in Carlisle, Massachusetts and moved to Lincoln, Massachusetts . Sleepycat distributed Berkeley DB under

851-488: The creation of Sleepycat Software . This company was acquired by Oracle Corporation in February 2006. Berkeley DB 1.x releases focused on managing key/value data storage and are referred to as "Data Store" (DS). The 2.x releases added a locking system enabling concurrent access to data. This is what is known as "Concurrent Data Store" (CDS). The 3.x releases added a logging system for transactions and recovery, called "Transactional Data Store" (TDS). The 4.x releases added

888-443: The database from occurring only partially, which can cause greater problems than rejecting the whole series outright. In other words, atomicity means indivisibility and irreducibility. Alternatively, we may say that a logical transaction may be composed of several physical transactions. Unless and until all component physical transactions are executed, the logical transaction will not have occurred. An example of an atomic transaction

925-425: The database must be valid according to all defined rules, including constraints , cascades , triggers , and any combination thereof. This prevents database corruption by an illegal transaction. An example of a database invariant is referential integrity , which guarantees the primary key – foreign key relationship. Transactions are often executed concurrently (e.g., multiple transactions reading and writing to

962-417: The example, when user A's transaction requests data that user B is modifying, the database provides A with the version of that data that existed when user B started his transaction. User A gets a consistent view of the database even if other users are changing data. One implementation, namely snapshot isolation , relaxes the isolation property. Guaranteeing ACID properties in a distributed transaction across

999-498: The level of isolation, possibly on all data that may be read as well. In write ahead logging, durability is guaranteed by writing the prospective change to a persistent log before changing the database. That allows the database to return to a consistent state in the event of a crash. In shadowing, updates are applied to a partial copy of the database, and the new copy is activated when the transaction commits. Many databases rely upon locking to provide ACID capabilities. Locking means that

1036-806: The licensing terms have led to its use in a multitude of free and open-source software . Those who do not wish to abide by the terms of the GNU AGPL, or use an older version with the Sleepycat Public License, have the option of purchasing another proprietary license for redistribution from Oracle Corporation . This technique is called dual licensing . Berkeley DB includes compatibility interfaces for some historic Unix database libraries: dbm , ndbm and hsearch (a System V and POSIX library for creating in-memory hash tables ). Berkeley DB has an architecture notably simpler than relational database management systems . Like SQLite and LMDB , it

1073-573: The original Sleepycat permissive license . Berkeley DB originated at the University of California, Berkeley as part of BSD , Berkeley's version of the Unix operating system. After 4.3BSD (1986), the BSD developers attempted to remove or replace all code originating in the original AT&T Unix from which BSD was derived. In doing so, they needed to rewrite the Unix database package. Seltzer and Yigit created

1110-557: The prior release's log records. Starting with the 6.0.21 (Oracle 12c) release, all Berkeley DB products are licensed under the GNU AGPL . Previously, Berkeley DB was redistributed under the 4-clause BSD license (before version 2.0), and the Sleepycat Public License, which is an OSI -approved open-source license as well as an FSF -approved free software license . The product ships with complete source code, build script, test suite, and documentation. The comprehensive feature along with

1147-478: The same way as above before the transaction was committed. Similar issues may arise with other constraints. We may have required the data types of both A and B to be integers. If we were then to enter, say, the value 13.5 for A , the transaction will be canceled, or the system may give rise to an alert in the form of a trigger (if/when the trigger has been written to this effect). Another example would be integrity constraints, which would not allow us to delete

Berkeley DB - Misplaced Pages Continue

1184-411: The statements constituting a transaction fails to complete, the entire transaction fails and the database is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes. A guarantee of atomicity prevents updates to the database from occurring only partially, which can cause greater problems than rejecting the whole series outright. As

1221-481: The transaction marks the data that it accesses so that the DBMS knows not to allow other transactions to modify it until the first transaction succeeds or fails. The lock must always be acquired before processing data, including data that is read but not modified. Non-trivial transactions typically require a large number of locks, resulting in substantial overhead as well as blocking other transactions. For example, if user A

1258-654: The user is told the transaction was a success. However, the changes are still queued in the disk buffer waiting to be committed to disk. Power fails and the changes are lost, but the user assumes (understandably) that the changes persist. Processing a transaction often requires a sequence of operations that is subject to failure for a number of reasons. For instance, the system may have no room left on its disk drives, or it may have used up its allocated CPU time. There are two popular families of techniques: write-ahead logging and shadow paging . In both cases, locks must be acquired on all information to be updated, and depending on

1295-476: The value in A and the value in B must sum to 100. The following SQL code creates a table as described above: Atomicity is the guarantee that series of database operations in an atomic transaction will either all occur (a successful operation), or none will occur (an unsuccessful operation). The series of operations cannot be separated with only some of them being executed, which makes the series of operations "indivisible". A guarantee of atomicity prevents updates to

1332-509: The value it had before T 1 without leaving an invalid database. This is known as a write-write contention , because two transactions attempted to write to the same data field. In a typical system, the problem would be resolved by reverting to the last known good state, canceling the failed transaction T 1 , and restarting the interrupted transaction T 2 from the good state. Consider a transaction that transfers 10 from A to B. First, it removes 10 from A, then it adds 10 to B. At this point,

1369-603: Was not acceptable to many developers and open source operating systems. By 2013 there were many alternatives to BDB, and Debian Linux was typical in their decision to completely phase out Berkeley DB, with a preference for the Lightning Memory-Mapped Database (LMDB). Sleepycat Software Sleepycat Software, Inc. was the software company primarily responsible for maintaining the Berkeley DB packages from 1996 to 2006. Berkeley DB

#186813