Misplaced Pages

Version control

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

Version control (also known as revision control , source control , and source code management ) is the software engineering practice of controlling, organizing, and tracking different versions in history of computer files ; primarily source code text files , but generally any type of file.

#365634

58-433: Version control is a component of software configuration management . A version control system is a software tool that automates version control. Alternatively, version control is embedded as a feature of some systems such as word processors , spreadsheets , collaborative web docs , and content management systems , e.g., Misplaced Pages's page history . Version control includes viewing old versions and enables reverting

116-428: A patch, which is applied to HEAD (of the trunk), creating a new revision without any explicit reference to the branch, and preserving the tree structure. Thus, while the actual relations between versions form a DAG, this can be considered a tree plus merges, and the trunk itself is a line. In distributed revision control, in the presence of multiple repositories these may be based on a single original version (a root of

174-758: A peer-to-peer approach to version control , as opposed to the client–server approach of centralized systems. Distributed revision control synchronizes repositories by transferring patches from peer to peer. There is no single central version of the codebase; instead, each user has a working copy and the full change history. Advantages of DVCS (compared with centralized systems) include: Disadvantages of DVCS (compared with centralized systems) include: Some originally centralized systems now offer some distributed features. Team Foundation Server and Visual Studio Team Services now host centralized and distributed version control repositories via hosting Git. Similarly, some distributed systems now offer features that mitigate

232-619: A software system ; part of the larger cross-disciplinary field of configuration management (CM). SCM includes version control and the establishment of baselines . The goals of SCM include: With the introduction of cloud computing and DevOps the purposes of SCM tools have become merged in some cases. The SCM tools themselves have become virtual appliances that can be instantiated as virtual machines and saved with state and version. The tools can model and manage cloud-based virtual resources, including virtual appliances, storage units, and software bundles. The roles and responsibilities of

290-446: A corollary to this is to commit only code which works and does not knowingly break existing functionality; utilizing branching to complete functionality before release; writing clear and descriptive commit messages, make what why and how clear in either the commit description or the code; and using a consistent branching strategy. Other best software development practices such as code review and automated regression testing may assist in

348-456: A different repository, this is interpreted as a merge or patch. In terms of graph theory , revisions are generally thought of as a line of development (the trunk ) with branches off of this, forming a directed tree, visualized as one or more parallel lines of development (the "mainlines" of the branches) branching off a trunk. In reality the structure is more complicated, forming a directed acyclic graph , but for many purposes "tree with merges"

406-414: A difficult manual merge when the other changes are finally checked in. In a large organization, files can be left "checked out" and locked and forgotten about as developers move between projects - these tools may or may not make it easy to see who has a file checked out. Most version control systems allow multiple developers to edit the same file at the same time. The first developer to "check in" changes to

464-413: A document or source file to which subsequent changes can be made. See baselines, labels and tags . A search for the author and revision that last modified a particular line. Software configuration management Software configuration management ( SCM ), a.k.a. software change and configuration management ( SCCM ), is the software engineering practice of tracking and controlling changes to

522-591: A file for exclusive write access, even when a merging capability exists. Most revision control tools will use only one of these similar terms (baseline, label, tag) to refer to the action of identifying a snapshot ("label the project") or the record of the snapshot ("try it with baseline X "). Typically only one of the terms baseline , label , or tag is used in documentation or discussion; they can be considered synonyms. In most projects, some snapshots are more significant than others, such as those used to indicate published releases, branches, or milestones. When both

580-442: A file to a previous version. As teams develop software, it is common for multiple versions of the same software to be deployed in different sites and for the developers to work simultaneously on updates. Bugs or features of the software are often only present in certain versions (because of the fixing of some problems and the introduction of others as the program develops). Therefore, for the purposes of locating and fixing bugs, it

638-490: A group of changes final, and available to all users. Not all revision control systems have atomic commits; Concurrent Versions System lacks this feature. The simplest method of preventing " concurrent access " problems involves locking files so that only one developer at a time has write access to the central " repository " copies of those files. Once one developer "checks out" a file, others can read that file, but no one else may change that file until that developer "checks in"

SECTION 10

#1732797589366

696-437: A peer-to-peer approach, as opposed to the client–server approach of centralized systems. Rather than a single, central repository on which clients synchronize, each peer's working copy of the codebase is a bona-fide repository. Distributed revision control conducts synchronization by exchanging patches (change-sets) from peer to peer. This results in some important differences from a centralized system: Rather, communication

754-415: A pull request to notify maintainers of a new change; a comment thread is associated with each pull request. This allows for focused discussion of code changes . Submitted pull requests are visible to anyone with repository access. A pull request can be accepted or rejected by maintainers. Once the pull request is reviewed and approved, it is merged into the repository. Depending on the established workflow,

812-399: A revision older than its immediate predecessor, then the resulting graph is instead a directed tree (each node can have more than one child), and has multiple tips, corresponding to the revisions without children ("latest revision on each branch"). In principle the resulting tree need not have a preferred tip ("main" latest revision) – just various different revisions – but in practice one tip

870-475: A set of developers, and this adds the pressure of someone managing permissions so that the code base is not compromised, which adds more complexity. Consequently, systems to automate some or all of the revision control process have been developed. This ensures that the majority of management of version control steps is hidden behind the scenes. Moreover, in software development, legal and business practice, and other environments, it has become increasingly common for

928-404: A simple example, when editing a computer file, the data stored in memory by the editing program is the working copy, which is committed by saving. Concretely, one may print out a document, edit it by hand, and only later manually input the changes into a computer and save it. For source code control, the working copy is instead a copy of all files in a particular revision, generally stored locally on

986-411: A simple line, with a single latest version, the "HEAD" revision or tip . In graph theory terms, drawing each revision as a point and each "derived revision" relationship as an arrow (conventionally pointing from older to newer, in the same direction as time), this is a linear graph . If there is branching, so multiple future revisions are based on a past revision, or undoing, so a revision can depend on

1044-568: A single document or snippet of code to be edited by a team, the members of which may be geographically dispersed and may pursue different and even contrary interests. Sophisticated revision control that tracks and accounts for ownership of changes to documents and code may be extremely helpful or even indispensable in such situations. Revision control may also track changes to configuration files , such as those typically stored in /etc or /usr/local/etc on Unix systems. This gives system administrators another way to easily track changes made and

1102-399: A source code repository that uses a distributed version control system are commonly made by means of a pull request , also known as a merge request . The contributor requests that the project maintainer pull the source code change, hence the name "pull request". The maintainer has to merge the pull request if the contribution should become part of the source base. The developer creates

1160-452: A truly distributed project, such as Linux , every contributor maintains their own version of the project, with different contributors hosting their own respective versions and pulling in changes from other users as needed, resulting in a general consensus emerging from multiple different nodes. This also makes the process of "forking" easy, as all that is required is one contributor stop accepting pull requests from other contributors and letting

1218-401: A way to roll back to earlier versions should the need arise. Many version control systems identify the version of a file as a number or letter, called the version number , version , revision number , revision , or revision level . For example, the first version of a file might be version 1 . When the file is changed the next version is 2 . Each version is associated with a timestamp and

SECTION 20

#1732797589366

1276-438: Is an adequate approximation. Revisions occur in sequence over time, and thus can be arranged in order, either by revision number or timestamp. Revisions are based on past revisions, though it is possible to largely or completely replace an earlier revision, such as "delete all existing text, insert new text". In the simplest case, with no branching or undoing, each revision is based on its immediate predecessor alone, and they form

1334-428: Is generally identified as HEAD. When a new revision is based on HEAD, it is either identified as the new HEAD, or considered a new branch. The list of revisions from the start to HEAD (in graph theory terms, the unique path in the tree, which forms a linear graph as before) is the trunk or mainline. Conversely, when a revision can be based on more than one previous revision (when a node can have more than one parent ),

1392-425: Is only necessary when pushing or pulling changes to or from other peers. Following best practices is necessary to obtain the full benefits of version control. Best practice may vary by version control tool and the field to which version control is applied. The generally accepted best practices in software development include: making incremental, small, changes; making commits which involve only one task or fix --

1450-444: Is vitally important to be able to retrieve and run different versions of the software to determine in which version(s) the problem occurs. It may also be necessary to develop two versions of the software concurrently: for instance, where one version has bugs fixed, but no new features ( branch ), while the other version is where new features are worked on ( trunk ). At the simplest level, developers could simply retain multiple copies of

1508-425: The ability to work offline, and does not rely on a single location for backups. Git , the world's most popular version control system, is a distributed version control system. In 2010, software development author Joel Spolsky described distributed version control systems as "possibly the biggest advance in software development technology in the [past] ten years". Distributed version control systems (DVCS) use

1566-603: The actors have become merged as well with developers now being able to dynamically instantiate virtual servers and related resources. Distributed revision control In software development , distributed version control (also known as distributed revision control ) is a form of version control in which the complete codebase , including its full history, is mirrored on every developer's computer. Compared to centralized version control (cf. monorepo ), this enables automatic management branching and merging , speeds up most operations (except pushing and fetching), improves

1624-462: The central repository always succeeds. The system may provide facilities to merge further changes into the central repository, and preserve the changes from the first developer when other developers check in. Merging two files can be a very delicate operation, and usually possible only if the data structure is simple, as in text files . The result of a merge of two image files might not result in an image file at all. The second developer checking in

1682-477: The code may need to be tested before being included into official release. Therefore, some projects contain a special branch for merging untested pull requests. Other projects run an automated test suite on every pull request, using a continuous integration tool, and the reviewer checks that any new code has appropriate test coverage. The first open-source DVCS systems included Arch , Monotone , and Darcs . However, open source DVCSs were never very popular until

1740-456: The code will need to take care with the merge, to make sure that the changes are compatible and that the merge operation does not introduce its own logic errors within the files. These problems limit the availability of automatic or semi-automatic merge operations mainly to simple text-based documents, unless a specific merge plugin is available for the file types. The concept of a reserved edit can provide an optional means to explicitly lock

1798-403: The codebases gradually grow apart. This arrangement, however, can be difficult to maintain, resulting in many projects choosing to shift to a paradigm in which one contributor is the universal "upstream", a repository from whom changes are almost always pulled. Under this paradigm, development is somewhat recentralized, as every project now has a central repository that is informally considered as

Version control - Misplaced Pages Continue

1856-554: The creation and adaptation of custom source code branches ( forks ) whose purpose might differ from the original project. In addition, it permits developers to locally clone an existing code repository and work on such from a local environment where changes are tracked and committed to the local repository allowing for better tracking of changes before being committed to the master branch of the repository. Such an approach enables developers to work in local and disconnected branches, making it more convenient for larger distributed teams. In

1914-410: The data as a whole, which is less intuitive for simple changes but simplifies more complex changes. When data that is under revision control is modified, after being retrieved by checking out, this is not in general immediately reflected in the revision control system (in the repository ), but must instead be checked in or committed. A copy outside revision control is known as a "working copy". As

1972-417: The developer to easily undo changes. This gives the developer more opportunity to experiment, eliminating the fear of breaking existing code. Branching assists with deployment. Branching and merging, the production, packaging, and labeling of source code patches and the easy application of patches to code bases, simplifies the maintenance and concurrent development of the multiple code bases associated with

2030-455: The developer's computer; in this case saving the file only changes the working copy, and checking into the repository is a separate step. If multiple people are working on a single data set or document, they are implicitly creating branches of the data (in their working copies), and thus issues of merging arise, as discussed below. For simple collaborative document editing, this can be prevented by using file locking or simply avoiding working on

2088-423: The developers may end up overwriting each other's work. Centralized revision control systems solve this problem in one of two different "source management models": file locking and version merging. An operation is atomic if the system is left in a consistent state even if the operation is interrupted. The commit operation is usually the most critical in this sense. Commits tell the revision control system to make

2146-436: The different versions of the program, and label them appropriately. This simple approach has been used in many large software projects. While this method can work, it is inefficient as many near-identical copies of the program have to be maintained. This requires a lot of self-discipline on the part of developers and often leads to mistakes. Since the code base is the same, it also requires granting read-write-execute permission to

2204-406: The electronic tracking of changes to CAD files (see product data management ), supplanting the "manual" electronic implementation of traditional revision control. Traditional revision control systems use a centralized model where all the revision control functions take place on a shared server . If two developers try to change the same file at the same time, without some method of managing access

2262-459: The entire code base and can focus instead on the code that introduced the problem. Version control enhances collaboration in multiple ways. Since version control can identify conflicting changes, i.e. incompatible changes made to the same lines of code, there is less need for coordination among developers. The packaging of commits, branches, and all the associated commit messages and version labels, improves communication between developers, both in

2320-446: The following of version control best practices. Costs and benefits will vary dependent upon the version control tool chosen and the field in which it is applied. This section speaks to the field of software development, where version control is widely applied. In addition to the costs of licensing the version control software, using version control requires time and effort. The concepts underlying version control must be understood and

2378-435: The identification of what problems exist, how long they have existed, and determining problem scope and solutions. Previous versions can be installed and tested to verify conclusions reached by examination of code and commit messages. Version control can greatly simplify debugging. The application of a test case to multiple versions can quickly identify the change which introduced a bug. The developer need not be familiar with

Version control - Misplaced Pages Continue

2436-711: The issues of checkout times and storage costs, such as the Virtual File System for Git developed by Microsoft to work with very large codebases, which exposes a virtual file system that downloads files to local storage only as they are needed. A distributed model is generally better suited for large projects with partly independent developers, such as the Linux Kernel . It allows developers to work in independent branches and apply changes that can later be committed, audited and merged (or rejected) by others. This model allows for better flexibility and permits for

2494-455: The local repository, and once the development is done, the change should be integrated into the central repository as soon as possible. Organizations utilizing this centralize pattern often choose to host the central repository on a third party service like GitHub , which offers not only more reliable uptime than self-hosted repositories, but can also add centralized features like issue trackers and continuous integration . Contributions to

2552-721: The moment and over time. Better communication, whether instant or deferred, can improve the code review process, the testing process, and other critical aspects of the software development process. Some of the more advanced revision-control tools offer many other facilities, allowing deeper integration with other tools and software-engineering processes. Plugins are often available for IDEs such as Oracle JDeveloper , IntelliJ IDEA , Eclipse , Visual Studio , Delphi , NetBeans IDE , Xcode , and GNU Emacs (via vc.el). Advanced research prototypes generate appropriate commit messages. Terminology can vary from system to system, but some terms in common usage include: An approved revision of

2610-419: The official repository, managed by the project maintainers collectively. While distributed version control systems make it easy for new developers to "clone" a copy of any other contributor's repository, in a central model, new developers always clone the central repository to create identical local copies of the code base. Under this system, code changes in the central repository are periodically synchronized with

2668-411: The other as secondary, merged into the first with or without its own revision history. Engineering revision control developed from formalized processes based on tracking revisions of early blueprints or bluelines . This system of control implicitly allowed returning to an earlier state of the design, for cases in which an engineering dead-end was reached in the development of the design. A revision table

2726-415: The person making the change. Revisions can be compared, restored, and, with some types of files, merged. IBM's OS/360 IEBUPDTE software update tool dates back to 1962, arguably a precursor to version control system tools. Two source management and version control packages that were heavily used by IBM 360/370 installations were The Librarian and Panvalet . A full system designed for source code control

2784-407: The presence of merges, the resulting graph is no longer a tree, as nodes can have multiple parents, but is instead a rooted directed acyclic graph (DAG). The graph is acyclic since parents are always backwards in time, and rooted because there is an oldest version. Assuming there is a trunk, merges from branches can be considered as "external" to the tree – the changes in the branch are packaged up as

2842-400: The resulting process is called a merge , and is one of the most complex aspects of revision control. This most often occurs when changes occur in multiple branches (most often two, but more are possible), which are then merged into a single branch incorporating both changes. If these changes overlap, it may be difficult or impossible to merge, and require manual intervention or rewriting. In

2900-539: The rise of distributed revision control tools such as Git . Revision control manages changes to a set of data over time. These changes can be structured in various ways. Often the data is thought of as a collection of many individual items, such as files or documents, and changes to individual files are tracked. This accords with intuitions about separate files but causes problems when identity changes, such as during renaming, splitting or merging of files. Accordingly, some systems such as Git , instead consider changes to

2958-404: The same document that someone else is working on. Revision control systems are often centralized, with a single authoritative data store, the repository, and check-outs and check-ins done with reference to this central repository. Alternatively, in distributed revision control , no single repository is authoritative, and data can be checked out and checked into any repository. When checking into

SECTION 50

#1732797589366

3016-432: The technical particulars required to operate the version control software chosen must be learned. Version control best practices must be learned and integrated into the organization's existing software development practices. Management effort may be required to maintain the discipline needed to follow best practices in order to obtain useful benefit. A core benefit is the ability to keep history and revert changes, allowing

3074-420: The term baseline and either of label or tag are used together in the same context, label and tag usually refer to the mechanism within the tool of identifying or making the record of the snapshot, and baseline indicates the increased significance of any given label or tag. Most formal discussion of configuration management uses the term baseline . Distributed revision control systems (DRCS) take

3132-399: The tree), but there need not be an original root - instead there can be a separate root (oldest revision) for each repository. This can happen, for example, if two people start working on a project separately. Similarly, in the presence of multiple data sets (multiple projects) that exchange data or merge, there is no single root, though for simplicity one may think of one project as primary and

3190-415: The updated version (or cancels the checkout). File locking has both merits and drawbacks. It can provide some protection against difficult merge conflicts when a user is making radical changes to many sections of a large file (or group of files). If the files are left exclusively locked for too long, other developers may be tempted to bypass the revision control software and change the files locally, forcing

3248-412: The various stages of the deployment process; development, testing, staging, production, etc. There can be damage mitigation, accountability, process and design improvement, and other benefits associated with the record keeping provided by version control, the tracking of who did what, when, why, and how. When bugs arise, knowing what was done when helps with damage mitigation and recovery by assisting in

3306-419: Was started in 1972, Source Code Control System for the same system (OS/360). Source Code Control System's introduction, having been published on December 4, 1975, historically implied it was the first deliberate revision control system. RCS followed just after, with its networked version Concurrent Versions System . The next generation after Concurrent Versions System was dominated by Subversion , followed by

3364-432: Was used to keep track of the changes made. Additionally, the modified areas of the drawing were highlighted using revision clouds. Version control is widespread in business and law. Indeed, "contract redline" and "legal blackline" are some of the earliest forms of revision control, and are still employed in business and law with varying degrees of sophistication. The most sophisticated techniques are beginning to be used for

#365634