Misplaced Pages

Git

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

In software development , distributed version control (also known as distributed revision control ) is a form of version control in which the complete codebase , including its full history, is mirrored on every developer's computer. Compared to centralized version control (cf. monorepo ), this enables automatic management branching and merging , speeds up most operations (except pushing and fetching), improves the ability to work offline, and does not rely on a single location for backups. Git , the world's most popular version control system, is a distributed version control system.

#84915

66-501: Git ( / ɡ ɪ t / ) is a distributed version control system that tracks versions of files . It is often used to control source code by programmers who are developing software collaboratively. Design goals of Git include speed, data integrity , and support for distributed , non-linear workflows — thousands of parallel branches running on different computers. As with most other distributed version control systems, and unlike most client–server systems, Git maintains

132-771: A Tcl/Tk GUI , which allows users to perform actions such as creating and amending commits, creating and merging branches, and interacting with remote repositories. In addition to the official GUI, many 3rd party interfaces exist that provide similar features to the official GUI distributed with Git, such as GitHub Desktop, SourceTree, and TortoiseGit. GUI clients make Git easier to learn and use, improving workflow efficiency and reducing errors. Popular options include cross-platform GitKraken Desktop (freemium) and Sourcetree (free/paid), or platform-specific choices like GitHub Desktop (free) for Windows/macOS and TortoiseGit (free) for Windows. While Git provides built-in GUI tools (git-gui, gitk),

198-427: A source-code management system. Torvalds explains: In many ways you can just see git as a filesystem—it's content-addressable , and it has a notion of versioning, but I really designed it coming at the problem from the viewpoint of a filesystem person (hey, kernels is what I do), and I actually have absolutely zero interest in creating a traditional SCM system. From this initial design approach, Git has developed

264-697: A build of Git for Windows, still using the MSYS2 environment. The JGit implementation of Git is a pure Java software library, designed to be embedded in any Java application. JGit is used in the Gerrit code-review tool, and in EGit, a Git client for the Eclipse IDE. Go-git is an open-source implementation of Git written in pure Go . It is currently used for backing projects as a SQL interface for Git code repositories and providing encryption for Git. Dulwich

330-400: A change request typically originating from an end user. That request is evaluated and if it is decided to implement it, the programmer studies the existing code to understand how it works before implementing the change. Testing to make sure the existing functionality is retained and the desired new functionality is added often comprises the majority of the maintenance cost. Software maintenance

396-588: A distributed system that he could use like BitKeeper, but none of the available free systems met his needs. He cited an example of a source-control management system needing 30 seconds to apply a patch and update all associated metadata, and noted that this would not scale to the needs of Linux kernel development, where synchronizing with fellow maintainers could require 250 such actions at once. For his design criterion, he specified that patching should take no more than three seconds, and added three more goals: These criteria eliminated every version-control system in use at

462-426: A fixed meaning, but often refers to older systems which are large, difficult to modify, and also necessary for current business needs. Often legacy systems are written in obsolete programming languages , lack documentation, have a deteriorating structure after years of changes, and depend on experts to keep it operational. When dealing with these systems, at some point so much technical debt accumulates that maintenance

528-483: A large number of contributors to understand the code base and fix bugs efficiently. An additional problem with maintenance is that nearly every change to code will introduce new bugs or unexpected ripple effects , which require another round of fixes. Testing can consume the majority of maintenance resource for safety-critical code, due to the need to revalidate the entire software if any changes are made. Revalidation may include code review , regression testing with

594-502: A local copy of the entire repository , a.k.a. repo, with history and version-tracking abilities, independent of network access or a central server . A repo is stored on each computer in a standard directory with additional, hidden files to provide version control capabilities. Git provides features to synchronize changes between repos that share history; copied (cloned) from each other. For collaboration, Git supports synchronizing with repos on remote machines. Although all repos (with

660-518: A non-default strategy can be selected at merge time: When there are more than one common ancestors that can be used for a three-way merge, it creates a merged tree of the common ancestors and uses that as the reference tree for the three-way merge. This has been reported to result in fewer merge conflicts without causing mis-merges by tests done on prior merge commits taken from Linux 2.6 kernel development history. Also, this can detect and handle merges involving renames. Git's primitives are not inherently

726-415: A pull request to notify maintainers of a new change; a comment thread is associated with each pull request. This allows for focused discussion of code changes . Submitted pull requests are visible to anyone with repository access. A pull request can be accepted or rejected by maintainers. Once the pull request is reviewed and approved, it is merged into the repository. Depending on the established workflow,

SECTION 10

#1732772583085

792-512: A scheduled release and implemented. Although agile methodology does not have a maintenance phase, the change cycle can be enacted as a scrum sprint . Understanding existing code is an essential step before modifying it. The rate of understanding depends both on the code base as well as the skill of the programmer. Following coding conventions such as using clear function and variable names that correspond to their purpose makes understanding easier. Use of conditional loop statements only if

858-586: A service. The most popular are GitHub , SourceForge , Bitbucket and GitLab . Git, a powerful version control system, can be daunting with its command-line interface. Git GUI clients offer a graphical user interface (GUI) to simplify interaction with Git repositories. These GUIs provide visual representations of your project's history, including branches, commits, and file changes. They also streamline actions like staging changes, creating commits, and managing branches. Visual diff tools help resolve merge conflicts arising from concurrent development. Git comes with

924-547: A similar uptake among open-source projects. Distributed version control In 2010, software development author Joel Spolsky described distributed version control systems as "possibly the biggest advance in software development technology in the [past] ten years". Distributed version control systems (DVCS) use a peer-to-peer approach to version control , as opposed to the client–server approach of centralized systems. Distributed revision control synchronizes repositories by transferring patches from peer to peer. There

990-399: A source code repository that uses a distributed version control system are commonly made by means of a pull request , also known as a merge request . The contributor requests that the project maintainer pull the source code change, hence the name "pull request". The maintainer has to merge the pull request if the contribution should become part of the source base. The developer creates

1056-417: A subset of unit tests , integration tests , and system tests . The goal of the testing is to verify that previous functionality is retained, and the new functionality has been added. The key purpose of software maintenance is ensuring that the product continues to meet usability requirements. At times, this may mean extending the product's capabilities beyond what was initially envisioned. According to

1122-503: A subset of Git. GameOfTrees is an open-source implementation of Git for the OpenBSD project. As Git is a distributed version control system, it could be used as a server out of the box. It is shipped with a built-in command git daemon which starts a simple TCP server running on the Git protocol. Dedicated Git HTTP servers help (amongst other features) by adding access control, displaying

1188-452: A truly distributed project, such as Linux , every contributor maintains their own version of the project, with different contributors hosting their own respective versions and pulling in changes from other users as needed, resulting in a general consensus emerging from multiple different nodes. This also makes the process of "forking" easy, as all that is required is one contributor stop accepting pull requests from other contributors and letting

1254-461: A unique blob. The relationships between the blobs can be found through examining the tree and commit objects. Newly added objects are stored in their entirety using zlib compression. This can consume a large amount of disk space quickly, so objects can be combined into packs , which use delta compression to save space, storing blobs as their changes relative to other blobs. Additionally, Git stores labels called refs (short for references) to indicate

1320-538: A wider range of third-party options cater to platform-specific user preferences. The Eclipse Foundation reported in its annual community survey that as of May 2014, Git is now the most widely used source-code management tool, with 42.9% of professional software developers reporting that they use Git as their primary source-control system compared with 36.3% in 2013, 32% in 2012; or for Git responses excluding use of GitHub : 33.3% in 2014, 30.3% in 2013, 27.6% in 2012 and 12.8% in 2011. Open-source directory Open Hub reports

1386-431: A working system in short order. These influences led to the following implementation choices: Another property of Git is that it snapshots directory trees of files. The earliest systems for tracking versions of source code, Source Code Control System (SCCS) and Revision Control System (RCS), worked on individual files and emphasized the space savings to be gained from interleaved deltas (SCCS) or delta encoding (RCS)

SECTION 20

#1732772583085

1452-413: Is an implementation of Git written in pure Python with support for CPython 3.6 and later and Pypy. The libgit2 implementation of Git is an ANSI C software library with no other dependencies, which can be built on multiple platforms, including Windows, Linux, macOS, and BSD. It has bindings for many programming languages, including Ruby , Python, and Haskell . JS-Git is a JavaScript implementation of

1518-467: Is an ongoing process that is essential for the longevity of a software system, to keep it effective, adaptable and relevant in an ever-evolving technological landscape. Software maintenance is often considered lower skilled and less rewarding than new development. As such, it is a common target for outsourcing or offshoring . Usually, the team developing the software is different from those who will be maintaining it. The developers lack an incentive to write

1584-468: Is compromised by a change. A challenge with maintainability is that many software engineering courses do not emphasize it, and give out one-and-done assignments that have clear and unchanging specifications. Software engineering courses do not cover systems as complex as occur in the real world. Development engineers who know that they will not be responsible for maintaining the software do not have an incentive to build in maintainability. Maintenance

1650-424: Is incurred when programmers, often out of laziness or urgency to meet a deadline, choose quick and dirty solutions rather than build maintainability into their code. A common cause is underestimates in software development effort estimation , leading to insufficient resources allocated to development. One important aspect is having a large amount of automated software tests that can detect if existing functionality

1716-537: Is no single central version of the codebase; instead, each user has a working copy and the full change history. Advantages of DVCS (compared with centralized systems) include: Disadvantages of DVCS (compared with centralized systems) include: Some originally centralized systems now offer some distributed features. Team Foundation Server and Visual Studio Team Services now host centralized and distributed version control repositories via hosting Git. Similarly, some distributed systems now offer features that mitigate

1782-425: Is not as well studied as other phases of the software life cycle, despite comprising the majority of costs. Understanding has not changed significantly since the 1980s. Software maintenance can be categorized into several types depending on whether it is preventive or reactive and whether it is seeking to add functionality or preserve existing functionality, the latter typically in the face of a changed environment. In

1848-477: Is not practical or economical. Other choices include: Despite taking up the lion's share of software development resources, maintenance is the least studied phase of software development. Much of the literature has focused on how to develop maintainable code from the outset, with less focus on motivating engineers to make maintainability a priority. As of 2020 , automated solutions for code refactoring to reduce maintenance effort are an active area of research, as

1914-448: Is often called software evolution instead of maintenance. Despite testing and quality assurance , virtually all software contains bugs where the system does not work as intended. Post-release maintenance is necessary to remediate these bugs when they are found. Most software is a combination of pre-existing commercial off-the-shelf (COTS) and open-source software components with custom-written code. COTS and open-source software

1980-433: Is often considered an unrewarding job for software engineers , who, if assigned to maintenance, were more likely to quit. It often pays less than a comparable job in software development. The task is often assigned to temporary workers or lesser-skilled staff, although maintenance engineers are also typically older than developers, partly because they must be familiar with outdated technologies. In 2008, around 900,000 of

2046-583: Is the modification of software after delivery. As per the IEEE standard glossary of software engineering terminology, software maintenance refers to the process of modifying and updating software after its initial development and deployment, to correct faults, improve performance or other attributes, add new features to meet evolving user requirements, or adapt to a changed environment. It is important to emphasize that software maintenance thus involves many activities that go beyond mere bug fixing. Software maintenance

Git - Misplaced Pages Continue

2112-420: Is typically updated over time, which can reduce the maintenance burden, but the modifications to these software components will need to be adjusted for in the final product. Unlike software development , which is focused on meeting specified requirements, software maintenance is driven by events—such as user requests or detection of a bug. Its main purpose is to preserve the usefulness of the software, usually in

2178-722: The ISO / IEC 14764 specification, software maintenance can be classified into four types: According to some estimates, enhancement (the latter two categories) comprises some 80 percent of software maintenance. Maintainability is the quality of software enabling it to be easily modified without breaking existing functionality. According to the ISO/IEC 14764 specification, activity to ensure software maintainability prior to release counts as part of software maintenance. Many software development organizations neglect maintainability, even though doing so will increase long-term costs. Technical debt

2244-593: The code documentation . On the other hard structured iterative enhancement can begin by changing the top-level requirements document and propagating the change down to lower levels of the system. Modification often includes code refactoring (improving the structure without changing functionality) and restructuring (improving structure and functionality at the same time). Unlike commercial software, free and open source software change cycles are largely restricted to coding and testing, with minimal documentation. Open-source software projects instead rely on mailing lists and

2310-548: The open-source community. Today, Git is the de facto standard version control system. It is the most popular distributed version control system, with nearly 95% of developers reporting it as their primary version control system as of 2022. It is the most widely used source-code management tool among professional developers. There are offerings of Git repository services, including GitHub , SourceForge , Bitbucket and GitLab . Torvalds started developing Git in April 2005 after

2376-429: The (mostly similar) versions. Later revision-control systems maintained this notion of a file having an identity across multiple revisions of a project. However, Torvalds rejected this concept. Consequently, Git does not explicitly record file revision relationships at any level below the source-code tree. These implicit revision relationships have some significant consequences: Git implements several merging strategies;

2442-507: The 1.3 million software engineers and programmers working in the United States were doing maintenance. Companies started separate teams for maintenance, which led to outsourcing this work to a different company, and by the turn of the twenty-first century, sometimes offshoring the work to another country—whether as part of the original company or a separate entity. The typical sources of outsourcing are developed countries such as

2508-887: The BSDs ( DragonFly BSD , FreeBSD , NetBSD , and OpenBSD ), Solaris , macOS , and Windows . The first Windows port of Git was primarily a Linux-emulation framework that hosts the Linux version. Installing Git under Windows creates a similarly named Program Files directory containing the Mingw-w64 port of the GNU Compiler Collection , Perl 5, MSYS2 (itself a fork of Cygwin , a Unix-like emulation environment for Windows) and various other Windows ports or emulations of Linux utilities and libraries. Currently, native Windows builds of Git are distributed as 32- and 64-bit installers. The git official website currently maintains

2574-523: The Git database that is not referred to may be cleaned up by using a garbage collection command or automatically. An object may be referenced by another object or an explicit reference. Git has different types of references. The commands to create, move, and delete references vary. git show-ref lists all references. Some types are: Git (the main implementation in C) is primarily developed on Linux , although it also supports most major operating systems, including

2640-644: The Linux kernel tree at a rate of 6.7 patches per second. On 16 June, Git managed the kernel 2.6.12 release. Torvalds turned over maintenance on 26 July 2005 to Junio Hamano, a major contributor to the project. Hamano was responsible for the 1.0 release on 21 December 2005. Torvalds sarcastically quipped about the name git (which means "unpleasant person" in British English slang): "I'm an egotistical bastard, and I name all my projects after myself. First ' Linux ', now 'git'." The man page describes Git as "the stupid content tracker". The read-me file of

2706-480: The United States, the United Kingdom, Japan, and Australia, while destinations are usually lower-cost countries such as China, India, Russia, and Ireland. Reasons for offshoring include taking advantage of lower labor costs, enabling around-the-clock support, reducing time pressure on developers, and to move support closer to the market for the product. Downsides of offshoring include communication barriers in

Git - Misplaced Pages Continue

2772-423: The change. This may require input from multiple departments; for example, the marketing team can help evaluate whether the change is expected to bring more business. Software development effort estimation is a difficult problem, including for maintenance change requests, but the request is likely to be declined if it is too expensive or infeasible. If it is decided to implement the request, it can be assigned to

2838-449: The code could execute more than once, and eliminating code that will never execute can also increase understandability. Experienced programmers have an easier time understanding what the code does at a high level. Software visualization is sometimes used to speed up this process. Modification to the code may take place in any way. On the one hand, it is common to haphazardly apply a quick fix without being granted enough time to update

2904-477: The code may need to be tested before being included into official release. Therefore, some projects contain a special branch for merging untested pull requests. Other projects run an automated test suite on every pull request, using a continuous integration tool, and the reviewer checks that any new code has appropriate test coverage. The first open-source DVCS systems included Arch , Monotone , and Darcs . However, open source DVCSs were never very popular until

2970-418: The code to be easily maintained. Software is often delivered incomplete and almost always contains some bugs that the maintenance team must fix. Software maintenance often initially includes the development of new functionality, but as the product nears the end of its lifespan, maintenance is reduced to the bare minimum and then cut off entirely before the product is withdrawn. Each maintenance cycle begins with

3036-409: The code, modifying it, and revalidating it. Frequently, software is delivered in an incomplete state. Developers will test a product until running out of time or funding, because they face fewer consequences for an imperfect product than going over time or budget. The transition from the development to the maintenance team is often inefficient, without lists of known issues or validation tests, which

3102-403: The codebases gradually grow apart. This arrangement, however, can be difficult to maintain, resulting in many projects choosing to shift to a paradigm in which one contributor is the universal "upstream", a repository from whom changes are almost always pulled. Under this paradigm, development is somewhat recentralized, as every project now has a central repository that is informally considered as

3168-418: The company may decide that it is no longer profitable to make functional improvements, and restrict support to bug fixing and emergency updates. Changes become increasingly difficult and expensive due to lack of expertise or decaying architecture due to software aging . After a product is no longer maintained, and does not receive even this limited level of updating, some vendors will seek to extract revenue from

3234-408: The contents of a Git repository via the web interfaces, and managing multiple repositories. Already existing Git repositories can be cloned and shared to be used by others as a centralized repo. It can also be accessed via remote shell just by having the Git software installed and allowing a user to log in. Git servers typically listen on TCP port 9418. There are many offerings of Git repositories as

3300-554: The creation and adaptation of custom source code branches ( forks ) whose purpose might differ from the original project. In addition, it permits developers to locally clone an existing code repository and work on such from a local environment where changes are tracked and committed to the local repository allowing for better tracking of changes before being committed to the master branch of the repository. Such an approach enables developers to work in local and disconnected branches, making it more convenient for larger distributed teams. In

3366-630: The early 1970s, companies began to separate out software maintenance with its own team of engineers to free up software development teams from support tasks. In 1972, R. G. Canning published "The Maintenance 'Iceberg ' ", in which he contended that software maintenance was an extension of software development with an additional input: the existing system. The discipline of software maintenance has changed little since then. One twenty-first century innovation has been companies deliberately releasing incomplete software and planning to finish it post-release. This type of change, and others that expand functionality,

SECTION 50

#1732772583085

3432-405: The face of changing requirements. If conceived of as part of the software development life cycle , maintenance is the last and typically the longest phase of the cycle, comprising 80 to 90 percent of the lifecycle cost. Other models consider maintenance separate from software development, instead as part of the software maintenance life cycle (SMLC). SMLC models typically include understanding

3498-433: The form of such factors as time zone and organizational disjunction and cultural differences. Despite many employers considering maintenance lower-skilled work and the phase of software development most suited to offshoring, it requires close communication with the customer and rapid response, both of which are hampered by these communication difficulties. In software engineering, the term legacy system does not have

3564-488: The free license for BitKeeper , the proprietary source-control management (SCM) system used for Linux kernel development since 2002, was revoked for Linux. The copyright holder of BitKeeper, Larry McVoy , claimed that Andrew Tridgell had created SourcePuller by reverse engineering the BitKeeper protocols . The same incident also spurred the creation of Mercurial , another version-control system. Torvalds wanted

3630-418: The full set of features expected of a traditional SCM, with features mostly being created as needed, then refined and extended over time. Git has two data structures : a mutable index (also called stage or cache ) that caches information about the working directory and the next revision to be committed; and an object database that stores immutable objects. The index serves as a connection point between

3696-711: The issues of checkout times and storage costs, such as the Virtual File System for Git developed by Microsoft to work with very large codebases, which exposes a virtual file system that downloads files to local storage only as they are needed. A distributed model is generally better suited for large projects with partly independent developers, such as the Linux Kernel . It allows developers to work in independent branches and apply changes that can later be committed, audited and merged (or rejected) by others. This model allows for better flexibility and permits for

3762-455: The local repository, and once the development is done, the change should be integrated into the central repository as soon as possible. Organizations utilizing this centralize pattern often choose to host the central repository on a third party service like GitHub , which offers not only more reliable uptime than self-hosted repositories, but can also add centralized features like issue trackers and continuous integration . Contributions to

3828-484: The locations of various commits. They are stored in the reference database and are respectively: Frequently used commands for Git's command-line interface include: A .gitignore file may be created in a Git repository as a plain text file . The files listed in the .gitignore file will not be tracked by Git. This feature can be used to ignore files with keys or passwords, various extraneous files, and large files (which GitHub will refuse to upload). Every object in

3894-462: The maintenance team will likely recreate. After release, members of the development team are likely to be reassigned or otherwise become unavailable. The maintenance team will require additional resources for the first year after release, both for technical support and fixing defects left over from development. Initially, software may go through a period of enhancements after release. New features are added according to user feedback. At some point,

3960-407: The object database and the working tree. The object store contains five types of objects: Each object is identified by a SHA-1 hash of its contents. Git computes the hash and uses this value for the object's name. The object is put into a directory matching the first two characters of its hash. The rest of the hash is used as the file name for that object. Git stores each revision of a file as

4026-419: The official repository, managed by the project maintainers collectively. While distributed version control systems make it easy for new developers to "clone" a copy of any other contributor's repository, in a central model, new developers always clone the central repository to create identical local copies of the code base. Under this system, code changes in the central repository are periodically synchronized with

SECTION 60

#1732772583085

4092-496: The release of Git and Mercurial . BitKeeper was used in the development of the Linux kernel from 2002 to 2005. The development of Git , now the world's most popular version control system, was prompted by the decision of the company that made BitKeeper to rescind the free license that Linus Torvalds and some other Linux kernel developers had previously taken advantage of. Software maintenance Software maintenance

4158-575: The same history) are peers, developers often use a central server to host a repo to hold an integrated copy. Git is a free and open-source software shared under the GPL-2.0-only license . Git was originally created by Linus Torvalds for version control during the development of the Linux kernel . The trademark "Git" is registered by the Software Freedom Conservancy , marking its official recognition and continued evolution in

4224-407: The software as long as possible, even though the product is likely to become increasingly avoided. Eventually, the software will be withdrawn from the market, although it may remain in use. During this process, the software becomes a legacy system . The first step in the change cycle is receiving a change request from a customer and analyzing it to confirm the problem and decide whether to implement

4290-417: The source code elaborates further: "git" can mean anything, depending on your mood. The source code for Git refers to the program as "the information manager from hell". Git's design is a synthesis of Torvalds's experience with Linux in maintaining a large distributed development project, along with his intimate knowledge of file-system performance gained from the same project and the urgent need to produce

4356-412: The time, so immediately after the 2.6.12-rc2 Linux kernel development release, Torvalds set out to write his own. The development of Git began on 3 April 2005. Torvalds announced the project on 6 April and became self-hosting the next day. The first merge of multiple branches took place on 18 April. Torvalds achieved his performance goals; on 29 April, the nascent Git was benchmarked recording patches to

#84915