Version control systems has been an integral part of the software development process in the IT world. Industry players have regarded it as a best practice, and indeed the use of one has become the norm rather than the exception. Much like IDE’s, people understand that the benefits of using one far outweighs the cost or the additional learning curve that comes with using such system.
For years now, Subversion or SVN has been the king of version control systems. Developed by Apache, it has been the go-to system of most developers. However, recently, a new VCS has been making waves, bringing with it a new approach to versioning and the promise of a faster, more powerful and more efficient system. This is Git.
Git was initially designed and developed by Linus Torvalds for Linux kernel. It began after many Linux kernel developers chose to give up access to the proprietary BitKeeper system. Primarily developed on Linux, Git can be used on other Unix-like operating system. For Windows, a native port called msysgit is available, as well as a GUI client called TortoiseSVN.
HOW GIT WORKS
Git stores and thinks about information much differently than version control systems, even though the user interface is fairly similar. Most other systems store information as a list of file-based changes. It simply stores the set of files and the changes made to each file over time. Git, however, treats data more like a set of snapshots of a mini file system. For every commit, it takes a picture of what the files look like at the moment and saves a reference for the check-in.
To uniquely identify all the files and objects, Git uses SHA-1 hash. In additional to ensuring uniqueness of identifiers, this action also guarantees the integrity of the files as it’s impossible to modify any file without Git knowing about it.
In order to understand how Git works, it is important to know the three states under Git:
- Committed – means the data is safely stored in the local database
- Modified – means that you have changed the file but have not committed it to your database yet
- Staged – means that you have marked a modified file in its current version to go into your next commit snapshot
GIT ADVANTAGES
Git’s most compelling feature is the branching and merging. In most VCS, a branch is basically a clone of the repository in a new directory – Git does not work like that. A branch in Git is in actuality a simple file that contains the 40 character SHA-1 checksum of the commit it points to.
This means that a user can do things like:
- Create a branch to try out an idea, commit a few times, switch back to where you branched from, apply a patch, switch back to where you are experimenting, then merge it in.
- Create a branch that always contains only what goes to production,
- Create new branches for each new feature that can be merged to the master branch later on
- Create a branch to experiment in and either merge that or delete it – without it being seen by others.
Another positive of Git is that everything is in local. There is very little aside from “fetch”, “pull” and “push” commands that communicates in any way with anything other than the hard disk – all operations are done locally. This means that aside from having everything within one’s fingertips always, Git operations are blazingly fast. In fact, it is limited only by the speed of the hard drive. This has been and continues to be the design goal of the application.
Last, but definitely not the least, is Git’s ability to adapt to any workflow. Because of Git’s distributed nature and superior branching system, one can easily implement pretty much any workflow one can think of relatively easily.
It can be the traditional Subversion-style workflow, with one repository acting as central server.
It can be used with Integration Manager workflow, wherein one person acts as the integration manager who commits to the “blessed” repository, and having developers who clone that repository, push to their own independent repositories, then asks the integration manager to pull from their respective repositories.
It can also implement the Dictator and Lieutenants workflow, wherein the workflow is broken down per module. “Lieutenants”, people who are in charge of a specific module of the project, are responsible for merging all the changes for their respective modules. They have their “dictators”, or people who are responsible in pulling the changes from his/her lieutenants and pushing them to the blessed repository. And with Git’s flexibility, it can be set up in any way, from having one dictator for the project, to multiple dictators, and to having one lieutenant per dictator, to multiple lieutenants per dictator.
Whether it be a Subversion-style workflow, Integration Manager workflow, or even Dictator and Lieutenants workflow, Git is flexible and powerful enough to adjust and still maintain its usefulness.
This is just a quick overview of Git – the versioning control system. It barely scratched the surface as to Git’s usefulness, flexibility and usability. In fact, Git offers more complex and powerful functionality such as complex diffing and merging. Indeed, Git has shown that it is a well thought out and designed system which aims to help and facilitate efficient versioning of files and codes. With all its offerings, it is only prudent that industry players take a look at Git and see how it fits their current set-up.
References:
- http://git-scm.com/
- http://blogs.atlassian.com/2012/02/version-control-centralized-dvcs/
- http://whygitisbetterthanx.com/
- http://thinkvitamin.com/code/why-you-should-switch-from-subversion-to-git/
- http://gaveen.owain.org/2008/05/simple-diagram-on-distributed-vcs-hint.html
- http://progit.org/
- http://www.youtube.com/watch?v=8dhZ9BXQgc4