Understanding Versioning

I ran across a great book on this topic, Version Control by Example, written by Erick Sink. It was beautifully written and with a great balance of history, concept and technical details. Here’s a summary of the main concepts on the versioning systems.
A version control system is a piece of software that helps the developers on a software team work together and
also archives a complete history of their work.
There are three basic goals of a version control system (VCS):
  1. We want people to be able to work simultaneously, not serially. Think of your team as a multi-threaded piece of software with each developer running in his own thread. The key to high performance in a multi-threaded system is to maximize concurrency. Our goal is to never have a thread which is blocked on some other thread.
  2. When people are working at the same time, we want their changes to not conflict with each other. Multi-threaded programming requires great care on the part of the developer and special features such as critical sections, locks, and a test-and-set instruction on the CPU. Without these kinds of things, the threads would overwrite each other’s data. A multi-threaded software team needs things too, so that developers can work without messing each other up. That is what the version control system provides.
  3. We want to archive every version of everything that has ever existed — ever. And who did it. And when. And why.
There are 18 basic operations you can do with a version control system.1 In this chapter, I will introduce each
of these operations as an abstract notion which can be implemented by the actual commands of a specific
version control tool. Usually, the name of my abstract operation is the most common name for the command
that implements the operation. For example, since the action of committing changes to the repository is called
“commit” by Subversion, Veracity, Git, Mercurial, and Bazaar, it seemed like a good idea to use that term here
as well.


A repository is the official place where you store all your work. It keeps track of your tree, by which I mean all your files, as well as the layout of the directories in which they are stored.
But there has to be more. If the definition in the previous paragraph were the whole story, then a version control repository would be no more than a network filesystem. A repository is much more than that. A repository contains history.
A filesystem is two-dimensional: Its space is defined by directories and files. In contrast, a repository is three dimensional: It exists in a continuum defined by directories, files, and time. A version control repository contains every version of your source code that has ever existed.
A consequence of this idea is that nothing is ever really destroyed. Every time you make some kind of change to your repository, even if that change is to delete something, the repository gets larger because the history is longer. Each change adds to the history of the repository. We never subtract anything from that history.
The create operation is used to create a new repository. This is one of the first operations you will use, and after that, it gets used a lot less often. When you create a new repository, your VCS will expect you to say something to identify it, such as where you want it to be created, or what its name should be.


The checkout operation is used when you need to make a new working copy for a repository that already exists.
A working copy is a copy used for, er, working. A working copy is a snapshot of the repository used by a developer as a place to make changes. The repository is shared by the whole team, but
people do not modify it directly. Rather, each individual developer works by using a working copy. The working copy provides her with a private workspace where she can do her work isolated from the rest of the team.


This is the operation that actually modifies the repository. Several others modify the working copy and add an operation to a list we call the pending changeset, a place where changes wait to be committed. The commit operation takes the pending changeset and uses it to create a new version of the tree in the repository.
All modern version control tools perform this operation atomically. In other words, no matter how many individual modifications are in your pending changeset, the repository will either end up with all of them (if the operation is successful), or none of them (if the operation fails). It is impossible for the repository to end up in a state with only half of the operations done. The integrity of the repository is assured.
It is typical to provide a log message (or comment) when you commit, explaining the changes you have made.
This log message becomes part of the history of the repository.


Update brings your working copy up-to-date by applying changes from the repository, merging them with any changes you have made to your working copy if necessary. When the working copy was first created, it contents exactly reflected a specific revision of the repository. The VCS remembers that revision so that it can
keep careful track of where you started making your changes.
This revision is often referred to as the parent of the working copy, because if you commit changes from the working copy, that revision will be the parent of the new changeset.


Use the add operation when you have a file or directory in your working copy that is not yet under version control and you want to add it to the repository. The item is not actually added immediately. Rather, the item becomes part of the pending changeset, and is added to the repository when you commit.


This is the most common operation when using a version control system. When you checkout, your working copy contains a bunch of files from the repository. You modify those files, expecting to make your changes a part of the repository.
With most version control tools, the edit operation doesn’t actually involve the VCS directly. You simply edit the file using your favorite text editor or development environment and the VCS will notice the change and make the modified file part of the pending changeset.
On the other hand, some version control tools want you to be more explicit. Such tools usually set the filesystem read-only bit on all files in the workin copy. Later, when you notify the VCS that you want to modify a file, it will make the working copy of that file writable.


Use the delete operation when you want to remove a file or directory from
the repository.
If you try to delete a file which has been modified in your working copy, your VCS might complain. Typically, the delete operation will immediately delete the working copy of the file, but the actual deletion of the file in the repository is simply added to the pending changeset.
Recall that in the repository the file is not really deleted. When you commit a changeset containing a delete, you are simply creating a new version of the tree which does not contain the deleted file. The previous version of the tree is still in the repository, and that version still contains the file.


Use the rename operation when you want to change the name of a file or directory. The operation is added to the pending changeset, but the item in the working copy typically gets renamed immediately.
There is lot of variety in how version control tools support rename. Some of the earlier tools had no support for rename at all. Some tools (including Bazaar and Veracity) implement rename formally, requiring that they be notified explicitly when something is to be renamed. Such tools treat the name of a file or directory as simply one of its attributes, subject to change over time.
Still other tools (including Git) implement rename informally, detecting renames by observing changes rather than by keeping track of the identity of a file. Rename detection usually works well in practice, but if a file has been both renamed and modified, there is a chance the VCS will do the wrong thing.


Use the move operation when you want to move a file or directory from one place in the tree to another. The operation is added to the pending changeset, but the item in the working copy typically gets moved immediately.
Some tools treat rename and move as the same operation (in the Unix tradition of treating the file’s entire path as its name), while others keep them separate (by thinking of the file’s name and its containing directory as separate attributes).


As you make changes in your working copy, each change is added to the pending changeset. The status operation is used to see the pending changeset.
Or to put it another way, status shows you what changes would be applied to the repository if you were to commit.


Status provides a list of changes but no details about them. To see exactly what changes have been made to the files, you need to use the diff operation.
Your VCS may implement diff in a number of different ways. For a command-line application, it may simply print out a diff to the console. Or your VCS might launch a visual diff application.


Sometimes I make changes to my working copy that I simply don’t intend to keep. Perhaps I tried to fix a bug and discovered that my fix introduce five new bugs which are worse than the one I started with. Or perhaps I just changed my mind. In any case, a very nice feature of a working copy is the ability to revert the changes I have made.

A complete revert of the working copy will throw away all your pending changes and return the working copy to the way it was just after you did the checkout.


Your repository keeps track of every version that has ever existed. The log operation is the way to see those records. It displays each changeset along with additional data such as:
  • Who made the change?
  • When was the change made?
  • What was the log message?
Most version control tools present ways of slicing and dicing this information. For example, you can ask log to list all the changesets made by the user named Leonardo, or all the changesets made during April 2010.


Version control tools provide a way to mark a specific instant in the history of the repository with a meaningful name.


The branch operation is what you use when you want your development process to fork off into two different directions. For example, when you release version 3.0, you might want to create a branch so that development of 4.0 features can be kept separate from 3.0.x bug-fixes.


Typically when you have used branch to enable your development to diverge, you later want it to converge again, at least partially. For example, if you created a branch for 3.0.x bug-fixes, you probably want those bugfixes to happen in the main line of development as well. Without the merge operation, you could still achieve this by manually doing the bug-fixes in both branches. Merge makes this operation simpler by automating things
as much as possible.


In some cases, the merge operation requires human intervention. Merge automatically deals with everything that can be done safely. Everything else is considered a conflict. For example, what if the file foo.js was modified in one branch and deleted in the other? This kind of situation requires a person to make the decisions. The resolve operation is used to help the user figure things out and to inform the VCS how the conflict should be


The lock operation is used to get exclusive rights to modify a file. Not all version control tools include this feature. In some cases, it is provided but is intended to be rarely used. For any files that are in a format based on plain text (source code, XML, etc.), it is usually best to just let the VCS handle the concurrency issues. But for binary files which cannot be automatically merged, it can be handy to grab a lock on a file.