Version Control and Leadership (by alaric)
For many years now, most of my home directory has been under version control of one form or another. I have a laptop, a desktop machine, and a server I ssh to; keeping stuff in synch between three working environments is very valuable, as is having efficient offsite backups and history.
I started my version control career, like most folks, with CVS - since for a long time CVS was the only open-source version control system in widespread usage.
Then along came Subversion, which was clearly Much Nicer, and I quickly switched my personal version control system over to using it. As a freelance software engineer I use it commercially, and now run a virtualised trac/svn hosting system that lets me easily add new projects, which many projects I'm involved with are hosted on. And my open source projects run on a similar platform.
However, more recently, there's been an explosion of interest in the distributed version control model, with lots of products appearing, such as Darcs, Mercurial, Monotone and Git.
I've been quite interested in the distributed model; sure, Subversion is working well for me, but the distributed model interests me because it's more general. You can set up a central repository and push all your changes to it so it's the central synch point, like a Subversion repository, but you don't have to; you can synch changes between arbitrary copies of your stuff without having to go through a central point. And given two approaches, one of which has a superset of the functionality of the other, I'm naturally drawn towards the superset, even if I only need the features of the subset - because I can't predict what my future needs will be.
Also, these distributed version control systems seemed to have better branch merging than Subversion, which until recently required manual tracking of which changes had been merged into a branch from other branches. And being able to do 'local commits' to a local repository, while working offline on my laptop on a train, then commit them to the server as a batch would be great. Subversion really can't do very much without a network connection to its server at the moment.
Now, I was starting to gravitate towards Mercurial, since it's written in Python and seems quite widely available. But then I saw the following talk by Linus Torvalds on git (which he originally wrote):
Two things struck me.
- I do like the architecture of git. Subversion stores history as a set of deltas; each version the files have been through are encoded in terms of their differences from the next version, while git just stores multiple as-is snapshots of the state in a content-addressable file system not unlike Venti, which automatically replaces multiple copies of identical data with references to a single copy of it. So it can pull out any version of the files very quickly, and doesn't really have to worry too much about how versions are related; Subversion stores everything as explicit chains of diffs and has to walk those chains to get anywhere. Git makes a note of which revision led to which revision(s) - it can be more than one if there was a branch, and more than one revision can lead to the same revision if there was a merge - but that's just used for working out the common ancestor of two arbitrary revisions in order to merge them; git can efficiently and reliably merge arbitrary points in arbitrary branches by skipping along the links to find the nearest common ancestor, generating diffs from that to the source of the merge, then applying those diffs to the target of the merge. There's none of the complex stuff that Subversion has to do with tracking which changes have been applied and all that. NOTE: I'm talking about "Subversion vs. Git" here since those are the examples of each model I know much about - I'm really comparing the models, not the precise products, here.
- Linus Torvalds makes an act of calling people who disagree with him "stupid and ugly", and making somewhat grand claims such as stating that centralised version control just can't work, and generally acting as though he's smarter than everyone else. Now, he does that in a tongue in cheek way; I get the impression he's not really a git (even though he claims he named git after himself), although I couldn't be sure unless I met him. Indeed, I used to think he was a bit of a git from reading things he'd said, but seeing him in action on video for the first time made me realise that he seems to be joking after all. BUT, I think this may be part of why he has become famous and well-respected in some circles. There's a few quite cocky people in the software world who push their ideas with arrogance rather than humility, steamrolling their intellectual opponents with insults; Richard Stallman comes to mind as another. Now, people who do this but are notably and demonstrably wrong get 'outed' as a git and lose a lot of respect; but if you're generally right and do this, it seems to lead to you having vehement followers who believe what you say quite uncritically. Which is interesting.
But I still can't choose. I see a lot of git vs. svn vs. hg vs. monotone vs. darcs - most of them complaining about problems with the loser that have been fixed in more recent versions. They're all rapidly moving targets! It looks like the only way to actually choose one is to spend a few months working on a major project with recent versions of each... in parallel. NOT GOING TO HAPPEN!
I dunno. I'm kinda leaning towards moving to git, but I'm worried that this might just be Linus Torvalds' reality distortion field pulling me in. Next I'll be using Linux if I'm not careful...




