Managing lots of servers (by )

The way OSes work was rather designed around the notion of large centralised systems, which is increasingly out of date. There's too much per-node configuration splattered around all over the place; you have to manually set up processes to deal with it all, otherwise you start to lose the ability to replace machines since their configuration contains mysteries that you may or may not be able to recreate if you rebuild the OS install. You really need to be able to regenerate any machine from scratch in as little time as possible - not just restoring from a backup; you might need to recreate the function of a machine on new hardware, since you can't always get replacements for old setups.

Now, if you can get away with it, you can build your application to support distributed operation - then just set up a sea of identical OS installs by using a configuration script that sets up a server and runs an instance of your distributed app, but a lot of off-the-shelf software sadly fails to support that kind of operation. If you want to run standard apps that use filesystems and SQL databases to store their data, you need to be cunning.

How can we do better? Here's a few (rather brief) notes on an approach I'm experimenting with.

Boot your OS from a CD

Not quite a liveCD, though, since we do want to actually use the local disk for stuff.

  • modified RC chain that (union?) mounts /etc from the hard disk and then runs with that to get local network etc. configuration.
  • swap, /tmp, /var etc on the hard disk, obviously.
  • Makes rolling out new versions of the OS easy; forces you to prototype the system on a hard disk on a staging server or VM, then when it's ready, burn a CD, test the CD in a staging server, then if it works, burn a hundred copies and roll them out. USB sticks are another option, but a little more awkward in datacentre environments. Cost of having a human go and re-CD every server exists but is low and provides safety compared to automatic rollouts that could go disastrously wrong. The fact you can roll back by putting the old CD back, having a truly read-only root filesystem and OS (making it harder to hide rootkits) is great, though!

Use Xen

  • The actual loaded OS is just a Xen dom0 setup
  • Prebuilt domU root images exist on the CD-ROM, which are then spun up (based on settings in /etc on the hard disk). The root images get given a partition from the hard disk which contains their /etc and swap, and any local storage they need, in much the same way as dom0 boots directly from the CD.
  • Or your read-only domU root images could be stored on the hard disks of the servers and rolled out via the network; the advantages of distributing them on CD-ROM are a lot smaller than for the dom0 OS, as dom0 can enforce the read-only nature of the domU images, provide remote access to roll back to an earlier version and try again if an upgrade turns out to be bad, etc.

Virtualise storage

  • Use local storage on servers just for cached stuff and temporary storage. Eg, we have each server's configuration stored on local disk so it can boot, but that's just checked out from subversion. We put swap on local disks. But the contents of any server's disks should be recreatable by checking the configuration out from SVN again and/or rsyncing any shared mainly-read-only data (domU images etc) from authoritative copies.
  • For actual data that we care about, use network protocols (iSCSI, NFS, SQL, etc) to talk to special reliable storage services.
  • For domUs that have a criticial local filesystem, we use iSCSI. However, we use software RAID to mirror (or parity-protect) the filesystem over more than one physical iSCSI server, so that either can fail without losing data or causing downtime. Since the domU itself then stores nothing, should it fail (or the physical server hosting it fail), an exact duplicate can be brought up on another physical server and it will connect to the same iSCSI servers to provide access to the same data (and we hope that the filesystem used can recover from any corruption that arose during the failure, or else we're toast anyway).
  • Higher level storage protocols (NFS, SQL, etc) are served out from domUs that, as above, have stable block-level storage from software-RAIDed iSCSI backends. And, likewise, should the NFS server go down, we can resurrect an identical clone of it from the same iSCSI backend disks and it will carry on with the state the failed one left behind.
  • But where possible, use proper distributed/replicated databases!

Details

  • The dom0 ISO contains a bootloader, Xen, and NetBSD set up as a dom0 kernel, with /usr/pkg containing a bunch of useful core packages (sudo, subversion-base, screen, xentools, etc)
  • The dom0 ISO chain will:
    1. Mount the first partition on the first disk in the server that has a special marker file in as /config
    2. union-mount /config/local/etc/ over /etc
    3. now read /etc/rc.conf
    4. Run the normal /etc/rc tasks, including mounting /var and /tmp from the hard disk, mounting data partitions and setting up networking, ipnat, ipf, etc.
    5. Scan the list of Xen domUs to start from /config/local/domUs/* and start them, each with the correct disk images (from the data partitions), MAC address, and memory allocations.
  • /config/local and /config/global are svn checkouts
  • On all machines (dom0s and domUs), /etc/hosts is a symlink to /config/global/hosts, and any other such useful files.
  • domUs run pkg_chk, but don't have /usr/pkgsrc; they fetch compiled binary packages from a repository domU running the same base OS, which builds every package in pkg_chk.conf. This domU might need to be the NIS master, since that would be the only way to keep pkgsrc-created role user UIDs in synch.

How to bootstrap it

  • We need documented procedures for setting up a dom0 iso image, to make sure no important steps are missed...
    • Make a working directory
    • Install NetBSD source sets
    • Set up custom /etc/rc that finds a suitable filesystem to locate /etc from and mounts it as /config - or drops to a shell if none can be found.
    • Make a Xen3 dom0 kernel with "config netbsd root on cd0a type cd9660 dumps on none" and "options INCLUDE_CONFIG_FILE"
    • Put in the Xen 3 kernel
    • Configure grub menu.lst to load NetBSD on top of Xen.
    • Install core packages (xen tools) - /var/db/pkg and /usr/pkg will union mount over what we provide to allow for node-local extensions, although we shouldn't need too many in dom0.
    • Install grub and mkisofs as per http://www.gnu.org/software/grub/manual/html_node/Making-a-GRUB-bootable-CD-ROM.html
  • We need domU read-only root filesystem images created along a similar theme

Subversion layout:

  • /pservers/ - root of /config/local for each physical server
  • /vservers/ - root of /config/local for each virtual server
  • /global - /config/global
  • /docs - configuration checklists and scripts for dom0 and domU master images

Living in Groups (by )

Until relatively recently in human history, people tended to live in small but relatively intimate groups, sharing a lot of domestic arrangements; a party would go out hunting, another went out gathering, others looked after the children, others cooked the food, and so on.

This was, quite simply, more efficient. Economies of scale meant that a small team could cook for a large group in less total person-hours than each person cooking for themselves - especially when you compare the time consumed making and maintaining cooking equipment and the like.

These days, the same economies of scale have had the opposite effect - food is now produced in factories, and easy-to-cook ingredients and ready meals are cheaply available; this, combined with all sorts of other socio-economic factors, has lead to it now being quite practical to live entirely alone, spending your days working then coming home to a small meal you cook for yourself in minutes, cleaning your dishes and clothes in a machine, cleaning your floors with a machine, and so on.

And, thus, I suspect the loneliness of bachelor living is probably a modern phenomenon. Without ready meals and domestic appliances, moving away from home would be an unattractive prospect until you had a partner to team up with in order to form a breadwinning/homemaking duo (and the fact that sexist role models enforced a certain split of duties is, I think, entirely orthogonal to this issue) - and when you team up with a partner is precisely when you really start to want to be away from your parents...

Some of the best living arrangements I've had have been as a student, when the also-interesting economics of the cost of a place to live in London would force us to share houses (and sometimes rooms). Although we rarely actually cooked for each other, living in the same house as several other people was psychologically comforting for me. I really don't function well at all when living on my own; I've never officially done it, but in situations where all my housemates happen to be away for a few days, I've definitely started to slide into depression.

I currently live with my wife and daughter, so I'm basically OK, but even then, we still wish there were other similar couples we could share some resources with; if our house was larger we'd have lodgers. Personally I think my ideal would be having my own bedroom, office, bathroom, and kitchen (although I'd often cook for others), but sharing a big living room and garden, and being in the same physical building. There's increased security in a house that's rarely totally empty, and efficiencies in sharing resources (such a house would take up much less space than several individual ones, and consume much less energy), and increased convenience (you'd be quite likely to be able to find somebody to help you with something). And you'd have good times together.

This was a nice thing about what I did last weekend, which was to go on camp with my cub scouts; you might think that, with my legendarily complicated and busy life, the last thing I need is to donate my time to a voluntary organisation. But I work solving complex mental tasks (mainly on my own), face the difficult challenges of supporting my family under trying circumstances (shared with my wife, but we still feel quite 'alone' as a small family without much support from our extended family); after weeks of that, a weekend of hard work solving relatively simple problems (how to wash the puddle of sick away from outside the tent full of sleeping children, when it's raining heavily and the sick is slowly being washed downhill towards the tent? Answer: get digging equipment, dig a trench in the little gap between the tent and the puddle, scrape it all in there with the spade then wash it in with water, then close the trench) as part of a team is a delightfully refreshing change. Much more refreshing than a holiday spent just doing nothing; I'd be fretting too much about all the jobs I should be doing at home. Volunteering means I'm doing something somebody needs me to do, but working with others so it's a fun team activity rather than an ordeal.

But I wonder how many people would be happier living in 'communes'. A friend of mine is a Hare Krishna; I'm indifferent to the religion, but their culture is excellent - and part of it seems to be a high acceptance of living in groups sharing resources, which I think is very healthy.

Perhaps there's an opening for a property developer to set up some buildings with little apartments that then share living areas. Obviously they couldn't just be sold as independent units; perhaps they'd need to be owned by a limited company of some kind and the mortgage repayments, rent, or other expenses paid by all the residents paying a share, since the residents would need to be able to vet and veto potential new housemates, as rifts occurring in such a community would be fatal.

In the meantime, I wish our house had room for lodgers 😉

Version Control and Leadership (by )

For many years now, most of my home directory has been under version control of one form or another. I have a laptop, a desktop machine, and a server I ssh to; keeping stuff in synch between three working environments is very valuable, as is having efficient offsite backups and history.

I started my version control career, like most folks, with CVS - since for a long time CVS was the only open-source version control system in widespread usage.

Then along came Subversion, which was clearly Much Nicer, and I quickly switched my personal version control system over to using it. As a freelance software engineer I use it commercially, and now run a virtualised trac/svn hosting system that lets me easily add new projects, which many projects I'm involved with are hosted on. And my open source projects run on a similar platform.

However, more recently, there's been an explosion of interest in the distributed version control model, with lots of products appearing, such as Darcs, Mercurial, Monotone and Git.

I've been quite interested in the distributed model; sure, Subversion is working well for me, but the distributed model interests me because it's more general. You can set up a central repository and push all your changes to it so it's the central synch point, like a Subversion repository, but you don't have to; you can synch changes between arbitrary copies of your stuff without having to go through a central point. And given two approaches, one of which has a superset of the functionality of the other, I'm naturally drawn towards the superset, even if I only need the features of the subset - because I can't predict what my future needs will be.

Also, these distributed version control systems seemed to have better branch merging than Subversion, which until recently required manual tracking of which changes had been merged into a branch from other branches. And being able to do 'local commits' to a local repository, while working offline on my laptop on a train, then commit them to the server as a batch would be great. Subversion really can't do very much without a network connection to its server at the moment.

Now, I was starting to gravitate towards Mercurial, since it's written in Python and seems quite widely available. But then I saw the following talk by Linus Torvalds on git (which he originally wrote):

Two things struck me.

  1. I do like the architecture of git. Subversion stores history as a set of deltas; each version the files have been through are encoded in terms of their differences from the next version, while git just stores multiple as-is snapshots of the state in a content-addressable file system not unlike Venti, which automatically replaces multiple copies of identical data with references to a single copy of it. So it can pull out any version of the files very quickly, and doesn't really have to worry too much about how versions are related; Subversion stores everything as explicit chains of diffs and has to walk those chains to get anywhere. Git makes a note of which revision led to which revision(s) - it can be more than one if there was a branch, and more than one revision can lead to the same revision if there was a merge - but that's just used for working out the common ancestor of two arbitrary revisions in order to merge them; git can efficiently and reliably merge arbitrary points in arbitrary branches by skipping along the links to find the nearest common ancestor, generating diffs from that to the source of the merge, then applying those diffs to the target of the merge. There's none of the complex stuff that Subversion has to do with tracking which changes have been applied and all that. NOTE: I'm talking about "Subversion vs. Git" here since those are the examples of each model I know much about - I'm really comparing the models, not the precise products, here.
  2. Linus Torvalds makes an act of calling people who disagree with him "stupid and ugly", and making somewhat grand claims such as stating that centralised version control just can't work, and generally acting as though he's smarter than everyone else. Now, he does that in a tongue in cheek way; I get the impression he's not really a git (even though he claims he named git after himself), although I couldn't be sure unless I met him. Indeed, I used to think he was a bit of a git from reading things he'd said, but seeing him in action on video for the first time made me realise that he seems to be joking after all. BUT, I think this may be part of why he has become famous and well-respected in some circles. There's a few quite cocky people in the software world who push their ideas with arrogance rather than humility, steamrolling their intellectual opponents with insults; Richard Stallman comes to mind as another. Now, people who do this but are notably and demonstrably wrong get 'outed' as a git and lose a lot of respect; but if you're generally right and do this, it seems to lead to you having vehement followers who believe what you say quite uncritically. Which is interesting.

But I still can't choose. I see a lot of git vs. svn vs. hg vs. monotone vs. darcs - most of them complaining about problems with the loser that have been fixed in more recent versions. They're all rapidly moving targets! It looks like the only way to actually choose one is to spend a few months working on a major project with recent versions of each... in parallel. NOT GOING TO HAPPEN!

I dunno. I'm kinda leaning towards moving to git, but I'm worried that this might just be Linus Torvalds' reality distortion field pulling me in. Next I'll be using Linux if I'm not careful...

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales