The way OSes work was rather designed around the notion of large centralised systems, which is increasingly out of date. There's too much per-node configuration splattered around all over the place; you have to manually set up processes to deal with it all, otherwise you start to lose the ability to replace machines since their configuration contains mysteries that you may or may not be able to recreate if you rebuild the OS install. You really need to be able to regenerate any machine from scratch in as little time as possible - not just restoring from a backup; you might need to recreate the function of a machine on new hardware, since you can't always get replacements for old setups.
Now, if you can get away with it, you can build your application to support distributed operation - then just set up a sea of identical OS installs by using a configuration script that sets up a server and runs an instance of your distributed app, but a lot of off-the-shelf software sadly fails to support that kind of operation. If you want to run standard apps that use filesystems and SQL databases to store their data, you need to be cunning.
How can we do better? Here's a few (rather brief) notes on an approach I'm experimenting with.
Boot your OS from a CD
Not quite a liveCD, though, since we do want to actually use the local disk for stuff.
- modified RC chain that (union?) mounts /etc from the hard disk and then runs with that to get local network etc. configuration.
- swap, /tmp, /var etc on the hard disk, obviously.
- Makes rolling out new versions of the OS easy; forces you to prototype the system on a hard disk on a staging server or VM, then when it's ready, burn a CD, test the CD in a staging server, then if it works, burn a hundred copies and roll them out. USB sticks are another option, but a little more awkward in datacentre environments. Cost of having a human go and re-CD every server exists but is low and provides safety compared to automatic rollouts that could go disastrously wrong. The fact you can roll back by putting the old CD back, having a truly read-only root filesystem and OS (making it harder to hide rootkits) is great, though!
- The actual loaded OS is just a Xen dom0 setup
- Prebuilt domU root images exist on the CD-ROM, which are then spun up (based on settings in /etc on the hard disk). The root images get given a partition from the hard disk which contains their /etc and swap, and any local storage they need, in much the same way as dom0 boots directly from the CD.
- Or your read-only domU root images could be stored on the hard disks of the servers and rolled out via the network; the advantages of distributing them on CD-ROM are a lot smaller than for the dom0 OS, as dom0 can enforce the read-only nature of the domU images, provide remote access to roll back to an earlier version and try again if an upgrade turns out to be bad, etc.
- Use local storage on servers just for cached stuff and temporary storage. Eg, we have each server's configuration stored on local disk so it can boot, but that's just checked out from subversion. We put swap on local disks. But the contents of any server's disks should be recreatable by checking the configuration out from SVN again and/or rsyncing any shared mainly-read-only data (domU images etc) from authoritative copies.
- For actual data that we care about, use network protocols (iSCSI, NFS, SQL, etc) to talk to special reliable storage services.
- For domUs that have a criticial local filesystem, we use iSCSI. However, we use software RAID to mirror (or parity-protect) the filesystem over more than one physical iSCSI server, so that either can fail without losing data or causing downtime. Since the domU itself then stores nothing, should it fail (or the physical server hosting it fail), an exact duplicate can be brought up on another physical server and it will connect to the same iSCSI servers to provide access to the same data (and we hope that the filesystem used can recover from any corruption that arose during the failure, or else we're toast anyway).
- Higher level storage protocols (NFS, SQL, etc) are served out from domUs that, as above, have stable block-level storage from software-RAIDed iSCSI backends. And, likewise, should the NFS server go down, we can resurrect an identical clone of it from the same iSCSI backend disks and it will carry on with the state the failed one left behind.
- But where possible, use proper distributed/replicated databases!
- The dom0 ISO contains a bootloader, Xen, and NetBSD set up as a dom0 kernel, with /usr/pkg containing a bunch of useful core packages (sudo, subversion-base, screen, xentools, etc)
- The dom0 ISO chain will:
- Mount the first partition on the first disk in the server that has a special marker file in as /config
- union-mount /config/local/etc/ over /etc
- now read /etc/rc.conf
- Run the normal /etc/rc tasks, including mounting /var and /tmp from the hard disk, mounting data partitions and setting up networking, ipnat, ipf, etc.
- Scan the list of Xen domUs to start from /config/local/domUs/* and start them, each with the correct disk images (from the data partitions), MAC address, and memory allocations.
- /config/local and /config/global are svn checkouts
- On all machines (dom0s and domUs), /etc/hosts is a symlink to /config/global/hosts, and any other such useful files.
- domUs run
pkg_chk, but don't have /usr/pkgsrc; they fetch compiled binary packages from a repository domU running the same base OS, which builds every package in pkg_chk.conf. This domU might need to be the NIS master, since that would be the only way to keep pkgsrc-created role user UIDs in synch.
How to bootstrap it
- We need documented procedures for setting up a dom0 iso image, to make sure no important steps are missed...
- Make a working directory
- Install NetBSD source sets
- Set up custom /etc/rc that finds a suitable filesystem to locate /etc from and mounts it as /config - or drops to a shell if none can be found.
- Make a Xen3 dom0 kernel with "config netbsd root on cd0a type cd9660 dumps on none" and "options INCLUDE_CONFIG_FILE"
- Put in the Xen 3 kernel
- Configure grub menu.lst to load NetBSD on top of Xen.
- Install core packages (xen tools) - /var/db/pkg and /usr/pkg will union mount over what we provide to allow for node-local extensions, although we shouldn't need too many in dom0.
- Install grub and mkisofs as per http://www.gnu.org/software/grub/manual/html_node/Making-a-GRUB-bootable-CD-ROM.html
- We need domU read-only root filesystem images created along a similar theme
- root of /config/local for each physical server
- root of /config/local for each virtual server
- /global - /config/global
- /docs - configuration checklists and scripts for dom0 and domU master images