Category: Computing

Block device snapshots in NetBSD (by )

fss is a neat feature in NetBSD that I hadn't noticed (and I suspect few people have). It's a filesystem snapshot driver.

You attach it to a mounted filesystem, and whenever a block in the underlying disk partition is written to, it copies the old block to a snapshot file. It also creates a snapshot block device, and any reads to that device look in the snapshot file first, then look in the underlying disk partition if there isn't a corresponding block in the snapshot file. End result? The snapshot block device looks like a frozen-in-time copy of the underlying disk partition, without the space and time requirements of actually copying the whole thing.

It works like this:

        -bash-3.2$ cat /home/alaric/test.txt
        THIS IS A TEST
        -bash-3.2$ sudo fssconfig fss0 / /tmp/back
        Password:
        -bash-3.2$ sudo mount /dev/fss0 /mnt
        -bash-3.2$ cat /mnt/home/alaric/test.txt
        THIS IS A TEST
        -bash-3.2$ cat > /home/alaric/test.txt
        THIS IS NOT A TEST
        -bash-3.2$ cat /mnt/home/alaric/test.txt
        THIS IS A TEST
        -bash-3.2$ cat /home/alaric/test.txt
        THIS IS NOT A TEST
        -bash-3.2$ mount
        /dev/wd0a on / type ffs (local)
        kernfs on /kern type kernfs (local)
        procfs on /proc type procfs (local)
        /dev/fss0 on /mnt type ffs (local)
        -bash-3.2$ sudo umount /mnt     
        -bash-3.2$ sudo fssconfig -lv
        fss0: /, taken 2008-11-29 01:21:23, file system internal
        fss1: not in use
        fss2: not in use
        fss3: not in use
        -bash-3.2$ sudo fssconfig -u fss0
        -bash-3.2$ cat /home/alaric/test.txt 
        THIS IS NOT A TEST
        -bash-3.2$ sudo rm /tmp/back
        override rw-------  root/wheel for '/tmp/back'? y

...taking it from me that there were no long pauses after any command in that sequence; no operation took time noticeably proportional to the 8GB size of the disk partition.

I created the snapshot file on the same filesystem I was snapshotting, which is only allowed for the ffs filesystem due to needing special locking; but it can snapshot any filesystem if I put the snapshot log on a different filesystem.

This is useful for various things:

  1. Consistent backups. As a backup, that might take hours, runs, the filesystem is changing underneath it, meaning that you can end up with a broken backup when your applications change groups of files together; the backup can end up with non-corresponding versions of things. Imagine running a backup while reinstalling a load of applications and their shared libraries, and it getting a mixture of old and new versions of libraries and binaries. Ick.
  2. Short-term onsite backups. Imagine running a snapshot every hour, from a cron job, and deleting the oldest snapshot, so you have four hourly snapshots on the go. If you do something stupid, you can go back and retrieve old versions of your stuff. Or perhaps a week's nightly snapshots. Not a backup in that it won't protect against system failures, but it's the kind of backup you can go back to when you mess something up at the filesystem level.
  3. Trialling potentially disastrous operations, like major software upgrades. Take a snapshot beforehand. If it fails, then copy the afflicted files back from the snapshot.
  4. Security auditing. Take regular nightly snapshots, then you can compare them to the live system to see what's changed, to help analyse successful breakins.

There is one caveat: a snapshot taken from a mounted filesystem will, when itself mounted, of course give you a log warning:

  /dev/fss0: file system not clean (fs_clean=4); please fsck(8)
  /dev/fss0: lost blocks 0 files 0

...and you can't fsck it since it's read-only, so you might run into trouble with that, but I think a good sync before taking the snapshot should make the window of opportunity for problems quite small.

This is really neat stuff. It's been in since NetBSD 2.0, and is still marked as experimental - so more people need to try it out, find any bugs, and otherwise confirm it works fine so it can lose that experimental tag 😉

I’m missing Scheme (by )

I've not done any Scheme programming for ages. In fact, the past few months have been quite a haze of relentless hard work; I'm liking what I'm actually doing for a living, except I've been doing rather a lot of faffing about recruitment rather than actually doing it lately.

I'm having to spend half of my week in London, and the other half working from home - but with Sarah away that half of the week doing her course, I'm working from home alone by day and looking after Jean in the evening, while having my working day bracketed by taking Jean to and from nursery, which is a half hour round trip each way. All the thrill of commuting without the fun of working somewhere different to where you sleep, or with people.

So, no programming-for-fun lately! But that can't last forever, since trying to stop my mind from going exploring for too many months in a row is always rather futile.

So I came across Ventonegro's post on and-let* and it set me thinking. The Lisp family of languages (which includes Scheme) are renowned for their macros, which are the key rationale for the minimalist syntax; without things like if holding a special place in the language, user-written macros are just as powerful as anything that comes built into the language. This lets you extend the language with features that you'd be mad to build into a language core, but which are nonetheless useful reusable constructs, such as and-let*.

As an aside, let me just explain and-let* - the name is a terse mnemonic that makes sense to Schemers and nobody else, but it's a way of compactly writing bits of code that attempt to compute something in steps, where the trail might end at any step and fall back to some default. The example Ventonegro gives is rather good:

  (define (get-session request)
    (and-let* ((cookies (request-cookies request))
               (p (assoc "session_id" cookies))
               (sid-str (cdr p))
               (sid (string->number sid-str))
               ((integer? sid))
               ((exact? sid))
               (sp (assq sid *sessions*)))
      (cdr sp)))

Which translates to:

  • If there are cookies available
  • And there is one called session_id
  • And parsing it as a session id succeeds
  • and the session id is a number
  • and that number is an integer
  • and that number is exact (eg, 3 rather than 3.0)
  • and that number is the ID of an existing session
  • ...return that session

A few languages happen to make that pattern easy to write natively by putting assignments inside an and, as Peter Bex points out, but with Lisp you don't need to rely on the that piece of luck; you can roll your own luck.

There's a whole library of useful macros and combinators (another handy higher-order programming tool) in most Scheme systems, and any your system lacks can be copied easily enough. But it occurs to me that there's very few educational resources on actually using them. I think a definite theme, if not a whole chapter, in a "practical Scheme" book would have to be the introduction and then applied use of such handy macros (and a damned good reference guide to them all), because reading the definition of and-let* failed to really fill me with inspiration for situations I'd use it in. While reading Ventonegro's example reminded me of some ugly code I'd written that could be tidied up by just using and-let*.

It's great to be able to assemble your own syntactic tools, but presenting them as one unorganised mass will just make your language seem as complex and messy as C++ and Perl combined; yet sticking only to the core base language and expecting programmers to spot their own patterns and abstract them out results in duplication of effort, and every piece of source code starting with a preamble full of generally rather general macros, neither of which are good. Rather than choosing a tradeoff between the two, as static programming language designers are forced to, we need to find a way of cataloguing such tools so that they can easily be split by category, and by priority; so the ones that are most widely useful can be learnt wholesale, and ones that are more useful in certain niches can be glanced at once then gone back to and studied if required.

Fun in computer games (by )

I've noticed a pattern in computer games which I find fun. Not all games I find fun; they can be fun in different ways. I'm just saying I've noticed a particular element which subtly contributes towards the funness.

Namely, having to make a tradeoff between two or more competing requirements.

Let's have an example - Desktop Tower Defence. It's a tower defence game, which means that you use your resources to build a set of defences that waves of attackers then flow into.

Firstly, clever placement of defences has a much greater effect than simply how much you spend on them. So the game requires some measure of thought, rather than repetitive accumulation of resources followed by spending them.

But the crux of the matter is that there are different kinds of attackers, which have different weaknesses. A defence set up in the way that would be the strongest against land-based attackers - a long winding zig-zag with turrets along it - would be weak against flying attackers, since they just fly over your layout in a straight line rather than being constrained to the paths. Against them, you want a solid block of turrets in a cross, under the two orthogonal lines they fly along. So you need to establish some tradeoff between the two challenges. Not to mention that there are turrets which only attack air targets, but have a high damage per cost ratio, and turrets which only attack ground targets, and turrets that attack both but have a worse damage per cost ratio. And turrets with long ranges, or high fire rates, or that do a lot of damage per shot, or damage neighbouring targets due to a splash effect, and so on.

Sometimes you can have a tradeoff that's too simple - it's amenable to mathematical analysis to find an optimal result. That's no good. It has to be too complex to work out on paper, but not too complex to grasp. The middle ground between the two is the area where experimentation is rewarding.

I probably ought to read A Theory of Fun for Game Design...

Managing lots of servers (by )

The way OSes work was rather designed around the notion of large centralised systems, which is increasingly out of date. There's too much per-node configuration splattered around all over the place; you have to manually set up processes to deal with it all, otherwise you start to lose the ability to replace machines since their configuration contains mysteries that you may or may not be able to recreate if you rebuild the OS install. You really need to be able to regenerate any machine from scratch in as little time as possible - not just restoring from a backup; you might need to recreate the function of a machine on new hardware, since you can't always get replacements for old setups.

Now, if you can get away with it, you can build your application to support distributed operation - then just set up a sea of identical OS installs by using a configuration script that sets up a server and runs an instance of your distributed app, but a lot of off-the-shelf software sadly fails to support that kind of operation. If you want to run standard apps that use filesystems and SQL databases to store their data, you need to be cunning.

How can we do better? Here's a few (rather brief) notes on an approach I'm experimenting with.

Boot your OS from a CD

Not quite a liveCD, though, since we do want to actually use the local disk for stuff.

  • modified RC chain that (union?) mounts /etc from the hard disk and then runs with that to get local network etc. configuration.
  • swap, /tmp, /var etc on the hard disk, obviously.
  • Makes rolling out new versions of the OS easy; forces you to prototype the system on a hard disk on a staging server or VM, then when it's ready, burn a CD, test the CD in a staging server, then if it works, burn a hundred copies and roll them out. USB sticks are another option, but a little more awkward in datacentre environments. Cost of having a human go and re-CD every server exists but is low and provides safety compared to automatic rollouts that could go disastrously wrong. The fact you can roll back by putting the old CD back, having a truly read-only root filesystem and OS (making it harder to hide rootkits) is great, though!

Use Xen

  • The actual loaded OS is just a Xen dom0 setup
  • Prebuilt domU root images exist on the CD-ROM, which are then spun up (based on settings in /etc on the hard disk). The root images get given a partition from the hard disk which contains their /etc and swap, and any local storage they need, in much the same way as dom0 boots directly from the CD.
  • Or your read-only domU root images could be stored on the hard disks of the servers and rolled out via the network; the advantages of distributing them on CD-ROM are a lot smaller than for the dom0 OS, as dom0 can enforce the read-only nature of the domU images, provide remote access to roll back to an earlier version and try again if an upgrade turns out to be bad, etc.

Virtualise storage

  • Use local storage on servers just for cached stuff and temporary storage. Eg, we have each server's configuration stored on local disk so it can boot, but that's just checked out from subversion. We put swap on local disks. But the contents of any server's disks should be recreatable by checking the configuration out from SVN again and/or rsyncing any shared mainly-read-only data (domU images etc) from authoritative copies.
  • For actual data that we care about, use network protocols (iSCSI, NFS, SQL, etc) to talk to special reliable storage services.
  • For domUs that have a criticial local filesystem, we use iSCSI. However, we use software RAID to mirror (or parity-protect) the filesystem over more than one physical iSCSI server, so that either can fail without losing data or causing downtime. Since the domU itself then stores nothing, should it fail (or the physical server hosting it fail), an exact duplicate can be brought up on another physical server and it will connect to the same iSCSI servers to provide access to the same data (and we hope that the filesystem used can recover from any corruption that arose during the failure, or else we're toast anyway).
  • Higher level storage protocols (NFS, SQL, etc) are served out from domUs that, as above, have stable block-level storage from software-RAIDed iSCSI backends. And, likewise, should the NFS server go down, we can resurrect an identical clone of it from the same iSCSI backend disks and it will carry on with the state the failed one left behind.
  • But where possible, use proper distributed/replicated databases!

Details

  • The dom0 ISO contains a bootloader, Xen, and NetBSD set up as a dom0 kernel, with /usr/pkg containing a bunch of useful core packages (sudo, subversion-base, screen, xentools, etc)
  • The dom0 ISO chain will:
    1. Mount the first partition on the first disk in the server that has a special marker file in as /config
    2. union-mount /config/local/etc/ over /etc
    3. now read /etc/rc.conf
    4. Run the normal /etc/rc tasks, including mounting /var and /tmp from the hard disk, mounting data partitions and setting up networking, ipnat, ipf, etc.
    5. Scan the list of Xen domUs to start from /config/local/domUs/* and start them, each with the correct disk images (from the data partitions), MAC address, and memory allocations.
  • /config/local and /config/global are svn checkouts
  • On all machines (dom0s and domUs), /etc/hosts is a symlink to /config/global/hosts, and any other such useful files.
  • domUs run pkg_chk, but don't have /usr/pkgsrc; they fetch compiled binary packages from a repository domU running the same base OS, which builds every package in pkg_chk.conf. This domU might need to be the NIS master, since that would be the only way to keep pkgsrc-created role user UIDs in synch.

How to bootstrap it

  • We need documented procedures for setting up a dom0 iso image, to make sure no important steps are missed...
    • Make a working directory
    • Install NetBSD source sets
    • Set up custom /etc/rc that finds a suitable filesystem to locate /etc from and mounts it as /config - or drops to a shell if none can be found.
    • Make a Xen3 dom0 kernel with "config netbsd root on cd0a type cd9660 dumps on none" and "options INCLUDE_CONFIG_FILE"
    • Put in the Xen 3 kernel
    • Configure grub menu.lst to load NetBSD on top of Xen.
    • Install core packages (xen tools) - /var/db/pkg and /usr/pkg will union mount over what we provide to allow for node-local extensions, although we shouldn't need too many in dom0.
    • Install grub and mkisofs as per http://www.gnu.org/software/grub/manual/html_node/Making-a-GRUB-bootable-CD-ROM.html
  • We need domU read-only root filesystem images created along a similar theme

Subversion layout:

  • /pservers/ - root of /config/local for each physical server
  • /vservers/ - root of /config/local for each virtual server
  • /global - /config/global
  • /docs - configuration checklists and scripts for dom0 and domU master images

Version Control and Leadership (by )

For many years now, most of my home directory has been under version control of one form or another. I have a laptop, a desktop machine, and a server I ssh to; keeping stuff in synch between three working environments is very valuable, as is having efficient offsite backups and history.

I started my version control career, like most folks, with CVS - since for a long time CVS was the only open-source version control system in widespread usage.

Then along came Subversion, which was clearly Much Nicer, and I quickly switched my personal version control system over to using it. As a freelance software engineer I use it commercially, and now run a virtualised trac/svn hosting system that lets me easily add new projects, which many projects I'm involved with are hosted on. And my open source projects run on a similar platform.

However, more recently, there's been an explosion of interest in the distributed version control model, with lots of products appearing, such as Darcs, Mercurial, Monotone and Git.

I've been quite interested in the distributed model; sure, Subversion is working well for me, but the distributed model interests me because it's more general. You can set up a central repository and push all your changes to it so it's the central synch point, like a Subversion repository, but you don't have to; you can synch changes between arbitrary copies of your stuff without having to go through a central point. And given two approaches, one of which has a superset of the functionality of the other, I'm naturally drawn towards the superset, even if I only need the features of the subset - because I can't predict what my future needs will be.

Also, these distributed version control systems seemed to have better branch merging than Subversion, which until recently required manual tracking of which changes had been merged into a branch from other branches. And being able to do 'local commits' to a local repository, while working offline on my laptop on a train, then commit them to the server as a batch would be great. Subversion really can't do very much without a network connection to its server at the moment.

Now, I was starting to gravitate towards Mercurial, since it's written in Python and seems quite widely available. But then I saw the following talk by Linus Torvalds on git (which he originally wrote):

Two things struck me.

  1. I do like the architecture of git. Subversion stores history as a set of deltas; each version the files have been through are encoded in terms of their differences from the next version, while git just stores multiple as-is snapshots of the state in a content-addressable file system not unlike Venti, which automatically replaces multiple copies of identical data with references to a single copy of it. So it can pull out any version of the files very quickly, and doesn't really have to worry too much about how versions are related; Subversion stores everything as explicit chains of diffs and has to walk those chains to get anywhere. Git makes a note of which revision led to which revision(s) - it can be more than one if there was a branch, and more than one revision can lead to the same revision if there was a merge - but that's just used for working out the common ancestor of two arbitrary revisions in order to merge them; git can efficiently and reliably merge arbitrary points in arbitrary branches by skipping along the links to find the nearest common ancestor, generating diffs from that to the source of the merge, then applying those diffs to the target of the merge. There's none of the complex stuff that Subversion has to do with tracking which changes have been applied and all that. NOTE: I'm talking about "Subversion vs. Git" here since those are the examples of each model I know much about - I'm really comparing the models, not the precise products, here.
  2. Linus Torvalds makes an act of calling people who disagree with him "stupid and ugly", and making somewhat grand claims such as stating that centralised version control just can't work, and generally acting as though he's smarter than everyone else. Now, he does that in a tongue in cheek way; I get the impression he's not really a git (even though he claims he named git after himself), although I couldn't be sure unless I met him. Indeed, I used to think he was a bit of a git from reading things he'd said, but seeing him in action on video for the first time made me realise that he seems to be joking after all. BUT, I think this may be part of why he has become famous and well-respected in some circles. There's a few quite cocky people in the software world who push their ideas with arrogance rather than humility, steamrolling their intellectual opponents with insults; Richard Stallman comes to mind as another. Now, people who do this but are notably and demonstrably wrong get 'outed' as a git and lose a lot of respect; but if you're generally right and do this, it seems to lead to you having vehement followers who believe what you say quite uncritically. Which is interesting.

But I still can't choose. I see a lot of git vs. svn vs. hg vs. monotone vs. darcs - most of them complaining about problems with the loser that have been fixed in more recent versions. They're all rapidly moving targets! It looks like the only way to actually choose one is to spend a few months working on a major project with recent versions of each... in parallel. NOT GOING TO HAPPEN!

I dunno. I'm kinda leaning towards moving to git, but I'm worried that this might just be Linus Torvalds' reality distortion field pulling me in. Next I'll be using Linux if I'm not careful...

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales