Block device snapshots in NetBSD (by )

fss is a neat feature in NetBSD that I hadn't noticed (and I suspect few people have). It's a filesystem snapshot driver.

You attach it to a mounted filesystem, and whenever a block in the underlying disk partition is written to, it copies the old block to a snapshot file. It also creates a snapshot block device, and any reads to that device look in the snapshot file first, then look in the underlying disk partition if there isn't a corresponding block in the snapshot file. End result? The snapshot block device looks like a frozen-in-time copy of the underlying disk partition, without the space and time requirements of actually copying the whole thing.

It works like this:

        -bash-3.2$ cat /home/alaric/test.txt
        THIS IS A TEST
        -bash-3.2$ sudo fssconfig fss0 / /tmp/back
        Password:
        -bash-3.2$ sudo mount /dev/fss0 /mnt
        -bash-3.2$ cat /mnt/home/alaric/test.txt
        THIS IS A TEST
        -bash-3.2$ cat > /home/alaric/test.txt
        THIS IS NOT A TEST
        -bash-3.2$ cat /mnt/home/alaric/test.txt
        THIS IS A TEST
        -bash-3.2$ cat /home/alaric/test.txt
        THIS IS NOT A TEST
        -bash-3.2$ mount
        /dev/wd0a on / type ffs (local)
        kernfs on /kern type kernfs (local)
        procfs on /proc type procfs (local)
        /dev/fss0 on /mnt type ffs (local)
        -bash-3.2$ sudo umount /mnt     
        -bash-3.2$ sudo fssconfig -lv
        fss0: /, taken 2008-11-29 01:21:23, file system internal
        fss1: not in use
        fss2: not in use
        fss3: not in use
        -bash-3.2$ sudo fssconfig -u fss0
        -bash-3.2$ cat /home/alaric/test.txt 
        THIS IS NOT A TEST
        -bash-3.2$ sudo rm /tmp/back
        override rw-------  root/wheel for '/tmp/back'? y

...taking it from me that there were no long pauses after any command in that sequence; no operation took time noticeably proportional to the 8GB size of the disk partition.

I created the snapshot file on the same filesystem I was snapshotting, which is only allowed for the ffs filesystem due to needing special locking; but it can snapshot any filesystem if I put the snapshot log on a different filesystem.

This is useful for various things:

  1. Consistent backups. As a backup, that might take hours, runs, the filesystem is changing underneath it, meaning that you can end up with a broken backup when your applications change groups of files together; the backup can end up with non-corresponding versions of things. Imagine running a backup while reinstalling a load of applications and their shared libraries, and it getting a mixture of old and new versions of libraries and binaries. Ick.
  2. Short-term onsite backups. Imagine running a snapshot every hour, from a cron job, and deleting the oldest snapshot, so you have four hourly snapshots on the go. If you do something stupid, you can go back and retrieve old versions of your stuff. Or perhaps a week's nightly snapshots. Not a backup in that it won't protect against system failures, but it's the kind of backup you can go back to when you mess something up at the filesystem level.
  3. Trialling potentially disastrous operations, like major software upgrades. Take a snapshot beforehand. If it fails, then copy the afflicted files back from the snapshot.
  4. Security auditing. Take regular nightly snapshots, then you can compare them to the live system to see what's changed, to help analyse successful breakins.

There is one caveat: a snapshot taken from a mounted filesystem will, when itself mounted, of course give you a log warning:

  /dev/fss0: file system not clean (fs_clean=4); please fsck(8)
  /dev/fss0: lost blocks 0 files 0

...and you can't fsck it since it's read-only, so you might run into trouble with that, but I think a good sync before taking the snapshot should make the window of opportunity for problems quite small.

This is really neat stuff. It's been in since NetBSD 2.0, and is still marked as experimental - so more people need to try it out, find any bugs, and otherwise confirm it works fine so it can lose that experimental tag 😉

6 Comments

  • By Ben, Thu 4th Dec 2008 @ 10:06 am

    Sounds like a horrid hack.

    Shouldn't it be supported at the FS level? Like ZFS and the NetApp stuff.

  • By alaric, Wed 10th Dec 2008 @ 11:23 pm

    I think it's bad layering to put snapshots in at the FS level.

    For a start, it's much more complex to implement at that level. More bloated code to harbour bugs. Block-level snapshotting is much easier to validate, which is nice for something trusted with our MP3 stashes.

    And snapshots at the FS level make your snapshot system orthogonal to your file system, so they can be chosen and developed independently. Under NetBSD's system, I can happily snapshot a FAT32 filesystem.

    What the fss driver does is the same thing NetApp does when used as an iSCSI target rather than an NFS server, or what virtualisation systems do when snapshotting a virtual disk - it's a tried and tested technique. It's just nice to have it available at the OS level...

  • By Ben, Thu 11th Dec 2008 @ 10:04 am

    Consistency: How do you know the FS is consistent before making a snapshot it? Can you ever snapshot a busy FS?

    Duplicate databases: Both the FS and volume manager do block allocation. So two bits of duplicate code to go wrong, plus the volume manager doesn't know it can reuse the blocks when you delete that 2Gb file.

    Performance: The volume manager hides details of the discs from the FS which can be used to increase performance. See how fast Sun's ZFS + write optimised flash + read optimised flash + slow SATA discs combination goes in it's new Amber Road products for an example of why you'd want such a blatant layering violation.

  • By alaric, Thu 11th Dec 2008 @ 10:19 am

    Ok!

    Consistency: I was thinking that it'd be nice if the kernel atomically did a sync before taking the snapshot, in order to flush all pending data and metadata to the filesystem so it's in a consistent state, then it occurred to me that perhaps it does; I'd have to read the source to check. In which case, the warning about mounting a non-cleanly-unmounted filesystem when mounting a snapshot can safely be ignored. Even for a busy FS, this would mean that a snapshot just meant a short interruption in throughput.

    Duplication: What volume manager? The fss snapshot driver just stores the snapshot in a file. Thus reusing the filesystem's block allocator. And what 2Gb file are you referring to?

    Performance: Again, what volume manager? fss will hurt performance of the original disk during operation since it's doing a copy-on-write snapshot, so every write to the original disk becomes a read-write-write cycle, but not reads as they can just go straight to the original disk; while the snapshot virtual-disk is read-only, its reads will be a bit slower since they amount to looking the block up in an index structure in the snapshot file or falling back to the original disk. However, performance of the snapshot virtual-disk isn't really critical; the only real cost is the copy-on-write on the original disk. But this isn't hiding any details of the discs from the FS, it's just interposing an extra copy on write, and one that goes away as soon as you've removed the snapshot.

  • By Ben, Thu 11th Dec 2008 @ 11:48 am

    There's still a race condition even with an automatic sync, unless the world stops.

    Volume manager: My bad. But fss is even more of a nasty design than a volume manager. I hate to think what it's doing to allow the snapshot to live on the FS it's snapshotting.

  • By alaric, Thu 11th Dec 2008 @ 12:11 pm

    I meant an atomic sync rather than an automatic sync. In a single operation, sync the filesystem and then snapshot it. So, yes, stop the world for a few ms! I suspect that's how VMware does it? When you install the VMware tools, it's meant to make "better" snapshots in some way, which might be just that, forcing a complete sync.

    That's only supported on FSes that declare they are re-entrant enough to support it - just FFS for now 🙂 For FAT et al, the snapshot needs to be on another filesystem (which can be another of the same type, as long as it's a different mount).

Other Links to this Post

RSS feed for comments on this post.

Leave a comment

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales