Block device snapshots in NetBSD (by alaric)
fss is a neat feature in NetBSD that I hadn't noticed (and I suspect few people have). It's a filesystem snapshot driver.
You attach it to a mounted filesystem, and whenever a block in the underlying disk partition is written to, it copies the old block to a snapshot file. It also creates a snapshot block device, and any reads to that device look in the snapshot file first, then look in the underlying disk partition if there isn't a corresponding block in the snapshot file. End result? The snapshot block device looks like a frozen-in-time copy of the underlying disk partition, without the space and time requirements of actually copying the whole thing.
It works like this:
-bash-3.2$ cat /home/alaric/test.txt
THIS IS A TEST
-bash-3.2$ sudo fssconfig fss0 / /tmp/back
Password:
-bash-3.2$ sudo mount /dev/fss0 /mnt
-bash-3.2$ cat /mnt/home/alaric/test.txt
THIS IS A TEST
-bash-3.2$ cat > /home/alaric/test.txt
THIS IS NOT A TEST
-bash-3.2$ cat /mnt/home/alaric/test.txt
THIS IS A TEST
-bash-3.2$ cat /home/alaric/test.txt
THIS IS NOT A TEST
-bash-3.2$ mount
/dev/wd0a on / type ffs (local)
kernfs on /kern type kernfs (local)
procfs on /proc type procfs (local)
/dev/fss0 on /mnt type ffs (local)
-bash-3.2$ sudo umount /mnt
-bash-3.2$ sudo fssconfig -lv
fss0: /, taken 2008-11-29 01:21:23, file system internal
fss1: not in use
fss2: not in use
fss3: not in use
-bash-3.2$ sudo fssconfig -u fss0
-bash-3.2$ cat /home/alaric/test.txt
THIS IS NOT A TEST
-bash-3.2$ sudo rm /tmp/back
override rw------- root/wheel for '/tmp/back'? y
...taking it from me that there were no long pauses after any command in that sequence; no operation took time noticeably proportional to the 8GB size of the disk partition.
I created the snapshot file on the same filesystem I was snapshotting, which is only allowed for the ffs filesystem due to needing special locking; but it can snapshot any filesystem if I put the snapshot log on a different filesystem.
This is useful for various things:
- Consistent backups. As a backup, that might take hours, runs, the filesystem is changing underneath it, meaning that you can end up with a broken backup when your applications change groups of files together; the backup can end up with non-corresponding versions of things. Imagine running a backup while reinstalling a load of applications and their shared libraries, and it getting a mixture of old and new versions of libraries and binaries. Ick.
- Short-term onsite backups. Imagine running a snapshot every hour, from a cron job, and deleting the oldest snapshot, so you have four hourly snapshots on the go. If you do something stupid, you can go back and retrieve old versions of your stuff. Or perhaps a week's nightly snapshots. Not a backup in that it won't protect against system failures, but it's the kind of backup you can go back to when you mess something up at the filesystem level.
- Trialling potentially disastrous operations, like major software upgrades. Take a snapshot beforehand. If it fails, then copy the afflicted files back from the snapshot.
- Security auditing. Take regular nightly snapshots, then you can compare them to the live system to see what's changed, to help analyse successful breakins.
There is one caveat: a snapshot taken from a mounted filesystem will, when itself mounted, of course give you a log warning:
/dev/fss0: file system not clean (fs_clean=4); please fsck(8)
/dev/fss0: lost blocks 0 files 0
...and you can't fsck it since it's read-only, so you might run into trouble with that, but I think a good sync before taking the snapshot should make the window of opportunity for problems quite small.
This is really neat stuff. It's been in since NetBSD 2.0, and is still marked as experimental - so more people need to try it out, find any bugs, and otherwise confirm it works fine so it can lose that experimental tag 😉










