<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Backups and Archives</title>
	<atom:link href="http://www.snell-pym.org.uk/archives/2008/07/11/backups-and-archives/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.snell-pym.org.uk/archives/2008/07/11/backups-and-archives/</link>
	<description>Sarah and Alaric Snell-Pym living in interesting times</description>
	<pubDate>Wed, 19 Nov 2008 22:04:56 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5</generator>
		<item>
		<title>By: David Cantrell</title>
		<link>http://www.snell-pym.org.uk/archives/2008/07/11/backups-and-archives/#comment-76206</link>
		<dc:creator>David Cantrell</dc:creator>
		<pubDate>Tue, 15 Jul 2008 10:50:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.snell-pym.org.uk/?p=799#comment-76206</guid>
		<description>&lt;p&gt;I just backup everything.  It's easier than figuring out what should be backed up.  I use rsnapshot to do it, which Does The Right Thing for data that rarely changes.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I just backup everything.  It's easier than figuring out what should be backed up.  I use rsnapshot to do it, which Does The Right Thing for data that rarely changes.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: alaric</title>
		<link>http://www.snell-pym.org.uk/archives/2008/07/11/backups-and-archives/#comment-76113</link>
		<dc:creator>alaric</dc:creator>
		<pubDate>Fri, 11 Jul 2008 21:15:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.snell-pym.org.uk/?p=799#comment-76113</guid>
		<description>&lt;blockquote&gt;
  &lt;p&gt;I think I disagree with you here.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Oooh!&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;You should always have backups of everything that you can't afford to loose and you certainly can't afford to loose your OS and installed apps (group 1) even if you can restore them from the original media.&lt;/p&gt;
  
  &lt;p&gt;If you adopt the above then your backup strategy dictates your application vendor strategy. i.e you have to use pkgsrc for everything and you can't make any local modifications otherwise you end up in the position of having to backup the third party apps or modified sources anyway and then you may as well be backing up everything.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ah, not necessarily. pkgsrc installs stuff into /usr/pkg - anything else should go into /usr/local, so can be backed up.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Group 2: /tmp is wiped on every boot anyway.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Yep, all the more reason to exclude it from backups ;-)&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;/var: This contains lots of important stuff like mail spools and other transient but critical things. It's also the hardest to backup as you need to ensure that, for example, you don't catch your mail server or db server in the middle of the write. So you need some kind of freezer and that can affect your HA requirements.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Yeah... I try and avoid databases that can't recover from a bad shutdown, though. When I have the choice.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;If you elect to reinstall your OS and apps via a download from, say, the NetBSD project server or from a vendor CD then you have no guarantee that when you come to do the restore you'll have access to it. You also have no way of ensuring that your post restore is exactly the same as your pre disaster environment and that is very important.... and running the install programs for every little thing always takes longer than "restore /dev/sd1" and going for a coffee.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But all the software on a UNIX server gets recreated from scratch fairly regularly anyway. How many weeks go by before some fundamental component grows a security vulnerability and you need to reinstall it and the things that depend upon it? Having the system come back neatly from a rebuild is as important as having it come back after a reboot - failing to do so is a sign of sloppiness!&lt;/p&gt;

&lt;p&gt;Heck, I did a reinstall of most of pkgsrc on infatuation at the start of last week, and since then, vulnerabilities have been found in its pcre and ruby packages... not to mention that mutt's been vulnerable to a 'signature spoofing' attack (whatever that is) for ages now, but there's no fixed version yet available.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;If you suffer a failure in the middle of something important then you want to be able to get back to exactly where you were as quickly as possible. You don't want to have to worry about whether you're going to get unexpected package upgrades when you reinstall from the pkgsrc config file and you certainly don't want to discover that a hastily tweaked vendor config file is now back in its default state.&lt;/p&gt;
  
  &lt;p&gt;So, when you install a machine, and every so often (maybe every year) you should do a complete dump of the OS and the apps.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But such a snapshot will be dangerously out of date after a couple of months, and not &lt;em&gt;safe&lt;/em&gt; to restore onto a network-connected system :-(&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;I'm all in favour of a layered strategy and possibly having slightly different strategies for group 1/5 data than for /home and /var data. However, if you design your layers properly I think you can just apply the same thing everywhere and group 1/5 data will just produce lots of zero length backups which are essentially free.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;My main argument, though, is that it's a pain to separate things into the groups, due to the way they're spread all over the filesystem.&lt;/p&gt;

&lt;p&gt;In an Ideal World, I'd love a system with top-level directories for Installed Stuff (OS &lt;em&gt;and&lt;/em&gt; apps), Temporary Stuff, and Interesting Stuff ;-) Yeah, having VCS checkouts handled specially is a small win; it may just be easier and safer to just back them up with everything else&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;/ and /usr: Make this a partition. Make it read only if you like. A read only means you can back it up less frequently that a read write one.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I have a new blog post in the pipeline about configuring systems to boot from a CD-ROM, actually, containing a 'live filesystem' for / and /usr, over which /etc is union-mounted from the hard disk (so the on-disk /etc only contains changed files)... but the reasons and tradeoffs involved will have to wait for that posting to be gone over in detail ;-)&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;/export: I tend to make this a partition as well since that's where I store all my really big non user specific data. I use it as a place to send backups from other machines and as a place for OS install media and general network shares and stuff. You might call this /data or something else.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;nod&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;[Andy's backup rota]&lt;/p&gt;

&lt;p&gt;I have a vaguely similar approach. Since I have two machines side by side in the datacentre joined by a cable, backups between them are cheap, so my most important data - the svn repositories, trac databases, and SQL databases - are dumped (not at the filesystem level, but at a logical level respecting locks and stuff) from the fileserver (infatuation) to the other machine (fear) on a nightly basis, with generations of dumps.&lt;/p&gt;

&lt;p&gt;Then the interesting filesystems of both machines (excluding a whole bunch of cache/temp directories, pkgsrc, the actual SQL database files, and so on, but definitely including the nightly dump files on fear) are rsynced down to a 500GB disk at home. I rsync infatuation (it being the fileserver) most weeks, and occasionally do fear or pain if either has been 'worked on' recently so will have any interesting changes. I like rsync, since it gives me a real filesystem I can go and look into for interesting things without needing to mess with restore apps.&lt;/p&gt;

&lt;p&gt;Pah, looking over rsync's logs, it looks like I need to add a new directory to the ignore list: /var/tmp. /var really is a worst-case for backing up ;-)&lt;/p&gt;

&lt;p&gt;But I want to do more than just have a single rsync snapshot - I'd really like to set up something like venti on a separate disk array. Why venti and not rsync-with-hardlink-snapshots? Well, because a venti-like system can just grow linearly by adding extra disks without any nasty filesystem reshuffling! I'm not so keen on tapes since the size of them is so limited, the hardware to read them is expensive, etc - pools of USB hard disks stacked up on an isolated pair of mini-ITX fanless backup servers would be better, I think!&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;I'm wondering if encrypted backups are a good idea. Obviously I'd have to backup the keys somehow. I could encrypt the content of the tapes and I'd be able to send them out to anyone of my "friends" who'd take them. I could then send USB keys containing the keys to, say, my parent's house.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Yeah, I've wondered about that. I'm feeling compelled to err on the side of safety for now, though.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;So in summary, + You want to design a strategy that doesn't require any maintenance once it's set up: you just feed blank media and rest assured that it'll get all of the important stuff. + It needs to be essentially free (in MB and time) to run backups on stuff that hasn't changed: if it isn't then you'll try to economise by grouping things that are backed up and then stuff will invariably fall through the cracks. This is kind of like a special case of the first point.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The problem is that the vast bulk of the changes to a system are just software being reinstalled due to security loopholes... /usr/pkg is about a gigabyte on most of my machines, but with a high turnover.&lt;/p&gt;

&lt;p&gt;Take infatuation as an example. We have 760MiB in /usr/pkg. /usr/pkgsrc, which is a combination of stuff out of NetBSD's cvs server and the downloaded source tarballs, is 927MiB - 622MiB of that being the source tarballs. And the pkgsrc work directory, where the tarballs are extracted and the build process occurs (but is kept around in case I need to reinstall the packages later, generally) is currently 477MiB. So that's (760+927+477)MiB = ~2GiB that can all be regenerated by downloading pkgsrc and dropping in the configuration fiels and telling pkg_chk to do its thing... while saving me 2GiB in data transfer ;-)&lt;/p&gt;

&lt;blockquote&gt;
  &lt;ul&gt;
  &lt;li&gt;You need to have a single action restore:
  just unpack and go. If you have to configure anything or remember anything when you're stressed then it will make life very unpleasant and error prone. If someone else can't do it for you then it's not simple enough.&lt;/li&gt;
  &lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ok, but I'm &lt;em&gt;used&lt;/em&gt; to rebuilding my systems from base configurations. Certainly, I need to script it more, though ;-) But wait until you see my boot-from-CD plan, which will mean all my servers share a SINGLE boot CD-ROM with the core OS on, and the hard disk partitioned into lots of Xen images...&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<blockquote>
  <p>I think I disagree with you here.</p>
</blockquote>

<p>Oooh!</p>

<blockquote>
  <p>You should always have backups of everything that you can't afford to loose and you certainly can't afford to loose your OS and installed apps (group 1) even if you can restore them from the original media.</p>
  
  <p>If you adopt the above then your backup strategy dictates your application vendor strategy. i.e you have to use pkgsrc for everything and you can't make any local modifications otherwise you end up in the position of having to backup the third party apps or modified sources anyway and then you may as well be backing up everything.</p>
</blockquote>

<p>Ah, not necessarily. pkgsrc installs stuff into /usr/pkg - anything else should go into /usr/local, so can be backed up.</p>

<blockquote>
  <p>Group 2: /tmp is wiped on every boot anyway.</p>
</blockquote>

<p>Yep, all the more reason to exclude it from backups <img src='http://www.snell-pym.org.uk/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>

<blockquote>
  <p>/var: This contains lots of important stuff like mail spools and other transient but critical things. It's also the hardest to backup as you need to ensure that, for example, you don't catch your mail server or db server in the middle of the write. So you need some kind of freezer and that can affect your HA requirements.</p>
</blockquote>

<p>Yeah... I try and avoid databases that can't recover from a bad shutdown, though. When I have the choice.</p>

<blockquote>
  <p>If you elect to reinstall your OS and apps via a download from, say, the NetBSD project server or from a vendor CD then you have no guarantee that when you come to do the restore you'll have access to it. You also have no way of ensuring that your post restore is exactly the same as your pre disaster environment and that is very important.... and running the install programs for every little thing always takes longer than "restore /dev/sd1" and going for a coffee.</p>
</blockquote>

<p>But all the software on a UNIX server gets recreated from scratch fairly regularly anyway. How many weeks go by before some fundamental component grows a security vulnerability and you need to reinstall it and the things that depend upon it? Having the system come back neatly from a rebuild is as important as having it come back after a reboot - failing to do so is a sign of sloppiness!</p>

<p>Heck, I did a reinstall of most of pkgsrc on infatuation at the start of last week, and since then, vulnerabilities have been found in its pcre and ruby packages... not to mention that mutt's been vulnerable to a 'signature spoofing' attack (whatever that is) for ages now, but there's no fixed version yet available.</p>

<blockquote>
  <p>If you suffer a failure in the middle of something important then you want to be able to get back to exactly where you were as quickly as possible. You don't want to have to worry about whether you're going to get unexpected package upgrades when you reinstall from the pkgsrc config file and you certainly don't want to discover that a hastily tweaked vendor config file is now back in its default state.</p>
  
  <p>So, when you install a machine, and every so often (maybe every year) you should do a complete dump of the OS and the apps.</p>
</blockquote>

<p>But such a snapshot will be dangerously out of date after a couple of months, and not <em>safe</em> to restore onto a network-connected system <img src='http://www.snell-pym.org.uk/wp-includes/images/smilies/icon_sad.gif' alt=':-(' class='wp-smiley' /> </p>

<blockquote>
  <p>I'm all in favour of a layered strategy and possibly having slightly different strategies for group 1/5 data than for /home and /var data. However, if you design your layers properly I think you can just apply the same thing everywhere and group 1/5 data will just produce lots of zero length backups which are essentially free.</p>
</blockquote>

<p>My main argument, though, is that it's a pain to separate things into the groups, due to the way they're spread all over the filesystem.</p>

<p>In an Ideal World, I'd love a system with top-level directories for Installed Stuff (OS <em>and</em> apps), Temporary Stuff, and Interesting Stuff <img src='http://www.snell-pym.org.uk/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> Yeah, having VCS checkouts handled specially is a small win; it may just be easier and safer to just back them up with everything else</p>

<blockquote>
  <p>/ and /usr: Make this a partition. Make it read only if you like. A read only means you can back it up less frequently that a read write one.</p>
</blockquote>

<p>I have a new blog post in the pipeline about configuring systems to boot from a CD-ROM, actually, containing a 'live filesystem' for / and /usr, over which /etc is union-mounted from the hard disk (so the on-disk /etc only contains changed files)... but the reasons and tradeoffs involved will have to wait for that posting to be gone over in detail <img src='http://www.snell-pym.org.uk/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>

<blockquote>
  <p>/export: I tend to make this a partition as well since that's where I store all my really big non user specific data. I use it as a place to send backups from other machines and as a place for OS install media and general network shares and stuff. You might call this /data or something else.</p>
</blockquote>

<p><em>nod</em></p>

<p>[Andy's backup rota]</p>

<p>I have a vaguely similar approach. Since I have two machines side by side in the datacentre joined by a cable, backups between them are cheap, so my most important data - the svn repositories, trac databases, and SQL databases - are dumped (not at the filesystem level, but at a logical level respecting locks and stuff) from the fileserver (infatuation) to the other machine (fear) on a nightly basis, with generations of dumps.</p>

<p>Then the interesting filesystems of both machines (excluding a whole bunch of cache/temp directories, pkgsrc, the actual SQL database files, and so on, but definitely including the nightly dump files on fear) are rsynced down to a 500GB disk at home. I rsync infatuation (it being the fileserver) most weeks, and occasionally do fear or pain if either has been 'worked on' recently so will have any interesting changes. I like rsync, since it gives me a real filesystem I can go and look into for interesting things without needing to mess with restore apps.</p>

<p>Pah, looking over rsync's logs, it looks like I need to add a new directory to the ignore list: /var/tmp. /var really is a worst-case for backing up <img src='http://www.snell-pym.org.uk/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>

<p>But I want to do more than just have a single rsync snapshot - I'd really like to set up something like venti on a separate disk array. Why venti and not rsync-with-hardlink-snapshots? Well, because a venti-like system can just grow linearly by adding extra disks without any nasty filesystem reshuffling! I'm not so keen on tapes since the size of them is so limited, the hardware to read them is expensive, etc - pools of USB hard disks stacked up on an isolated pair of mini-ITX fanless backup servers would be better, I think!</p>

<blockquote>
  <p>I'm wondering if encrypted backups are a good idea. Obviously I'd have to backup the keys somehow. I could encrypt the content of the tapes and I'd be able to send them out to anyone of my "friends" who'd take them. I could then send USB keys containing the keys to, say, my parent's house.</p>
</blockquote>

<p>Yeah, I've wondered about that. I'm feeling compelled to err on the side of safety for now, though.</p>

<blockquote>
  <p>So in summary, + You want to design a strategy that doesn't require any maintenance once it's set up: you just feed blank media and rest assured that it'll get all of the important stuff. + It needs to be essentially free (in MB and time) to run backups on stuff that hasn't changed: if it isn't then you'll try to economise by grouping things that are backed up and then stuff will invariably fall through the cracks. This is kind of like a special case of the first point.</p>
</blockquote>

<p>The problem is that the vast bulk of the changes to a system are just software being reinstalled due to security loopholes... /usr/pkg is about a gigabyte on most of my machines, but with a high turnover.</p>

<p>Take infatuation as an example. We have 760MiB in /usr/pkg. /usr/pkgsrc, which is a combination of stuff out of NetBSD's cvs server and the downloaded source tarballs, is 927MiB - 622MiB of that being the source tarballs. And the pkgsrc work directory, where the tarballs are extracted and the build process occurs (but is kept around in case I need to reinstall the packages later, generally) is currently 477MiB. So that's (760+927+477)MiB = ~2GiB that can all be regenerated by downloading pkgsrc and dropping in the configuration fiels and telling pkg_chk to do its thing... while saving me 2GiB in data transfer <img src='http://www.snell-pym.org.uk/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>

<blockquote>
  <ul>
  <li>You need to have a single action restore:
  just unpack and go. If you have to configure anything or remember anything when you're stressed then it will make life very unpleasant and error prone. If someone else can't do it for you then it's not simple enough.</li>
  </ul>
</blockquote>

<p>Ok, but I'm <em>used</em> to rebuilding my systems from base configurations. Certainly, I need to script it more, though <img src='http://www.snell-pym.org.uk/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> But wait until you see my boot-from-CD plan, which will mean all my servers share a SINGLE boot CD-ROM with the core OS on, and the hard disk partitioned into lots of Xen images...</p>]]></content:encoded>
	</item>
	<item>
		<title>By: @ndy</title>
		<link>http://www.snell-pym.org.uk/archives/2008/07/11/backups-and-archives/#comment-76108</link>
		<dc:creator>@ndy</dc:creator>
		<pubDate>Fri, 11 Jul 2008 16:38:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.snell-pym.org.uk/?p=799#comment-76108</guid>
		<description>&lt;p&gt;I think I disagree with you here.&lt;/p&gt;

&lt;p&gt;You should always have backups of everything that you can't afford to loose and you certainly &lt;em&gt;can't&lt;/em&gt; afford to loose your OS and installed apps (group 1) even if you can restore them from the original media.&lt;/p&gt;

&lt;p&gt;If you adopt the above then your backup strategy dictates your application vendor strategy. i.e you have to use pkgsrc for everything and you can't make any local modifications otherwise you end up in the position of having to backup the third party apps or modified sources anyway and then you may as well be backing up everything.&lt;/p&gt;

&lt;p&gt;Group 2: /tmp is wiped on every boot anyway. /var: This contains lots of important stuff like mail spools and other transient but critical things. It's also the hardest to backup as you need to ensure that, for example, you don't catch your mail server or db server in the middle of the write. So you need some kind of freezer and that can affect your HA requirements.&lt;/p&gt;

&lt;p&gt;As everyone knows, most backup strategies fail at the restore stage because no one ever tests them properly.&lt;/p&gt;

&lt;p&gt;If you elect to reinstall your OS and apps via a download from, say, the NetBSD project server or from a vendor CD then you have no guarantee that when you come to do the restore you'll have access to it. You also have no way of ensuring that your post restore is &lt;em&gt;exactly&lt;/em&gt; the same as your pre disaster environment and that is &lt;em&gt;very&lt;/em&gt; important.... and running the install programs for every little thing always takes longer than "restore /dev/sd1" and going for a coffee.&lt;/p&gt;

&lt;p&gt;If you suffer a failure in the middle of something important then you want to be able to get back to exactly where you were as quickly as possible. You don't want to have to worry about whether you're going to get unexpected package upgrades when you reinstall from the pkgsrc config file and you certainly don't want to discover that a hastily tweaked vendor config file is now back in its default state.&lt;/p&gt;

&lt;p&gt;So, when you install a machine, and every so often (maybe every year) you should do a complete dump of the OS and the apps. Every week or month you should do an incremental dump: these things don't change very often... hell... I just talked myself into doing this dump every day: most day's it'll be zero.&lt;/p&gt;

&lt;p&gt;Group 3: So what if it came from the server? There are &lt;em&gt;always&lt;/em&gt; changes in there that you don't really want to commit yet but would really rather not lose. Don't rely on your ability to remember to save each one of them in a specially named patch file: one day you will forget.&lt;/p&gt;

&lt;p&gt;Group 4: This is going to be data in /var or data in /home. It'll get covered by default by a good backup strategy so doesn't need a section of its own.&lt;/p&gt;

&lt;p&gt;Group 5: Very similar data life model to group 1. Therefore same strategy is required.&lt;/p&gt;

&lt;p&gt;I'm all in favour of a layered strategy and possibly having slightly different strategies for group 1/5 data than for /home and /var data. However, if you design your layers properly I think you can just apply the same thing everywhere and group 1/5 data will just produce lots of zero length backups which are essentially free.&lt;/p&gt;

&lt;p&gt;My backup strategy starts when I first install a machine. Lots of people argue that everything should be in a single large partition. I'm more aligned to the old school philosophy of different partitions and file systems for different types of data. wrt backups it gives you lots of flexibility when it comes to restore and quota management.&lt;/p&gt;

&lt;p&gt;/ and /usr: Make this a partition. Make it read only if you like. A read only means you can back it up less frequently that a read write one.&lt;/p&gt;

&lt;p&gt;/var: Make this a partition.&lt;/p&gt;

&lt;p&gt;/home: Make this a network mount or a partition.&lt;/p&gt;

&lt;p&gt;/usr/local: Make this a partition as well: it stops you reaching for your backups everytime a badly behaved system installer stamps all over /usr/&lt;/p&gt;

&lt;p&gt;/export: I tend to make this a partition as well since that's where I store all my really big non user specific data. I use it as a place to send backups from other machines and as a place for OS install media and general network shares and stuff. You might call this /data or something else.&lt;/p&gt;

&lt;p&gt;Layers are important and not every layer of backup needs to exist for each partition.&lt;/p&gt;

&lt;p&gt;There are offsite tape backups that need to exist for everything but you can let them get quite old in some instances. These guard against complete system failures.&lt;/p&gt;

&lt;p&gt;There are onsite tape backups that could be the incremental versions of your offsite set.
These guard against complete system failures and week-to-week dataloss such as viruses.&lt;/p&gt;

&lt;p&gt;There are onsite online backups that are made frequently. These guard against momentary brain absences such as "rm myfile" instead of "rm myfile~".&lt;/p&gt;

&lt;p&gt;So, for me, the layers work like this:&lt;/p&gt;

&lt;p&gt;I (should) have a set of tape backups. (I've designed how they will work but I've yet to actually make any.)
I make a "level 0" dump of all my filesystems every 6 months. That takes as much tape as I have data. I &lt;em&gt;think&lt;/em&gt; I can get it onto no more than a couple of DDS3s per machine.
I then do a weekly "towers of hanio" incremental backup.
The first 9 weeks go like this:&lt;/p&gt;

&lt;p&gt;0 3 2 5 4 7 6 9 8&lt;/p&gt;

&lt;p&gt;Then I do a level 1 backup:&lt;/p&gt;

&lt;p&gt;1 3 2 5 4 7 6 9 8&lt;/p&gt;

&lt;p&gt;and again:&lt;/p&gt;

&lt;p&gt;1 3 2 5 4 7 6 9&lt;/p&gt;

&lt;p&gt;By that time 26 weeks have passed so I take another level 0 backup, but you can keep doing it for as long as you want. When the level 1 backup gets bigger than 1 tape it starts to get messy. So, given that I use DDS3 I can generate between 12 and 24GB of new data every 6 months and not use more than 1 tape for anything other than the level 0 backups.&lt;/p&gt;

&lt;p&gt;For file systems that don't change much such as / the incremental dump runs fast and doesn't take up much space on the tape and I still capture all the little details that trip you up on a restore.&lt;/p&gt;

&lt;p&gt;I've got a tape schedule that tries to spread the wear evenly over all the tapes as much as possible. It's quite complex and I'm going to need a script that tells me which tape to get from where each week.&lt;/p&gt;

&lt;p&gt;For my online backups I have a seperate disk to which I rsync my partitions.
I use a rather modified version of this script:
http://www.mikerubel.org/computers/rsync_snapshots/&lt;/p&gt;

&lt;p&gt;This uses hardlinks to preserve space for files that haven't changed and therefore you need to ensure you have lots of inodes. You get fine grained backups of rarely changing partitions for almost the same cost as frequently changing ones. i.e. the cost in terms of time and disk space scales per MB of change (plus a constant) rather than per frequency of backup.&lt;/p&gt;

&lt;p&gt;My mods amount to storing YYYYMMDDHHMM style folders rather than rotating ones and keeping the snapshots until I delete them: I've written an algorithm that will use some kind of exponential decaying reaper to decide what to delete but I haven't integrated it yet.&lt;/p&gt;

&lt;p&gt;I run these backups when I feel like it but I should really aim to do it at least once a day, maybe and maybe hourly for /home.&lt;/p&gt;

&lt;p&gt;If I had important work for an important client then I'd probably run it at least hourly on some partitions.&lt;/p&gt;

&lt;p&gt;I'm wondering if encrypted backups are a good idea. Obviously I'd have to backup the keys somehow. I could encrypt the content of the tapes and I'd be able to send them out to anyone of my "friends" who'd take them. I could then send USB keys containing the keys to, say, my parent's house.&lt;/p&gt;

&lt;p&gt;So in summary,
    + You want to design a strategy that doesn't require any maintenance once it's set up:
        you just feed blank media and rest assured that it'll get all of the important stuff.
    + It needs to be essentially free (in MB and time) to run backups on stuff that hasn't changed:
        if it isn't then you'll try to economise by grouping things that are backed up and then stuff will invariably fall through the cracks. This is kind of like a special case of the first point.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;+ You need to have a single action restore:
    just unpack and go. If you have to configure anything or remember anything when you're stressed then it will make life very unpleasant and error prone. If someone else can't do it for you then it's not simple enough.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I initially drew my inspiration from this site:
http://www.taobackup.com/&lt;/p&gt;

&lt;p&gt;It's an advert for a commercial product, but it does contain some good advice.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I think I disagree with you here.</p>

<p>You should always have backups of everything that you can't afford to loose and you certainly <em>can't</em> afford to loose your OS and installed apps (group 1) even if you can restore them from the original media.</p>

<p>If you adopt the above then your backup strategy dictates your application vendor strategy. i.e you have to use pkgsrc for everything and you can't make any local modifications otherwise you end up in the position of having to backup the third party apps or modified sources anyway and then you may as well be backing up everything.</p>

<p>Group 2: /tmp is wiped on every boot anyway. /var: This contains lots of important stuff like mail spools and other transient but critical things. It's also the hardest to backup as you need to ensure that, for example, you don't catch your mail server or db server in the middle of the write. So you need some kind of freezer and that can affect your HA requirements.</p>

<p>As everyone knows, most backup strategies fail at the restore stage because no one ever tests them properly.</p>

<p>If you elect to reinstall your OS and apps via a download from, say, the NetBSD project server or from a vendor CD then you have no guarantee that when you come to do the restore you'll have access to it. You also have no way of ensuring that your post restore is <em>exactly</em> the same as your pre disaster environment and that is <em>very</em> important.... and running the install programs for every little thing always takes longer than "restore /dev/sd1" and going for a coffee.</p>

<p>If you suffer a failure in the middle of something important then you want to be able to get back to exactly where you were as quickly as possible. You don't want to have to worry about whether you're going to get unexpected package upgrades when you reinstall from the pkgsrc config file and you certainly don't want to discover that a hastily tweaked vendor config file is now back in its default state.</p>

<p>So, when you install a machine, and every so often (maybe every year) you should do a complete dump of the OS and the apps. Every week or month you should do an incremental dump: these things don't change very often... hell... I just talked myself into doing this dump every day: most day's it'll be zero.</p>

<p>Group 3: So what if it came from the server? There are <em>always</em> changes in there that you don't really want to commit yet but would really rather not lose. Don't rely on your ability to remember to save each one of them in a specially named patch file: one day you will forget.</p>

<p>Group 4: This is going to be data in /var or data in /home. It'll get covered by default by a good backup strategy so doesn't need a section of its own.</p>

<p>Group 5: Very similar data life model to group 1. Therefore same strategy is required.</p>

<p>I'm all in favour of a layered strategy and possibly having slightly different strategies for group 1/5 data than for /home and /var data. However, if you design your layers properly I think you can just apply the same thing everywhere and group 1/5 data will just produce lots of zero length backups which are essentially free.</p>

<p>My backup strategy starts when I first install a machine. Lots of people argue that everything should be in a single large partition. I'm more aligned to the old school philosophy of different partitions and file systems for different types of data. wrt backups it gives you lots of flexibility when it comes to restore and quota management.</p>

<p>/ and /usr: Make this a partition. Make it read only if you like. A read only means you can back it up less frequently that a read write one.</p>

<p>/var: Make this a partition.</p>

<p>/home: Make this a network mount or a partition.</p>

<p>/usr/local: Make this a partition as well: it stops you reaching for your backups everytime a badly behaved system installer stamps all over /usr/</p>

<p>/export: I tend to make this a partition as well since that's where I store all my really big non user specific data. I use it as a place to send backups from other machines and as a place for OS install media and general network shares and stuff. You might call this /data or something else.</p>

<p>Layers are important and not every layer of backup needs to exist for each partition.</p>

<p>There are offsite tape backups that need to exist for everything but you can let them get quite old in some instances. These guard against complete system failures.</p>

<p>There are onsite tape backups that could be the incremental versions of your offsite set.
These guard against complete system failures and week-to-week dataloss such as viruses.</p>

<p>There are onsite online backups that are made frequently. These guard against momentary brain absences such as "rm myfile" instead of "rm myfile~".</p>

<p>So, for me, the layers work like this:</p>

<p>I (should) have a set of tape backups. (I've designed how they will work but I've yet to actually make any.)
I make a "level 0" dump of all my filesystems every 6 months. That takes as much tape as I have data. I <em>think</em> I can get it onto no more than a couple of DDS3s per machine.
I then do a weekly "towers of hanio" incremental backup.
The first 9 weeks go like this:</p>

<p>0 3 2 5 4 7 6 9 8</p>

<p>Then I do a level 1 backup:</p>

<p>1 3 2 5 4 7 6 9 8</p>

<p>and again:</p>

<p>1 3 2 5 4 7 6 9</p>

<p>By that time 26 weeks have passed so I take another level 0 backup, but you can keep doing it for as long as you want. When the level 1 backup gets bigger than 1 tape it starts to get messy. So, given that I use DDS3 I can generate between 12 and 24GB of new data every 6 months and not use more than 1 tape for anything other than the level 0 backups.</p>

<p>For file systems that don't change much such as / the incremental dump runs fast and doesn't take up much space on the tape and I still capture all the little details that trip you up on a restore.</p>

<p>I've got a tape schedule that tries to spread the wear evenly over all the tapes as much as possible. It's quite complex and I'm going to need a script that tells me which tape to get from where each week.</p>

<p>For my online backups I have a seperate disk to which I rsync my partitions.
I use a rather modified version of this script:
<a href="http://www.mikerubel.org/computers/rsync_snapshots/" rel="nofollow">http://www.mikerubel.org/computers/rsync_snapshots/</a></p>

<p>This uses hardlinks to preserve space for files that haven't changed and therefore you need to ensure you have lots of inodes. You get fine grained backups of rarely changing partitions for almost the same cost as frequently changing ones. i.e. the cost in terms of time and disk space scales per MB of change (plus a constant) rather than per frequency of backup.</p>

<p>My mods amount to storing YYYYMMDDHHMM style folders rather than rotating ones and keeping the snapshots until I delete them: I've written an algorithm that will use some kind of exponential decaying reaper to decide what to delete but I haven't integrated it yet.</p>

<p>I run these backups when I feel like it but I should really aim to do it at least once a day, maybe and maybe hourly for /home.</p>

<p>If I had important work for an important client then I'd probably run it at least hourly on some partitions.</p>

<p>I'm wondering if encrypted backups are a good idea. Obviously I'd have to backup the keys somehow. I could encrypt the content of the tapes and I'd be able to send them out to anyone of my "friends" who'd take them. I could then send USB keys containing the keys to, say, my parent's house.</p>

<p>So in summary,
    + You want to design a strategy that doesn't require any maintenance once it's set up:
        you just feed blank media and rest assured that it'll get all of the important stuff.
    + It needs to be essentially free (in MB and time) to run backups on stuff that hasn't changed:
        if it isn't then you'll try to economise by grouping things that are backed up and then stuff will invariably fall through the cracks. This is kind of like a special case of the first point.</p>

<pre><code>+ You need to have a single action restore:
    just unpack and go. If you have to configure anything or remember anything when you're stressed then it will make life very unpleasant and error prone. If someone else can't do it for you then it's not simple enough.
</code></pre>

<p>I initially drew my inspiration from this site:
<a href="http://www.taobackup.com/" rel="nofollow">http://www.taobackup.com/</a></p>

<p>It's an advert for a commercial product, but it does contain some good advice.</p>]]></content:encoded>
	</item>
</channel>
</rss>
