<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: Venti: Append-Only Storage Management</title>
	<atom:link href="http://www.snell-pym.org.uk/archives/2006/01/14/venti-append-only-storage-management/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.snell-pym.org.uk/archives/2006/01/14/venti-append-only-storage-management/</link>
	<description>Sarah and Alaric Snell-Pym living in interesting times</description>
	<pubDate>Tue, 06 Jan 2009 14:46:17 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: andyjpb</title>
		<link>http://www.snell-pym.org.uk/archives/2006/01/14/venti-append-only-storage-management/comment-page-1/#comment-2746</link>
		<dc:creator>andyjpb</dc:creator>
		<pubDate>Sun, 15 Jan 2006 12:15:11 +0000</pubDate>
		<guid isPermaLink="false">http://snell-pym.org.uk/archives/2006/01/14/venti-append-only-storage-management/#comment-2746</guid>
		<description>&lt;p&gt;git (The new content manager / SCM by Linus Torvalds) uses a similar idea for its backend storage. Each file committed is stored (compressed IIRC) in a file names with it's SHA1 ID. There are then files (objects) that contain lists of SHA1s to model directories and trees. There are then commit objects that associate a tree with it's commit message, etc. Once an object has been committed to the repository, it stays there forever and any other files with the same contents end up referencing the same object. It also means that parts of the repository can be copied onto read only media every now and then.&lt;/p&gt;

&lt;p&gt;As for backups, I use a modified version of this:
http://www.mikerubel.org/computers/rsync_snapshots/&lt;/p&gt;

&lt;p&gt;On the first backup the contents of your tree are copied. On subsequent backups the backup tree is hardlinked to itself. Then rsync is used in such a way that, for files that have changed, it unlinks the version in the new backup tree and then copies the new version across. It's similar to the traditional backup method: take a level 0 backup and then do incremental backups however, your latest backup is always the "full backup" and the others are the increments, traveling back in time as opposed to forward. One of the downsides is in terms of redundancy;, the whole thing only counts as one backup as there is only one copy of each version of a file on the disk.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>git (The new content manager / SCM by Linus Torvalds) uses a similar idea for its backend storage. Each file committed is stored (compressed IIRC) in a file names with it's SHA1 ID. There are then files (objects) that contain lists of SHA1s to model directories and trees. There are then commit objects that associate a tree with it's commit message, etc. Once an object has been committed to the repository, it stays there forever and any other files with the same contents end up referencing the same object. It also means that parts of the repository can be copied onto read only media every now and then.</p>

<p>As for backups, I use a modified version of this:
<a href="http://www.mikerubel.org/computers/rsync_snapshots/" rel="nofollow">http://www.mikerubel.org/computers/rsync_snapshots/</a></p>

<p>On the first backup the contents of your tree are copied. On subsequent backups the backup tree is hardlinked to itself. Then rsync is used in such a way that, for files that have changed, it unlinks the version in the new backup tree and then copies the new version across. It's similar to the traditional backup method: take a level 0 backup and then do incremental backups however, your latest backup is always the "full backup" and the others are the increments, traveling back in time as opposed to forward. One of the downsides is in terms of redundancy;, the whole thing only counts as one backup as there is only one copy of each version of a file on the disk.</p>]]></content:encoded>
	</item>
</channel>
</rss>
