Birthday Blues (by )

I wasn't going to do anything for my Birthday but Al wanted to do me a meal and so Fondue was settled upon and then we invited Barbara and I thought it would be nice if I could have a friend there. So I texted my friend Andrew who lives in Cheltenham on the off chance he'd be about as he tends to go home to Essex at the weekend.

He didn't respond which was fair enough as it was short notice but the thing is I almost texted Alex I had some how forgotten. I've been so busy the last few months and then I felt bad and cried.

Jean and Al spent most of the day at one of Jeans little friends parties and so I spent the afternoon photographing boots:/

Barbara then forgot that it was my birthday and thought Jean was being silly when she said it was mummy's birthday. She then disappeared just after cake (taking the cake with her) to watch TV.

Al had got me lots of yummy things for a chocolate fondue and I thought Yay! But he doesn't eat chocolate and I discovered that chocolate fondue isn't that much fun unless your sharing it.

I then realised that I've been here for three birthdays and I know know one my age, Andrew is planing to move away and Alex is gone. I've made friends but there not the same age as me and though it may seem mean I need people who are the same age as me - people who have the same type of worries and the like.

I'm sure were I've gone wrong.

Sorry I will do a nicer post about my birthday tomorrow.

Backup system progress update (by )

I forgot to mention before... I chose a name for it. Ugarit, after the site of the oldest human archive yet found.

Anyway, I had another few hours at it last night (I think my total's up to about a working day now; one more working day to go before I know if my two day estimate was correct!).

I've implemented two fundamental abstractions. In a limited-block-size reference-counted content-addressed store, a large file has to be stored as lots of chunks, with the keys of the chunks stored in one or more indirect chunks full of keys. If there's too many keys to fit in one indirect chunk, then you need lots of indirect chunks, and in turn, one or more doubly-indirect chunks to store the keys of THEM in, until you finally have a single chunk at the top of a big tree, whose key you can use to refer to the file.

Reference counting complicates this. As you're writing the file's chunks, some may already exist in the archive, and some might not. Those that already exist, you need to up their reference count, so we know they're being used. Those that don't, you upload the actual content to create them.

But if you reuse all of the blocks of a file - because it's not changed - then you'll find you can reuse the indirect blocks, too, since they won't have changed. In which case you mustn't increment the refcounts of the data blocks, since they're still only referenced from the one indirect block, even though that indirect block now has two references.

So, what I've done is to implement a "key stream writer" API. You create a key stream writer and feed keys into it, but along with every key, you feed a 'reused?' flag which is true if the key is a reused one rather than a new one.

At the end, you finish the key stream writer, and it returns you a key for the entire tree, plus a reused? flag.

And it does all the magic internally, incrementing the refcounts on reused keys passed to it if and only if it creates a brand new indirect block referring to that key, and correctly handling refcounts on any indirect blocks it uses. It does this with a lovely recursion; the key stream writer accepts keys into a buffer and flushes the buffer out to a key stream indirect block when it gets full, at which point it creates a parent key stream writer to manage the next "level" up in the tree, writing the key of the newly created indirect block to that writer. I'm quite taken aback by how simple and elegant the code is, for what it does 🙂

Then, on top of that, I implemented an sexpr-stream writer, which accepts sexprs (Lisp data records, in effect) and buffers them, writing them to the archive when there's too much for a single block, and using a key stream writer to track the blocks written. This is to be used to implement directories, which will be a list of records giving details of all the files in the directory. The sexpr stream writer needs to know about keys mentioned in the sexprs, so it can do the same refcount-tracking magic to handle sharing correctly. Luckily, in spite of those requirements, it also works beautifully and is also elegant code!

Then I just need to implement storing files (which should be quite easy; split the file into chunks, check to see if each chunk is already in the archive and upload if not, write keys to key stream), then I can implement storing directories (recursively store files and directories, use an sexpr stream writer to log each file created). Directory storage will need a few cases handled, since I plan to store funny filesystem objects like FIFOs, device specials, symlinks, and so on.

Then I'll need the main driver that does a snapshot of a directory tree, then creates a snapshot object with metadata in and links it into the snapshot chain for a chosen tag.

I reckon that lot will take me half of the next day - then I'll implement deletion (I've been implementing primitive deletion operators for key streams and sexpr streams as I go along, so this won't take long) and extraction (again, I've written fold operators for key and sexpr streams, so this won't take long), then I'll have the core done!

What will remain then? A command-line wrapper with decent option parsing and command line ops to do a snapshot to a tag, retrieve all or part of a snapshot, list snapshots on a tag, delete a snapshot, and delete a tag, which should just be a matter of plumbing. Then I'll be done!

I still think I'm on for two days... but mainly because I'm punting on writing an actual S3 backend for now. That'll take a while, dealing with fiddly Web Services authentication stuff!

Other things I ought to get around to doing at some point are:

  1. LZMA compression. For now, I'm using a deflate implementation for Chicken Scheme that I found, but I would like to wrap liblzma as a Chicken egg, since it's thought to have tighter compression.
  2. Encryption. Right now I have a stub in the code for encrypt/decrypt operations, applied to the compressed blocks, but for now it just returns the block unchanged. I need to either use Chicken's libcrypt egg (which looks complicated) or find a lightweight implementation of a good cipher that I can wrap as an egg.
  3. More storage backends.
    1. A replication backend that wraps a list of backends, doing writes and deletes to all of them, and servicing reads from the first backend that succeeds to provide a block. That way I could back up to a local disk and to S3, and restores would come from the local disk if possible or from S3 if not. Even if I lost part of the local disk in a filesystem corruption incident, anything I lost would still be available from S3.
    2. A generational backend, that wraps a list of backends. Writes are only done to the first, but reads (and deletes) are serviced from the first one that responds. That way, I can fill up one hard disk with archives, then start filling another, and so on, building up a chain of archive partitions.

Care must be taken when doing complex things with backends, though, in order to preserve the correctness of reference counts. A group of archives that are used together by replication or generations will have refcounts that are correct across the whole group, not necessarily individually within each component archive.

More forbodingly, my existing cache backend wrapper won't mix well with deletion of snapshots when the same backend is shared between multiple machines, as it basically caches the existence of blocks; but if a block is deleted by another system, the local cache won't know, and will incorrectly report that the block is already known. Which means it won't get uploaded to the archive, which means data could be lost.

For now I'll recommend that people who want to delete from shared repositories don't use the cache, or flush the caches after a deletion, but in the long run I'd like to think about how to improve the cache. Perhaps by logging deleted keys to a special area in the backend, and when a cache starts up, it can fetch the log and look for any deletions it doesn't already have. Or just compress the entire cache file and upload it when closing the connection, and download it when starting a connection, as long as no two cache-using machines are snapshotting at the same time...

Sledging and the end of Chistmas (by )

I ment to put these up sooner but forgot - Jean was finially coxed onto the seldge. Here are the photos of her being dragged around by Barbara.

Going fast! Pull Barbara! This is fun Hello mummy

After that it was time for the decorations to come down, Jean helped me until she decided there was enough tinsel to make a nest out of and went to sleep!

Asleep in the tinsel

I think Jean enjoyed this Christmas. This was origonally supposed to go up on the sixth so no panicking out there!

Web-Empire (by )

As some of you may know I have being doing my MRes at college part time but I have also been attending business courses and mentor sessions set up for me by the Prince's Trust and as a result have steamed ahead with my business idea.

I am now officially a Web-consultant and designer and thanks to skills I learnt or at least found out about during my undergraduate and my Masters I have discovered I am better (To my horror) than alot of the compertition. If one more person asks me if I use dreamweaver I may just scream!

Ok so I know I am lucky in the fact I have the techno guru at my disposial if I ever get stuck but to be honest it is all alot simpler than I thought it would be.

So here is my Website, the business is called Web-Empire and is a unique blend of art and technology just about right for my skill set I would say. You see I've had a eureka moment in that I feel alot of technical and sciencetific stuff is inaccessible to people mainly due to artificial barriers of what people expect things to be like - I will no doubt do a longer post on this at some point (if I get the time with all those nice new customers I'm hoping to get).

I want to make things accessible and I want people to lose this concept of dichotomy which allows dangerous psuedo science to slip in.

I'm hoping that web-empire will be a nice approachable business which will help its customers 🙂

I hope you all like the web-site feed back is always appreciated. It also has a blog where nice little articles on the world of the web will go, along with book reviews on web, business and art stuff. Not to mention I am constructing a hopefully useful glossary.

Information Request II (by )

A little while ago I requested information on cooking and recipes this time it is of a more technical nature - infact what I would love very much is information on the history of computing, electronics or anything loosely related to them 🙂

I want to make Web-Empires blog and glossary a proper useful resource for people to use plus it is sort of related to a novel I am writing!

I hope this isn't me being too cheeky!

If any one can recomend some books I could read as well that would be much appreciated 🙂

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales