Snell-Pym

Jean’s Introduction to Economics (by sarah)

Jean keeps telling me not to work 🙁 She wants me to play with her instead but I have to do some work when she's about, though I really don't like doing it and it really depends on weather she's in the sort of mood were she's happy to say just sit and paint.

Our conversation went like this:

'Mummy don't work - play with me!'

'I will play with you in 20 minutes Jean I have to finish this first.'

'No'

'Yes'

'Why?'

'So that we can get money'

'For my house?' (her money box)

'No so that we can buy food.'

Jean hits the heater in annoyance, 'DONT DO THAT IF YOU BREAK IT WE CAN"T REPLACE IT AND THEN WE WILL BE COLD!!!'

'We can just go out all day.'

'No we can't Jean you need money for that.'

'Silly mummy we go shopping!'

'No Jean you need money for that.'

Puzzeled look, 'I'll get my house and horse.' (again her two money boxes.) I had explain it wasn't enough money and thats why I was working besides it's hers and would she like to pay it into the bank? Things then got a bit abstract and I was engander of doing a Mr Brown in Padington Bear.

'Oh ok, I'll make dinner mummy, to help.'

'Good girl Jean.'

A blue toy pot appears with a plastic chicken leg and some construction blocks from something that isn't lego. 'Soup mummy to keep you warm.'

'Thankyou Jean.'

Jean, Sarah | sarah | Fri 23rd Jan 2009 10:58 am | Comments (2)

Monster Blogs Sorted (by sarah)

I have spent the last few hours sorting out an issue on the Monster blogs(Blue, Red and Yellow). As some of you might have noticed the links were broken and appearing as non-clickable ugly text with all the markdown visable 🙁

We had been trying to chase down the problem for weeks when another blog I set up was showing the same issue - I was in the visual editor.

It turns out there is now a visual as well as HTML editor on WordPress blogs. This has cuased me lots of unnessascery work involving manually going through and removing unwanted tags :'(

I have gone into more detail on the Web-Empire blog as I felt it was information that needed sharing.

Sarah, Work | sarah | Thu 22nd Jan 2009 11:37 pm | Comments (0)

Ugarit interactive restore (by alaric)

Ugarit is coming along nicely. I've written the interactive archive exploration/extraction shell, although it's still a bit ugly (mtimes are still just displayed as a number rather than in a human-readable format, and the fields in ls -l outputs aren't padded to fixed widths, you can only cd up or down one level at a time rather than using a path, and little things like that).

Here it is in action, starting from the top of an archive with a single tag called Test that has two snapshots at different times. current just refers to the most recent snapshot of the two. I extract LICENCE.txt then take a look to see how it came out.

> ls
Test <tag>
> cd Test
/Test> ls
time<1232405984.074> <snapshot>
time<1232405984.162> <snapshot>
current <snapshot>
/Test> cd current
/Test/current> ls -l
-rw-r--r-- 1000 100 time<1231987453.0> README.txt
lrwxr-xr-x 1000 100 time<1231988569.0> LICENCE.txt -> subdir/LICENCE.txt
drwxr-xr-x 1000 100 time<1232013672.0> subdir
drwxr-xr-x 1000 100 time<1232155290.0> .svn
prw-r--r-- 1000 100 time<1232052740.0> FIFO
crw-r--r-- 0 100 time<1232154570.0> chardev
brw-r--r-- 0 100 time<1232154578.0> blockdev
/Test/current> cd subdir
/Test/current/subdir> ls -l
-rw-r--r-- 1000 100 time<1231987453.0> LICENCE.txt
drwxr-xr-x 1000 100 time<1232155290.0> .svn
/Test/current/subdir> get LICENCE.txt
Extracted LICENCE.txt
/Test/current/subdir> bye
-bash-3.2$ cat LICENCE.txt 
Copyright (c) 2008-2009, Warhead.org.uk Ltd

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

Neither the names of Warhead.org.uk Ltd, Snell Systems, nor Kitten Technologies, nor the names of their contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

It's getting there! The only thing that's really holding me back now is that I have limited Internet access in the evening to read the manuals for the Chicken eggs I want to use.

Mainly, I need a command-line argument wrapper, and an encryption engine.

I want to offer the choice of no compression or deflate compression when writing into the archive, but that each block should be marked with a prefix byte stating its compression algorithm so that a block can be decompressed no matter how it was compressed, as long as your copy of Ugarit knows the algorithm - I'll add lzma, and make it the default, as soon as I've written a wrapper to liblzma.

Then I want a choice of encryption algorithms, which I plan to do by writing a wrapper for libmcrypt, rather than using the cryptlib interface for Chicken I've already found, as I don't seem to have cryptlib in pkgsrc on NetBSD, and libmcrypt looks nice and simple.

There's a standard Scheme library for command line parsing, called args-fold (yes, another form of fold...), which does a similar job to getopt libraries. But I don't have it installed yet. And I need to check out a Scheme library I saw for indentation-delimited syntax, that might make for a configuration file format more appealing to parenthophobes...

Computing, Scheme | alaric | Wed 21st Jan 2009 6:00 pm | Comments (1)

fold>cons (by alaric)

Or should I say, (> fold cons)

In a way, I'm sad that I started my life programming imperatively rather than functionally. While I feel that imperative programming is still the most appropriate for some problems (particularly ones that really do look like state machines), and logical programming is great for dealing with big blobs of knowledge (such as databases), most programming tasks, I think, are best handled functionally.

But imperative programming is in fashion; it's what people are taught, since it's what people use, and it's what people use, because it's what they were taught, and lots of feedback loops maintain that status quo; there are more tools for imperative programming, and more books, and so on. Lots of people think functional programming is somehow mind-bending and difficult, but that's just because they don't know it well, and the first taste of something unusual often seems bizarre and worrying.

So, although I find functional programming more efficient and pleasant than imperative, I don't get to do very much of it. And, therefore, I'm really still learning it.

While working on Ugarit, in particular, I learnt a new trick. And that trick was the true usefulness of fold.

fold is a function in Scheme and similar languages that is used to apply a supplied function to every element in a list, but rather than making a new list of the results like map, the result of each function call is just passed into the next one. In order to get the process started, you have to supply an initial value to pass into the first call.

Here's what it'd look like in C, assuming you were dealing with an array of ints:

int fold(int *int_array, size_t array_len, int (*kons)(int element, int acc), int knil) {
  int acc = knil;
  while (array_len--) {
    acc = kons(*int_array++, acc);
  }
  return acc;
}

The names kons and knil are hints as to what they do; hints that I should have noticed, but didn't. But more on that later.

Now, you might use that to add up the elements of an array, by making a function that adds its two integer arguments and returns the result, and passing it to fold with 0 as knil. Or multiply them all together. Or something.

Now, I rarely have to add up the elements of lists, so when I read about fold in the Scheme specification, I thought "Meh, I'll remember that when I next need to add up the elements of a list", and mentally discarded it into the pile of barely-useful tools.

My mistake was that I was thinking of fold as a function to apply a function to every element of a list. Which is rather the wrong way round. fold is really a mental model of a sequence, which the function fold is just an implementation of for lists. It should be called list-fold.

What made me realise this was reading the interface to Alex Shinn's gdbm egg. It's a wrapper to the gdbm library, which provides file-based persistent key-value mappings.

One of the operations the gdbm library gives you is to iterate over the whole database. There's a gdbm_first function to get the first record, and a gdbm_next function to get the next one. Call gdbm_first, then keep calling gdbm_next until you stop getting records. That's an imperative interface, as it involves having a "cursor" that each call to gdbm_first or gdbm_next moves around; it's all about altering state.

But in the gdbm egg, Alex provided a more functional way of doing that. He provided gdbm-fold, which accepts a gdbm database handle, a kons function to call on each record in turn (along with the return value of the last call to kons), and a knil value to get the process started. It returns the result of the final kons call.

Now, the interesting thing is that if you just want a list of records, you can pass in an empty list for knil, and for kons, a function that takes a record and a list and returns a new list starting with that record and then continuing with the list passed to it (eg, taking that list and sticking the new record on the head of it). Indeed, where it not for that fact that gdbm-fold calls kons with three arguments - the key of the record, the contents of the record, and the result of the previous kons - you could pass in the standard Scheme function cons as kons, and the empty list '() as knil. Which is where the names kons and knil come from - '() is pronounced "nil". cons is an acceptable kons function.

Indeed, you can use traditional fold to copy a list, by passing in cons and '(). It'll come out in reverse order, mind, since fold will start at the beginning of the list, and the first cons will join the first element of the list onto '() to get a one-element list, and the second cons will join the second element of the list on top to end up with a list that has the second element then the first element, and so on, until you get the entire list reverse.

But this is the interesting thing about fold. Folding lists isn't actually all that interesting. What's interesting is that anything list-like can have a fold operation.

See, when working on Ugarit, I came across various sequential structures that live within archives. When a file is chopped into blocks and stored, the entire file is referred to by a reference to a list of references to all the blocks, that lives in the archive. And a directory is a list of directory entries. And a tag refers to a chain of snapshots going back through time.

When I wrote my first of these structures - the list of references to blocks, which I called a key stream - I needed a function to get a key stream back out of the archive so the blocks of a file could be brought out in order and written back to disk in an extraction operation.

My first instinct, of course, was to write a function that would recurse over the tree structure that's used to store a key stream within the archive (because if a key stream is too large to fit in a block, it needs to be split into blocks, and then another higher-level smaller key stream used to collect that list of blocks...), building up a list in memory which it would return. The problem is, each of the blocks that represent a file represent a megabyte of data; a file that's a terabyte long would involve a million keys, and building up a list in memory to hold them all struck me as fundamentally wrong. gdbm-fold let you iterate over the entire contents of a database without loading all of it into memory.

So I wrote fold-key-stream, a function that takes a reference to a key stream, a kons function that is called on every key in turn, and a knil value to get the ball rolling. Having seen gdbm-fold, I had a hunch this was the right approach. I expected to have to perform a mind-bending turning inside out of my simple tree walk, but when I sat down and did it, the right answer fell out easily:

;; kons is called on (key type accumulator) for every key in the stream, in order
(define (fold-key-stream archive key ks-type kons knil)
   (define type (archive-exists? archive key))
   (if (eq? ks-type type)
      ; Recurse
      (begin
         (define subkeys (deserialise-key-stream (archive-get archive key)))
         (fold 
            (lambda (subkey acc) (fold-key-stream archive subkey ks-type kons acc))
            knil
            subkeys))
      ; Leaf node
      (kons key type knil)))

The function asks the archive for the type of a block, given that block's key, to see if it's followed the tree all the way down to the leaves or not, since the key stream blocks will all have a particular type (ks-type) while the leaves - the actual blocks the key stream points to, such as file data - will have som different type. So you could call fold-key-stream on a block that's not even a key stream block, and it'll treat it like a key stream with just one element in, that block.

So the code starts by checking the type of the block. If it's not a key-stream block, it's a leaf block, so we just call kons on it, passing in knil since this is the only call to kons, and return the result. Job done.

But if it's not a leaf node, then we go into the (begin ... bit. The first thing we do is to read the key-stream block and convert it into a list of keys, which we call subkeys. And then we get funny, by calling that useless fold function for summing up lists on it. Now, we have a list of keys, which may be actual keys to call kons on, or keys of further subtrees. But we already know how to handle both those cases; if it's a leaf node, fold-key-stream would just call kons on it, and if it's a list of further subtrees, then each of them is actually a smaller key-stream in itself, and fold-key-stream can deal with them... so we just wrap fold-key-stream in a little wrapper function that adds in the extra parameters, such as the reference to the archive, and use fold to call it on every subkey in the block. fold provides the basic plumbing for us, of actually calling the function and passing in the accumulator parameter. The clever part is that each call to fold-key-stream is passed the knil from the previous call; we're recursing folds within folds to walk down a multi-way tree, but the chain of passing knil into kons is preserved.

So really fold is more like a functional foreach operation: the fun isn't in calling it on lists, but in providing fold operations for all sorts of sequential data structures. Since then I've written several fold operations for various data structures within the archive. Indeed, one of them is a high-level ls operation on nodes within the archive, from the root that lists all the tags in the archive, to each tag that has a list of snapshots, to the directories and files within the snapshots. I will use this with a kons function that prints out each entry in turn, pausing for a keypress after every twenty or so, and return to a store continuation (sort of like a longjmp in C, or throwing an exception) to break out of the fold if the user presses 'q'.

In practice, any function that builds up a list and returns it probably ought to be recast as a folder; this can be done simply by letting the user provide their own cons and '() (let's call the kons and knil...) rather than using cons and '() to build a list up yourself. Then the user can get a normal list if they want, or can directly process the elements in turn, rather than getting a list from your function that they immediately strip down.

So, a fold function is, in effect, a way of representing a list. Or, at least, a potential list, that doesn't exist until you invoke it. Which leads one to think of defining list operations like map over them...

Scheme | alaric | Tue 20th Jan 2009 6:00 pm | Comments (0)

Butterfly Cake (by sarah)

As promised the butterfly cakes construction is now on our cooking blog.

Salaric-Cooking

I'm hoping to have some of the other cakes and things on there soon!

Recipies | sarah | 4:22 pm | Comments (0)

Snell-Pym

Sarah and Alaric Snell-Pym living in interesting times

Jean’s Introduction to Economics (by sarah)

Monster Blogs Sorted (by sarah)

Ugarit interactive restore (by alaric)

fold>cons (by alaric)

Butterfly Cake (by sarah)

Search

Categories

About Us

Ada Lovelace Day

Business

Family

Fictional Friends

Friends

Mind candy

Projects

The Salaric Blogs

Archives

Meta

Snell-Pym

Sarah and Alaric Snell-Pym living in interesting times

Jean’s Introduction to Economics (by sarah)

Monster Blogs Sorted (by sarah)

Ugarit interactive restore (by alaric)

fold>cons (by alaric)

Butterfly Cake (by sarah)

Subscribe

Search

Categories

About Us

Ada Lovelace Day

Business

Family

Fictional Friends

Friends

Mind candy

Projects

The Salaric Blogs

Archives

Meta