Motivation

When I was a child, I had a lot of technical books lying around. My grandmother was a science teacher and my grandfather was an engineer, and lots of their old books lined my shelves. And alongside them, I had a lot of science fiction, too. And from all this, I learnt that technology could be used to extend the abilities of our frail bodies, to do amazing things...

I was an avid reader. I read all the books I could lay my hands on, and used to raid the libraries for more whenever I could. The fact that correctly arranged bits of various materials could, given the right inputs, enable you to fly, or communicate over long distances, entranced me... I designed everything from computers to deep-space colonisation programmes. But all I did was design them; I had a few tools, but not enough to build anything interesting. Most of my building efforts went into building further tools to try and bootstrap myself to greater things. I made a simple CMOS bus analyzer, and some power supplies and things like that, but even the cost of components was prohibitive. How could I build a robot if I couldn't afford motors, large enough batteries, and all the chips required to get a working computer?

And so, my initial enthusiasm with the wonders of technology was dulled, and replaced by a kind of cynical weariness. I could design something awesome, and imagine all the steps required to build it in arbitrary detail, but I couldn't see a way to make it anything more than a dream. My room filled with books, and notepads packed with diagrams, but no awesome pinnacles of technology.

The one area I could press on with, of course, was software. Once you've got a computer, you don't really need to spend more on it to make as much software as you have time for. I found that writing applications was rather tedious, though. Software engineering is a new, immature, field; it is still at the stage where extensive manual effort is required to make the simplest thing - and the situation was even worse programming in Pascal for MS-DOS in the early 1990s. To write an application you needed to write your own user interface and data storage and memory management and all that stuff. I set off to write a spreadsheet in the hope of making money, but quickly found that redesigning the operating system was much more fun - applications, to be honest, weren't all that interesting when I realised that the very foundations I had to build them on (programming languages, standard libraries, operating systems, the lot) were unbelievably shoddy; perhaps once I'd fixed them, application development might be less tedious...

So I disappeared off down a rabbit hole of researching computer science. I still wrote software, but it was mainly prototypes and experiments, never anything that would be useful in its own right. I wrote applications as coursework for my A-level in computing, of course; and I ended up writing a library of low-level hardware drivers for DJGPP that I released into the public domain in the mid 1990s, as a spin-off from some other project, but my main output, again, wasn't useful finished software, but designs.

I was worried that this meant I was lazy. The effort required to do the "boring bits" of finishing something just didn't appeal to me; when presented with a problem, I could design an awesome, exciting, solution in half an hour, but implementing that design would take weeks, during which several interesting new problems to solve would turn up. Was this some personal failing, that I lacked the stamina to finish anything? I loved designing things, but it also weighed heavily on me that I was all talk and no action.

When I first got a job writing software, I was worried that I would be afflicted by the same laziness, and have trouble motivating myself. As it turned out, I was OK; I found that writing software for other people was easy, as I'd get to see them looking all happy when I presented them with a finished solution to their problem. This provided all the motivation I needed to keep my nose to the grindstone.

But when I was spending my time writing lots of pointless code to work around the deficiencies of POSIX or SQL or something, I couldn't help myself from thinking about how I'd build these underlying bits of infrastructure if I could. I'd already accumulated a fair amount of ideas in the past; I'd been collecting them together under the umbrella of ARGON, an integrated design for an operating system, clustered fault-tolerant virtual machine, programming language, distributed database, and other related bits of infrastructure that I feel would make a much better basis on which to build applications than the current big ball of mud.

And yet I wonder why I waste my time designing something that would cost millions to implement, and which would be doomed to fail in an operating system market dominated by well-entrenched existing players. I feel I'm doomed to design things that can never be built, going all the way back to designing spacecraft as a child. And yet, it's the one thing I'm really passionate about. Whenever I have a spare moment to think, I'm usually designing something.

I've tried to make a career of this by focussing on "software architecture", as the activity of designing software is currently known. And I've managed to move away from being paid to design and build apps, towards the infrastructure projects I crave (such as databases). That's not always been a great thing; at GenieDB, I had to avoid thinking about the one big vague area in the ARGON design, the distributed database TUNGSTEN, in case of conflicts of interest with my employer. Now I work for an analytical/retention database company, and analytics is an area that I don't feel compelled to make part of ARGON (but I need to be a bit careful about the retention side, as I think that support for archival storage is woefully inadequate in modern software systems).

Not that I've been entirely unproductive in terms of working software, mind - there's a few tools I've built to solve my own problems; Tangle is a tool for documenting cabling and networks, that I wrote to help myself with some contract work I had looking after a moderately complicated hosting setup. The Eye of Horus is a monitoring system I built for keeping an eye on my own servers. I wrote banterpixra to help myself learn Lojban. I'm working on Ugarit, a backup/archival system based on content-addressed storage, to improve that woeful support for archival storage on my own servers. But none of them are anything like the kinds of grand ideas I conjure up on a daily basis, despite being the result of many days' work.

As the title of this blog post suggests, though, I struggle with motivation. My dreams are so far beyond my own ability to execute them that even building things like Ugarit can depress me as the slow pace at which software is built shows just how unreachable my goals are. It's even worse these days, as I have to fit looking after two children and a disabled wife around being the sole wage earner; there are sufficient tasks that only I can do that my "free time" boils down to Thursday evenings (except lately, as I've been having to spend those house and mortgage hunting) and the odd hour or so in the evenings once the kids are all in bed (as I am writing this now, an hour and a half after I'd have ideally liked to be in bed). I really wish I had more time to myself, but when I do get time, do I spend it designing wonderful things that nobody will ever build, or actually making trivial things that those with more time and energy could do much better?

Pass The Conch

In The Lord of the Flies, the children (marooned on an island and working out how to organise themselves to survive) develop a technique for managing debate: they use a conch shell as a token to represent who currently holds the floor. Without the conch, you can't talk; you have to wait your turn.

Cut to the Real World of Commerce and Industry: in various places I've worked, there's been a number of shared resources which can only be used by one person or agent at a time. Mainly, these have been testing servers - if you are doing performance analyses, or looking for timing-related bugs, you can't have anyone else running jobs on the same server as you, or they'll compete for resources and interfere with your results. Or perhaps there's only one "data area" of some kind, and two attempts to use it at once will lead to catastrophe.

This is usually handed by asking around the office or in IRC: "Is anyone using X?", hoping that anybody who is is still around (as opposed to too busy doing something to notice the request, or leaving a job going while out to lunch). Because of the unreliability of this system, and the inability to integrate it with automatic systems that need to claim resources (such as automatic test systems), I have often wished for a software tool to manage it. Which would, naturally, be called "conch".

Here's my feature wishlist:

  • Network-based. A central conch server tracks a set of conches, accessed via a Web interface or a direct protocol. The direct protocol should have a command-line client for scripting, and be trivial to write native client libraries for in programming languages.

  • Authenticated. No need for super security, but we want to keep out casual mischief-makers, so require authentication to use the server; to enable easy integration with other workflow apps, support htaccess files, "trust the upstream proxy" (eg, accept HTTP auth usernames and ignore any passwords sent), or running an arbitrary shell command to validate a username/password pair. It might be used across the public Internet, so allow for SSL wrapping the connection. The command line client should, by default, use the username and password from ~/.conch or prompt for them (and save them in ~/.conch) if not specified.

  • The ability to create or delete resources, to claim a currently-free resource, to release a resource you hold, or to "force" the release of a resource that somebody else holds (if they forget and go home, etc).

  • The ability to list the status of a resource, to list the resources held by a specific user, to list the resources held by yourself, to list all users with resources held, and to list all resources.

  • Fine-grained access control (per-user rights limitation) might be handy, but probably not useful for the first draft.

  • An IRC bot might be cool - at least for logging resource claims/releases and commands to list current state; maybe for resource claims/releases as well, if users are either trusted by nick or authenticate via a private message.

If this doesn't already exist, it should be easy to build (something could be knocked up with awful and sql-de-lite in a day or two, I bet!)...

What if my child is gay?

It's widely held that it's a scary experience for somebody to tell their parents that they are gay. As a parent, therefore, I began to wonder how I could arrange it so that, if any of my children turn out to be gay, they could be spared any distress in telling us about it.

I surmised that the distress arose because of this pattern:

  1. Child is raised by parents with the assumption that they will be straight. This might be a stated assumption - the parents actually talking about "when you start to bring [girl/boy]friends home" or "when you get married and have kids", or simply be signs of homophobia in the parent. Perhaps it could even be that the parents show no signs of expecting their child to be heterosexual, but the child (through other social conditioning) nonetheless assume (correctly or not) that's what their parents expect.

  2. Child, at some point, realises they have desires they feel their parents would disapprove of or be shocked by, as well as or instead of "normal" heterosexual desires.

  3. Child eventually announces this to the parents.

  4. Parents reaction ranges from "Oh, that's nice dear" to "Oh my god! What a shock... but now I think about it it's no big deal" to "YOU ARE NO CHILD OF MINE".

I presume it's either the fear of not knowing how the parent will respond, or suspecting they'll respond negatively, that makes it stressful for people to tell their parents that they're gay, bisexual, transgender, or whatever.

So I started wondering if it'd be best to, at some point, outright say "You know, your mother and I are totally fine with whatever sexual orientation you choose". Maybe that'd be a bit awkward; perhaps it'd be better to just to leave it implicit-but-hinted-at by openly introducing our gay/poly/etc friends to the children as such, and other such ways of showing that we're OK with it all.

But I began to realise that it would be much better if our children never actually had to "come out" to us about anything. Rather than trying to make step 4 of the above list less traumatic, how about if we just make it unnecessary by stopping the process at step 1?

I mean, ideally, our children should be able to bring home same-sex partners or whatever without feeling they have to gain our permission and acceptance first.

For a start, I think people are too enthusiastic about putting themselves (and, worse, each other) into boxes. I mean, I am attracted to women, and have never fancied a man, so I guess I count as straight, but I can find no reason to assume I might never fall in love with a man (I might just be really really picky and have not met Mr Right yet). And what about a bisexual person who has the occasional gay crush, but never really acts on it, and (quite happily) only ever goes out with members of the opposite sex, eventually marries one, and lives happily ever after? There's no problem with that, and their actual sexual label becomes a matter of perspective.

So, sod that. As my children are human beings, I am aware that they might acquire any combination of sexual tastes that humans are capable of; and those tastes are their own affair - which they may or may not choose to discuss with their parents, as they see fit. And what kinds (and numbers; don't forget polyamory) of people they actually bring home to meet us is their choice. And I don't require them to declare a classification up front. I want my children to feel free to bring home whatever partners take their fancy.

Of course, I don't want to deny them the right to stand up and say "Father! I wish to declare that I BAT FOR THE OTHER TEAM!" if they want to. I think that labeling yourself can be an important thing for a young person, learning to establish their own identity. If they want to do that, that's fine, and I'll support them in doing so and treat the event with the gravity they seem to want from it; if they come to me looking like they're after a rite of passage, I'll try to provide one. But I don't want them to think they have to.

But what I really want in the end, I guess, is for my children to feel free to be themselves (at least at home; I can't be responsible for the reactions of the rest of society, sadly), and for them to know that they have my support in whatever they do, as long as they do it ethically.

Scientists

I've been reading a book lately called "The Brain that Changes Itself", which discusses a once-controversial theory that the adult brain can rewire itself in the same manner as the developing brain, with the main difference being that it just requires more effort to focus the attention. It used to be believed that once the brain had finished developing, its structure and function was fixed. However, it turns out that, with the right approach, the victims of strokes and the like can retrain their brain to perform the lost functions with different bits of neural tissue.

But that's not what I'm writing about today.

One of the things that has struck me in the book's account of how the neurbiologists rejected this controversial idea, along with previous impressions I had obtained from other sources, is that the academic community is riddled with idiots who reject evidence that contradicts their beliefs about their field.

To see why this is crazy, look at it like this. Science is, largely, about finding the underlying truths of the Universe. The problem is that these underlying truths can rarely be directly observed (and we've figured most of the ones that can out by now). One cannot directly percieve an electron, but one can deduce its existance by perceiving the effect of an electron gun in an evacuated chamber pointed at a phosphorescent screen. But there are multiple interpretations of that experiment - perhaps there are tiny charged particles being released which stimulate the screen into producing light... or perhaps the electron gun actually causes the metal of the negative electrode to ablate and the resulting ion cloud then condenses into an invisibly thin thread which coils out across the vacuum until it touches the screen, whereupon electricity flows directly down the wire and causes the spot of light. Perhaps the vacuum is required, not because air inhibits the free motion of electrons, but because the air disrupts the formation of the thread.

Yet we can rule out the thread theory in a number of ways, and there are other experiments that show that electrons are discrete charged particles. It's the weight of a whole heap of evidence that all reinforce the correct theory and disprove all the alternative theories. However, one can never be entirely sure that another theory has yet to be discovered, which all the existing experiments fail to disprove - but which leads to the development of an experiment which disproves the electron theory, and reinforces the new theory. Perhaps there are no electrons; but the "electron theory" has provided us with useful predictions, and nobody has yet found fault with it. So we stick with it. Even if it's wrong, it's useful - and if we ever find it's wrong, that will give us the clues required to find a better theory.

But there are levels of deduction involved here. We directly observe the construction of electron guns and the appearance of spots of light with our eyes. We apply previously reinforced beliefs that the electrical power supply we connect the electron gun to will provide a voltage, and that the electron gun will therefore emit electrons. We observe the appearance of a spot of light, and therefore conclude that the electrons flew through the chamber and caused the spot of light. And from that, combined with existing knowledge about the nature of light and matter, we construct a theory that electrons can travel through a vacuum then cause phosphorescent screens to glow. Each level of further deduction is less certain then those that it builds upon, since its truth depends on their truth, plus a further step of deduction - which might be wrong in itself.

So what do we do when new evidence comes and appears to disprove our theory? Say somebody publishes the results of an experiment that show that, if a kitten is within one metre of the chamber, the spot of light on the screen grows into the kanjii for "potato". The electron theory does not predict this. Have we disproved the theory of electrons? Or have we merely discovered that kittens emit complicated high-frequency magnetic fields that disturb the paths of nearby electron beams? Well, I'm sure further experiments would be performed, surrounding kittens with Hall sensors and SQUIDs and the like, but for now, let's imagine we only have that one data point to look at.

Electron theorists would probably question the validity of the experiment at all. For a start, it is a leap of faith that the experiment was set up correctly. Perhaps the electron gun itself is defective and projects the kanjii symbol directly, and the kitten has nothing to do with it. Perhaps there are coils under the bench generating magnetic fields that steer the beam to draw the symbol, either accidentally or as part of a deliberate academic prank. In this case, with seemingly unrelated objects (kittens) having suspiciously unexpected consequences (kanjii characters), that is a distinct possibility, so the kitten theorists would be under additional burden of proof to recreate the experiment - and to ask electron theorists to defend their theory by recreating the experiment themselves to show that it does not occur with "trusted" equipment. For sure, the academic community does need some level of protection from a "denial of service" attack from charlatans assaulting it with fraudulent claims that have to be tediously experimentally dismissed. There is scope to accidentally perform flawed experiments due to overlooking some factor or failing to test all the equipment used for defects, leading to honest results that turn out to be misleading. This gives some credibility to the concept that some data can be rejected out-of-hand for contradicting widely-held theories, but it is all too easy to take conformist censorship at this level too far and reject evidence that actually shows flaws in currently-sacred theories.

But what if the conflicting evidence is less silly, or it is independently and widely confirmed in other experiments, showing there is definitely some effect at work? Perhaps kittens do emit mysterious high-frequency magnetic fields - in which case, our theory of electrons is still valid; it's just our theory of kittens which was wrong. As physicists are often more familiar with electrons than kittens, it's easy for them to defend their electron theory and question the researcher's grasp of kitten theory, thereby making it somebody else's problem. Meanwhile, biologists asked to defend the theory that mammal tissue can't generate intense, high frequency, magnetic fields might point to excellent arguments about the maximum rates of charge movements in various tissues, and tell the physicsts that their electron theory must be all wrong. At least we now have some kind of debate, rather than outright censorship, but - particularly in cross-specialisation problems like this one - it's all too easy for both sides to just ignore the evidence and blame it on the other.

But what makes scientists so defensive? Good scientists realise that the data is all we can be sure about (and, even then, we must be careful of experimental errors, or failing to control for unknown influences). They treat theories as temporary affairs, which suffice until they are found wanting, or something better is found. Where does this academic Nazism emerge, where academics will often jump immediately to questioning the motives and competence of people who hold views that contradict the mainstream, leading to the mainstream remaining mainstream long after the weight of contradicting evidence becomes overpowering?

I think a part of the problem is the fact that scientists with new ideas have to fight so hard to get them heard over the mainstream in the first place - they find it hard to give up the fighting mentality once they've been accepted.

Another part of the problem might be human nature - scientists are taught the existing lore of their field in lectures where they soak it all up, and probably record it in their minds as unassailable truth. I suspect they are much more open to reconsider theories they encounter as "new" after having lived, for a while, in an academic world in which no theory explaining the behaviour in question had yet emerged. Theories considered "complete" when they were learnt are probably rarely questioned.

Cloud Storage

Currently, you can go to various providers and buy online storage capacity (IMHO, rsync.net is best, after research I did to find an offsite backup host for work). It's more expensive than a hard disk in your computer, and miles slower, but it has one brilliant advantage: it's remote. So it's perfect for backups.

And that's the heart of a free market - storage is cheap to the cloud providers (they just buy disks, and in bulk at that), but their storage has more value to you than your own storage because of it's remoteness. So they can rent it to you at a markup, and you get a benefit, and everyone is happy. Money flows, the economy grows, and one day we'll get to have affordable space tourism et cetera.

But large, centralised, cloud storage providers are attractive targets for people who want to steal data. They become centralised points of failure; if they go bankrupt, lots of people lose their backups. Therefore, it's smart to do your backups to more than one of them, just in case. But that means setting up your systems to talk to each one's interfaces, arranging payment and agreeing to terms and conditions with them all individually, and so on.

Surely this state of affairs can be improved? With ADVANCED TECHNOLOGY?

Well, I think it can, and here's how.

Imagine a marketplace for cloud storage. This might be a centralised trading server, or it might be a peer-to-peer protocol... greater minds than I are working on decentralised P2P marketplaces, I hope. But however it's implemented, imagine that I can run a daemon on my server that measures my free disk space, subtracts some amount (10GiB?) for my short-term growth, and rents the rest out on the marketplace. By looking at the depth of market (how many unfulfilled bids for how much storage are out there, ordered by bidding price, highest first), it can choose the best price it can rent my storage for that will use up my available storage. My offer will include a price to upload a block (base price + price per byte), the price to keep a block (base price + price per byte, and the billing period) and the price to download a block (base price + price per byte).

It's an interesting question whether periodic storage fees, or just having a "successful download bounty", will win out. Charging storage fees encourages the buyers to notify you if they don't want a block any more, but just charging for successful downloads (and just deleting blocks that aren't referenced on an LRU basis to free up space) is beautifully simple.

The trust model is rather different to normal cloud providers. If a provider loses their data, I can't sue them; I just don't get to pay them the download bounty for getting my block back. So I'll have to store my data widely across several providers, and prices will lower to take account of that, and I'll need to do trial downloads to check my blocks are still available from time to time, and if not, hire a new storage provider to take a new copy of that block from a surviving copy.

But all of this can be done in software. A storage manager app would present a simple get/store block interface to, eg, Ugarit or Tahoe-LAFS, but behind the scenes, it would manage relationships with providers, checking blocks are available, ensuring there's a sufficient number of copies of each, shifting between providers when rates go up or if a provider's reliability score drops too low, etc.

But all of this depends on it being easy for computers to send money between themselves, which is where Bitcion comes in. Storage providers and consumers can just run bitcoin wallets and arrange transfers between themselves.

The end result? I can run a daemon to rent out spare storage space on my system, and money would slowly accrue in a Bitcoin wallet. The daemon would rent out all but a safety margin of my space, and as I used up my safety margin, it would shed blocks (notifying the owner) to make more room, and increase its offer price in the market to reduce demand so that the lower-paying blocks move willingly and can be replaced with higher-paying blocks.

And I can run another daemon as part of my backup system, that would spend from the same bitcoin wallet to get backup space on other machines. When I have mostly empty filesystems, I will be spending little on backups, and earning lots on renting that space out, so money will accumulate... when I start to fill the filesystems up, the trickle will slowly reverse, and then perhaps I should spend my profits on a new hard disk before they all go and I have to top it up from my own Bitcoin wallet!

Details

The devil's in the details, as always. The marketplace will depend on being able to place bids in a standard format. Potential buyers will need to be able to introduce themselves, perhaps via an HTTP-based protocol served by the storage-for-hire daemon on my server; sign up for an account by registering a public key, and then access upload/download/delete block interfaces. The daemon would quote a price in the market, but each block upload would have to be annotated with the rates the buyer is offering, to avoid race conditions when rates change during a transaction. Blocks with unattractive rates can be rejected by the server. There would need to be a back channel for the server to asynchronously notify buyers that it needs to get rid of a block - I'd hate to force buyers to have public IPs (many will be behind NAT) by giving them an HTTP endpoint, but perhaps a choice of that or polling the server to ask for blocks that need to be shifted within a time limit would suffice. It would also be polite for the server to inform the buyer of any blocks it had to delete without notice, rather than waiting for them to check them.

But how to address blocks? On the one hand, I want content-addressed storage, as it prevents cheating. There's no way a bad server can claim to have blocks it's deleted by sending back random junk and saying "But that's what you gave me! PROVE I'M LYING!" if they are identified by hashes. But on the other hand, existing systems have their own addressing schemes (Ugarit identifies block by a keyed hash of their uncompressed plaintext contents, so that the hash doesn't give away the content (it's a keyed hash), but it will also remain unchanged if the compression or encryption algorithms are upgraded - old blocks can still be read while new blocks are written with the new algorithms, and old blocks can be re-compressed and re-encrypted without breaking the references to them). So enforcing that blocks are identified by the SHA256 of their ciphertext would exclude various uses.

The best scheme I can think of is this: each block is identified by a client-supplied ID string combined with a hash based on an agreed algorithm. So the server would say "I support SHA1, SHA256, and Tiger", and the client would say "Ok, here's a block I want to call Boris, and I like SHA256", and the server would reply with "Ok, that block's called Boris:<256-bit hash>". The client should check the returned hash matches the hash it computed itself. A client that's happy with server-assigned IDs would give all their blocks the same name (the empty string), as the hash in the resulting identifier keeps it unique. The server will store the block by hash (deduplicating blocks with the same hash), but keep a per-customer table mapping names to hashes. If the client hasn't provided distinct names, then the LAST mapping for the name provided is kept.

Meanwhile, on retrieval, a block can be requested by name, or by hash. The client should remember the hashes, even if it uses names, so that it can check that the server isn't sending it a garbage block.

As a Ugarit backend, this would work fine; the Ugarit keyed hash can be used as the name, and the server's hash stored for cross-checking on retrieval. If the local store is lost due to disaster, it could either be restored from another backup somehow, or it could just be skipped and we hope that the servers don't lie to us (the latter would be better than refusing to try to restore at all!). Ugarit tags (which are the roots of the hash tree) can be stored by using the tag name as a block name, and using the fact that multiple uploads with the same block name just overwrite the name->hash mapping.

Needless to say, clients should encrypt ALL their data! You can't trust random providers.

Have I missed any other scams? Servers might try to accept lots of blocks and keep the upload fees and never keep them. That provides an incentive to servers to not charge upload fees at all, and just hope to make money on download fees and/or storage. It'll be interesting to see how the market ends up structuring itself! Also, as it's a low risk to accept data from somebody but a high risk to send money, I think the protocol should be based around periodic billing at the end of the period, rather than per-operation micropayments (that makes more efficient use of Bitcoin's transaction charge and hour transaction confirmation latency, too). Billing periods could be anything from a day upwards.

But this is a real cloud, in a sense far beyond the current definition of cloud computing. Millions of tiny providers, all competing in a marketplace, with the clients automatically spreading their risk across them in a fine-grained way. I think that'd work for storage, as it's easy to define and commoditise; doing it for computation might be possible, but it'd require much more standardisation of execution models and sandboxes and the like...

(Thanks to the folks in #bitcoin on Freenode IRC for inspiration for all this!)

UPDATE: A friend suggests an improvement over periodic downloads to check the data is still there. Have a "check" operation where the client supplies a random key and a block name or hash, and the server has to hash the block along with the key and return the result. That allows the client to check the block is still there if it has a way to get a local copy of the block. Otherwise, it would still have to rely on downloading the block and checking the hash matches.

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales