Category: Computing

Alaric’s projects for this year

This year's going to be pretty busy with settling into the new home, but I have a few projects.

  1. Finish the ring casting I nearly finished before the move. That's a priority.
  2. Resurrect my aluminium foundry. In particular, it's our bronze wedding anniversary, so Sarah's going to design a pattern for a sundial, which I will cast in Aluminium bronze, a nice alloy that I can make myself from my scrap aluminium and bits of old plumbing...
  3. Continue with minor stuff on Ugarit, but as a milestone, build the distributed storage backend, which will rock.
  4. Work on my wearable computer project. No specific milestone for this, as it's currently a long drawn out research/prototyping phase as I sort out many details.

Wish me luck... I usually suffer from "all my weekends getting eaten up", but as my New Year's Resolution has been to spend at least one day every two weeks doing something fun with my children, I'm going to be booking weekend days in my calendar in advance through the year for that and my own projects. Before they get filled up!

Motivation

When I was a child, I had a lot of technical books lying around. My grandmother was a science teacher and my grandfather was an engineer, and lots of their old books lined my shelves. And alongside them, I had a lot of science fiction, too. And from all this, I learnt that technology could be used to extend the abilities of our frail bodies, to do amazing things...

I was an avid reader. I read all the books I could lay my hands on, and used to raid the libraries for more whenever I could. The fact that correctly arranged bits of various materials could, given the right inputs, enable you to fly, or communicate over long distances, entranced me... I designed everything from computers to deep-space colonisation programmes. But all I did was design them; I had a few tools, but not enough to build anything interesting. Most of my building efforts went into building further tools to try and bootstrap myself to greater things. I made a simple CMOS bus analyzer, and some power supplies and things like that, but even the cost of components was prohibitive. How could I build a robot if I couldn't afford motors, large enough batteries, and all the chips required to get a working computer?

And so, my initial enthusiasm with the wonders of technology was dulled, and replaced by a kind of cynical weariness. I could design something awesome, and imagine all the steps required to build it in arbitrary detail, but I couldn't see a way to make it anything more than a dream. My room filled with books, and notepads packed with diagrams, but no awesome pinnacles of technology.

The one area I could press on with, of course, was software. Once you've got a computer, you don't really need to spend more on it to make as much software as you have time for. I found that writing applications was rather tedious, though. Software engineering is a new, immature, field; it is still at the stage where extensive manual effort is required to make the simplest thing - and the situation was even worse programming in Pascal for MS-DOS in the early 1990s. To write an application you needed to write your own user interface and data storage and memory management and all that stuff. I set off to write a spreadsheet in the hope of making money, but quickly found that redesigning the operating system was much more fun - applications, to be honest, weren't all that interesting when I realised that the very foundations I had to build them on (programming languages, standard libraries, operating systems, the lot) were unbelievably shoddy; perhaps once I'd fixed them, application development might be less tedious...

So I disappeared off down a rabbit hole of researching computer science. I still wrote software, but it was mainly prototypes and experiments, never anything that would be useful in its own right. I wrote applications as coursework for my A-level in computing, of course; and I ended up writing a library of low-level hardware drivers for DJGPP that I released into the public domain in the mid 1990s, as a spin-off from some other project, but my main output, again, wasn't useful finished software, but designs.

I was worried that this meant I was lazy. The effort required to do the "boring bits" of finishing something just didn't appeal to me; when presented with a problem, I could design an awesome, exciting, solution in half an hour, but implementing that design would take weeks, during which several interesting new problems to solve would turn up. Was this some personal failing, that I lacked the stamina to finish anything? I loved designing things, but it also weighed heavily on me that I was all talk and no action.

When I first got a job writing software, I was worried that I would be afflicted by the same laziness, and have trouble motivating myself. As it turned out, I was OK; I found that writing software for other people was easy, as I'd get to see them looking all happy when I presented them with a finished solution to their problem. This provided all the motivation I needed to keep my nose to the grindstone.

But when I was spending my time writing lots of pointless code to work around the deficiencies of POSIX or SQL or something, I couldn't help myself from thinking about how I'd build these underlying bits of infrastructure if I could. I'd already accumulated a fair amount of ideas in the past; I'd been collecting them together under the umbrella of ARGON, an integrated design for an operating system, clustered fault-tolerant virtual machine, programming language, distributed database, and other related bits of infrastructure that I feel would make a much better basis on which to build applications than the current big ball of mud.

And yet I wonder why I waste my time designing something that would cost millions to implement, and which would be doomed to fail in an operating system market dominated by well-entrenched existing players. I feel I'm doomed to design things that can never be built, going all the way back to designing spacecraft as a child. And yet, it's the one thing I'm really passionate about. Whenever I have a spare moment to think, I'm usually designing something.

I've tried to make a career of this by focussing on "software architecture", as the activity of designing software is currently known. And I've managed to move away from being paid to design and build apps, towards the infrastructure projects I crave (such as databases). That's not always been a great thing; at GenieDB, I had to avoid thinking about the one big vague area in the ARGON design, the distributed database TUNGSTEN, in case of conflicts of interest with my employer. Now I work for an analytical/retention database company, and analytics is an area that I don't feel compelled to make part of ARGON (but I need to be a bit careful about the retention side, as I think that support for archival storage is woefully inadequate in modern software systems).

Not that I've been entirely unproductive in terms of working software, mind - there's a few tools I've built to solve my own problems; Tangle is a tool for documenting cabling and networks, that I wrote to help myself with some contract work I had looking after a moderately complicated hosting setup. The Eye of Horus is a monitoring system I built for keeping an eye on my own servers. I wrote banterpixra to help myself learn Lojban. I'm working on Ugarit, a backup/archival system based on content-addressed storage, to improve that woeful support for archival storage on my own servers. But none of them are anything like the kinds of grand ideas I conjure up on a daily basis, despite being the result of many days' work.

As the title of this blog post suggests, though, I struggle with motivation. My dreams are so far beyond my own ability to execute them that even building things like Ugarit can depress me as the slow pace at which software is built shows just how unreachable my goals are. It's even worse these days, as I have to fit looking after two children and a disabled wife around being the sole wage earner; there are sufficient tasks that only I can do that my "free time" boils down to Thursday evenings (except lately, as I've been having to spend those house and mortgage hunting) and the odd hour or so in the evenings once the kids are all in bed (as I am writing this now, an hour and a half after I'd have ideally liked to be in bed). I really wish I had more time to myself, but when I do get time, do I spend it designing wonderful things that nobody will ever build, or actually making trivial things that those with more time and energy could do much better?

Pass The Conch

In The Lord of the Flies, the children (marooned on an island and working out how to organise themselves to survive) develop a technique for managing debate: they use a conch shell as a token to represent who currently holds the floor. Without the conch, you can't talk; you have to wait your turn.

Cut to the Real World of Commerce and Industry: in various places I've worked, there's been a number of shared resources which can only be used by one person or agent at a time. Mainly, these have been testing servers - if you are doing performance analyses, or looking for timing-related bugs, you can't have anyone else running jobs on the same server as you, or they'll compete for resources and interfere with your results. Or perhaps there's only one "data area" of some kind, and two attempts to use it at once will lead to catastrophe.

This is usually handed by asking around the office or in IRC: "Is anyone using X?", hoping that anybody who is is still around (as opposed to too busy doing something to notice the request, or leaving a job going while out to lunch). Because of the unreliability of this system, and the inability to integrate it with automatic systems that need to claim resources (such as automatic test systems), I have often wished for a software tool to manage it. Which would, naturally, be called "conch".

Here's my feature wishlist:

  • Network-based. A central conch server tracks a set of conches, accessed via a Web interface or a direct protocol. The direct protocol should have a command-line client for scripting, and be trivial to write native client libraries for in programming languages.

  • Authenticated. No need for super security, but we want to keep out casual mischief-makers, so require authentication to use the server; to enable easy integration with other workflow apps, support htaccess files, "trust the upstream proxy" (eg, accept HTTP auth usernames and ignore any passwords sent), or running an arbitrary shell command to validate a username/password pair. It might be used across the public Internet, so allow for SSL wrapping the connection. The command line client should, by default, use the username and password from ~/.conch or prompt for them (and save them in ~/.conch) if not specified.

  • The ability to create or delete resources, to claim a currently-free resource, to release a resource you hold, or to "force" the release of a resource that somebody else holds (if they forget and go home, etc).

  • The ability to list the status of a resource, to list the resources held by a specific user, to list the resources held by yourself, to list all users with resources held, and to list all resources.

  • Fine-grained access control (per-user rights limitation) might be handy, but probably not useful for the first draft.

  • An IRC bot might be cool - at least for logging resource claims/releases and commands to list current state; maybe for resource claims/releases as well, if users are either trusted by nick or authenticate via a private message.

If this doesn't already exist, it should be easy to build (something could be knocked up with awful and sql-de-lite in a day or two, I bet!)...

Cloud Storage

Currently, you can go to various providers and buy online storage capacity (IMHO, rsync.net is best, after research I did to find an offsite backup host for work). It's more expensive than a hard disk in your computer, and miles slower, but it has one brilliant advantage: it's remote. So it's perfect for backups.

And that's the heart of a free market - storage is cheap to the cloud providers (they just buy disks, and in bulk at that), but their storage has more value to you than your own storage because of it's remoteness. So they can rent it to you at a markup, and you get a benefit, and everyone is happy. Money flows, the economy grows, and one day we'll get to have affordable space tourism et cetera.

But large, centralised, cloud storage providers are attractive targets for people who want to steal data. They become centralised points of failure; if they go bankrupt, lots of people lose their backups. Therefore, it's smart to do your backups to more than one of them, just in case. But that means setting up your systems to talk to each one's interfaces, arranging payment and agreeing to terms and conditions with them all individually, and so on.

Surely this state of affairs can be improved? With ADVANCED TECHNOLOGY?

Well, I think it can, and here's how.

Imagine a marketplace for cloud storage. This might be a centralised trading server, or it might be a peer-to-peer protocol... greater minds than I are working on decentralised P2P marketplaces, I hope. But however it's implemented, imagine that I can run a daemon on my server that measures my free disk space, subtracts some amount (10GiB?) for my short-term growth, and rents the rest out on the marketplace. By looking at the depth of market (how many unfulfilled bids for how much storage are out there, ordered by bidding price, highest first), it can choose the best price it can rent my storage for that will use up my available storage. My offer will include a price to upload a block (base price + price per byte), the price to keep a block (base price + price per byte, and the billing period) and the price to download a block (base price + price per byte).

It's an interesting question whether periodic storage fees, or just having a "successful download bounty", will win out. Charging storage fees encourages the buyers to notify you if they don't want a block any more, but just charging for successful downloads (and just deleting blocks that aren't referenced on an LRU basis to free up space) is beautifully simple.

The trust model is rather different to normal cloud providers. If a provider loses their data, I can't sue them; I just don't get to pay them the download bounty for getting my block back. So I'll have to store my data widely across several providers, and prices will lower to take account of that, and I'll need to do trial downloads to check my blocks are still available from time to time, and if not, hire a new storage provider to take a new copy of that block from a surviving copy.

But all of this can be done in software. A storage manager app would present a simple get/store block interface to, eg, Ugarit or Tahoe-LAFS, but behind the scenes, it would manage relationships with providers, checking blocks are available, ensuring there's a sufficient number of copies of each, shifting between providers when rates go up or if a provider's reliability score drops too low, etc.

But all of this depends on it being easy for computers to send money between themselves, which is where Bitcion comes in. Storage providers and consumers can just run bitcoin wallets and arrange transfers between themselves.

The end result? I can run a daemon to rent out spare storage space on my system, and money would slowly accrue in a Bitcoin wallet. The daemon would rent out all but a safety margin of my space, and as I used up my safety margin, it would shed blocks (notifying the owner) to make more room, and increase its offer price in the market to reduce demand so that the lower-paying blocks move willingly and can be replaced with higher-paying blocks.

And I can run another daemon as part of my backup system, that would spend from the same bitcoin wallet to get backup space on other machines. When I have mostly empty filesystems, I will be spending little on backups, and earning lots on renting that space out, so money will accumulate... when I start to fill the filesystems up, the trickle will slowly reverse, and then perhaps I should spend my profits on a new hard disk before they all go and I have to top it up from my own Bitcoin wallet!

Details

The devil's in the details, as always. The marketplace will depend on being able to place bids in a standard format. Potential buyers will need to be able to introduce themselves, perhaps via an HTTP-based protocol served by the storage-for-hire daemon on my server; sign up for an account by registering a public key, and then access upload/download/delete block interfaces. The daemon would quote a price in the market, but each block upload would have to be annotated with the rates the buyer is offering, to avoid race conditions when rates change during a transaction. Blocks with unattractive rates can be rejected by the server. There would need to be a back channel for the server to asynchronously notify buyers that it needs to get rid of a block - I'd hate to force buyers to have public IPs (many will be behind NAT) by giving them an HTTP endpoint, but perhaps a choice of that or polling the server to ask for blocks that need to be shifted within a time limit would suffice. It would also be polite for the server to inform the buyer of any blocks it had to delete without notice, rather than waiting for them to check them.

But how to address blocks? On the one hand, I want content-addressed storage, as it prevents cheating. There's no way a bad server can claim to have blocks it's deleted by sending back random junk and saying "But that's what you gave me! PROVE I'M LYING!" if they are identified by hashes. But on the other hand, existing systems have their own addressing schemes (Ugarit identifies block by a keyed hash of their uncompressed plaintext contents, so that the hash doesn't give away the content (it's a keyed hash), but it will also remain unchanged if the compression or encryption algorithms are upgraded - old blocks can still be read while new blocks are written with the new algorithms, and old blocks can be re-compressed and re-encrypted without breaking the references to them). So enforcing that blocks are identified by the SHA256 of their ciphertext would exclude various uses.

The best scheme I can think of is this: each block is identified by a client-supplied ID string combined with a hash based on an agreed algorithm. So the server would say "I support SHA1, SHA256, and Tiger", and the client would say "Ok, here's a block I want to call Boris, and I like SHA256", and the server would reply with "Ok, that block's called Boris:<256-bit hash>". The client should check the returned hash matches the hash it computed itself. A client that's happy with server-assigned IDs would give all their blocks the same name (the empty string), as the hash in the resulting identifier keeps it unique. The server will store the block by hash (deduplicating blocks with the same hash), but keep a per-customer table mapping names to hashes. If the client hasn't provided distinct names, then the LAST mapping for the name provided is kept.

Meanwhile, on retrieval, a block can be requested by name, or by hash. The client should remember the hashes, even if it uses names, so that it can check that the server isn't sending it a garbage block.

As a Ugarit backend, this would work fine; the Ugarit keyed hash can be used as the name, and the server's hash stored for cross-checking on retrieval. If the local store is lost due to disaster, it could either be restored from another backup somehow, or it could just be skipped and we hope that the servers don't lie to us (the latter would be better than refusing to try to restore at all!). Ugarit tags (which are the roots of the hash tree) can be stored by using the tag name as a block name, and using the fact that multiple uploads with the same block name just overwrite the name->hash mapping.

Needless to say, clients should encrypt ALL their data! You can't trust random providers.

Have I missed any other scams? Servers might try to accept lots of blocks and keep the upload fees and never keep them. That provides an incentive to servers to not charge upload fees at all, and just hope to make money on download fees and/or storage. It'll be interesting to see how the market ends up structuring itself! Also, as it's a low risk to accept data from somebody but a high risk to send money, I think the protocol should be based around periodic billing at the end of the period, rather than per-operation micropayments (that makes more efficient use of Bitcoin's transaction charge and hour transaction confirmation latency, too). Billing periods could be anything from a day upwards.

But this is a real cloud, in a sense far beyond the current definition of cloud computing. Millions of tiny providers, all competing in a marketplace, with the clients automatically spreading their risk across them in a fine-grained way. I think that'd work for storage, as it's easy to define and commoditise; doing it for computation might be possible, but it'd require much more standardisation of execution models and sandboxes and the like...

(Thanks to the folks in #bitcoin on Freenode IRC for inspiration for all this!)

UPDATE: A friend suggests an improvement over periodic downloads to check the data is still there. Have a "check" operation where the client supplies a random key and a block name or hash, and the server has to hash the block along with the key and return the result. That allows the client to check the block is still there if it has a way to get a local copy of the block. Otherwise, it would still have to rely on downloading the block and checking the hash matches.

Lords of a new economy

Pondering Bitcoin, I recently opined:

Who sets the difficulty of the puzzle and all that? The computers in the network do - when the system was created, rules were agreed, and written into the software. As everyone runs software following those rules, anybody solving easier puzzles or trying to award themselves more bounty for doing so will have their bounty-claiming transaction rejected as invalid. To loosen the rules, a majority of the computers in the system will all need to accept the new rules - so it will require consensus from the community.

I've been thinking more about this.

What does it really mean to say a transaction is "accepted" in Bitcoin? I want to pay somebody 10BTC for a piece of computer hardware. So my bitcoin client software looks over my bitcoin addresses, and assembles a transaction stating that I take money from a bunch of previous transactions that send money to my addresses and send 10BTC to one addresses, and the change to a newly-minted address of my own. That's signed by my private keys, to prove I own the addresses the money is coming from, and fired off into the Bitcoin network. Which, in practice, means it's broadcast so the whole world sees it.

Now, anybody seeing that can check it's valid, by looking at the global transaction history to see if the source transactions were valid, and that the transaction is signed to prove that I was the owner of the addresses those transactions paid into, and so on. So the recipient knows I'm sending them the money within a few seconds. However, there's a number of frauds I could be committing, including taking advantage of network delays to spend the same money twice - which won't be detected until the second transaction also arrives at the recipient and they realise they've been duped. So there's more to it than just that.

You can consider money to have "cleared" into your address if, and only if, other recipients will accept you transferring that money to them as valid. So if some of your balance is from a dodgy transaction, and you decide to try and spend it anyway, then the new recipient should reject that.

So to build a global standard of "accepted transaction", we have the bitcoin miners. They all run software that checks the validity of transactions, and assembles groups of accepted transactions, and then invests significant effort into demonstrating that they agree with it. This means that an onlooker can tell that the majority of the miners agree on the validity of a transaction if lots of proof of it accumulates. Currently, the standard is that this involves about an hour's total computation from the majority of the miners in the system. If your transaction has withstood that much scrutiny, then it's considered "baked in", and transactions spending the money you received in that transaction will now be considered valid in turn - in other words, you can now spend the money. The miner's reward for providing this service is that they are allowed, subject to certain constraints, to sneak in their own transactions that let them create money from nothing and give it to themselves; and they also get any transaction fees that were provided with the transactions they accept.

So "validity" all boils down to what transactions more than 50% of the miners will accept. If there was a bug in the software that let a transaction which created money from nothing count as valid, then the miners would accept it as valid and it would be baked into global history. And therefore the senders of those transactions could magic money into existence. Not cool.

But if that bug was found, and a new release of the software rushed out, then that loophole would be closed as soon as more than 50% of the miners ran that software. Indeed, if such a bug were found, many miners would probably stop mining - as contributing to devaluing the economy would reduce the value of the bitcoins they can award themselves for doing the mining - until the patch was in place; bitcoin transactions would just hang in limbo until it was ready.

Nonetheless, it seems that a lot of scrutiny needs to be applied to new versions of the rules miners use to validate transactions in case they have loopholes. As they are the rules of the Bitcoin economy. And a LOT of scrutiny needs to be given to the implementation of those rules, which is where it's really easy to let bugs in.

Changing the Rules

For instance, as there's a cap of about 21 million bitcoin in existence, and each bitcoin can be divided into at most a hundred billion little pieces, there's a constraint that you can never spend an amount smaller than a 21*10^17th of the total value of the economy. When we're a universe-spanning colony of post-singularity time-bending superbeings, that might be an issue.

But it could be fixed. Just create a new currency, within the same bitcoin infrastructure (but with its own transaction types for transacting it). And then create a new kind of transaction which splits N bitcoins into N*2^128 "minicoins", and another which does the reverse. You'd extend the "is this transaction valid?" routine to now accept four kinds of transactions: Traditional bitcoin ones, new minicoin ones, and the two directions of exchange transactions. If everyone agreed that was a good idea, and all the implementations implemented it, and everyone installed the new implementations, then after a while, people would start to find their test transactions attempting to split a bitcoin up would be accepted by the global mining community. At which point, the new change would be "in". The changeover to the new implementation should be done at the same time across all the miners if possible (perhaps at an agreed block number), as otherwise, miners running the new rules before a majority of the mining capacity is would sometimes win blocks that don't get accepted by the majority, losing their bounty (and, therefore, having wasted their time).

A similar process could be followed to change the rules for generation transactions; perhaps the bounty rules might be changed so that the system will no longer cap at 21 million BTC, but grow with the size of the economy or something.

Which is very interesting.

Making the change would require consensus amongst the implementation teams (currently, there's only really one, with Freecoin hopefully soon being another) followed by consensus amongst the miners to install the new implementation.

Compare that to the analogous process for national currencies: governments can choose to print more or less money as they see fit, while a tangled combination of laws and banks generally define the rules for transferring funds. The Bitcoin economy is, largely, defined by the miners and what transactions they choose to accept. They know that if they change their rules to unfairly benefit themselves over non-miners, then they can't stop more people joining up and becoming miners too, dissolving their advantage with more competition; and if they ruin the Bitcoin economy, everyone will start a new one with saner miners, and their mined bitcoins wil become worthless. There's little incentive for an individual miner to accept transactions that other miners wouldn't; because if they win a block with such a transaction in and try to claim the bounty, the others would reject their block due to the bad transaction, and they wouldn't get their bounty accepted. It's in nobody's best interests to vary the rules without gaining a consensus between the implementers first, so any change to the rules will necessarily be rather conservative and careful.

But, the bitcoin economy needs to be careful. Don't let any one miner (or mining pool) get too close to 50% of the hashing capacity. And get more competing implementations of the rules in place. Bugs in the system would hit confidence in the economy hard, even if they were fixed rapidly. (Also, rolling out en emergency bug fix would probably be the easiest way for an attacker to try and slip a new bug in with insufficient review).

And... back to politics

It's not often that you get to see anarcho-capitalism and enlightened self interest having such free reign of expression; and it will be interesting to see how it pans out.

Currently, powerful vested interests (largely, big business) have found ways to lobby governments to do things in ways that benefit them. What will happen if bitcoin becomes a significant proportion of the world economy? There will be cries that we can't let the world be run by a bunch of nerds. Perhaps countries will enact rules that bitcoin miners on their soil have to run approved software, and those countries will form an international committee to decide what rules that approved software should run. Or perhaps private bitcoin mining will be made illegal, and nations will set up their own supercomputers to dominate the mining capacity, with their own rules controlling the money supply. I can see that happening; and then anti-money-laundering rules (transactions above a certain amount need to be signed with an X.509 identity or similar?) will be introduced. But that will, at worst, just cause a fork of the chain, as people who want an unregulated economy will just go off on their own separate way with the old rules.

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales