Category: Crypto / security

Cloud Storage

Currently, you can go to various providers and buy online storage capacity (IMHO, rsync.net is best, after research I did to find an offsite backup host for work). It's more expensive than a hard disk in your computer, and miles slower, but it has one brilliant advantage: it's remote. So it's perfect for backups.

And that's the heart of a free market - storage is cheap to the cloud providers (they just buy disks, and in bulk at that), but their storage has more value to you than your own storage because of it's remoteness. So they can rent it to you at a markup, and you get a benefit, and everyone is happy. Money flows, the economy grows, and one day we'll get to have affordable space tourism et cetera.

But large, centralised, cloud storage providers are attractive targets for people who want to steal data. They become centralised points of failure; if they go bankrupt, lots of people lose their backups. Therefore, it's smart to do your backups to more than one of them, just in case. But that means setting up your systems to talk to each one's interfaces, arranging payment and agreeing to terms and conditions with them all individually, and so on.

Surely this state of affairs can be improved? With ADVANCED TECHNOLOGY?

Well, I think it can, and here's how.

Imagine a marketplace for cloud storage. This might be a centralised trading server, or it might be a peer-to-peer protocol... greater minds than I are working on decentralised P2P marketplaces, I hope. But however it's implemented, imagine that I can run a daemon on my server that measures my free disk space, subtracts some amount (10GiB?) for my short-term growth, and rents the rest out on the marketplace. By looking at the depth of market (how many unfulfilled bids for how much storage are out there, ordered by bidding price, highest first), it can choose the best price it can rent my storage for that will use up my available storage. My offer will include a price to upload a block (base price + price per byte), the price to keep a block (base price + price per byte, and the billing period) and the price to download a block (base price + price per byte).

It's an interesting question whether periodic storage fees, or just having a "successful download bounty", will win out. Charging storage fees encourages the buyers to notify you if they don't want a block any more, but just charging for successful downloads (and just deleting blocks that aren't referenced on an LRU basis to free up space) is beautifully simple.

The trust model is rather different to normal cloud providers. If a provider loses their data, I can't sue them; I just don't get to pay them the download bounty for getting my block back. So I'll have to store my data widely across several providers, and prices will lower to take account of that, and I'll need to do trial downloads to check my blocks are still available from time to time, and if not, hire a new storage provider to take a new copy of that block from a surviving copy.

But all of this can be done in software. A storage manager app would present a simple get/store block interface to, eg, Ugarit or Tahoe-LAFS, but behind the scenes, it would manage relationships with providers, checking blocks are available, ensuring there's a sufficient number of copies of each, shifting between providers when rates go up or if a provider's reliability score drops too low, etc.

But all of this depends on it being easy for computers to send money between themselves, which is where Bitcion comes in. Storage providers and consumers can just run bitcoin wallets and arrange transfers between themselves.

The end result? I can run a daemon to rent out spare storage space on my system, and money would slowly accrue in a Bitcoin wallet. The daemon would rent out all but a safety margin of my space, and as I used up my safety margin, it would shed blocks (notifying the owner) to make more room, and increase its offer price in the market to reduce demand so that the lower-paying blocks move willingly and can be replaced with higher-paying blocks.

And I can run another daemon as part of my backup system, that would spend from the same bitcoin wallet to get backup space on other machines. When I have mostly empty filesystems, I will be spending little on backups, and earning lots on renting that space out, so money will accumulate... when I start to fill the filesystems up, the trickle will slowly reverse, and then perhaps I should spend my profits on a new hard disk before they all go and I have to top it up from my own Bitcoin wallet!

Details

The devil's in the details, as always. The marketplace will depend on being able to place bids in a standard format. Potential buyers will need to be able to introduce themselves, perhaps via an HTTP-based protocol served by the storage-for-hire daemon on my server; sign up for an account by registering a public key, and then access upload/download/delete block interfaces. The daemon would quote a price in the market, but each block upload would have to be annotated with the rates the buyer is offering, to avoid race conditions when rates change during a transaction. Blocks with unattractive rates can be rejected by the server. There would need to be a back channel for the server to asynchronously notify buyers that it needs to get rid of a block - I'd hate to force buyers to have public IPs (many will be behind NAT) by giving them an HTTP endpoint, but perhaps a choice of that or polling the server to ask for blocks that need to be shifted within a time limit would suffice. It would also be polite for the server to inform the buyer of any blocks it had to delete without notice, rather than waiting for them to check them.

But how to address blocks? On the one hand, I want content-addressed storage, as it prevents cheating. There's no way a bad server can claim to have blocks it's deleted by sending back random junk and saying "But that's what you gave me! PROVE I'M LYING!" if they are identified by hashes. But on the other hand, existing systems have their own addressing schemes (Ugarit identifies block by a keyed hash of their uncompressed plaintext contents, so that the hash doesn't give away the content (it's a keyed hash), but it will also remain unchanged if the compression or encryption algorithms are upgraded - old blocks can still be read while new blocks are written with the new algorithms, and old blocks can be re-compressed and re-encrypted without breaking the references to them). So enforcing that blocks are identified by the SHA256 of their ciphertext would exclude various uses.

The best scheme I can think of is this: each block is identified by a client-supplied ID string combined with a hash based on an agreed algorithm. So the server would say "I support SHA1, SHA256, and Tiger", and the client would say "Ok, here's a block I want to call Boris, and I like SHA256", and the server would reply with "Ok, that block's called Boris:<256-bit hash>". The client should check the returned hash matches the hash it computed itself. A client that's happy with server-assigned IDs would give all their blocks the same name (the empty string), as the hash in the resulting identifier keeps it unique. The server will store the block by hash (deduplicating blocks with the same hash), but keep a per-customer table mapping names to hashes. If the client hasn't provided distinct names, then the LAST mapping for the name provided is kept.

Meanwhile, on retrieval, a block can be requested by name, or by hash. The client should remember the hashes, even if it uses names, so that it can check that the server isn't sending it a garbage block.

As a Ugarit backend, this would work fine; the Ugarit keyed hash can be used as the name, and the server's hash stored for cross-checking on retrieval. If the local store is lost due to disaster, it could either be restored from another backup somehow, or it could just be skipped and we hope that the servers don't lie to us (the latter would be better than refusing to try to restore at all!). Ugarit tags (which are the roots of the hash tree) can be stored by using the tag name as a block name, and using the fact that multiple uploads with the same block name just overwrite the name->hash mapping.

Needless to say, clients should encrypt ALL their data! You can't trust random providers.

Have I missed any other scams? Servers might try to accept lots of blocks and keep the upload fees and never keep them. That provides an incentive to servers to not charge upload fees at all, and just hope to make money on download fees and/or storage. It'll be interesting to see how the market ends up structuring itself! Also, as it's a low risk to accept data from somebody but a high risk to send money, I think the protocol should be based around periodic billing at the end of the period, rather than per-operation micropayments (that makes more efficient use of Bitcoin's transaction charge and hour transaction confirmation latency, too). Billing periods could be anything from a day upwards.

But this is a real cloud, in a sense far beyond the current definition of cloud computing. Millions of tiny providers, all competing in a marketplace, with the clients automatically spreading their risk across them in a fine-grained way. I think that'd work for storage, as it's easy to define and commoditise; doing it for computation might be possible, but it'd require much more standardisation of execution models and sandboxes and the like...

(Thanks to the folks in #bitcoin on Freenode IRC for inspiration for all this!)

UPDATE: A friend suggests an improvement over periodic downloads to check the data is still there. Have a "check" operation where the client supplies a random key and a block name or hash, and the server has to hash the block along with the key and return the result. That allows the client to check the block is still there if it has a way to get a local copy of the block. Otherwise, it would still have to rely on downloading the block and checking the hash matches.

Lords of a new economy

Pondering Bitcoin, I recently opined:

Who sets the difficulty of the puzzle and all that? The computers in the network do - when the system was created, rules were agreed, and written into the software. As everyone runs software following those rules, anybody solving easier puzzles or trying to award themselves more bounty for doing so will have their bounty-claiming transaction rejected as invalid. To loosen the rules, a majority of the computers in the system will all need to accept the new rules - so it will require consensus from the community.

I've been thinking more about this.

What does it really mean to say a transaction is "accepted" in Bitcoin? I want to pay somebody 10BTC for a piece of computer hardware. So my bitcoin client software looks over my bitcoin addresses, and assembles a transaction stating that I take money from a bunch of previous transactions that send money to my addresses and send 10BTC to one addresses, and the change to a newly-minted address of my own. That's signed by my private keys, to prove I own the addresses the money is coming from, and fired off into the Bitcoin network. Which, in practice, means it's broadcast so the whole world sees it.

Now, anybody seeing that can check it's valid, by looking at the global transaction history to see if the source transactions were valid, and that the transaction is signed to prove that I was the owner of the addresses those transactions paid into, and so on. So the recipient knows I'm sending them the money within a few seconds. However, there's a number of frauds I could be committing, including taking advantage of network delays to spend the same money twice - which won't be detected until the second transaction also arrives at the recipient and they realise they've been duped. So there's more to it than just that.

You can consider money to have "cleared" into your address if, and only if, other recipients will accept you transferring that money to them as valid. So if some of your balance is from a dodgy transaction, and you decide to try and spend it anyway, then the new recipient should reject that.

So to build a global standard of "accepted transaction", we have the bitcoin miners. They all run software that checks the validity of transactions, and assembles groups of accepted transactions, and then invests significant effort into demonstrating that they agree with it. This means that an onlooker can tell that the majority of the miners agree on the validity of a transaction if lots of proof of it accumulates. Currently, the standard is that this involves about an hour's total computation from the majority of the miners in the system. If your transaction has withstood that much scrutiny, then it's considered "baked in", and transactions spending the money you received in that transaction will now be considered valid in turn - in other words, you can now spend the money. The miner's reward for providing this service is that they are allowed, subject to certain constraints, to sneak in their own transactions that let them create money from nothing and give it to themselves; and they also get any transaction fees that were provided with the transactions they accept.

So "validity" all boils down to what transactions more than 50% of the miners will accept. If there was a bug in the software that let a transaction which created money from nothing count as valid, then the miners would accept it as valid and it would be baked into global history. And therefore the senders of those transactions could magic money into existence. Not cool.

But if that bug was found, and a new release of the software rushed out, then that loophole would be closed as soon as more than 50% of the miners ran that software. Indeed, if such a bug were found, many miners would probably stop mining - as contributing to devaluing the economy would reduce the value of the bitcoins they can award themselves for doing the mining - until the patch was in place; bitcoin transactions would just hang in limbo until it was ready.

Nonetheless, it seems that a lot of scrutiny needs to be applied to new versions of the rules miners use to validate transactions in case they have loopholes. As they are the rules of the Bitcoin economy. And a LOT of scrutiny needs to be given to the implementation of those rules, which is where it's really easy to let bugs in.

Changing the Rules

For instance, as there's a cap of about 21 million bitcoin in existence, and each bitcoin can be divided into at most a hundred billion little pieces, there's a constraint that you can never spend an amount smaller than a 21*10^17th of the total value of the economy. When we're a universe-spanning colony of post-singularity time-bending superbeings, that might be an issue.

But it could be fixed. Just create a new currency, within the same bitcoin infrastructure (but with its own transaction types for transacting it). And then create a new kind of transaction which splits N bitcoins into N*2^128 "minicoins", and another which does the reverse. You'd extend the "is this transaction valid?" routine to now accept four kinds of transactions: Traditional bitcoin ones, new minicoin ones, and the two directions of exchange transactions. If everyone agreed that was a good idea, and all the implementations implemented it, and everyone installed the new implementations, then after a while, people would start to find their test transactions attempting to split a bitcoin up would be accepted by the global mining community. At which point, the new change would be "in". The changeover to the new implementation should be done at the same time across all the miners if possible (perhaps at an agreed block number), as otherwise, miners running the new rules before a majority of the mining capacity is would sometimes win blocks that don't get accepted by the majority, losing their bounty (and, therefore, having wasted their time).

A similar process could be followed to change the rules for generation transactions; perhaps the bounty rules might be changed so that the system will no longer cap at 21 million BTC, but grow with the size of the economy or something.

Which is very interesting.

Making the change would require consensus amongst the implementation teams (currently, there's only really one, with Freecoin hopefully soon being another) followed by consensus amongst the miners to install the new implementation.

Compare that to the analogous process for national currencies: governments can choose to print more or less money as they see fit, while a tangled combination of laws and banks generally define the rules for transferring funds. The Bitcoin economy is, largely, defined by the miners and what transactions they choose to accept. They know that if they change their rules to unfairly benefit themselves over non-miners, then they can't stop more people joining up and becoming miners too, dissolving their advantage with more competition; and if they ruin the Bitcoin economy, everyone will start a new one with saner miners, and their mined bitcoins wil become worthless. There's little incentive for an individual miner to accept transactions that other miners wouldn't; because if they win a block with such a transaction in and try to claim the bounty, the others would reject their block due to the bad transaction, and they wouldn't get their bounty accepted. It's in nobody's best interests to vary the rules without gaining a consensus between the implementers first, so any change to the rules will necessarily be rather conservative and careful.

But, the bitcoin economy needs to be careful. Don't let any one miner (or mining pool) get too close to 50% of the hashing capacity. And get more competing implementations of the rules in place. Bugs in the system would hit confidence in the economy hard, even if they were fixed rapidly. (Also, rolling out en emergency bug fix would probably be the easiest way for an attacker to try and slip a new bug in with insufficient review).

And... back to politics

It's not often that you get to see anarcho-capitalism and enlightened self interest having such free reign of expression; and it will be interesting to see how it pans out.

Currently, powerful vested interests (largely, big business) have found ways to lobby governments to do things in ways that benefit them. What will happen if bitcoin becomes a significant proportion of the world economy? There will be cries that we can't let the world be run by a bunch of nerds. Perhaps countries will enact rules that bitcoin miners on their soil have to run approved software, and those countries will form an international committee to decide what rules that approved software should run. Or perhaps private bitcoin mining will be made illegal, and nations will set up their own supercomputers to dominate the mining capacity, with their own rules controlling the money supply. I can see that happening; and then anti-money-laundering rules (transactions above a certain amount need to be signed with an X.509 identity or similar?) will be introduced. But that will, at worst, just cause a fork of the chain, as people who want an unregulated economy will just go off on their own separate way with the old rules.

Bitcoin security

I've been learning about Bitcoin lately.

It's an electronic currency. I've seen electronic currency before - in the late 90s there were efforts to create them based on virtual banks issuing coins. The coins were basically long random serial numbers which, along with a statement of the value of the coin, were then signed by the bank. The public key of the bank is published, so people can check they're valid coins issued by the bank. The idea was that rather than withdrawing a bunch of notes from the bank, you can ask the bank to mint you a bunch of these signed numbers instead; and anyone who sees them can check their value, and eventually, return them to the bank (which can also check their value in the same way) to get their account credited.

This simple approach has two problems: the coins can be traced by their unique serial number (even more conveniently than the serial numbers on notes, and about as conveniently as card transactions and inter-bank transfers already can), and that it's hard to detect somebody spending the same coin twice - as it's just a number, you can make as many copies as you like. Various elaborate cryptographic techniques were proposed to avoid this, with the person withdrawing from the bank choosing the random numbers and letting the bank "blind-sign" them without knowing them, people spending the coins having to hand over a recipient-chosen random set of bits from a secret number such that if the same coin is spent with two different recipients enough bits are revealed to identify the double-spender, and so on...

These things just complicate the process of transferring funds, in ways that make it harder and harder to trust the security. And it leaves a currency that relies on central banks to issue it (which can be exploited by determined and/or powerful attackers).

So, enter bitcoin. I won't bore you with the deep technical details (see the paper for that), but the basic idea is this: I have a pool of bitcoin addresses, which are just public+private key pairs - the well-studied basis of cryptographic digital identity. Other people can send money to those identities by issuing transactions, signed by an identity that has enough money, specifying the hash of my public key (my address, that I publish) and an amount to transfer. For this transaction to be valid, there has to be enough money in the source address - so trying to spend the same money twice means the transaction is not valid. The money assigned to any one address can be traced back through the transactions to an event that first created some money (more on that in a moment).

Now, how do people know if a transaction is valid? Because when I issue a transaction, it gets broadcast into the network. And all the other nodes in the network check their copy of all the transactions that have ever happened to see if it matches the rules. If so, they accept it - and demonstrate this fact by competing with each other to solve a Hard Maths Puzzled based on my transaction. The computer that does this first then receives a fixed bonus, which creates new money; and it also receives any optional "transaction fee" I put in my transaction, encouraging computers to pay attention to my transaction first.

That's really clever. My transaction is vouched for by other computers - ones I do not control - vouching that it meets the rules by spending their time competing to solve the puzzle and get the bounty. Claiming the bounty is a transaction much like any other, creating money from nothing and sending it to an address; other computers won't accept it unless the rules are kept (meaning there's no incentive for a computer to try and solve the puzzle for an invalid transaction, as other computers won't accept it and give them the bounty).

And the difficulty of the puzzle that needs to be solved, and the maximum bounty that can be claimed for solving it, changes with time. The difficulty is adjusted based on how quickly previous puzzles were solved, and the amount of bounty with the amount of money in circulation, so even as more and more computers join the system, the average time before a transaction is wholly accepted by the system remains about the same (about one hour) and the total amount of money in circulation will slowly rise for a while, then remain roughly constant (the bounties will get smaller and smaller until, eventually, transaction fees are the main motivation for trying to solve the puzzle).

Who sets the difficulty of the puzzle and all that? The computers in the network do - when the system was created, rules were agreed, and written into the software. As everyone runs software following those rules, anybody solving easier puzzles or trying to award themselves more bounty for doing so will have their bounty-claiming transaction rejected as invalid. To loosen the rules, a majority of the computers in the system will all need to accept the new rules - so it will require consensus from the community.

Bitcoins started off being worthless (so the original "miners" setting their computers to solve the puzzles made lots of them and hoarded them), but over the past months, they've started gaining real cash value. As I write this, they're about $5 each, and people are racing to build supercomputers to solve the puzzles faster and faster so they get a bigger share of the approximately 300 an hour that currently get generated as bounties. The recent meteoritic rise suggests a speculative bubble, which will burst some day - the ten I bought for £2.20 each yesterday are worth about £2.80 each today.

But the recent public attention (Forbes article, This Week in Startups interview) has caused people to start raising questions. Is this going to encourage money laundering, tax evasion, buying and selling illegal goods and services? Will it be stomped down on by governments?

I have a few thoughts on the matter.

Read more »

The UK MoD Manual of Security has been leaked

The UK MoD Manual of Security has appeared on WikiLeaks.

I'm not certain this is a good thing, to be honest... the intelligence services are renowned for overstepping their mark, and I'm sure the sections on dealing with investigative journalists and the like will be useful to those who fight against that kind of thing, but I suspect the bits about dealing with foreign intelligence agencies would probably have best been kept secret. Still, the cat is out of the bag, so perhaps it's no bad thing if the MoD are forced to have a total security audit and overhaul their manual :-)

I've not managed to download it - WikiLeaks servers seem to be rather busy - but the front page does have some interesting snippets from the sections about visitors to China and Russia, discussing the kinds of things the local intelligence agencies do to try and extract Western commercial and military secrets.

This has some interesting bearing on the growing tendency to outsource software development tasks to developing countries. I know a lot of this work does go to China, and so we can probably assume that any intellectual property made available to developers in China is probably scrutinised by their security services and passed on to Chinese companies that may be able to benefit from it.

In the depths of my career history, I once worked on a software system that was to be used in a Government project to protect the nation's "critical national infrastructure"; and I gather that another part of the system was outsourced to an Indian development team. I'm not sure if the client was actually made aware of this, but at the time, I felt concerned that national security might be threatened by this.

n2n revisited

I have spoken before about n2n, the peer-to-peer VPN tool that makes it easy to create efficient virtual networks.

Normal VPN products are really more of a "virtual private cable" than a "virtual private network" - they just establish a point-to-point link over the Internet, requiring a login to set it up and encrypting the traffic. This means you can have a virtual connection to a real private network somewhere; and if a few people connect into that network via VPN links, then there really is a virtual private network between you all, but all going through a central point where all your links meet.

While with n2n, everyone connects to a shared "supernode" that keeps a list of who is connected to the VPN, and from where; then when you want to connect to somebody else, you use the list from the supernode to establish a direct encrypted connection between yourself and them, rather than going through any central point. So it's an actual virtual network out of the box. You can even have more than one supernode running, so that any one can fail; all the supernode does is to provide the directory service.

Also, you don't need to maintain a database of user logins; a supernode can carry any number of virtual networks. When you connect to the supernode, you just tell it the name of the community you want to join, and it will share your connection details with anybody else in the same community - you can make communities up on the fly rather than needing to maintain a central list. Access control is handled by the simple fact that you need to know the correct encryption key for the community you want to join, or your messages will be received garbled by everyone else, and ignored.

Anyway, for a long time, I wanted to get into n2n, but I couldn't as it didn't compile out of the box on NetBSD; but a desire for a better VPN solution at work has led to me getting it working. It wasn't that much work, in the end, as the existing FreeBSD support already had a BSD approach to things.

n2n is distributed via Subversion, so they don't have version tarballs - this is a problem for my NetBSD port. So I decided to mirror it into git with git svn, then forked it as "Kitten n2n", made my NetBSD port, tagged a release, pushed it to github, uploaded a tarball from that tag, and then made a NetBSD package of net/kitten-n2n.

I'll tinker with it for a few more days, then I'll submit it to the NetBSD folks for consideration.

I'll keep pulling in from the official n2n Subversion repo, to pull down patches, and I'll see if they'd like my patches pushed up - as well as NetBSD support, there's a few things I'd like to fix as well (I've spotted passing an integer through a void* by casting, which is slightly dodgy practice and produces warnings on my 64-bit machine, but is easily fixed by passing a pointer to a heap-allocated copy of the integer!)

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales