Bitcoin pseudonymity (by )

As I write, there is still uncertainty about just how private the pseudonymity of Bitcoin really is.

I can't say I have an answer myself, but I can explain the complex issues involved and make a few predictions!

The pseudonymity of Bitcoin

One of the interesting things about Bitcoin is that all transactions are recorded in a big, shared, publicly-readable ledger called the blockchain. Every single transaction is open to public scrutiny.

However, the information about a transaction that goes into the blockchain isn't very illuminating on its own. It typically looks something like this - the important part of which I will quote here:

  • Input: 19HeKHeseFXwhHkiYxnEUNyFESmZGrf8hp - 0.01178727 BTC
  • Output: 1DGGoYKsCLt3axwP2iJehhLpJFFYNzgqNo - 0.00028727 BTC
  • Output: 1KLEu7t8JN3r4jMPs4p3xZFz7JRDrP94zD - 0.011 BTC

What happened here? This shows that 0.01178727 BTC came from the address 19HeKHeseFXwhHkiYxnEUNyFESmZGrf8hp; of that, 0.00028727 BTC went to 1DGGoYKsCLt3axwP2iJehhLpJFFYNzgqNo and 0.011 BTC went to 1KLEu7t8JN3r4jMPs4p3xZFz7JRDrP94zD (and the remainder is left for the miner who validates the transaction to keep). Those things are Bitcoin addresses; if you have a Bitcoin address, then you can send bitcoins to it. Any given Bitcoin wallet will have a number of addresses, and each address will have a certain amount of bitcoins that have been sent to them in the past and not yet spent.

What that tells you is that at some point in the past, somebody sent 0.01178727 BTC to 19HeKHeseFXwhHkiYxnEUNyFESmZGrf8hp, and that money sat "in the account" as an "unspent transaction". When the owner of that address (which we shall abbreviate as "19" from now on, and the other two addresses "1K" and "1D" similarly) wanted to spend some money (say, 0.011BTC) to "1K", their Bitcoin client looked at the available unspent transactions to the addresses they control, and decided to use that particular 0.01178727 BTC unspent transaction. As only 0.011BTC needs to be sent, the software creates a transaction with the original 0.01178727 transaction as the "input", and two outputs - one sending 0.011BTC to "1K" as requested, and then another sending the "change" back to another address in the wallet: "1D". The Bitcoin protocol only lets you spend an amount that was previously sent to you in an "unspent transaction", so these change outputs in transactions are necessary to make it possible to spend arbitrary amounts.

If the amount you wanted to send was larger than any one unspent transaction in your wallet, it could have combined many (from different addresses, if required) inputs to make up an amount larger than the desired output; then sent some to the destination address and some back to the owner as change.

Now, there's nothing in the transaction saying which outputs are spending to a third party and which are change back to yourself; I have guessed in the above example, based on the fact that one output is 0.011BTC, which seems an amount one might want to send to somebody, while the other output looks like it's just the difference between that and the input amount (minus a transaction fee for the miner). Based on a statistical model in my head of the amounts humans are likely to choose to spend, I have been able to make a prediction.

So who owns which addresses?

The recommended practice for making a Bitcoin transaction is that the recipient should make a fresh address just for that transaction, and send it to the payer; once they have received the funds, they never use that address again.

Therefore, the standard transaction pattern for an address is that some transaction sends money into it, then that unspent transaction hangs around for some time, and then a transaction sends the money out to two or more other addresses; one will be an address created for change and owned by the owner of the original address, while the other will be owned by some third party.

There's plenty of other patterns that exist, though. Somebody who wants to pay a lot of people (perhaps a company's monthly payroll run) can pay them all in one transaction rather than issuing lots of smaller ones, by having lots of outputs in a single transaction; spotting the change address might be harder if it's one out of a hundred rather than one out of two. Somebody wanting to "empty" an address into another, or spending an amount that happens to be identical to an amount that came in, might issue a transaction with only one output as no change is required. A public donation address may have lots of input transactions, as may one that is re-used - I might give my employer an address for my monthly pay to go into each pay day, rather than making a new one every time.

Now, imagine a case in point. Let's say that I'm attempting to fund a terrorist organisation. The police, suspecting I may be involved in sending money to them, are trying to trace my Bitcoin cash flows; by leaning on my employer, they find out my monthly pay address, and start looking in the blockchain.

They see one of my monthly pay day transactions is spent by a transaction that transfers 1BTC to another address, which we shall call "A" (and the remainder, presumably change, back to another: "B"). "A" is a single-use address, as the entire balance is spent by a single transaction; 0.9BTC goes to "X", a known terrorist group donation address published on their web site, and the remaining 0.09 BTC (there's a 0.01BTC transaction fee) goes to another address, "C".

When they tie me to a chair and threaten me with lasers, I tell them that I don't know anything about the organisation or their address "X"; I say I think I spent 1BTC buying a second-hand laptop on Bitmit. I proclaim my innocence, and suggest that the person I bought the laptop from, the holder of address "A" must have made the donation.

But am I telling the truth?

Laundering Bitcoins

Clearly, if the payment to "X" had come directly from a transaction spend money that was given to me, that would be a smoking gun; unless I could claim that my wallet file (full of the private keys required to spend money from my addresses) had been copied by a thief, it must have been me spending it.

Bitcoins that have been sent to an address that can be traced to me (because I've published it, or given it to an organisation that knows some of my personal details and said "Use this to send me money") can be considered, in the blockchain, to be publicly traceable to me. For instance, Wikileaks gives a donation address on their Web site, and the blockchain shows a history of payments to and from that address. I publish a "vanity address" of 1ALAricQjEj5ErpDCLDZWYHGv15jMq1gEM on my web site in the hope of attracting tips for my open-source and voluntary work, so money sent there is clearly "sent to me" in somebody's eyes, and definitely under my control.

But when I spend money to a single-use address, unless the communications between me and the recipient where they send me the address are intercepted, or the recipient comes forward, there's nothing in the blockchain saying who owns that address. If I want to spend some of my "tainted" money on something I want to be able to deny, all I need to do is to transfer some of my money to different addresses that aren't linked to me (but are, nonetheless, controlled by me), then spend it onwards. Of course, I can make that chain arbitrarily long, bouncing the money around for some time before spending it; it costs me in transaction fees, and the time spent organising it all, but that's the cost of "laundering" my money so that it's in a state where I still control it (by owning the addresses) but it's not easily linked to me.

Tracing bitcoins

So what are law enforcement folks to do? I think the answer lies in forming statistical models.

Say we can trace some money - 10BTC - to our suspect through a known transaction, through any of the means above, and we want to see if any of it goes to our terrorist donation address, "X". If the money goes straight out to "X" from the original transaction, then we have a smoking gun.

Now, if our original 10BTC is spent by sending 9BTC to some address "A" and 0.99BTC to some address "B" with a 0.01BTC fee, and then we see money going from "B" to "X", what does that tell us? It's quite likely that the 9BTC to address "A" was a transaction made by our suspect, and the 0.99BTC to "B" was their change from that transaction, just by looking at the "neatness" of the numbers. In which case, "B" is probably still owned by our suspect, so the spending of money from "B" to "X" is suspicious.

But we still can't rule out that the suspect spent 0.99BTC to a third party for some service, in which case "B" is that third party and "A" is the suspect's change address.

And knowing this, a good money laundering system might build a model of likely spending amounts, and be sure to create a chain of transactions containing a likely-seeming amount to one address and the remainder to another; or, if it isn't too bothered about looking like a money laundry and just wants to make it hard to follow the money, each incoming transaction to be laundered can be split into ten transactions, each of completely evenly distributed random amounts, so that none of them look promisingly like change addresses and all must be followed. Alternating those "obvious obfuscators" with legitimate-seeming transactions would create a complex web of transactions to follow with no easy way to tell when the money actually changes hands.

But if there's a source transaction we are starting from, that gets spent to a vast branching tree of never-before-seen addresses - and then all that money is gathered together again and send to "X", then the suspect will have to explain how every Satoshi of the money we traced to them went to a mixture of transaction fees and "X". It doesn't matter how many different addresses you split your money between if they all end up at the same destination.

So a decent money laundering application will have to work at a slightly higher level than "send this money from me to X, untraceably"; it will need to work a bit more like a bank account. You put money in from time to time (to addresses it gives you), giving it capital to churn around between its pool of addresses; and when you ask it to send a given amount to a given target address, it can then slowly send it in little amounts here and there from the network of addresses it's managing. And to prevent the entire pool of money from being easily identified due to the way its only interactions with the "known bitcoin economy" being you putting money in and the incriminating payments to "X", you'll need to also use it to make occasional donations to known legitimate public bitcoin addresses, to make small purchases that aren't particularly traceable to you, and so on.

Transaction timings are crucial, too. If some money is sent to our suspect, and then a chain of a hundred transactions later, that money ends up sent to "X" - but that chain of a hundred transactions is issued in a single ten-minute burst, then that's pretty suspicious; it's much more likely that the suspect has tried to cover their tracks with a series of transactions than that a long chain of merchants that accept zero-confirmation transactions have very rapidly moved the money around the economy to "X". The legitimate Bitcoin economy has a certain statistical distribution of the time taken for transactions to be spent, and if the laundering software doesn't mimic that distribution (and the combined delay+amount distribution, as well) perfectly, then its laundering traffic can be distinguished from legitimate traffic in the long run.

So what's the normal history of a bitcoin transaction? If somebody sends me 1BTC and I spend it in a number of transactions, some of them will be to known public addresses and some to single-use addresses; and that money will spread out, combing with other money from other transactions. Some of it will end up going to known public addresses, some to known suspect addresses such as our fictional "X", and some will languish in the addresses of long-term savers or people who have lost their private keys.

So anyone wanting to write a completely undetectable laundry system will need to make sure the transactions it generates look exactly like that; the original transaction it is trying to protect from being traced to "X" must not have any statistical properties in the blockchain that distinguish it from other, randomly-chosen, transactions. That will be pretty tough.

So is Bitcoin's privacy weak or not?

I would say "it depends". Large criminal syndicates wanting to launder thousands of bitcoins will have a harder time doing so, as that amount will be harder to hide; it's a strong signal amongst the noise. To evade detection, the will need to have longer chains of transactions (increasing the time between their money arriving and them being able to spend it as they wish), and will need to transfer a significant fraction of it to known public addresses. Also, they will need to be careful where they run their money laundry software - if all the transactions originate from an Internet connection known to be controlled by them and is wiretapped, the game will be up. They'll need a sophisticated piece of laundry software running on a distributed network of computers, with careful choices as to which computer to issue transactions from in order to avoid any statistical bias there, as well.

Perhaps they could use botnets and other stolen computer time, but they'll need to be careful - each computer forming part of the laundry network needs to at some point know the private keys to some of the addresses involved in order to issue transactions from those addresses, so if they are discovered, the discoverer can "steal" the balance of those addresses. Botnets will probably form a part of the laundering network, but with a cap on how much money can be in addresses whose private keys have been sent to untrusted nodes at any one time. One way around it would be to never give private keys to the botnet nodes, and merely ask them to relay on pre-prepared transactions made on trusted hardware, but that makes the botnet much more centralised - and meaning that once the botnet is found and control traffic from the trusted hardware monitored, then any transactions it makes will instantly label the money being transferred as part of the laundry.

Meanwhile, small-scale legitimate money laundring is quite easy. If you are a closeted gay teen in Iran and want to donate money to a charity supporting people in your situation, but are afraid that your government is tracing money sent to their addresses, you have a number of options. As your payment is small compared to the amount you are spending without fear (small enough that you're not having to explain what's happened to such a large proportion of your money should the police come to your door, by definition), it is easy to "hide" it amongst a sea of change-address-scale transactions, and you can afford to entrust it to online anonymous spending services that operate a shared laundry, accepting money from a load of people and spending money to their chosen outputs without any clear link between them in the blockchain; money can only be hidden in such a system if it's a small fraction of the total flow, which is practical for a few bitcoins here and there but not for thousands per day. Shared laundries have many advantages - you put some money into an address, which spends it on something; meanwhile, at some point in the future, a totally unrelated transaction is spent to your target. There need not be any link in the blockchain between you putting money in and that money coming out on your behalf. You run some risk that the money you put in to donate to an LGBT rights charity might end up being used to make somebody else's donation to the same or another incriminating address, so the laundry will still need to do a certain amount of money mixing internally, but the laundry has much more flexibility; especially if it used for all sorts of innocent transactions as well, by well-meaning people who want to make life easier for LGBT people in dangerous situations.

Shared laundries have their own risks, though - as they are run by a third party, rather than your own software on your own hardware, you run the risk that they are monitored or even fronts for law enforcement agencies; this can be mitigated, if it seems a danger, by combining them with other laundry techniques; third-party shared laundry systems can just be another type of transaction used to move money within a laundry network. An individual wanting to hide their money from mass surveillance can split it into a few shares and send each through a different third-party laundry to a separate wallet (using different addresses, so it's not clear they are all in one wallet), which they use for a mixture of non-worrying (but anonymous) and worrying transactions, with a few random transfers-to-self to mix it all up a bit. I think we'll start to see wallets with improved privacy features, too; a wallet that combines money received from multiple addresses to make up an output amount correlates those input addresses together, and it would be possible to avoid that by juggling the money around between addresses internally somewhat, as a kind of mini-laundry in its own right. Similarly, rather than making it so obvious which is the change address, a wallet told to spend 3BTC from an input transaction with value 11.5BTC could send out the 3BTC but generate two or three change addresses with values such as 6BTC, 2BTC and 0.49BTC (leaving a 0.01BTC tip for the miner) rather than a single one, making it less clear what is change and what is not.

Conclusions

And so I think we'll see a bit of an arms race. Laundry systems, be they ones used by organised crime or public shared ones, will be attempting to develop random transaction streams that are harder and harder to reliably distinguish from the economy as a whole. They'll need to control the structure of the transactions themselves, their timings, and the locations through which they enter the Bitcoin network. Meanwhile, law enforcement will be trying to fine-tune their own statistical models to try and remain one step ahead, and deploying mass surveillance to try and trace down the origin nodes of bitcoin transactions.

Tracing the proceeds of a well-organised money laundry will be hard work, especially if the laundry also uses traditional techniques such as actually transferring the money through other commodities and currencies in addition to using bitcoins. The laundries can be made harder and harder to trace, by spending more and more of the money that goes through them; and their owners will need to decide how much of a laundry cost they can take, compared to the risk of being caught. It's classic game theory. How much is it worth the law enforcement agencies spending on statisticians, powerful computers, and bitcoin network monitoring equipment and bandwidth, as opposed to using other means to catch criminals? How much is it worth the criminals going to great lengths to hide their bitcoins, as opposed to using traditional techniques such as suitcases full of cash and gold? I think that, by offering another option to their arsenal, Bitcoin will probably slightly reduce the cost of money laundering somewhat, while also adding new risks of being caught in novel ways. There's something very definite about the risks involved in carrying gold bars across borders in a speedboat then hiding it in a hole in the ground, compared to the dangers that a single transaction from a known IP address will nail your carefully-laundered bitcoins back to you.

Meanwhile, I think the prospects are pretty rosy for indviiduals wanting to sneak a handle of bitcoins around! But I'd encourage people to help the cause of people who need small-scale anonymity by putting some of their innocent, legal, and shameless BTC spends through shared laundries, in order to create more traffic volume to hide the ones that need to be hidden!

No Comments

No comments yet.

RSS feed for comments on this post.

Leave a comment

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales