About blockchians' privacy

in #bitcoin3 years ago

In last weeks, among the community resonates one interesting event - FBI seizing ransom bitcoins paid by the Colonial Pipeline Co. to DarkSide. The known facts are: the funds have been moved over at least six addresses, FBI seized DarkSide’s servers and infrastructure, and FBI claims to have the private key(s) to the final wallets. It remains unclear, how FBI traced the location of the servers, and to what extend the DarkSide itself has been compromised. However, there is bunch of speculations, from backdoors in the Bitcoin protocol, through cooperation with crypto exchange to having an inside man.

On the other hand, there have been also disturbing statements about privacy of the Bitcoin protocol, e.g. “We’ve effectively developed a map of hundreds of millions of Bitcoin addresses associated with illicit actors all around the world,” by Elliptic, and “FBI officials say the techniques they used to recover some of Colonial’s funds can be used in future cases, including when hackers attempt to transfer cryptocurrency through unfriendly overseas jurisdictions. ‘Overseas is not an issue for this technique,’ said Mr. Chan of the FBI’s San Francisco field office.” (How the FBI Got Colonial Pipeline’s Ransom Money Back, The Wall Street Journal, 11. June 2021).

So one can really ask a question: Is Bitcoin protocol private?
In this article, I try to answer this question by discussing a potential attack vector on the broadcasting protocols used in most blockchain protocols.

TLDR

Whisper protocol

Most of you probably know the theory about blockchains - the transactions are grouped into the blocks, which are created by the miners (stakers, witnesses, ... - what ever terminology your favorite blockchain use). The miners create the blocks by arranging a subset of transactions from a set of known, yet unconfirmed transactions, called “mempool”. The blocks are ordered in a partial ordering manner, where each blocks has a know predecessor, all secured with strong cryptography, making it almost impossible to forge an already mined block and e.g. change a committed transaction. So far so good.

However, did you think about, how the miners get known about your newly created transaction? That is the task of another protocol, sometimes called “whisper” protocol - and for the shake of simplicity, I will use this name further. The whisper protocol is, among other things, responsible for broadcasting the new transactions among the nodes in the blockchain network.

In the blockchain network, every node is connected to a bunch of other nodes. This number is usually relatively low compared to the numbers of all nodes in the network. One node is usually connected to maximum up to tens of other nodes, while the overall network size can be hundreds, if not thousands, of nodes. As soon as a node learns about a new transaction (which might be generated for example by a connected wallet, or another peer), it let it’s connected peers to know about them as well. Assuming the network is a connected graph, in a finite time, every other node in the network will learn about this new transaction. For those familiar with the graph theory - yes, it is a practical, decentralized implementation of the breadth-first search algorithm.

Another feature of most whisper protocols is that they allows connected peer nodes discovery. When a new node connects to another node, the later can be asked for its known peers. This, at least in theory, allows full network traversal. Of course, some nodes are behind firewalls or NATs, and one cannot connect to them directly.

Potential privacy attack

The not-so-obvious disadvantage of this protocol is the fact, that various peers learn about the new transaction in different times, depending on its distance from the originating node. This opens a potential attack based on timing of the distribution of the transactions.

Assume, imagine an attacker, interested in finding the originator of a particular transaction(s). All they need is to deploy a bunch of nodes, randomly connected to other network nodes in a way, that every other node in the network is not too far from at least one of the attacker nodes. More formally, the condition shall hold:
min⁡({d(N_a, N) | N_a ∈ A}< x
for every node N, where x is a sufficiently small number (like 2 or 3), d(N_1, N_2) is a distance between two nodes, and A is a set of attackers nodes. The math calculating size of A to probabilistically reach the required property is omitted.

The attacker’s nodes are then modified in a way, that they record for every transaction received over the whisper protocol the IP address of the peer and exact timestamps. Of course, the nodes needs to be synced against very precise time source, as timestamps are crucial.

Anytime a transaction that is interesting for the attacker appears, the attacker can analyze logs from their nodes, to figure out where it appeared for the very first time, and which node(s) sent it. This way, the attacker can trace the transaction to the source. It might be lucky and the originator’s node is attached directly to one of the attacker’s nodes. It might also be connected indirectly through one hop (intermediary peers). In such a case, the attacker will recognize the transaction coming from two or more intermediary nodes. The likely originator is among the intersection of the sets of peers connected to the intermediary peers. Similar analysis can be done also in the case the distance between originator and attackers nodes is at least 2, loosing accuracy with every increment in x.

This way, the origin of a transaction can be discovered in a very short time (almost instantly). Would even more transactions be generated at the same origin, it gives the attacker a clear signal of a connected hot wallet. Knowing the IP address, it is just a question of time till the law enforcements will ring on the door bell.

Conclusion

We can discuss about the costs of performing such attacks - especially, when the attacker has to run all the nodes 24x7. One can easily see that it would require hundreds of nodes in case of bitcoin. With current Bitcoin blockchain size the costs of corresponding VMs can be costly, however, that the attacker does not need to run full node. Certain possible optimizations are: sharing old blockchain data between nodes, overwriting limits to connect to higher number of peers as usual, skipping full validation, etc. Under such assumption, the attack became feasible not only for law enforcements agencies with deep pockets, but also to medium-sized companies.

Considering the statements from Elliptic, it is likely that such attackers are already active on the network, and the past transactions are probably already assigned to an IP address. That also means, that the attackers are capable to link seemingly unassociated addresses to one wallet.

Using public services like infura or web wallets won’t really help - the service provider is most likely logging the access to its services.

The conclusion is clear: until you really understand how every single piece of the protocol works, don’t assume privacy for granted - it most likely is not. When privacy is an issue, think about additional counter-measures.