Quick tor analysis
The other month I was thinking about how secure tor really is. It is already common knowledge that tor doesn’t provide encryption from your machine to the destination machine, so if you don’t use a secure protocol (https, ssh, etc.) then a rouge exit node could sniff your traffic. Rather than repeat a variation on the whole “how can a rouge exit node sniff traffic” I started thinking about “what could a big organisation do to remove the anonymity provided by tor?”
Simulation 1 - A start of the investigation
If an organisation controls both the entrance node and the exit node then it can effectively remove any anonymity from the circuit. First step was to put a Perl script together that would extract the current tor directory (effectively a list of known nodes in the tor network and their statuses) from a locally running tor node. I then used this extracted directory as the base data for my crude simulator, also written in Perl. The simulator would simulate 10,000 circuits per generation and would run for 300 generations.
For the first 100 generations the number of entrance and exit nodes controlled by organisation A would increase by 1 per generation. For the next 100 generations entrance and exit nodes controlled by organisation B would increase by 1 and for the final 100 generations the number of entrance and exit nodes controlled by organisation C would increase by 1.
I repeated the simulation using tor directories exported over a 13 day period. At the end I produced an average set of results from the 13 different simulations and produced the graph shown in figure 1. As can easily be seen the number of uncontrolled circuits quickly decline as organisation A starts to add it’s own nodes to the network. After they have added 100 nodes to the network they would be controlling over 70% of the circuits.
Organisation A could only maintain their level of control while they were the only organisation. As soon as organisation B started trying to control circuits there was a quick decline of circuits controlled by organisation A. When organisation C started trying to control circuits both organisation A and B’s level of control declined. As each organisation’s level of control declined the number of uncontrolled circuits (i.e those that users would remain anonymous using) increased.
The only thing this simulation hadn’t really covered was the concept of guard nodes (where a client will select its entrance nodes from a small subset of available nodes). So I decided to add guard nodes into the simulation.
Simulation 2 - With guard nodes
The changes over the first simulation made was that the the 10,000 runs would be split between 100 clients. Each client would pick 3 guard nodes and only use them as their entrance nodes. In the previous simulation it made no difference if there was 1 client or 10,000 clients as the circuits picked wouldn’t be influenced by the clients in any way.
After running the simulation again using the same extracted directories as the first simulation another graph was produced, shown in figure 2. Comparing this to the first graph shows the same over all trends, the averages are slightly more erratic but nothing significant.
Again we see organisation A getting a lot of control when it is the only one trying to control the network but as soon as other organisation try to control circuits as well the total number of circuits controlled by organisations decrease.
Simulation 3 - What if there wasn’t any uncontrolled nodes?
The final simulation was to find out if what had been suggested by the previous 2 simulations was correct. That a tor network could provide anonymity to a large percentage of users even if every node was controlled by an organisation looking to remove the anonymity of the network.
This time I simulated 4 organisations. Organisation A started with control of the whole network (100 nodes), over the first 100 generations organisation B increased the number of entrance and exit nodes it controlled by 1 per generation. From 100 to 200 generations organisation C’s controlled entrance and exit nodes increased by 1 per generation. For the final 100 generations entrance and exit nodes controlled by organisation D’s were increased by 1 per generation.
The simulation was run 50 times and the results averaged out. The graph of the averaged results are shown in figure 3. As organisation B increases the number of nodes it controls Organisation A’s number of controlled circuits quickly declines. When organisation C starts trying to control circuits as well Organisation A and B’s number of controlled circuits declines even more. By the time that Organisation D has reached the same level of control as that of A, B and C, over 70% of the circuits are uncontrolled.
A number of interesting results have appeared from these simulations. The first was that if only 1 organisation tried to take over the tor network by adding its own nodes then it would only need to add 100 nodes to control over 70% of the traffic. Just adding 50 entrance/exit nodes would be enough to take control of over 40% of the traffic over the network, more than enough to do some serious traffic analysis. Even taking into account that each node would need to be in a different class B network to get full advantage, this would be physically possible for an organisation to do (For more information about this see http://www.cs.uml.edu/~xinwenfu/paper/SPCC10_Fu.pdf )
Also the use of guard nodes may add some protection from an individual clients point of view but from the point of view of those organisations not targeting a single individual then they add no real security as the results were about the same as the non guard node simulation.
Finally the really interesting thing for me was that it is entirely possible for a tor type network to be established by an organisation, but the moment other organisations take an interest in the network and try to take control of the circuits then it is more likely that a connection will be over an uncontrolled circuit (i.e. anonymous from the point of view we have been looking at the network from.) This is because while it is highly likely that the entrance node selected will be controlled by an organisation it is less likely that the exit node selected will be controlled by the same organisation.