Articles

Quick tor analysis

The other month I was thinking about how secure tor really is. It is already common knowledge that tor doesn’t provide encryption from your machine to the destination machine , so if you don’t use a secure protocol (https, ssh, etc.) then a rouge exit node could sniff your traffic. Rather than repeat a variation on the whole “how can a rouge exit node sniff traffic” I started thinking about “what could a big organisation do to remove the anonymity provided by tor?”

Simulation 1 - A start of the investigation

If an organisation controls both the entrance node and the exit node then it can effectively remove any anonymity from the circuit. First step was to put a Perl script together that would extract the current tor directory (effectively a list of known nodes in the tor network and their statuses) from a locally running tor node. I then used this extracted directory as the base data for my crude simulator, also written in Perl. The simulator would simulate 10,000 circuits per generation and would run for 300 generations.

For the first 100 generations the number of entrance and exit nodes controlled by organisation A would increase by 1 per generation. For the next 100 generations entrance and exit nodes controlled by organisation B would increase by 1 and for the final 100 generations the number of entrance and exit nodes controlled by organisation C would increase by 1.

I repeated the simulation using tor directories exported over a 13 day period. At the end I produced an average set of results from the 13 different simulations and produced the graph shown in figure 1. As can easily be seen the number of uncontrolled circuits quickly decline as organisation A starts to add it’s own nodes to the network. After they have added 100 nodes to the network they would be controlling over 70% of the circuits.

Organisation A could only maintain their level of control while they were the only organisation. As soon as organisation B started trying to control circuits there was a quick decline of circuits controlled by organisation A. When organisation C started trying to control circuits both organisation A and B’s level of control declined. As each organisation’s level of control declined the number of uncontrolled circuits (i.e those that users would remain anonymous using) increased.

The only thing this simulation hadn’t really covered was the concept of guard nodes (where a client will select its entrance nodes from a small subset of available nodes). So I decided to add guard nodes into the simulation.

Simulation 2 - With guard nodes

The changes over the first simulation made was that the the 10,000 runs would be split between 100 clients. Each client would pick 3 guard nodes and only use them as their entrance nodes. In the previous simulation it made no difference if there was 1 client or 10,000 clients as the circuits picked wouldn’t be influenced by the clients in any way.

After running the simulation again using the same extracted directories as the first simulation another graph was produced, shown in figure 2. Comparing this to the first graph shows the same over all trends, the averages are slightly more erratic but nothing significant.

Again we see organisation A getting a lot of control when it is the only one trying to control the network but as soon as other organisation try to control circuits as well the total number of circuits controlled by organisations decrease.

Simulation 3 - What if there wasn’t any uncontrolled nodes?

The final simulation was to find out if what had been suggested by the previous 2 simulations was correct. That a tor network could provide anonymity to a large percentage of users even if every node was controlled by an organisation looking to remove the anonymity of the network.

This time I simulated 4 organisations. Organisation A started with control of the whole network (100 nodes), over the first 100 generations organisation B increased the number of entrance and exit nodes it controlled by 1 per generation. From 100 to 200 generations organisation C’s controlled entrance and exit nodes increased by 1 per generation. For the final 100 generations entrance and exit nodes controlled by organisation D’s were increased by 1 per generation.

The simulation was run 50 times and the results averaged out. The graph of the averaged results are shown in figure 3. As organisation B increases the number of nodes it controls Organisation A’s number of controlled circuits quickly declines. When organisation C starts trying to control circuits as well Organisation A and B’s number of controlled circuits declines even more. By the time that Organisation D has reached the same level of control as that of A, B and C, over 70% of the circuits are uncontrolled.

A number of interesting results have appeared from these simulations. The first was that if only 1 organisation tried to take over the tor network by adding its own nodes then it would only need to add 100 nodes to control over 70% of the traffic. Just adding 50 entrance/exit nodes would be enough to take control of over 40% of the traffic over the network, more than enough to do some serious traffic analysis. Even taking into account that each node would need to be in a different class B network to get full advantage, this would be physically possible for an organisation to do (For more information about this see http://www.cs.uml.edu/~xinwenfu/paper/SPCC10_Fu.pdf )

Also the use of guard nodes may add some protection from an individual clients point of view but from the point of view of those organisations not targeting a single individual then they add no real security as the results were about the same as the non guard node simulation.

Finally the really interesting thing for me was that it is entirely possible for a tor type network to be established by an organisation, but the moment other organisations take an interest in the network and try to take control of the circuits then it is more likely that a connection will be over an uncontrolled circuit (i.e. anonymous from the point of view we have been looking at the network from.) This is because while it is highly likely that the entrance node selected will be controlled by an organisation it is less likely that the exit node selected will be controlled by the same organisation.

Jason — 2010-07-12

An Introduction To Password Hashes

What is a password hash?

In the early days of computing there was required a method of storing passwords so that users logging in could be autheticated. While it would be possible to store the passwords in plain text in a file anyone who could access that file would have the login details for everyone on the machine.

The solution to this problem was to hash the password. A hash function is a one way function. What is meant by one way is that it is easy to generate the hash from the password but very difficult / time consuming to generate the password from the hash.

Brute Force Attacks and Dictionary Attacks

One method of converting a hash back to the password that generated it is to imply try hashing every possible combination of characters are valid for the password and then comparing that hash to the one being cracked. This is called a brute force attack.

Another similar method is a dictionary attack where every word in a dictionary of probably passwords is hashed and compared to hash that is being cracked. This is quicker than a brute force attack but does not guarentee a result.

Rainbow Tables

Some hashing schemes (e.g. those used by windows) can be easily defeated if the attacker has spent the time to generate a set of rainbow tables for their target hash method. Rainbow tables are way of reducing the amount of time it takes to crack a hash by precomputing most of the possible hashes before hand.

It works by having another function that maps a hash back to a possible password. Then the system builds up chains of a set length by starting with a password and generating the relating hash and then using it’s function it maps that hash back to a possible password and then it generates the hash for that password and then uses it’s function to map that that hash to a new password.

The system then stores the original password in the chain and the final hash. The system generates enough chains to cover the majority of possible passwords. The collection of these chains are refered to as rainbow tables.

When breaking a hash it checks the list of hashes to see if it has a match. If not then it uses it’s function to generate a password from the hash which it hashes and compares it to the list of hashes in the rainbow table. The system keeps generating passwords based on the previous hash and then hashing that password and comparing till it finds a match. Then all it has to do is generate the chain that it finds a match on and then it should find in that chain the hash and a password that generates it.

Salts

A common method used to defeat rainbow tables is to use a salt in a password. This is a little bit of information that is stored with the hash that is used to alter the way that the hash function hashes the password. It could be a simple as a bit of text that is concatenated onto the password or it may actually change how the hash function operates in some way. With a large enough variation possible in the salt then the time taken to generate the rainbow tables and the space required to store them are drastically increased (i.e. into the realms only available to government bodies).

More Information

For more information check out the following links.

Jason — 2009-12-04

Explaining the PATH environment variable in Windows

Recently I have seen a large number of posts on forums asking how they run a windows console application that they have downloaded. The problem most of them encounter is that they can only run the application from the command line if they are in the same directory as the application they have just installed.

What is more worrying to me is the replies that people give to these posts suggesting some very silly things (e.g. moving their application into the c:\windows\system32\ directory). What a lot of the replies show is that there are a lot of people who don’t understand the what the PATH environment variable is.

So what is an environment variable then? An envrionment variable is a method of storing variable information so that processes can look it up. A good way to get an understanding of this is to actually use an environment variable. Open up a command line and enter echo %windir%. Notice the output. It is where your windows directory is located.

Now I hear you saying, “I know where my windows directory is, so whats the point.”, but lets look at it from the point of view of an installer. The installer wants to write a file into your windows directory, if we hard code the installer to always write it to c:\windows then the installer will fail when someone has a windows directory called c:\windowsxp or if they have their windows directory on a different drive (e.g. d:\windows). But if the installer uses the windir environment variable instead of hard coding it then it will work on those machines as well.

Now that we have an understanding of what an environment variable is lets look at the PATH environment variable. In your command line enter echo %PATH% and look at the long result it returns. At first this looks confusing but after looking at it for a little while we can see that it really is just a list of directories seperated by semicolons. So what is this environment variable used for?

When you enter a command on the command line the system uses the PATH environment variable as an ordered list of directories that it has to search for the command. So if your path environment variable is c:\windows\system32;c:\windowd;c:\ and you enter the command edit it will first look in c:\windows\system32\ for an executable called edit, it finds edit.com and runs it.

Now lets enter notepad on the command line. The system again looks in the first entry in the path environment variable (c:\windows\system32\ in our example) but it doesn’t find a match. It then looks in c:\windows where it finds notepad.exe and runs it.

Finaly lets enter asdf on the command line. The system again looks in the first entry in the path environment variable (c:\windows\system32) but fails to find a match. It then looks in c:\windows where it also fails to find a match. Finally it looks in c:\ and failing to find a match displays an error to the user.

Adding extra directories into your PATH environment variable is easy. Simply goto your system properties and under the advance tab click on Environment Variables. In the Environment Variables window that appears it is split into two sections User Variables and System Variables. User Variables are only for the current user logged in while System Variables are for all users.

It is important to notice that the PATH environment variable appears in both User Variables and System Variables. Windows uses both of these to create the PATH environment variable. The user’s PATH entry is concatenated onto the end of the system entry to produce a single environment variable. This lets the administrator define a default set of directories to be searched for all users and then individual users can add their own directories to be searched afterwards. The general rule I use is to add to the system PATH entry only if I need multipe users of the machine to have it in thier PATH otherwise I add directories to the user PATH entry.

Jason — 2009-12-04

RC4 Tutorial

The RC4 algorithm is a stream cipher. It was created by Ron Rivest of “RSA Data Security” in 1987. The algorithm produces a key stream that is Xor’d with the plain text. At the core of the algorithm is an 16×16 SBox (Substitution Box) which is initialised by the key. Every time a byte is produced the SBox is permuted.

The algorithm needs to share 3 variables between the two functions (i, j and SBox[]);

To initialise the SBox the following procedure is followed

Each element in the SBox is filled with its element number. (I.E SBox[0]=0, SBox[1]=1, …..SBox[255]=255)
Another array of equal size to the SBox (K) is filled with the key. Repeating the key as needed until the whole of the array has been filled.
The index j is set to 0 4 : for(i=0 to 255) j=(j+SBox[i]+K[j]) Mod 256 Swap SBox[i] with SBox[j] done

At the end of this process the key array (K) is no longer needed. To the next value in the key stream the following procedure is used

i = (i + 1) Mod 256
j = (j + SBox[i]) Mod 256
Swap SBox[i] with SBox[j]
t = (SBox[i] + SBox[j]) Mod 256
return SBox[t]

Because RC4 uses Xor to encipher the plain text then the decipher process is exactly the same as the encipher process.

The following three files provide the basis for an rc4 encryption. To use it just compile rc4.c as an object file and link it to you program file (remembering to add rc4.h to the list of include files in your program).

e.g. gcc -C -o rc4.o rc4.c gcc -o rc4 test.c rc4.o

The generateKey function will produce the SBox based upon the key passed to it. The getByte function will return the next value in the key stream.

Note that this only provides a key stream and your program will need to Xor this with the plain text it is wishing to encipher.

The test.c program provided gives an example of how to use rc4 in your own programs.

----------------- Begin rc4.h ----------------
/*
* rc4 ecryption scheme header
*
*/

#ifndef RC4HEADER
#define RC4HEADER

void generateKey(unsigned char Key[]);
unsigned char getByte(void);

#endif
----------------- End rc4.h ----------------

----------------- Begin rc4.c ----------------
/*
* rc4 ecryption scheme
*
*/

#include <stdio.h>
#include <string.h>

#define SIZE 256

unsigned char SBox[SIZE];
int i;
int j;

void generateKey(unsigned char Key[]);
unsigned char getByte(void);

void generateKey(unsigned char Key[]) {
    unsigned char tmp;
    unsigned char KBox[SIZE];

    for(i = 0; i < SIZE; i++)
        SBox[i] = i;

    for (i = 0; i < SIZE; i++)
        KBox[i] = Key[i % strnlen(Key, SIZE)];

    for (j = 0, i = 0; i < SIZE; i++) {
        j = (j + SBox[i] + KBox[i]) % SIZE;
        tmp = SBox[i];
        SBox[i] = SBox[j];
        SBox[j] = tmp;
    }
}

unsigned char getByte(void) {
    unsigned char tmp;

    i = (i + 1) % SIZE;
    j = (j + SBox[i]) % SIZE;
    tmp = SBox[i];
    SBox[i] = SBox[j];
    SBox[j] = tmp;

    return SBox[(SBox[i] + SBox[j]) % SIZE];
}
----------------- End rc4.c ----------------

----------------- Begin test.c ----------------
/*
* rc4 ecryption scheme
*
*/

#include "rc4.h"
#include <stdio.h>

int main(int argc, char **argv) {
    int in;
    if (argc > 1) {
        generateKey(argv[1]);
        while((in = getchar()) != EOF) {
            printf("%c", (unsigned char) in ^ getByte());
        }
    }
    return 0;
}
----------------- End test.c ------------------

Jason — 2009-12-04