Heckroth Industries

twitterMap on Hak5

I had a pleasant surprise this morning when I discovered that twitterMap has appeared on the latest episode (season 9, episode 19) of Hak5.

Other news about twitterMap, I have almost finished version 2. The new version will contain the ability to produce wordlists and also let you map links between followers. The follower mapping won't be fast as that part of the twitter API is rate limited. My tests so far have shown it to be a "leave running in the background for the rest of the day" tool rather than a "go and have a cup of tea while it finishes running" tool.

To reduce the number of calls I have also added the option to cache the mapping between twitter accounts number (returned from some of their APIs) and the users screen name. This has really helped with the testing and should help reduce the number of limited API calls if you are running multiple maps of the same person or group of people.

twitterMap Hak5
Jason — 2011-06-30

comm the opposite of diff

Today I needed to find matching lines in a number of text files. My first thought was, what is the opposite of diff? the answer is comm. To compare two text files and output lines that appear in both use

comm -1 -2 <file 1> <file 2>

To get matching lines between 4 files I redirected the output to tempory files and then comm'd them.

comm -1 -2 <file 1> <file 2> > tmp1
comm -1 -2 <file 3> <file 4> > tmp2
comm -1 -2 tmp1 tmp2

You can pipe into comm by using '-' instead of a filename so you could also compare 4 files with

comm -1 -2 <file 1> <file 2> | comm -1 -2 - <file 3> | comm -1 -2 - <file 4>
Linux
Jason — 2011-06-21

Getting started with PowerShell

I have just started using PowerShell instead of CMD and I can say that it is a big improvement. The first thing I wanted to do though was to edit my profile so that I could tailor it to me.

First I listed my initial requirements

First I discovered that I would have to change my execution policy for PowerShell. This was a simple case of launching PowerShell as an administrator and entering

Set-ExecutionPolicy RemoteSigned

This lets PowerShell run local scripts but requires remote scripts to have been signed. After doing that I closed down the Administrator PowerShell and opened one as my standard user.

To edit my profile I initially used the command:

notepad $profile

and entered the following

# Functions

# prompt function, redefines what prompt is displayed
function prompt
{
"> "
}

After closing PowerShell and reopening it I now had a simple '>' as the prompt.

The next step was to put in the aliases for vi, vim, gvim and gimp. So I edited the profile again and entered

# Assign vi, vim, gvim to gvim
Set-Alias vim 'C:\Program Files (x86)\Vim\vim73\gvim.exe'
Set-Alias gvim vim
Set-Alias vi vim

# Assign gimp to gimp
Set-Alias gimp 'C:\Program Files (x86)\GIMP-2.0\bin\gimp-2.6.exe'

After restarting PowerShell again I could now edit my profile by using

vim $profile

Now that I was using vim to edit my profile I could use <CTRL>+V to insert special characters. Which would let me put a <CTRL>+D (^D) in my profile, specifically as an alias, though I would still have to press <CTRL>+D followed by <RETURN>. Initially I tried

Set-Alias ^D exit

But that didn't work. Ironically it wasn't the <CTRL>+D character causing problems, instead it was the use of exit in an alias. The solution is to wrap exit in a function and call the function from the alias.

# ex function, required to use exit in aliases
function ex
{
    exit
}

# Aliases

# Assign CTRL+D to exit
Set-Alias ^D ex

That so far is my standard PowerShell profile. Here it is in one chunk incase you want to cut and paste (remembering to replace ^D with a proper <CTRL>+D character).

# Functions

# prompt function, redefines what prompt is displayed
function prompt
{
"> "
}

# ex function, required to use exit in aliases
function ex
{
exit
}

# Aliases

# Assign CTRL+D to exit
Set-Alias ^D ex

# Assign vi, vim, gvim to gvim
Set-Alias vim 'C:\Program Files (x86)\Vim\vim73\gvim.exe'
Set-Alias gvim vim
Set-Alias vi vim

# Assign gimp to gimp
Set-Alias gimp 'C:\Program Files (x86)\GIMP-2.0\bin\gimp-2.6.exe'
Windows
Jason — 2011-06-03

Getting DSpace's handle server to run over IPv4 and IPv6

I have spent this morning figuring out how to get a DSpace Handle Server to accept connections over both IPv4 and IPv6. Previously it was working over IPv4 only. The usual Google hunt didn't turn up any advice, though it did turn up the exact opposite of what I wanted (how to get it to only run on IPv4).

In the end the solution turned out to be very simple, just not documented. In the handle server's configuration file changing the bind_address entries from the IPv4 address to :: and it will start listening on the relevant port on all interfaces (IPv4 and IPv6).

IPv6 IPv4
Jason — 2011-06-02

How tough is a flash drive?

Back when I first started using computers (Oric Atmos and ZX Spectrum) everything was stored on cassette tapes (C15, C60, C90, etc). They weren't the most reliable way to store data and there was a definite black art involved in retrieving data stored on them, though a volume of 7 is a good place to start.

When I started using computers at school (BBC Micros) their data was stored on 5.25 inch floppy disks and later on a network drive (50 machines all sharing a 10MB hard disk). They were easier to retrieve data from, though don't leave the 5.25 floppy disks in the sun for too long or they wouldn't stay flat.

Later on in life I started using PCs (286,386,486 and pentiums) at school/college which used a 3.5 inch floppy disk to store 720KB, which later on increased to 1.44MB. At home I moved onto an Amiga 500 and later an Amiga 1200 (A1200), both of which used 3.5 inch floppy disks (880KB), and a 120MB Hard disk in the case of the A1200. These were again more reliable and I have actually still got a large collection of 3.5 inch disks to use with my A1200.

From then on I used 3.5 inch floppy disks until USB flash drives came along. OK, sometimes I would use CDs but not for storing files I was working on, just for archiving data onto.

So why am I talking about the history of my data storage. Well I haven't had any storage medium in the past that would have survived what I put one of my flash drives through the other day. It was an old one that I use to write ISOs to for booting rather than writing a CD, just 1GB of storage. I had put MemTest86+ on it to test the memory of my new media center PC. Needless to say that afterwards it ended up in my trouser pocket and the next night I put those trousers into the washing machine and gave them a hot wash. When I was taking out the washing there was my flash drive sat in the drum having been washed and spun.

Personally I had at that point written the drive off, I know they are tough but I didn't expect it to survive that. I let it dry then plugged it into my computer and rebooted, the boot menu had the flash drive as an option so I selected it. MemTest86+ started up and started running. It looks like that flash drive is a lot tougher than I thought.

Retro Hardware
Jason — 2011-05-11

UTF-8 and CSVs

Recently I have had to produce some CSVs using UTF-8 character encoding. The UTF-8 encoding is easy to do you just need to remember to set the header charset to be utf-8 when printing the CGI header.

use CGI;
my $CGI=new CGI;
print $CGI->header(-type=>'text/csv', -charset=>'utf-8', -attachment =>$filename);

Then you have to print the Byte Order Mark (BOM) which in hex is FEFF as the very first thing so that Excel will recorgnise the CSV as being in UTF-8 and not in its default character set.

print "\x{FEFF}";

Interestingly from what I can tell this BOM is actually for UTF-16, the BOM for UTF-8 should be 0xEFBBBF, but this didn't seem to work with Excel.

Note: Usually the BOM is not recommended for UTF-8 as it can cause problems, but in the case of CSV's that you want to open in Excel it is required.

UTF8 CSV Perl CGI
Jason — 2011-04-13

Authenticating Apache against an Active Directory with multiple top level OUs containing users

Wow that was a long title, but just what I have been dealing with recently. The first solution which, is easy, is to use Kerberos, this works great unless you also want authenticationto fall back to a standard.htpasswd` file. In that case you need to use LDAP. Why? because LDAP and File use the same AuthType of Basic where as Kerberos uses an AuthType of Kerberos. Using LDAP and File authentication you can use a config like this

<Directory /var/www/html/private>
    SSLRequireSSL
    AuthName "Private"
    AuthType Basic
    AuthBasicProvider ldap file

    # File Auth
    AuthUserFile /var/www/.htpasswd

    AuthLDAPURL "ldap://ADServer.domain.co.uk/ou=Users,dc=domain,dc=co,dc=uk?sAMAccountName?sub?(objectClass=*)"
    AuthLDAPBindDN User@Domain.co.uk
    AuthLDAPBindPassword XXXXXXX

    AuthzLDAPAuthoritative off
    Require valid-user
    Satisfy any
</Directory>

This works unless you have your users split over a number of OUs in the Active Directory. If that is the case here is the way I got around it.

<AuthnProviderAlias ldap ldap-group1>
    AuthLDAPURL "ldap://ADServer.domain.co.uk/ou=Group-OU1,dc=domain,dc=co,dc=uk?sAMAccountName?sub?(objectClass=*)"
    AuthLDAPBindDN User@Domain.co.uk
    AuthLDAPBindPassword XXXXXXX
</AuthnProviderAlias>

<AuthnProviderAlias ldap ldap-group2>
    AuthLDAPURL "ldap://ADServer.domain.co.uk/ou=Group-OU2,dc=domain,dc=co,dc=uk?sAMAccountName?sub?(objectClass=*)"
    AuthLDAPBindDN User@Domain.co.uk
    AuthLDAPBindPassword XXXXXXX
</AuthnProviderAlias>

<Directory /var/www/html/private>
    SSLRequireSSL
    AuthName "Private"
    AuthType Basic
    AuthBasicProvider ldap-group1 ldap-group2 file

    # File Auth
    AuthUserFile /var/www/.htpasswd

    AuthzLDAPAuthoritative off
    Require valid-user
    Satisfy any
</Directory>
Apache Security
Jason — 2011-02-28

Comparison of ext2,3,4 and ntfs on usb flash drive

There is a topic over on the Hak5 forums asking which filesystem format is best for use on a USB flash drive. I figured that I would run some basic tests and see if my results match up with other tests on the internet.

I decided to test 3 different types of operation, reading, writing and deleting. Each test on each files system was performed 10 times with the results averaged and then plotted onto graphs.

Each test tested tested 11 different sized files (1KB, 512KB, 1MB, 2MB, 4MB, 8MB, 16MB, 32MB, 64MB, 128MB, 256MB).

The filesystems were all tested on a 4GB Dane-elec flash drive over USB2.0 on my eeePC 900 running Linux.

Reading

Before performing the reading test the disks were synced and the file cache dropped to try get an more accurate measure of the filesystem instead of the file cache.

The results from the read performance tests showed ext2 and ext4 performed the best overall, with ext2 having a slightly better performance than ext4 on the larger files.

An interesting result is the way that ext3 really seemed to struggle with reading the large files. The performance from ntfs was slower than than ext2 and ext4.

Writing

The writing performance tests showed again that ext3 seemed to struggle as the file size increased. A bigger gap is also shown between ext2 and ext4 with ext4 standing out as better with the larger file sizes.

Interestingly once the file sizes get beyond 32MB ntfs stands out as the best performer for writing.

Deleting

ext3 and ext4 perform the worst at deleting the larger files while ext2 performs the best. Again ntfs is worst than the other file systems on smaller files but performs better than ext3 and ext4 on the larger files.

Conclusion

Based upon these results I would recommend ext4 as it does a good job with reading and writing and while slower than the others at deleting larger files it is still capable of deleting and 256MB file in less that an eighth of a second.

It would be interesting to run these tests on existing file systems which have been used a lot to see if there is a difference after files have been added and removed repeatedly (which is quite common with USB flash drives).

Of course if you are going to be using the drive on a windows machine then ntfs would make much more sense.

Filesystems
Jason — 2011-01-12

Why I use cat even though there are more efficient methods

When using a series of commands tied together with pipes I usually start with the cat command. A lot of times when I post a one liner solution on a forum someone will reply that there was no point in starting with cat as it is inefficient. So I decided to put a quick post about why I use cat rather than one of the other methods.

The main reason that I use cat at the start of most strings of pipes is that it is easier to maintain. The logical flow of the data is going from left to right and the files that go into the pipe is easy to spot e.g.

cat /etc/passwd | grep bash | grep -v :x:

We can see here that /etc/passwd gets pushed through grep first to find those lines containing bash. Then those lines are pushed through grep again looking for lines that don't contain :x: (i.e. non shadowed passwords). This could have been written in a number of different ways.

grep bash /etc/passwd | grep -v :x:
</etc/passwd | grep bash | grep -v :x:

In these examples the first way would be reasonable, but the original file at the start of the pipe is a little hidden tucked away in the first grep command. The second way puts the original file at the start and is very clear, but a typo of > instead of < will destroy the file I am really wanting to read from.

So yes there are more efficient ways to start off a string of pipes, but I like to to use cat as it makes things a bit more obvious than some and less prone to destroying data with a simple typo than others.

Linux RedHat Security yum
Jason — 2011-01-10

gzip v bzip2

I have recently been looking at revamping our backup setup and I had to make a decision on the compression method to be used. Should I be using tar with gzip, bzip2 or a combination of the two. They were the only two real contenders mainly due to being stable and supported as standard in tar. The last thing I want to do with backups is to use an exotic compression method, as I want to be sure I will be able to restore the backups.

So the first thing I did on a mixture of servers was to time the length of time it took tar to create the compressed tarball with both tools. The results showed that for our data bzip2 was considerably slower at compressing the tar than gzip. Looking at the size of the final tarballs also showed that bzip2 produced smaller tarballs. So do I want faster generation of the tarballs or smaller resulting tarballs?

The final solution I decided on is to use a mixture of both gzip and bzip2. If it is a small quantity of data then bzip is used as the time difference to produce a small tarball is negligible. For the backup of large sets of data then gzip is used as bzip takes a lot longer to compress it than the time that would be saved pushing the smaller bzip2 tarball across the network to server responsible for writing the backups to tape.

Backup Compression
Jason — 2011-01-07
Newer posts...Older posts...