13 February 2010

links -- zfs, dtrace, benchmark work...

## amd; learning about AMD architecture
http://en.wikipedia.org/wiki/Comparison_of_AMD_processors
http://en.wikipedia.org/wiki/Socket_AM2%2B
http://www.tomshardware.com/reviews/socket-am3-phenom,2148-4.html

## benchmarking
http://en.wikipedia.org/wiki/Benchmark_(computing)

## love dtrace; want more
http://www.brendangregg.com/dtrace.html

## from scott lowe twitter post
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1007909

## from stupid HTML5 twitter post
http://en.wikipedia.org/wiki/Tragedy_of_the_commons

07 June 2009

security visualization

This weekend I decided to work through this tutorial on visualizing firewall logs, and wanted to capture my notes on how I got it done. Turns out that the example is pretty simplistic, and the Treemap tool starts to need some tickling to get it to ingest larger files.

I've been wanting to work on this for a long time, as my introduction to computing in a professional context was as a fledgling graphic designer, and I'm interested in the many possibilities associated with visualizing this kind of data. I've had this guy's copy of Applied Security Visualization for some time (I'll get it back to you soon, Trey) but had not yet had the time to crack it open. Mr. Marty's easy-to-use tutorial changed that though.

So, in my case I am using an old Netscreen 25 firewall that is used in a somewhat sequestered portion of my network. Despite that fact, it does generate around two hundred to three hundred thousand lines of log entries a day. A great thing about this project is that it's a chance to tune the firewall to ensure that the rules are blocking the traffic they should be and also ensure that it's not capturing unnecessary log info. I quickly identified some misconfiguration of the firewall going through this process and also identified a couple situations where I could turn off logging for a particular policy.

Step one is to run the log files through a process that will extract only the information necessary for viewing in Treemap. I started this process with some simple bash/awk/sed work, but quickly found that due to the naming of network address ranges in the Netscreen that included spaces, I would have to use some more advanced regular expressions and so, to my mind, perl/python/etc.

N.B. I did my best to "randomize" the IP addresses following and I hope I did it in a way that doesn't mess up the flow of things. Ultimately, the addresses and port numbers are not important.

## define a variable for the log file I want to work with
logfile=/var/log/firewall/20090601.log

## after I looked at a log file line, I found that the fields I wanted were
## the 22nd, 23th, 24th 25th, and 19th, so I grab those with awk, and write
## it out to a file; these fields correspond to source IP, destination IP,
## source port, destination port, and action (permit or deny)
awk '{ print $22, $23, $24, $25, $19 }' $logfile > /tmp/viz_output

## what'd we get?
head -n2 /tmp/viz_output
src=172.30.200.14 dst=10.4.39.8 src_port=2241 dst_port=445 action=Permit
src=172.30.200.14 dst=10.4.39.8 src_port=2211 dst_port=445 action=Permit

## then, just to stick with Rafael's process of using CSV files, I remove the
## labels from the fields ("src=", "dst=", etc.) and replace those with commas
##
## btw, yes, there are different and better ways to use sed, I know
cat /tmp/viz_output | \
sed 's/src=//' | \
sed 's/ dst=/,/' | \
sed 's/ src_port=/,/' | \
sed 's/ dst_port=/,/' | \
sed 's/ action=/,/' > /tmp/viz_output.csv

## look at what we've got so far
head -n2 /tmp/viz_output.csv
172.30.200.14,10.4.39.8,2241,445,Permit
172.30.200.14,10.4.39.8,2211,445,Permit

This looked fine for me when I started, but I quickly found that of the two hundred and fifty thousand lines I got from this process, not surprisingly there were some lines where fields contained a space and threw the simplistic use of awk off. I needed to use some regular expressions to add some intelligence so that I was keeping the fields aligned better.

I have some perl programming experience, but I do not code every day right now so I wasn't entirely comfortable coming up with a perl script to try to solve this problem. I decided to search for a parser for the Netscreen log files. There wasn't one on the sparsely populated secviz.org "parser-exchange", but I did find this one from Optek consulting. After downloading and reading through the code, I attempted a quick run using my sample log file. It worked fantastically.

I ended up using the following command line to get the output as close as possible to what Rafael specifies in the article:
## cleaned up for blog post
./nstf.pl --noDevice \
--novSys \
--nosTime \
--noPolId \
--nosZone \
--nodZone \
--noProto \
--noElap < $logfile > /tmp/viz_nstf_output

## and the output
head -n3 /tmp/viz_nstf_output
sAddr dAddr sPort dPort Action
172.30.200.14 10.4.39.8 2241 445 Permit
172.30.200.14 10.4.39.8 2221 445 Permit

## and, again to stick with the example, make it CSV
awk '{print $1","$2","$3","$4","$5}' < /tmp/viz_nstf_output > /tmp/viz_nstf_output.csv

## and the output
head -n3 /tmp/viz_nstf_output.csv
sAddr,dAddr,sPort,dPort,Action
172.30.200.14,10.4.39.8,2241,445,Permit
172.30.200.14,10.4.39.8,2211,445,Permit

## awesome, now to get it into shape for the article, remove the current header
## (sAddr,dAddr,etc) and replace commas with tabs, as well as sort and use the
## command uniq to pare the file down and provide a count of unique lines
##
## this is where things seem to fall apart, in terms of needing to massage the
## file to get it to work with Treemap; the command in the article did not work
## for me, but this one did
perl -pe 's/,/\t/g' < /tmp/viz_nstf_output.csv | \
sort | uniq -c | perl -pe 's/^\s*//, s/ /\t/' \
> /tmp/viz_nstf_output.tm3

## and the output
head -n2 /tmp/viz_nstf_output.tm3
2 172.30.200.14 10.4.39.8 49755 4196 Permit
170 10.39.208.12 63.240.161.99 123 123 Permit

At this point, I need to add a header to the top of the file to make sure that it's as specified in the tutorial.
cat header
count sip dip sport dport action
INTEGER STRING STRING STRING STRING STRING
Once I added that to the top of my .tm3 file, I'm ready to bring it into Treemap. To do that, I downloaded the DAVIX ISO, and loaded it up in VMware Workstation. I created some disk space to use with DAVIX and once booted up, I formatted the partition and got it ready for use (I believe this is explained in the documentation for DAVIX):
root@slax:~# fdisk /dev/sda
##
## hit n for new partition, and w to write out the changes
##
## then format the partition, and mount it in the mount point provided
root@slax:~# mkfs.ext3 /dev/sda1
root@slax:~# mount /dev/sda1 /mnt/hdc
root@slax:~# mount
aufs on / type aufs (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/sda1 on /mnt/hdc type ext3 (rw)

## now I can scp the formatted log file from the syslog box, and work on it
root@slax:~# cd /mnt/hdc
root@slax:~# scp ant@logger:/tmp/viz_nstf_output.tm3 .
I'll say here that while I very much appreciate the work that's clearly gone into Treemap, and the fact that it's available for free, the first bunch of times I tried to get a file loaded into it, it barfed. In writing up this post, I went back through the process and tested each command to ensure that it would work, and even after going through the process many times, Treemap was still bitching about the file not being formatted properly for one reason or another. I tend to agree with criticism I've seen Rafael give elsewhere: just make the tool accept CSV, it's easy to create and work with. If you do this type of thing all the time it probably won't be a big deal.

Anyway, loading the file into Treemap worked successfully with the first file I tried, which had around fifty thousand lines. After spending some time with the first map, I was excited to see the results from different days, so I went back through the process for two additional days. That's where I ran into issues with Treemap not having enough memory to open larger files. The second file I tried to load had around eighty thousand lines (which to me is not that much more, but clearly just enough) and it would not load.

Unfortunately the Treemap GUI doesn't produce any error that it's out of memory, it just doesn't load the file. I had Treemap installed on another machine and was running it from the command line with the same log files (one a winders machine, the other Linux/DAVIX, just for comparison). Running it from the command line and attempting to load the larger log file produced the following error output:

java -jar treemap.jar
readTM3: java.lang.OutOfMemoryError: Java heap space
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.io.BufferedInputStream.(BufferedInputStream.java:178)
at java.io.BufferedInputStream.(BufferedInputStream.java:158)
at sun.swing.SwingUtilities2$2$1.run(SwingUtilities2.java:1319)
at java.security.AccessController.doPrivileged(Native Method)
...
more java spew...
The line beginning with readTM3 makes it clear what's happening. To remedy, I gave Treemap more memory to work with, thus:

root@slax:/usr/local/lib/treemap# java -Xms512m -Xmx512m -jar treemap.jar

And everything went according to plan:



Using this view, it was very clear, very quickly that there was a misconfiguration that was blocking DNS traffic (the top right red block). Easily fixed. I also found that there were two machines on this subnet that apparently have either a virus, or are very persistently trying to contact machines all over the world every few seconds for a good cause. I choose the former.

So from this point it's time to learn more about Treemap as there are clearly many things it will do that I'm not aware of yet. The issue I had with larger files not loading, and with the potential for wanting to see a very large amount of information (from say a production firewall with many more events in the log) under investigation got me thinking too about how to filter in the first step so that the data set was smaller. The nstf script from Optek has some interesting options that I look forward to trying.

Thanks Rafael for the tutorial and all your work promoting visualization. Thanks too for DAVIX, what an awesome tool. I look forward to learning more and investigating further.