All Code Samples go here.

Simple Sysadmin Trick: Using tcpdump To Sniff Web Server Traffic

Sometimes, you just have to look into the raw data to see what your web server is doing. The logs might not show you enough detail or you suspect something is going on which is just not shown in the log files. Or, as in my case, logging is turned off because of too much activity.

The excellent tcpdump utiliy comes to the rescue here. I recommend you get more familiar with the tcpdump man page. Here is the command you can use, in a nutshell:

tcpdump -nl -w - -i eth1 -c 500 port 80|strings

or alternatively with just tcpdump (Thanks Chris!):

tcpdump -nl -s 0 -A -i eth1 -c 500 port 80

Your command line will print out all traffic exiting your server from port 80, headers and all. Lets look at the options in more detail.

  • -n: Don’t convert addresses (i.e., host addresses, port numbers, etc.) to names.
  • -l: Make stdout line buffered. Useful if you want to see the data while capturing it.
  • -w: Write the raw packets to file rather than parsing and printing them out. (Sent to stdout)
  • -i: Interface you want to sniff on, usually eth0 or eth1, but depends on your system.
  • -c: Number of packets to capture
  • port: port 80, duh :)
  • -A: Print each packet (minus its link level header) in ASCII.
  • -s: size

Now, depending on your web server configuration, you will probably have gzipped content which comes out as garbled characters. To strip all that out, just pipe it through strings.

The output will look something like this:

GET /about/comment-page-1 HTTP/1.0
Accept: image/gif, image/jpeg, image/pjpeg, application/x-ms-application, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*
User-Agent: Mozilla/2.0 (compatible; MSIE 3.02; Windows CE; 240x320)
Referer: http://quittingsoda.com/about/comment-page-1#comment-6637
Host: quittingsoda.com
Cookie: comment_author_xxx=sakyjartory; comment_author_email_26e707905b5fd6e7139333eb1dab208f=olfaexxxx; comment_author_url_26e707905b5fd6e7139333eb1dab208xxx
HTTP/1.1 200 OK
Date: Wed, 03 Oct 2012 01:49:19 GMT
Server: Apache/2
X-Powered-By: PHP/5.2.17
X-Pingback: http://quittingsoda.com/xmlrpc.php
Vary: Accept-Encoding,User-Agent
Connection: close
Content-Type: text/html; charset=UTF-8




About This Site | Quitting Soda

Let me know if you have any suggestions for improving this simple command line program for sniffing your web server traffic! Netcat is an essential tool that every linux administrator should get to know better.

How We Defeated a Proxy Jacker (Google Web Spam Syndrome)

A few months ago, we had an interesting issue with another website stealing our content in an unusual way. Essentially they ran a proxy service on a similar domain (a .com.ar instead of .com) and then replicated our site, replacing our ads with their own. They also ran our content through a content replacement algorithm, removing certain pieces of text and also replacing our own domain with theirs. It is fairly easy to do.

We had a few concerns about this. Firstly, we were worried that unsuspecting users would enter their login information on this site. Also, they actually were showing up in Google search engine results, which both hurts our brand, ranking and perception by users (this site loaded much slower than ours obviously).

First Attempt To Stop The Proxy Spammers: IP Blocking

Determined to put a stop to it, the first method we tried was blocking their crawler IPs. However, they owned enough IPs in enough ranges that it was simply a cat and mouse game. We would block them, and then a day later they would be back up. After a few weeks of this it didn’t seem to be a viable long-term solution for blocking these spammers.

Next, We Used Their Mirroring Of Our Content Against Them

Next I thought I would use the fact that they were modifying our page content and disable their site that way.

I created a box in html/css which would trigger via JavaScript only when loading from their domain. But, you wouldn’t think it would be that easy would you? They replace our domain with theirs, modifying any mention of their own domain. So I used a quick hash function to create a unique identifier from the loaded domain and then matched against that.

Also, figuring they would just take out the HTML box I created, I also found it useful to display a hex-encoded version of it. You can encode text to the hex-escaped version for Javascript with the following command line:

[cc lang=”bash”]echo -n “some text” | od -A n -t x1 |sed ‘s/ /\\x/g'[/cc]

The final Javascript I started loading on our site (and therefore their site) is below:

[cc lang=”javascript”][/cc]

After thoroughly testing and then putting up the code, we launched it live. It was very satisfying knowing that end-users were getting the warning message when visiting their site, and the actually reported their experience back to us. After about 12 hours, they figured out our play out and turned off their proxy. Success!

Not quite. The next morning I woke up to see their site back up. Looking into their code, they had removed the JS completely. I changed our code, embedded it in a JS file, and used other creative means to get it back up – but in the end they were just able to disable the Javascript entirely and defeat this attack. It was just another cat-and-mouse game.

We also tried DMCA/Abuse Contacts

We also tried to send notices to their hosting company to get them to take down the site. They were hosted in Argentina, so DMCA is not applicable. Their abuse contacts were non-responsive.

Originally we thought we could go after their domain registration since it violated our trademark – but this is a long and involved process involving lots of paperwork and time, and for another $10 they could just register another domain name. We didn’t think this was a viable option.

Going to the Source: Google

Google is involved in this scam in a number of ways. First, they were indexing and serving his site in their search results. Secondly the scammers were replacing our ads with their own Google AdSense ads, which I am sure is a ToS violation.

While we were attempting to defeat their site from a technical perspective, we also began looking at Google to see what we could do from there.

Google has a process for submitting DMCA requests. The issue in this case is that they make you submit one batch at a time, and with millions of pages on our site and indexed in Google, it just doesn’t make sense to list out urls line by line. It worked to remove those submitted results from search engine results, but it was like cutting grass with scissors. Finally, I attempted to contact Matt Cutts via Twitter:


Thanks. So much for that venue. I know @mattcutts is the public face of Web Spam at Google so I’m sure that he gets lots of @’s with dumb questions, but we were already way beyond this.

Finally, What Worked

Members of our site had a few personal contacts at Google both in an AdSense representative and otherwise. One of our contacts with Google was able to bring this issue up with the right people and they finally took the offending domain out of the SERPs permanently. We also reported their AdSense account – but we don’t know what happened with that. Without them showing up in SERPs, it was a moot point because they won’t be getting many visitors any more.

The Problem With Google

Google has become so large that it is almost impossible to get a situation like this taken care of without knowing someone who works for them. They have many automated systems in place, but scammers will continue to utilize loopholes for their own profit. Google enables this type of scam, yet they also profit from it.

I wish Google would have some sort of Ombudsman or review system set up so that someone like us, who is having our content ripped off by others using Google’s own tools (and with Google taking a percentage of profit from these people), has a way to efficiently deal with them without resorting to personal contacts. We spent much time on this, time that could have been put to better use.

Or maybe personal contacts are the only real way to deal with a situation like this?

Anyway – I am welcoming comments and any other ideas for dealing with these Proxy Hijackers and how to keep them offline. I’m curious how widespread this type of incident is, we know of only one other site that was having the same issue from the same scammer.

After all, they can always get another domain for $10.

Simple Disk Benchmarking in Linux Using ‘dd’

A great way to do a real-world disk test on your linux system is with a program called dd.

dd stands for data description and is used for copying data sources.

A simple command to do real-world disk write test in linux is:

dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync

This creates a file named ‘test’ with all zeroes in it. The flag conv=fdatasync tells dd to sync the write to disk before it exits. Without this flag, dd will perform the write but some of it will remain in memory, not giving you an accurate picture of the true write performance of the disk.

A sample of the run is below, with a simple SATA disk:

[14:11][root@server:~]$ dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 5.19611 s, 103 MB/s

Now, there is a major caveat for using dd for disk benchmarking. The first is that it only tests filesystem access. Depending on your filesystem (I’m looking at your ZFS) the file write may itself just load into memory for writing later down the road. The same with a RAID controller on the system.

A much more accurate way of performing a disk benchmark is to use tools specifically geared towards this task. It will write much more data over a longer period of time. Bonnie++ is a particularly useful tool for this purpose.

Now don’t forget to remove that test file.

Apache 2.4 Upgrade and the “Invalid Command ‘Order'” Error

The new Apache 2.4 has been released a few weeks ago and I decided to use this version while installing a new server (I compiled it from scratch rather that using an rpm or deb).

After using one of my tried and true Apache configuration files, I received this error on start:

Starting httpd: AH00526: Syntax error on line 104 of /usr/local/apache2/conf/httpd.conf:
Invalid command 'Order', perhaps misspelled or defined by a module not included in the server configuration

Common wisdom would imply that I should make sure the authz_host module is installed (LoadModule authz_host_module modules/mod_authz_host.so), however this just was not working.

Finally, I discovered that the Order command has been removed from Apache 2.4! According to the upgrade notes for Apache 2.4:

In 2.2, access control based on client hostname, IP address, and other characteristics of client requests was done using the directives Order, Allow, Deny, and Satisfy.

In 2.4, such access control is done in the same way as other authorization checks, using the new module mod_authz_host. The old access control idioms should be replaced by the new authentication mechanisms, although for compatibility with old configurations, the new module mod_access_compat is provided.

Basically, the Order command is depreciated.

In my case, I replaced the lines:

Order deny,allow
Deny from all

with:

Require all denied

Also make sure both of these modules are loaded:

LoadModule authz_core_module modules/mod_authz_core.so
LoadModule authz_host_module modules/mod_authz_host.so

Easy enough, but just be aware that there are several configuration changes between 2.2 and 2.4 which render your old Apache configuration files unusable.

The Easy CIDR Cheatsheet

Classless Inter-Domain Routing (henceforth known as CIDR) for years now, I always need a bit up a help remember how many addresses are in each block and how many sub-blocks fit into larger blocks. I have the following printed out for easy reference, and here it is for your geeky enjoyment:
CIDR        Total number    Network             Description:
Notation:   of addresses:   Mask:
--------------------------------------------------------------
/0          4,294,967,296   0.0.0.0             Every Address
/1          2,147,483,648   128.0.0.0           128 /8 nets
/2          1,073,741,824   192.0.0.0           64 /8 nets
/3          536,870,912     224.0.0.0           32 /8 nets
/4          268,435,456     240.0.0.0           16 /8 nets
/5          134,217,728     248.0.0.0           8 /8 nets
/6          67,108,864      252.0.0.0           4 /8 nets
/7          33,554,432      254.0.0.0           2 /8 nets
/8          16,777,214      255.0.0.0           1 /8 net (Class A)
--------------------------------------------------------------
/9          8,388,608       255.128.0.0         128 /16 nets
/10         4,194,304       255.192.0.0         64 /16 nets
/11         2,097,152       255.224.0.0         32 /16 nets
/12         1,048,576       255.240.0.0         16 /16 nets
/13         524,288         255.248.0.0         8 /16 nets
/14         262,144         255.252.0.0         4 /16 nets
/15         131.072         255.254.0.0         2 /16 nets
/16         65,536          255.255.0.0         1 /16 (Class B)
--------------------------------------------------------------
/17         32,768          255.255.128.0       128 /24 nets
/19         16,384          255.255.192.0       64 /24 nets
/19         8,192           255.255.224.0       32 /24 nets
/20         4,096           255.255.240.0       16 /24 nets
/21         2,048           255.255.248.0       8 /24 nets
/22         1,024           255.255.252.0       4 /24 nets
/23         512             255.255.254.0       2 /24 nets
/24         256             255.255.255.0       1 /24 (Class C)
--------------------------------------------------------------
/25         128             255.255.255.128     Half of a /24
/26         64              255.255.255.192     Fourth of a /24
/27         32              255.255.255.224     Eighth of a /24
/28         16              255.255.255.240     1/16th of a /24
/29         8               255.255.255.248     5 Usable addresses
/30         4               255.255.255.252     1 Usable address
/31         2               255.255.255.254     Unusable
/32         1               255.255.255.255     Single host
--------------------------------------------------------------
Reserved Space:
	0.0.0.0/8	
	127.0.0.0/8
	192.0.2.0/24
	10.0.0.0/8
	172.16.0.0/12
	192.168.0.0/16
	169.254.0.0/16
'>

Even though I’ve been working with Classless Inter-Domain Routing (henceforth known as CIDR) for years now, I always need a bit up a help remember how many addresses are in each block and how many sub-blocks fit into larger blocks. I have the following printed out for easy reference, and here it is for your geeky enjoyment:

CIDR        Total number    Network             Description:
Notation:   of addresses:   Mask:
--------------------------------------------------------------
/0          4,294,967,296   0.0.0.0             Every Address
/1          2,147,483,648   128.0.0.0           128 /8 nets
/2          1,073,741,824   192.0.0.0           64 /8 nets
/3          536,870,912     224.0.0.0           32 /8 nets
/4          268,435,456     240.0.0.0           16 /8 nets
/5          134,217,728     248.0.0.0           8 /8 nets
/6          67,108,864      252.0.0.0           4 /8 nets
/7          33,554,432      254.0.0.0           2 /8 nets
/8          16,777,214      255.0.0.0           1 /8 net (Class A)
--------------------------------------------------------------
/9          8,388,608       255.128.0.0         128 /16 nets
/10         4,194,304       255.192.0.0         64 /16 nets
/11         2,097,152       255.224.0.0         32 /16 nets
/12         1,048,576       255.240.0.0         16 /16 nets
/13         524,288         255.248.0.0         8 /16 nets
/14         262,144         255.252.0.0         4 /16 nets
/15         131.072         255.254.0.0         2 /16 nets
/16         65,536          255.255.0.0         1 /16 (Class B)
--------------------------------------------------------------
/17         32,768          255.255.128.0       128 /24 nets
/19         16,384          255.255.192.0       64 /24 nets
/19         8,192           255.255.224.0       32 /24 nets
/20         4,096           255.255.240.0       16 /24 nets
/21         2,048           255.255.248.0       8 /24 nets
/22         1,024           255.255.252.0       4 /24 nets
/23         512             255.255.254.0       2 /24 nets
/24         256             255.255.255.0       1 /24 (Class C)
--------------------------------------------------------------
/25         128             255.255.255.128     Half of a /24
/26         64              255.255.255.192     Fourth of a /24
/27         32              255.255.255.224     Eighth of a /24
/28         16              255.255.255.240     1/16th of a /24
/29         8               255.255.255.248     5 Usable addresses
/30         4               255.255.255.252     1 Usable address
/31         2               255.255.255.254     Unusable
/32         1               255.255.255.255     Single host
--------------------------------------------------------------
Reserved Space:
	0.0.0.0/8	
	127.0.0.0/8
	192.0.2.0/24
	10.0.0.0/8
	172.16.0.0/12
	192.168.0.0/16
	169.254.0.0/16

Of course I’m not the first one to come up with this. Modified based on the cheat sheet from Samat Jain.

Let me know if you have any improvements or suggestions.

Exporting Announcements from WHMCS

Doing some integration work with WHMCS, I found the need to export some of the announcements into WordPress. Since there isn’t any native implementation of this, I found the best way is to export it directly from the database. The PHP code to do this is fairly easy:

[cc lang=”php”]include(“/path/to/whmcs/configuration.php”);
$link = mysql_connect($db_host,$db_username,$db_password);
mysql_select_db($db_name);
$query = “SELECT * FROM tblannouncements WHERE published=’on’ ORDER BY date DESC LIMIT 0,3”;
$result=mysql_query($query);
while($data = mysql_fetch_array($result)) {
$id = $data[“id”];
$date = $data[“date”];
$title = $data[“title”];
$announcement = $data[“announcement”];
echo(“$title“);
}[/cc]

If you wanted to make it more than 3 posts, just change the limit to 5 or 10 or whatever you wish. You can also change the ordering and add additional filters via more SQL statements. If you wanted to do a list, encapulate the code with <ul> and just make them <li> entries.

I’m using this code in a WordPress template but it would work equally as well in any other PHP based application.

What a Resilver Looks Like in ZFS (and a Bug and/or Feature)

At home I have an (admittedly small) ZFS array set up to experiment with this awesome newish RAID technology. I think it has been around long enough that it can now be used in production, but I’m still getting used to the little bugs/features, and here is one that I just found.

After figuring out that I had 2 out of 3 of my 1TB Seagate Barracuda hard drives in the array fail, I had to give the entire array up for a loss and test out my backup strategy. Fortunately it worked and there was no data loss. After receiving the replacement drives in from Seagate, I rebuilt the ZFS array (using raidz again) and went along my merry way. After another 6 months or so, I started getting some funky results from my other drive. Thinking it might have some issue as with the others, I removed the drive and ran Seatools on it (by the way, Seatools doesn’t offer a 64-bit Windows version – what year is this?).

The drive didn’t show any signs of failure, so I decided to wipe it and add it back into the array to see what happens. That, of course, is easier said than done.

One of the problems I ran into is that I am using Ubuntu and fuse to run zfs. Ubuntu has this nasty habit of changing around drive identifiers when USB devices are plugged in. So now when this drive is plugged in, it is on /dev/sde instead of /dev/sdd, which is now a USB attached drive.

No problem, I figure, I’ll offline the bad drive in the zpool and replace it with the new drive location. No such luck.

First I offlined the drive using [cci]zpool offline media /dev/sdd[/cci]:

[cc]dave@cerberus:~$ sudo zpool status
pool: media
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using ‘zpool online’ or replace the device with
‘zpool replace’.
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
media DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
sdd OFFLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0[/cc]

Now that it’s offline, I thought you should be able to detach it. No such luck – since it is a ‘primary’ device of the zpool it does not allow you to remove it.

[cc]dave@cerberus:~$ sudo zpool detach media /dev/sdd
cannot detach /dev/sdd: only applicable to mirror and replacing vdevs[/cc]

What they want you to do is replace the drive with another drive. This drive (the same drive, with all info wiped from it) is now on /dev/sde. I try to replace it:

[cc]dave@cerberus:~$ sudo zpool replace media /dev/sdd /dev/sde
invalid vdev specification
use ‘-f’ to override the following errors:
/dev/sde is part of active pool ‘media’
dave@cerberus:~$ sudo zpool replace -f media /dev/sdd /dev/sde
invalid vdev specification
the following errors must be manually repaired:
/dev/sde is part of active pool ‘media'[/cc]

Even with -f it doesn’t allow the replacement, because the system thinks that the drive is part of another pool.

So basically you are stuck if trying to test a replacement with a drive that already been used in the pool. I’m sure I could replace it with another 1TB disk but what is the point of that?

I ended up resolving the problem by removing the external USB drive, therefore putting the drive back into the original /dev/sdd slot. Without issuing any commands, the system now sees the drive as the old one, and starts resilvering the drive.

[cc]root@cerberus:/home/dave# zpool status
pool: media
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using ‘zpool clear’ or replace the device with ‘zpool replace’.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver in progress for 0h9m, 4.62% done, 3h18m to go
config:

NAME STATE READ WRITE CKSUM
media ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sdd ONLINE 0 0 13 30.2G resilvered
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0[/cc]

It is interesting to see what it looks like from an i/o perspective. The system reads from the two good drives and writes to the new (bad) one. Using [cci]iostat -x[/cci]:

[cc]avg-cpu: %user %nice %system %iowait %steal %idle
29.77 0.00 13.81 32.81 0.00 23.60

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.80 0.00 33.60 0.00 42.00 0.01 15.00 15.00 1.20
sdb 0.00 0.00 625.00 0.00 108033.20 0.00 172.85 0.56 0.90 0.49 30.80
sdc 0.00 0.00 624.20 0.00 107828.40 0.00 172.75 0.50 0.81 0.47 29.60
sdd 0.00 1.20 0.00 504.40 0.00 107729.60 213.58 9.52 18.85 1.98 100.00[/cc]

It seems that ZFS is able to identify a hard drive by GID somehow but doesn’t automatically use it in the pool. This makes it so that you can’t test a drive by removing it, formatting it, and putting it into a new location. Basically, zfs assumes that your drives are always going to be in the same /dev location, which isn’t always true. As soon as you attach a USB drive in Ubuntu things are going to shift around.

After the resilver is complete, the zpool status is:

[cc]root@cerberus:/home/dave# zpool status
pool: media
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using ‘zpool clear’ or replace the device with ‘zpool replace’.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver completed after 0h16m with 0 errors on Sun May 15 07:35:46 2011
config:

NAME STATE READ WRITE CKSUM
media ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sdd ONLINE 0 0 13 50.0G resilvered
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0

errors: No known data errors[/cc]

You can now clear the error with:

[cc]root@cerberus:/home/dave# zpool clear media
root@cerberus:/home/dave#[/cc]

Zpool status now shows no errors:

[cc]root@cerberus:/home/dave# zpool status
pool: media
state: ONLINE
scrub: resilver completed after 0h16m with 0 errors on Sun May 15 07:35:46 2011
config:

NAME STATE READ WRITE CKSUM
media ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sdd ONLINE 0 0 0 50.0G resilvered
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0

errors: No known data errors[/cc]

So now the question I have is this: Are you able to manually update or remove the drive status somewhere in your system? How did zfs know that this drive already had a pool installed on it? I zeroed the drive and verified with fdisk there were no partition on it. Is there a file somewhere on the system that stores this information, or is it written somewhere on the drive?

ZFS is great, but it still has some little issues like this that give me pause before using it in a production system. Then again, I suppose all massive disk array systems have their little quirks!