Tips & Tricks for Technologists & System Administrators | About & Contact

Simple Sysadmin Trick: Using tcpdump To Sniff Web Server Traffic

Posted 2nd October in Linux, Shell, System Administration. 1 Comment

Sometimes, you just have to look into the raw data to see what your web server is doing. The logs might not show you enough detail or you suspect something is going on which is just not shown in the log files. Or, as in my case, logging is turned off because of too much activity.

The excellent tcpdump utiliy comes to the rescue here. I recommend you get more familiar with the tcpdump man page. Here is the command you can use, in a nutshell:

tcpdump -nl -w - -i eth1 -c 500 port 80|strings

or alternatively with just tcpdump (Thanks Chris!):

tcpdump -nl -s 0 -A -i eth1 -c 500 port 80

Your command line will print out all traffic exiting your server from port 80, headers and all. Lets look at the options in more detail.

  • -n: Don’t convert addresses (i.e., host addresses, port numbers, etc.) to names.
  • -l: Make stdout line buffered. Useful if you want to see the data while capturing it.
  • -w: Write the raw packets to file rather than parsing and printing them out. (Sent to stdout)
  • -i: Interface you want to sniff on, usually eth0 or eth1, but depends on your system.
  • -c: Number of packets to capture
  • port: port 80, duh :)
  • -A: Print each packet (minus its link level header) in ASCII.
  • -s: size

Now, depending on your web server configuration, you will probably have gzipped content which comes out as garbled characters. To strip all that out, just pipe it through strings.

The output will look something like this:

GET /about/comment-page-1 HTTP/1.0
Accept: image/gif, image/jpeg, image/pjpeg, application/x-ms-application, application/, application/xaml+xml, application/x-ms-xbap, application/, application/, application/msword, application/x-shockwave-flash, */*
User-Agent: Mozilla/2.0 (compatible; MSIE 3.02; Windows CE; 240x320)
Cookie: comment_author_xxx=sakyjartory; comment_author_email_26e707905b5fd6e7139333eb1dab208f=olfaexxxx; comment_author_url_26e707905b5fd6e7139333eb1dab208xxx
HTTP/1.1 200 OK
Date: Wed, 03 Oct 2012 01:49:19 GMT
Server: Apache/2
X-Powered-By: PHP/5.2.17
Vary: Accept-Encoding,User-Agent
Connection: close
Content-Type: text/html; charset=UTF-8
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "">
<html xmlns="" >
<head profile="">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>About This Site | Quitting Soda</title>

Let me know if you have any suggestions for improving this simple command line program for sniffing your web server traffic! Netcat is an essential tool that every linux administrator should get to know better.

How We Defeated a Proxy Jacker (Google Web Spam Syndrome)

Posted 31st August in Google, JavaScript, Other Technology. 2 Comments

A few months ago, we had an interesting issue with another website stealing our content in an unusual way. Essentially they ran a proxy service on a similar domain (a instead of .com) and then replicated our site, replacing our ads with their own. They also ran our content through a content replacement algorithm, removing certain pieces of text and also replacing our own domain with theirs. It is fairly easy to do.

We had a few concerns about this. Firstly, we were worried that unsuspecting users would enter their login information on this site. Also, they actually were showing up in Google search engine results, which both hurts our brand, ranking and perception by users (this site loaded much slower than ours obviously).

First Attempt To Stop The Proxy Spammers: IP Blocking

Determined to put a stop to it, the first method we tried was blocking their crawler IPs. However, they owned enough IPs in enough ranges that it was simply a cat and mouse game. We would block them, and then a day later they would be back up. After a few weeks of this it didn’t seem to be a viable long-term solution for blocking these spammers.

Next, We Used Their Mirroring Of Our Content Against Them

Next I thought I would use the fact that they were modifying our page content and disable their site that way.

I created a box in html/css which would trigger via JavaScript only when loading from their domain. But, you wouldn’t think it would be that easy would you? They replace our domain with theirs, modifying any mention of their own domain. So I used a quick hash function to create a unique identifier from the loaded domain and then matched against that.

Also, figuring they would just take out the HTML box I created, I also found it useful to display a hex-encoded version of it. You can encode text to the hex-escaped version for Javascript with the following command line:

echo -n "some text" | od -A n -t x1 |sed 's/ /\\x/g'

The final Javascript I started loading on our site (and therefore their site) is below:

<script type="text/javascript">
jQuery(document).ready(function() {
  // From
  String.prototype.hashCode = function(){
    var hash = 0;
    if (this.length == 0) return hash;
    for (i = 0; i < this.length; i++) {
        char = this.charCodeAt(i);
        hash = ((hash<<5)-hash)+char;
        hash = hash & hash; // Convert to 32bit integer
    return hash;
  // sets hash to current domain
  domainhash = document.domain.hashCode();
  // lists domains they are loading the site from (calculate hash of the attacker domain first)
  if (domainhash == '-1289333690' || domainhash == '208666227') {
    var overlay_orig = jQuery('<div style="position: fixed;top: 0;left: 0;width: 100%;height: 100%;background-color: #000;filter:alpha(opacity=50);-moz-opacity:0.5;-khtml-opacity: 0.5;opacity: 0.5;z-index: 999;text-align:middle;"></div><div style="position: fixed; top: 0px; width: 100%; z-index: 10000;"><div style="z-index: 10000; width: 400px; padding: 30px; margin: 200px auto; background-color: white; border: 1px solid black;color: black;"><h1 style="color:red">Warning: This Is A Scam Site</h1><p>Sorry for the interruption, but the site you are currently visiting is not the real one. This site scrapes our content and injects their own ads to make money./div></div>');
    // Hex encoded version of above, used to defeat content replacement
    var overlay = jQuery("\x3c\x64\x69\x76\x20\x73\x74\x79\x6c\x65\x3d\x22\x70\x6f\x73\x69\x74\x69\x6f\x6e\x3a\x20\x66\x69\x78\x65\x64\x3b\x74\x6f\x70\x3a\x20\x30\x3b\x6c\x65\x66\x74\x3a\x20\x30\x3b\x77\x69\x64\x74\x68\x3a\x20\x31\x30\x30\x25\x3b\x68\x65\x69\x67\x68\x74\x3a\x20\x31\x30\x30\x25\x3b\x62\x61\x63\x6b\x67\x72\x6f\x75\x6e\x64\x2d\x63\x6f\x6c\x6f\x72\x3a\x20\x23\x30\x30\x30\x3b\x66\x69\x6c\x74\x65\x72\x3a\x61\x6c\x70\x68\x61\x28\x6f\x70\x61\x63\x69\x74\x79\x3d\x35\x30\x29\x3b\x2d\x6d\x6f\x7a\x2d\x6f\x70\x61\x63\x69\x74\x79\x3a\x30\x2e\x35\x3b\x2d\x6b\x68\x74\x6d\x6c\x2d\x6f\x70\x61\x63\x69\x74\x79\x3a\x20\x30\x2e\x35\x3b\x6f\x70\x61\x63\x69\x74\x79\x3a\x20\x30\x2e\x35\x3b\x7a\x2d\x69\x6e\x64\x65\x78\x3a\x20\x39\x39\x39\x3b\x74\x65\x78\x74\x2d\x61\x6c\x69\x67\x6e\x3a\x6d\x69\x64\x64\x6c\x65\x3b\x22\x3e\x3c\x2f\x64\x69\x76\x3e\x3c\x64\x69\x76\x20\x73\x74\x79\x6c\x65\x3d\x22\x70\x6f\x73\x69\x74\x69\x6f\x6e\x3a\x20\x66\x69\x78\x65\x64\x3b\x20\x74\x6f\x70\x3a\x20\x30\x70\x78\x3b\x20\x77\x69\x64\x74\x68\x3a\x20\x31\x30\x30\x25\x3b\x20\x7a\x2d\x69\x6e\x64\x65\x78\x3a\x20\x31\x30\x30\x30\x30\x3b\x22\x3e\x3c\x64\x69\x76\x20\x73\x74\x79\x6c\x65\x3d\x22\x7a\x2d\x69\x6e\x64\x65\x78\x3a\x20\x31\x30\x30\x30\x30\x3b\x20\x77\x69\x64\x74\x68\x3a\x20\x34\x30\x30\x70\x78\x3b\x20\x70\x61\x64\x64\x69\x6e\x67\x3a\x20\x33\x30\x70\x78\x3b\x20\x6d\x61\x72\x67\x69\x6e\x3a\x20\x32\x30\x30\x70\x78\x20\x61\x75\x74\x6f\x3b\x20\x62\x61\x63\x6b\x67\x72\x6f\x75\x6e\x64\x2d\x63\x6f\x6c\x6f\x72\x3a\x20\x77\x68\x69\x74\x65\x3b\x20\x62\x6f\x72\x64\x65\x72\x3a\x20\x31\x70\x78\x20\x73\x6f\x6c\x69\x64\x20\x62\x6c\x61\x63\x6b\x3b\x63\x6f\x6c\x6f\x72\x3a\x20\x62\x6c\x61\x63\x6b\x3b\x22\x3e\x3c\x68\x31\x20\x73\x74\x79\x6c\x65\x3d\x22\x63\x6f\x6c\x6f\x72\x3a\x72\x65\x64\x22\x3e\x57\x61\x72\x6e\x69\x6e\x67\x3a\x20\x54\x68\x69\x73\x20\x49\x73\x20\x41\x20\x53\x63\x61\x6d\x20\x53\x69\x74\x65\x3c\x2f\x68\x31\x3e\x3c\x70\x3e\x53\x6f\x72\x72\x79\x20\x66\x6f\x72\x20\x74\x68\x65\x20\x69\x6e\x74\x65\x72\x72\x75\x70\x74\x69\x6f\x6e\x2c\x20\x62\x75\x74\x20\x74\x68\x65\x20\x73\x69\x74\x65\x20\x79\x6f\x75\x20\x61\x72\x65\x20\x63\x75\x72\x72\x65\x6e\x74\x6c\x79\x20\x76\x69\x73\x69\x74\x69\x6e\x67\x20\x69\x73\x20\x6e\x6f\x74\x20\x74\x68\x65\x20\x72\x65\x61\x6c\x20\x6f\x6e\x65\x2e\x20\x54\x68\x69\x73\x20\x73\x69\x74\x65\x20\x73\x63\x72\x61\x70\x65\x73\x20\x6f\x75\x72\x20\x63\x6f\x6e\x74\x65\x6e\x74\x20\x61\x6e\x64\x20\x69\x6e\x6a\x65\x63\x74\x73\x20\x74\x68\x65\x69\x72\x20\x6f\x77\x6e\x20\x61\x64\x73\x20\x74\x6f\x20\x6d\x61\x6b\x65\x20\x6d\x6f\x6e\x65\x79\x2e\x2f\x64\x69\x76\x3e\x3c\x2f\x64\x69\x76\x3e

After thoroughly testing and then putting up the code, we launched it live. It was very satisfying knowing that end-users were getting the warning message when visiting their site, and the actually reported their experience back to us. After about 12 hours, they figured out our play out and turned off their proxy. Success!

Not quite. The next morning I woke up to see their site back up. Looking into their code, they had removed the JS completely. I changed our code, embedded it in a JS file, and used other creative means to get it back up – but in the end they were just able to disable the Javascript entirely and defeat this attack. It was just another cat-and-mouse game.

We also tried DMCA/Abuse Contacts

We also tried to send notices to their hosting company to get them to take down the site. They were hosted in Argentina, so DMCA is not applicable. Their abuse contacts were non-responsive.

Originally we thought we could go after their domain registration since it violated our trademark – but this is a long and involved process involving lots of paperwork and time, and for another $10 they could just register another domain name. We didn’t think this was a viable option.

Going to the Source: Google

Google is involved in this scam in a number of ways. First, they were indexing and serving his site in their search results. Secondly the scammers were replacing our ads with their own Google AdSense ads, which I am sure is a ToS violation.

While we were attempting to defeat their site from a technical perspective, we also began looking at Google to see what we could do from there.

Google has a process for submitting DMCA requests. The issue in this case is that they make you submit one batch at a time, and with millions of pages on our site and indexed in Google, it just doesn’t make sense to list out urls line by line. It worked to remove those submitted results from search engine results, but it was like cutting grass with scissors. Finally, I attempted to contact Matt Cutts via Twitter:

Thanks. So much for that venue. I know @mattcutts is the public face of Web Spam at Google so I’m sure that he gets lots of @’s with dumb questions, but we were already way beyond this.

Finally, What Worked

Members of our site had a few personal contacts at Google both in an AdSense representative and otherwise. One of our contacts with Google was able to bring this issue up with the right people and they finally took the offending domain out of the SERPs permanently. We also reported their AdSense account – but we don’t know what happened with that. Without them showing up in SERPs, it was a moot point because they won’t be getting many visitors any more.

The Problem With Google

Google has become so large that it is almost impossible to get a situation like this taken care of without knowing someone who works for them. They have many automated systems in place, but scammers will continue to utilize loopholes for their own profit. Google enables this type of scam, yet they also profit from it.

I wish Google would have some sort of Ombudsman or review system set up so that someone like us, who is having our content ripped off by others using Google’s own tools (and with Google taking a percentage of profit from these people), has a way to efficiently deal with them without resorting to personal contacts. We spent much time on this, time that could have been put to better use.

Or maybe personal contacts are the only real way to deal with a situation like this?

Anyway – I am welcoming comments and any other ideas for dealing with these Proxy Hijackers and how to keep them offline. I’m curious how widespread this type of incident is, we know of only one other site that was having the same issue from the same scammer.

After all, they can always get another domain for $10.

Evaluating FTP Servers: ProFTPd vs PureFTPd vs vsftpd

Usually, I will try to push clients towards using SCP (via a client such as WinSCP), however inevitably there are clients who do not understand this new method of accessing their files securely online, and who for one reason or another insist on using FTP for their online file access. As they say – the customer is always right?

Anyway, there are currently 3 mainstream FTP servers available via the yum command on CentOS 6.x. PureFTPd, ProFTPd and vsftpd. So which FTP server is the best? I will summarize the servers below, or skip to the summary.


ProFTPd is a modular FTP server which has been around for a long time. The large control panels (cPanel, DirectAdmin) all support ProFTPd and have for years.

The most feature rich of the bunch is certainly ProFTPd. There are a ton of plugins available for it, and the creator of it modeled its configuration architecture much like Apache’s – it is also using the GPL for licensing.

Configuration of ProFTPd is fairly straight forward, and example configuration files abound at a quick search of Google.

ProFTPd is available on a wide variety of system architectures and operating systems.

ProFTPd Security

Of the bunch, ProFTPd has the most CVE vulnerabilities listed. The high number is most likely an indicator of ProFTPd’s widespread use which makes it a target of hackers.

ProFTPd CVE Entries: 40
Shodan ProFTPd entries: 127


PureFTPd‘s mantra is ‘Security First.’ This is evident in the low number of CVE entries (see below).

Licensed under the BSD license, PureFTPd is also available on a wide-range of operating systems (but not Windows).

Configuration of PureFTPd is simple, with a no-configuration file option. Although not as widely used as ProFTPd, PureFTPd has many configuration examples listed online.

PureFTPd Security

PureFTPd’s “Security First” mantra puts it at the lead in the security department with the fewest security vulnerabilities.

PureFTPd CVE Entries: 4
Shodan Pure-FTPd Entries: 12


vsftpd is another GPL-licensed FTP server, which stands for “Very Security FTP daemon.” It is a lighweight FTP server built with security in mind.

Its lightweight nature allows it to scale very efficiently, and many large sites (,, currently utilize vsftpd as their FTP server of choice.

vsftpd Security

vsftpd has a lower number of vulnerabilities listed in CVE than ProFTPd but more than PureFTPd. This could be because, since its name implies it is a secure FTP service, or because it is so widely used on large sites – that it is under more scrutiny than the others.

vsftpd CVE Entries: 12
Shodan vsftpd entries: 41

Summary & FTP Server Recommendations


Considering the evaluations above, any server would work in a situation, however generally speaking:

  • If you want a server with the most flexible configuration options and external modules: ProFTPd
  • If you have just a few users and want a simple, secure FTP server: PureFTPd
  • If you want to run a FTP server at scale with many users: vsftpd

Of course, everyone’s requirements are different so make sure you evaluate the options according to your own needs.

Disagree with my assessment? Let me know why!

Simple Disk Benchmarking in Linux Using ‘dd’

Posted 21st March in Linux, Shell, System Administration. 3 Comments

A great way to do a real-world disk test on your linux system is with a program called dd.

dd stands for data description and is used for copying data sources.

A simple command to do real-world disk write test in linux is:

dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync

This creates a file named ‘test’ with all zeroes in it. The flag conv=fdatasync tells dd to sync the write to disk before it exits. Without this flag, dd will perform the write but some of it will remain in memory, not giving you an accurate picture of the true write performance of the disk.

A sample of the run is below, with a simple SATA disk:

[14:11][root@server:~]$ dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 5.19611 s, 103 MB/s

Now, there is a major caveat for using dd for disk benchmarking. The first is that it only tests filesystem access. Depending on your filesystem (I’m looking at your ZFS) the file write may itself just load into memory for writing later down the road. The same with a RAID controller on the system.

A much more accurate way of performing a disk benchmark is to use tools specifically geared towards this task. It will write much more data over a longer period of time. Bonnie++ is a particularly useful tool for this purpose.

Now don’t forget to remove that test file.

Apache 2.4 Upgrade and the “Invalid Command ‘Order'” Error

Posted 6th March in Configurations. 2 Comments

The new Apache 2.4 has been released a few weeks ago and I decided to use this version while installing a new server (I compiled it from scratch rather that using an rpm or deb).

After using one of my tried and true Apache configuration files, I received this error on start:

Starting httpd: AH00526: Syntax error on line 104 of /usr/local/apache2/conf/httpd.conf:
Invalid command 'Order', perhaps misspelled or defined by a module not included in the server configuration

Common wisdom would imply that I should make sure the authz_host module is installed (LoadModule authz_host_module modules/, however this just was not working.

Finally, I discovered that the Order command has been removed from Apache 2.4! According to the upgrade notes for Apache 2.4:

In 2.2, access control based on client hostname, IP address, and other characteristics of client requests was done using the directives Order, Allow, Deny, and Satisfy.

In 2.4, such access control is done in the same way as other authorization checks, using the new module mod_authz_host. The old access control idioms should be replaced by the new authentication mechanisms, although for compatibility with old configurations, the new module mod_access_compat is provided.

Basically, the Order command is depreciated.

In my case, I replaced the lines:

Order deny,allow
Deny from all


Require all denied

Also make sure both of these modules are loaded:

LoadModule authz_core_module modules/
LoadModule authz_host_module modules/

Easy enough, but just be aware that there are several configuration changes between 2.2 and 2.4 which render your old Apache configuration files unusable.

Big Cable Wants to Encrypt Your Basic Channels – How To Fight For Your Rights

Posted 23rd February in Other Technology, Television. 1 Comment

For those not familiar with the current state of digital television, cable providers send signals to your house in a format called QAM. This comes in two flavors, Encrypted and Unencrypted formats. Encryption is used to protect channel content from general viewership so that cable operators can sell these packages and/or individual channels based on a decryption device at the home.

The FCC currently has a ban in place on the encryption of the “Basic” level of cable service. This includes such channels as ABC, NBC, Fox, PBS, etc. It is a good way for low-income or budget conscious consumer to buy the basic level of service if they can not receive these channels over the air, possibly because of interference or distance to broadcasting towers. You do not currently need to rent a cable box (aka decryption device) to see these channels, ensuring broad access to these channels which often perform public service functions such as notification in case of emergency, carrying signals from the emergency broadcast system, or also community television stations.

This is a good thing. It allows citizens to purchase a very cheap (I paid $12.99/mo) cable plan to receive these essential stations without the purchase of any decryption boxes from the cable company (which they force you to rent from them, I might add).

The Cable Companies Want Your Money

Photo credit zeusandhera

The cable companies are, and have been, lobbying the FCC to remove this encryption ban. The FCC has a proposed plan of rule making for removing this ban on encryption. The big cable providers are lobbying for this ban removal on a couple of major points:

  • Encrypting all channels will allow them to remain “hot” at the consumer end,
  • It will reduce service technician calls because of the above,
  • It will reduce or eliminate cable theft.

What they do not mention is that:

  • You will need to buy a cable adapter (either box-top or Cablecard) for each TV in your home.

The cable companies are profit driven (this is OK) and, while I admit this may reduce service calls to your home, they also have other methods of being able to deliver this service “hot” including IP-based television distribution. This is simply a cheaper way for them to use their existing technologies to maximize their profits, while maintaining a facade that they are doing this for consumers’ benefit.

Based on my own television bill and the number of televisions that an average consumer has in their home (3), I estimate that an existing basic television consumer would receive a 92% increase to their cable bill for the same exact content. (Source)

Fighting The Good Fight

Several “new” television technology companies, such as Boxee and Hauppauge Computer Works (makers of Unencrypted QAM tuner cards) are fighting this rule change, along with the EFF, to allow consumers to continue to receive these basic television channels in the unencrypted format. While Boxee and Hauppage have their own profit motives, they are actively working to promote new technologies to bring a variety of TV content into your home.

Boxee and the National Cable & Telecommunications Association have been going tête à tête over the issue. Boxee is obviously making a play where its Boxee Box uses Unencrypted QAM to receive television stations while the NCTA represents the cable companies who want you to continue to pay your cable providers.

How to Weigh In

I urge all consumers of Cable TV to weigh in on the issue, but especially if you utilize Unencrypted QAM format to watch broadcast TV. The Cable Industry says that the number of consumers who use unencrypted QAM is negligible, so we need to show the FCC that we are in fact, not a negligible party to deal with.

The FCC is accepting comments on this proposed change to the rules.

Here is a listing of all comments on this proposed elimination of the ban on the encryption of the basic tier of service from the FCC.

The best way is to write a short letter to the FCC. Here is the letter that I wrote.

Then proceed to the FCC’s page for submitting a filing. For the proceeding number, use 11-169. Type in your information and attach the letter you wrote (I recommend sending as a PDF). After submitting, the FCC reviews the submission and places it on their website.

Together, we can fight the cable companies

The only way they know that we are not happy with this proposed rule change is by commenting on this to the organization who makes the rules, the FCC.

Please send them your thoughts! It should only take about 15 minutes of your time and you will feel great about participating in the rule making process.

The Easy CIDR Cheatsheet

Posted 9th February in Configurations, Other Code, System Administration. Comments Off

Even though I’ve been working with Classless Inter-Domain Routing (henceforth known as CIDR) for years now, I always need a bit up a help remember how many addresses are in each block and how many sub-blocks fit into larger blocks. I have the following printed out for easy reference, and here it is for your geeky enjoyment:

CIDR        Total number    Network             Description:
Notation:   of addresses:   Mask:
/0          4,294,967,296             Every Address
/1          2,147,483,648           128 /8 nets
/2          1,073,741,824           64 /8 nets
/3          536,870,912           32 /8 nets
/4          268,435,456           16 /8 nets
/5          134,217,728           8 /8 nets
/6          67,108,864           4 /8 nets
/7          33,554,432           2 /8 nets
/8          16,777,214           1 /8 net (Class A)
/9          8,388,608         128 /16 nets
/10         4,194,304         64 /16 nets
/11         2,097,152         32 /16 nets
/12         1,048,576         16 /16 nets
/13         524,288         8 /16 nets
/14         262,144         4 /16 nets
/15         131.072         2 /16 nets
/16         65,536         1 /16 (Class B)
/17         32,768       128 /24 nets
/19         16,384       64 /24 nets
/19         8,192        32 /24 nets
/20         4,096        16 /24 nets
/21         2,048        8 /24 nets
/22         1,024        4 /24 nets
/23         512          2 /24 nets
/24         256          1 /24 (Class C)
/25         128        Half of a /24
/26         64         Fourth of a /24
/27         32         Eighth of a /24
/28         16         1/16th of a /24
/29         8          5 Usable addresses
/30         4          1 Usable address
/31         2          Unusable
/32         1          Single host
Reserved Space:

Of course I’m not the first one to come up with this. Modified based on the cheat sheet from Samat Jain.

Let me know if you have any improvements or suggestions.