Linux System Administration

Simple Disk Benchmarking in Linux Using ‘dd’

A great way to do a real-world disk test on your linux system is with a program called dd.

dd stands for data description and is used for copying data sources.

A simple command to do real-world disk write test in linux is:

dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync

This creates a file named ‘test’ with all zeroes in it. The flag conv=fdatasync tells dd to sync the write to disk before it exits. Without this flag, dd will perform the write but some of it will remain in memory, not giving you an accurate picture of the true write performance of the disk.

A sample of the run is below, with a simple SATA disk:

[14:11][root@server:~]$ dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 5.19611 s, 103 MB/s

Now, there is a major caveat for using dd for disk benchmarking. The first is that it only tests filesystem access. Depending on your filesystem (I’m looking at your ZFS) the file write may itself just load into memory for writing later down the road. The same with a RAID controller on the system.

A much more accurate way of performing a disk benchmark is to use tools specifically geared towards this task. It will write much more data over a longer period of time. Bonnie++ is a particularly useful tool for this purpose.

Now don’t forget to remove that test file.

What a Resilver Looks Like in ZFS (and a Bug and/or Feature)

At home I have an (admittedly small) ZFS array set up to experiment with this awesome newish RAID technology. I think it has been around long enough that it can now be used in production, but I’m still getting used to the little bugs/features, and here is one that I just found.

After figuring out that I had 2 out of 3 of my 1TB Seagate Barracuda hard drives in the array fail, I had to give the entire array up for a loss and test out my backup strategy. Fortunately it worked and there was no data loss. After receiving the replacement drives in from Seagate, I rebuilt the ZFS array (using raidz again) and went along my merry way. After another 6 months or so, I started getting some funky results from my other drive. Thinking it might have some issue as with the others, I removed the drive and ran Seatools on it (by the way, Seatools doesn’t offer a 64-bit Windows version – what year is this?).

The drive didn’t show any signs of failure, so I decided to wipe it and add it back into the array to see what happens. That, of course, is easier said than done.

One of the problems I ran into is that I am using Ubuntu and fuse to run zfs. Ubuntu has this nasty habit of changing around drive identifiers when USB devices are plugged in. So now when this drive is plugged in, it is on /dev/sde instead of /dev/sdd, which is now a USB attached drive.

No problem, I figure, I’ll offline the bad drive in the zpool and replace it with the new drive location. No such luck.

First I offlined the drive using zpool offline media /dev/sdd:

dave@cerberus:~$ sudo zpool status
  pool: media
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        media       DEGRADED     0     0     0
          raidz1-0  DEGRADED     0     0     0
            sdd     OFFLINE      0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0

Now that it’s offline, I thought you should be able to detach it. No such luck – since it is a ‘primary’ device of the zpool it does not allow you to remove it.

dave@cerberus:~$ sudo zpool detach media /dev/sdd
cannot detach /dev/sdd: only applicable to mirror and replacing vdevs

What they want you to do is replace the drive with another drive. This drive (the same drive, with all info wiped from it) is now on /dev/sde. I try to replace it:

dave@cerberus:~$ sudo zpool replace media /dev/sdd /dev/sde
invalid vdev specification
use '-f' to override the following errors:
/dev/sde is part of active pool 'media'
dave@cerberus:~$ sudo zpool replace -f media /dev/sdd /dev/sde
invalid vdev specification
the following errors must be manually repaired:
/dev/sde is part of active pool 'media'

Even with -f it doesn’t allow the replacement, because the system thinks that the drive is part of another pool.

So basically you are stuck if trying to test a replacement with a drive that already been used in the pool. I’m sure I could replace it with another 1TB disk but what is the point of that?

I ended up resolving the problem by removing the external USB drive, therefore putting the drive back into the original /dev/sdd slot. Without issuing any commands, the system now sees the drive as the old one, and starts resilvering the drive.

root@cerberus:/home/dave# zpool status
  pool: media
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 0h9m, 4.62% done, 3h18m to go
config:

        NAME        STATE     READ WRITE CKSUM
        media       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdd     ONLINE       0     0    13  30.2G resilvered
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0

It is interesting to see what it looks like from an i/o perspective. The system reads from the two good drives and writes to the new (bad) one. Using iostat -x:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          29.77    0.00   13.81   32.81    0.00   23.60

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.80    0.00    33.60     0.00    42.00     0.01   15.00  15.00   1.20
sdb               0.00     0.00  625.00    0.00 108033.20     0.00   172.85     0.56    0.90   0.49  30.80
sdc               0.00     0.00  624.20    0.00 107828.40     0.00   172.75     0.50    0.81   0.47  29.60
sdd               0.00     1.20    0.00  504.40     0.00 107729.60   213.58     9.52   18.85   1.98 100.00

It seems that ZFS is able to identify a hard drive by GID somehow but doesn’t automatically use it in the pool. This makes it so that you can’t test a drive by removing it, formatting it, and putting it into a new location. Basically, zfs assumes that your drives are always going to be in the same /dev location, which isn’t always true. As soon as you attach a USB drive in Ubuntu things are going to shift around.

After the resilver is complete, the zpool status is:

root@cerberus:/home/dave# zpool status
  pool: media
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed after 0h16m with 0 errors on Sun May 15 07:35:46 2011
config:

        NAME        STATE     READ WRITE CKSUM
        media       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdd     ONLINE       0     0    13  50.0G resilvered
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0

errors: No known data errors

You can now clear the error with:

root@cerberus:/home/dave# zpool clear media
root@cerberus:/home/dave#

Zpool status now shows no errors:

root@cerberus:/home/dave# zpool status
  pool: media
 state: ONLINE
 scrub: resilver completed after 0h16m with 0 errors on Sun May 15 07:35:46 2011
config:

        NAME        STATE     READ WRITE CKSUM
        media       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdd     ONLINE       0     0     0  50.0G resilvered
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0

errors: No known data errors

So now the question I have is this: Are you able to manually update or remove the drive status somewhere in your system? How did zfs know that this drive already had a pool installed on it? I zeroed the drive and verified with fdisk there were no partition on it. Is there a file somewhere on the system that stores this information, or is it written somewhere on the drive?

ZFS is great, but it still has some little issues like this that give me pause before using it in a production system. Then again, I suppose all massive disk array systems have their little quirks!

Disabling The hald-addon-storage Service On CentOS/RedHat

The haldHardware Access Layer Daemon – runs several processes in order to keep track of what hardware is installed on your system. This includes polling USB Drives and ‘hot-swap’ devices to check for changes along with a host of other tasks.

You might see it running on your system as follows:

2474 ? S 0:00 \_ hald-runner
2481 ? S 0:00 \_ hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
2487 ? S 0:00 \_ hald-addon-keyboard: listening on /dev/input/event0
2495 ? S 41:47 \_ hald-addon-storage: polling /dev/hdc

If your system is static and the devices do not change, you can actually disable this service using a policy entry.

Create a file in your policy directory, for example /etc/hal/fdi/policy/99-custom.fdi. Add the text:




hald-addon-storage


Save and reload the hald using /etc/init.d/haldaemon restart.

And you will find that service no longer is polling your hardware.

Of course to turn it back on, remove that policy entry and restart the haldaemon again, it will be back in service.

Solution Credit: Linuxforums User cn77

Adding Random Quotes to the Bash Login Screen

According to “official” system administrator rules and guidelines you shouldn’t be adding so-called vain scripts to the login prompt – only utilities that will add something useful to the system (for example, current system load, memory and disk usage, etc). However I have some systems that I frequently connect to and thought it would be neat to add a random quote script to my bash login. That being said, this should only be done on ‘non-production’ systems and adds a security vector so please be careful where you use this.

The goal of this is to add a little quote, at random, every time you log into your system. My thoughts were to do it not only as a little source of inspiration but also to add perspective to what I’m doing sitting in front of the computer all of the time.

Originally I was going to try to write the script solely in bash since it is so flexible (and just as a proof of concept) but dealing with RSS in bash isn’t exactly pretty and I just wanted to get this together as quick as possible. PHP makes parsing XML easy, there are a number of ways to accomplish it. I chose to use the ready-made script at rssphp.net to do this, if you are curious about how you can handle this yourself using SimpleXML check out this tutorial over at Pixel2Life. The end result of my solution is a bash script calling a php script to grab the quote.

The Code

First create a file named /etc/update-motd.d/10-quote. The name does not matter much – the number will decide what order the script is called in of all the scripts in /etc/update-motd.d. Do an ls on that directory to see what all is being called when you log in. Add the following lines to this file, assuming you are placing your scripts in /etc/scripts/:

#!/bin/sh
echo ""
/usr/bin/php /etc/scripts/getquote.php
echo ""

Download v1 of rssphp and extract it to the /etc/scripts/ directory. We will require that file in our php code.

Create the file /etc/scripts/getquote.php and add the following:

load('http://www.quotedb.com/quote/quote.php?action=random_quote_rss');
$rssitems = $rss->getItems();

if ($rssitems) {
// print_r($rssitems);
echo $rssitems[0]['description'].' :: '.$rssitems[0]['title']."\n";
}

?>

I am using the RSS source from QuoteDB as the source of my quotes. Of all the places I checked (and I checked a lot) they seemed to have the most appropriate ones for this use. Feel free to use any source you wish – as long as the XML fields title/description hold the quote you will be able to use it. The RSS url was not obvious from the site and I had to do some digging to find it, in the end I am using http://www.quotedb.com/quote/quote.php?action=random_quote_rss.

We also add the if statement to allow it to degrade nicely in case you have no network connectivity to the server. After a short period – a second or two – it will time out and let you log in.

The end result is a pretty quote in our motd:

Linux vps01.[redacted].com 2.6.18-2-pve #1 SMP Mon Feb 1 10:45:26 CET 2010 x86_64 GNU/Linux
Ubuntu 10.04.1 LTS

"The absence of alternatives clears the mind marvelously." :: Henry Kissinger

root@vps01:~#

It should be pretty strait forward; let me know if you run into any problems!

Fixing ip_conntrack Bottlenecks: The Tale Of The DNS Server With Many Tiny Connections

Server management is a funny thing. No matter how long you have been doing it, new interesting and unique challenges continue to pop up keeping you on your toes. This is a story about one of those challenges.

I manage a server which has a sole purpose: serving DNS requests. We use PowerDNS, which has been great. It is a DNS server whose backend is SQL, making administration of large numbers of records very easy. It is also fast, easy to use, open source and did I mention it is free?

The server has been humming along for years now. The traffic graphs don’t show a lot of data moving through it because it only serves DNS requests (plus MySQL replication) in the form of tiny UDP packets.

We started seeing these spikes in traffic but everything on the server seemed to be working properly. Test connections with dig proved that the server was accurately responding to requests, but external tests showed the server going up and down.

The First Clue

I started going through logs to see if we were being DoSed or if it was some sort of configuration problem. Everything seemed to be running properly and the requests, while voluminous, seemed to be legit. Within the flood of messages I spied error messages such as this:

printk: 2758 messages suppressed.
ip_conntrack: table full, dropping packet.

Ah ha! A clue! Let’s check the current numbers of ip_conntrack, which is a kernel function for the firewall which keeps tabs on packets heading into the system.

[root@ns1 log]# head /proc/slabinfo
slabinfo - version: 2.0
# name : tunables : slabdata
ip_conntrack_expect 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 0 0
ip_conntrack 34543 34576 384 10 1 : tunables 54 27 8 : slabdata 1612 1612 108
fib6_nodes 5 119 32 119 1 : tunables 120 60 8 : slabdata 1 1 0
ip6_dst_cache 4 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0
ndisc_cache 1 20 192 20 1 : tunables 120 60 8 : slabdata 1 1 0
rawv6_sock 4 11 704 11 2 : tunables 54 27 8 : slabdata 1 1 0
udpv6_sock 0 0 704 11 2 : tunables 54 27 8 : slabdata 0 0 0
tcpv6_sock 8 12 1216 3 1 : tunables 24 12 8 : slabdata 4 4 0

Continuing this line of logic, lets check our current value for this setting:

[root@ns1 log]# sysctl net.ipv4.netfilter.ip_conntrack_max
net.ipv4.netfilter.ip_conntrack_max = 34576

So it looks like we are hitting up against this limit. After the number of connections reaches this number, the kernel will simply drop the packet. It does this so that it will not overload and freeze up due to too many packets coming into it at once.

This system is running on CentOS 4.8, and since then newer versions of RHEL5 have the default set at 65536. For maximum efficiency we keep this number at multiples of 2. The top size depends on your memory, so just be careful as overloading it may cause you to run out of it.

Fixing The ip_conntrack Bottleneck

In my case I decided to go up 2 steps to 131072. To temporarily set it, use sysctl:

[root@ns1 log]# sysctl -w net.ipv4.netfilter.ip_conntrack_max=131072
net.ipv4.netfilter.ip_conntrack_max = 131072

Test everything out, if you have some problems with your network or system crashing, a reboot will set the value back to normal. To make the setting permanent on reboot, add the following line to your [cc inline=”1″]/etc/sysctl.conf

file:

# need to increase this due to volume of connections to the server
net.ipv4.netfilter.ip_conntrack_max=131072

My theory is that since the server was dropping packets, remote hosts were re-sending their DNS requests causing a ‘flood’ of traffic to the server and the spikes you see in the traffic graph above whenever traffic was mildly elevated. The bandwidth spikes were caused by amplification of traffic due to resending of the requests. After increasing ip_conntrack_max I immediately saw the bandwidth resume to normal levels.

Your server should now be set against an onslaught of tiny packets, legitimate or not. If you have even more connections than what you can safely track with ip_conntrack you may need to move to the next level which involves hardware firewalls and other methods for packet inspection off-server on dedicated hardware.

Some resources used in my investigation of this problem:
[1] http://wiki.khnet.info/index.php/Conntrack_tuning
[2] http://serverfault.com/questions/111034/increasing-ip-conntrack-max-safely
[3] http://www.linuxquestions.org/questions/red-hat-31/ip_conntrack-table-full-dropping-packet-615436/

The image of the kittens used for the featured image has nothing to do with this post. There are no known good photos of a “UDP Packet”, and I thought that everyone likes kittens, so there it is. Credit flickr user mathias-erhart.

Another Bash One Liner To Delete Old Directories

We received a tip from blog readers Christian and Michael for alternatives to the command to delete all directories older than a certain period of time. These both work in bash and can be used in scripts to clean up old backup directories or any situation where you need to delete old directories from the command line.

From Christian:

find /home/backup/ -maxdepth 1 -type d -mtime +7 -exec rm -r {} \;

From Michael:

find /home/backup/ -maxdepth 1 -type d -mtime +7 -exec echo “Removing Directory => {}” \; -exec rm -rf “{}” \;

The first one works quietly, while the second one will display what is being deleted. These are probably faster than putting it into a for loop, so feel free to use whatever works best in your particular situation!

One Line Batch Rename Files Using CSV Input File and awk

The Bash command environment, which is the namesake of this blog, is very flexible in that it allows you to manipulate the filesystem in many ways. Awk and sed are very powerful tools that allow you to do this rename with a simple one line command. This post will walk you through doing this with a Comma Separated Value (CSV) file and also using a simple regular expression to rename many files.
Read more