What a Resilver Looks Like in ZFS (and a Bug and/or Feature)

At home I have an (admittedly small) ZFS array set up to experiment with this awesome newish RAID technology. I think it has been around long enough that it can now be used in production, but I’m still getting used to the little bugs/features, and here is one that I just found.

After figuring out that I had 2 out of 3 of my 1TB Seagate Barracuda hard drives in the array fail, I had to give the entire array up for a loss and test out my backup strategy. Fortunately it worked and there was no data loss. After receiving the replacement drives in from Seagate, I rebuilt the ZFS array (using raidz again) and went along my merry way. After another 6 months or so, I started getting some funky results from my other drive. Thinking it might have some issue as with the others, I removed the drive and ran Seatools on it (by the way, Seatools doesn’t offer a 64-bit Windows version – what year is this?).

The drive didn’t show any signs of failure, so I decided to wipe it and add it back into the array to see what happens. That, of course, is easier said than done.

One of the problems I ran into is that I am using Ubuntu and fuse to run zfs. Ubuntu has this nasty habit of changing around drive identifiers when USB devices are plugged in. So now when this drive is plugged in, it is on /dev/sde instead of /dev/sdd, which is now a USB attached drive.

No problem, I figure, I’ll offline the bad drive in the zpool and replace it with the new drive location. No such luck.

First I offlined the drive using zpool offline media /dev/sdd:

dave@cerberus:~$ sudo zpool status
  pool: media
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        media       DEGRADED     0     0     0
          raidz1-0  DEGRADED     0     0     0
            sdd     OFFLINE      0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0

Now that it’s offline, I thought you should be able to detach it. No such luck – since it is a ‘primary’ device of the zpool it does not allow you to remove it.

dave@cerberus:~$ sudo zpool detach media /dev/sdd
cannot detach /dev/sdd: only applicable to mirror and replacing vdevs

What they want you to do is replace the drive with another drive. This drive (the same drive, with all info wiped from it) is now on /dev/sde. I try to replace it:

dave@cerberus:~$ sudo zpool replace media /dev/sdd /dev/sde
invalid vdev specification
use '-f' to override the following errors:
/dev/sde is part of active pool 'media'
dave@cerberus:~$ sudo zpool replace -f media /dev/sdd /dev/sde
invalid vdev specification
the following errors must be manually repaired:
/dev/sde is part of active pool 'media'

Even with -f it doesn’t allow the replacement, because the system thinks that the drive is part of another pool.

So basically you are stuck if trying to test a replacement with a drive that already been used in the pool. I’m sure I could replace it with another 1TB disk but what is the point of that?

I ended up resolving the problem by removing the external USB drive, therefore putting the drive back into the original /dev/sdd slot. Without issuing any commands, the system now sees the drive as the old one, and starts resilvering the drive.

root@cerberus:/home/dave# zpool status
  pool: media
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 0h9m, 4.62% done, 3h18m to go
config:

        NAME        STATE     READ WRITE CKSUM
        media       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdd     ONLINE       0     0    13  30.2G resilvered
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0

It is interesting to see what it looks like from an i/o perspective. The system reads from the two good drives and writes to the new (bad) one. Using iostat -x:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          29.77    0.00   13.81   32.81    0.00   23.60

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.80    0.00    33.60     0.00    42.00     0.01   15.00  15.00   1.20
sdb               0.00     0.00  625.00    0.00 108033.20     0.00   172.85     0.56    0.90   0.49  30.80
sdc               0.00     0.00  624.20    0.00 107828.40     0.00   172.75     0.50    0.81   0.47  29.60
sdd               0.00     1.20    0.00  504.40     0.00 107729.60   213.58     9.52   18.85   1.98 100.00

It seems that ZFS is able to identify a hard drive by GID somehow but doesn’t automatically use it in the pool. This makes it so that you can’t test a drive by removing it, formatting it, and putting it into a new location. Basically, zfs assumes that your drives are always going to be in the same /dev location, which isn’t always true. As soon as you attach a USB drive in Ubuntu things are going to shift around.

After the resilver is complete, the zpool status is:

root@cerberus:/home/dave# zpool status
  pool: media
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed after 0h16m with 0 errors on Sun May 15 07:35:46 2011
config:

        NAME        STATE     READ WRITE CKSUM
        media       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdd     ONLINE       0     0    13  50.0G resilvered
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0

errors: No known data errors

You can now clear the error with:

root@cerberus:/home/dave# zpool clear media
root@cerberus:/home/dave#

Zpool status now shows no errors:

root@cerberus:/home/dave# zpool status
  pool: media
 state: ONLINE
 scrub: resilver completed after 0h16m with 0 errors on Sun May 15 07:35:46 2011
config:

        NAME        STATE     READ WRITE CKSUM
        media       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdd     ONLINE       0     0     0  50.0G resilvered
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0

errors: No known data errors

So now the question I have is this: Are you able to manually update or remove the drive status somewhere in your system? How did zfs know that this drive already had a pool installed on it? I zeroed the drive and verified with fdisk there were no partition on it. Is there a file somewhere on the system that stores this information, or is it written somewhere on the drive?

ZFS is great, but it still has some little issues like this that give me pause before using it in a production system. Then again, I suppose all massive disk array systems have their little quirks!

Dave Drager

Dave Drager

Comments

Sign in or become a free systemBash member to read and leave comments.
Just enter your email below to get an easy log in link.