Hard-drive failure

After a couple of weeks of clacking noises from my media machine, and having had more pressing things to do than try to narrow down which of its three mass-storage drives has the problem, I finally got around to removing the failing drive (a 1.5 TB Seagate Barracuda) in the hope of rescuing its data. Backups did exist, but weren’t quite up to date. I’m pretty sure this is just a case of old age – it’s at least five years since I bought it.

The drive is now in a state of relatively advanced failure where any significant seeking activity will cause it to go into a clacking seizure; because of this, my attempts so far to read off its filesystem contents have been so dogged by appalling throughput that I had to abort for fear of killing the drive.

Mechanical hard drives (“spinning rust”) seem to follow this failure pattern quite often: the first thing to go is the ability to seek long distances over the platters, but track-to-track seeking and data readout can last much longer.

TL;DR? If in doubt, just linearly scan the drive into an image and worry about getting the files back later; it’s usually the best shot you’ve got at getting the contents off before it goes phut. As I write this, I’m 420 GB into streaming the drive’s single partition directly to a pair of LTO-4 tapes. Although there are some clacks, I’m consistently getting north of 90 MB/s.

If the drive lasts a few more hours, I’ll end up with a disk image that I can spool out to a new drive; if the drive still works when I next power it up, I’ll just image it directly onto the replacement drive and the tapes won’t be needed at all. In any case, one PartedMagic resize and a physical reinstallation will see the problem solved.

Update

The data rate was a bit painful through the last few gigabytes, but it’s done! Here are the command lines I used:

mt setblk 524288
dd if=/dev/sdb1 bs=1M | bf | dd of=$TAPE bs=1M
dd if=/dev/sdb1 bs=1M skip=650000 | bf | dd of=$TAPE bs=512k
dd if=/dev/sdb1 bs=1M skip=1200000 | bf | dd of=$TAPE bs=512k

Note: bf is an alias for the threaded buffering program I wrote to overcome the throughput penalties caused by unbuffered, blocking writes to tape drives; my alias specifies a 3.5 GB buffer.

There are a couple of mistakes here: I should have used obs= instead of bs= for the tape-writing dd instance, and I used the wrong size on the first tape too. The beauty of using mt setblk with a nonzero parameter, though, is that it fails the job hard with an I/O error the first time a size-mismatched write occurs; I’m therefore pretty sure that all three tapes came out fine despite the mistakes.

In case you’re wondering, I determined the skip= numbers for tapes 2 and 3 by looking at where the previous write ran out of space and rounding down by about 10 GB. I had to use an extra LTO-3 tape, I think because the heads on my LTO-4 drive are getting a bit marginal. The 650,000 binary MB written to the first tape corresponds to about 700,000 decimal MB, which is only ⅞ of the tape’s rated capacity. Once it gets down to ¾ or so, I’ll have to stick to LTO-3 again.

Hard-drive failure