I was investigating the possibility of making the Sun a bit more silent this weekend. In an act of unimaginable stupidity I accidentally pulled the wrong drive out of the RAID. Since I was running RAID5 I kind of assumed I would spend the day waiting for a parity check to finish before my system would be back online again but I was so terribly wrong. I’m using software RAID in OpenBSD, as my SCSI-controller apparently does not come with RAID support, and this is a feature which is not enabled in the supported kernel meaning I only have myself to blame.
Since I don’t own a VGA monitor I drove to work on Saturday evening in order to borrow a monitor so that I could investigate why the system didn’t come up. Instead of recalculating the parity the system refused to boot and waited for me to manually run fsck_ffs on the partitions residing on the RAID. I manually executed a parity recalculation and then ran fsck which found a ton of errors on both the partitions. To top it all off this was the day before my weekly tape backup was to be run. When I finally was able to boot the system it turned out it had killed so many files on /var that it was pretty much useless. I decided to recover the entire tape from last week as I had no idea how far the problems had spread. The posts from last week were recreated by copy pasting from planets where I’m aggregated but I lost the comments, drafts and a couple of other things.
My plan now is to switch to the concatenated disk driver and solely rely on tape for backups. ccd is officially supported by OpenBSD so I assume it doesn’t have as many hidden problems.
The lesson I’ve learned this weekend is to always run the tape backup before messing with the drives. Stupid me!
Name (required)
Mail (will not be published) (required)
Website