I was investigating the possibility of making the Sun a bit more silent this weekend. In an act of unimaginable stupidity I accidentally pulled the wrong drive out of the RAID. Since I was running RAID5 I kind of assumed I would spend the day waiting for a parity check to finish before my system would be back online again but I was so terribly wrong. I’m using software RAID in OpenBSD, as my SCSI-controller apparently does not come with RAID support, and this is a feature which is not enabled in the supported kernel meaning I only have myself to blame.
Since I don’t own a VGA monitor I drove to work on Saturday evening in order to borrow a monitor so that I could investigate why the system didn’t come up. Instead of recalculating the parity the system refused to boot and waited for me to manually run fsck_ffs on the partitions residing on the RAID. I manually executed a parity recalculation and then ran fsck which found a ton of errors on both the partitions. To top it all off this was the day before my weekly tape backup was to be run. When I finally was able to boot the system it turned out it had killed so many files on /var that it was pretty much useless. I decided to recover the entire tape from last week as I had no idea how far the problems had spread. The posts from last week were recreated by copy pasting from planets where I’m aggregated but I lost the comments, drafts and a couple of other things.
My plan now is to switch to the concatenated disk driver and solely rely on tape for backups. ccd is officially supported by OpenBSD so I assume it doesn’t have as many hidden problems.
The lesson I’ve learned this weekend is to always run the tape backup before messing with the drives. Stupid me!