Commit 4925a276 authored by Mike Hibler's avatar Mike Hibler

Fix obscure error with (not) invalidating disks during reload.

Our MBR/superblock/LVM/ZFS smashing code in rc.frisbee relied on dmesg
output to determine the local disks to call zapdisk on. However, the RE we
used assumed well ordered output like:

  da0 at mpt0 bus 0 scbus0 target 0 lun 0
  da0: <ATA WDC WD5003ABYZ-0 1S03> Fixed Direct Access SPC-3 SCSI device
  da0: Serial Number      WD-WMAYP0DPNFLM
  da0: 300.000MB/s transfers
  da0: Command Queueing enabled
  da0: 476940MB (976773168 512 byte sectors)

where we matched that last line. But due to the asynchronous nature of disk
initialization, probably due to some soon-to-be-failing disks on the d710s,
the last line was delayed and came out mashed-up with the da1 output:

  da1: <ATA WDC WD5003ABYX-1 1S02> Fixed Direct Access SPC-3 SCSI device
  da1: Serial Number      WD-WMAYP4939538
  da1: 300.000MB/s transfersda0: 476940MB (976773168 512 byte sectors)

so we didn't see da0 and didn't call zapdisk on it. This led to some LVM
metadata on /dev/sda4 leaking through to a new experiment and if that experiment
tried to setup LVM (e.g., a vnode host), it would blow up.

Now we use a sysctl call (kern.disks) to get the disk names.
parent 81ea018d
......@@ -138,13 +138,22 @@ tweakmbr() {
dd if=/etc/emulab/mbr${_NEW}.dd of=/dev/$_DSK bs=512 count=1
# XXX Use sysctl info if available.
# Do to the async nature of activity in booting, dmesg.boot info may be
# interleaved. E.g., we often see:
# da1: 300.000MB/s transfersda0: 476940MB (976773168 512 byte sectors)
# which means we won't see da0 as a disk using the sed RE.
find_disks() {
for d in `sed -n 's/^\([a-z]*[0-9][0-9]*\): [0-9][0-9]*MB/\1/p' /var/run/dmesg.boot`; do
case $d in
ad*|da*|ar*|aacd*|amrd*|mfid*|mfisyspd*|nvd*) _DISKS="$_DISKS $d"
_DISKS=`sysctl -n kern.disks 2>/dev/null`
if [ -z "$_DISKS" ]; then
for d in `sed -n 's/^\([a-z]*[0-9][0-9]*\): [0-9][0-9]*MB.*/\1/p' /var/run/dmesg.boot`; do
case $d in
ad*|da*|ar*|aacd*|amrd*|mfid*|mfisyspd*|nvd*) _DISKS="$_DISKS $d"
echo $_DISKS
......@@ -692,8 +701,32 @@ for dev in $devs; do
echo "`date`: slicefix run(s) done"
# Note that if growdisk succeeds, then the newly defined partition might
# contain metadata from a previous use. We would not have picked up on this
# earlier because the partition was not defined yet. So in our usual paranoid
# fashion, zap that partition if and only if we are in the reloading experiment.
echo "`date`: Resizing final disk partition"
$BINDIR/growdisk -vW /dev/$DISK
out=`$BINDIR/growdisk -vW /dev/$DISK`
echo $out
if [ $INRELOADING -eq 1 -a $stat -eq 0 ]; then
xpart=`echo $out | sed -n -e 's/.*defining partition \([0-9]\) .*/\1/p'`
if [ -n "$xpart" ]; then
echo "Zapping newly created extra partition $xpart"
if [ -x "$BINDIR/zapdisk" ]; then
echo "Invalidating superblocks and MBR/GPT on $DISK partition $xpart"
$BINDIR/zapdisk -v -p $xpart -SZ /dev/$DISK
off=`echo $out | sed -n -e 's/.*start=\([0-9][0-9]*\),.*/\1/p'`
dd if=/dev/zero of=/dev/$DISK oseek=$off count=16 >/dev/null 2>&1
if [ $? -ne 0 ]; then
echo "WARNING: failed to invalidate extra partition $xpart at $off"
# If requested to reboot, do so.
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment