Skip to content
  • Zhang, Yanmin's avatar
    block: fix boot failure with CONFIG_DEBUG_BLOCK_EXT_DEVT=y and nash · 561ec68e
    Zhang, Yanmin authored
    We run into system boot failure with kernel 2.6.28-rc. We found it on a
    couple of machines, including T61 notebook, nehalem machine, and another
    HPC NX6325 notebook.  All the machines use FedoraCore 8 or FedoraCore 9.
    With kernel prior to 2.6.28-rc, system boot doesn't fail.
    
    I debug it and locate the root cause. Pls. see
    http://bugzilla.kernel.org/show_bug.cgi?id=11899
    https://bugzilla.redhat.com/show_bug.cgi?id=471517
    
    As a matter of fact, there are 2 bugs.
    
    1)root=/dev/sda1, system boot randomly fails. Mostly, boot for 5 times
    and fails once. nash has a bug. Some of its functions misuse return
    value 0.  Sometimes, 0 means timeout and no uevent available. Sometimes,
    0 means nash gets an uevent, but the uevent isn't block-related (for
    exmaple, usb). If by coincidence, kernel tells nash that uevents are
    available, but kernel also set timeout, nash might stops collecting
    other uevents in queue if current uevent isn't block-related.  I work
    out a patch for nash to fix it.
    http://bugzilla.kernel.org/attachment.cgi?id=18858
    
    2) root=LABEL=/, system always can't boot. initrd init reports
    switchroot fails. Here is an executation branch of nash when booting:
        (1) nash read /sys/block/sda/dev; Assume major is 8 (on my desktop)
        (2) nash query /proc/devices with the major number; It found line
    	"8 sd";
        (3) nash use 'sd' to search its own probe table to find device (DISK)
    	type for the device and add it to its own list;
        (4) Later on, it probes all devices in its list to get filesystem
    	labels; scsi register "8 sd" always.
    
    When major is 259, nash fails to find the device(DISK) type. I enables
    CONFIG_DEBUG_BLOCK_EXT_DEVT=y when compiling kernel, so 259 is picked up
    for device /dev/sda1, which causes nash to fail to find device (DISK)
    type.
    
    To fixing issue 2), I create a patch for nash and another patch for
    kernel.
    
    http://bugzilla.kernel.org/attachment.cgi?id=18859
    http://bugzilla.kernel.org/attachment.cgi?id=18837
    
    
    
    Below is the patch for kernel 2.6.28-rc4. It registers blkext, a new
    block device in proc/devices.
    
    With 2 patches on nash and 1 patch on kernel, I boot my machines for
    dozens of times without failure.
    
    Signed-off-by Zhang Yanmin <yanmin.zhang@linux.intel.com>
    Acked-by: default avatarTejun Heo <tj@kernel.org>
    Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
    561ec68e