Skip to content
  • Tejun Heo's avatar
    blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode · 17497acb
    Tejun Heo authored
    blk-mq uses percpu_ref for its usage counter which tracks the number
    of in-flight commands and used to synchronously drain the queue on
    freeze.  percpu_ref shutdown takes measureable wallclock time as it
    involves a sched RCU grace period.  This means that draining a blk-mq
    takes measureable wallclock time.  One would think that this shouldn't
    matter as queue shutdown should be a rare event which takes place
    asynchronously w.r.t. userland.
    
    Unfortunately, SCSI probing involves synchronously setting up and then
    tearing down a lot of request_queues back-to-back for non-existent
    LUNs.  This means that SCSI probing may take above ten seconds when
    scsi-mq is used.
    
      [    0.949892] scsi host0: Virtio SCSI HBA
      [    1.007864] scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    1.1. PQ: 0 ANSI: 5
      [    1.021299] scsi 0:0:1:0: Direct-Access     QEMU     QEMU HARDDISK    1.1. PQ: 0 ANSI: 5
      [    1.520356] tsc: Refined TSC clocksource calibration: 2491.910 MHz
    
      <stall>
    
      [   16.186549] sd 0:0:0:0: Attached scsi generic sg0 type 0
      [   16.190478] sd 0:0:1:0: Attached scsi generic sg1 type 0
      [   16.194099] osd: LOADED open-osd 0.2.1
      [   16.203202] sd 0:0:0:0: [sda] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB)
      [   16.208478] sd 0:0:0:0: [sda] Write Protect is off
      [   16.211439] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
      [   16.218771] sd 0:0:1:0: [sdb] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB)
      [   16.223264] sd 0:0:1:0: [sdb] Write Protect is off
      [   16.225682] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
    
    This is also the reason why request_queues start in bypass mode which
    is ended on blk_register_queue() as shutting down a fully functional
    queue also involves a RCU grace period and the queues for non-existent
    SCSI devices never reach registration.
    
    blk-mq basically needs to do the same thing - start the mq in a
    degraded mode which is faster to shut down and then make it fully
    functional only after the queue reaches registration.  percpu_ref
    recently grew facilities to force atomic operation until explicitly
    switched to percpu mode, which can be used for this purpose.  This
    patch makes blk-mq initialize q->mq_usage_counter in atomic mode and
    switch it to percpu mode only once blk_register_queue() is reached.
    
    Note that this issue was previously worked around by 0a30288d
    ("blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during
    probe") for v3.17.  The temp fix was reverted in preparation of adding
    persistent atomic mode to percpu_ref by 9eca8046
    
     ("Revert "blk-mq,
    percpu_ref: implement a kludge for SCSI blk-mq stall during probe"").
    This patch and the prerequisite percpu_ref changes will be merged
    during v3.18 devel cycle.
    
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Reported-by: default avatarChristoph Hellwig <hch@infradead.org>
    Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de
    Fixes: add703fd
    
     ("blk-mq: use percpu_ref for mq usage count")
    Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    17497acb