Skip to content
  • Mike Snitzer's avatar
    dm thin: fix queue limits stacking · 0f640dca
    Mike Snitzer authored
    thin_io_hints() is blindly copying the queue limits from the thin-pool
    which can lead to incorrect limits being set.  The fix here simply
    deletes the thin_io_hints() hook which leaves the existing stacking
    infrastructure to set the limits correctly.
    
    When a thin-pool uses an MD device for the data device a thin device
    from the thin-pool must respect MD's constraints about disallowing a bio
    from spanning multiple chunks.  Otherwise we can see problems.  If the raid0
    chunksize is 1152K and thin-pool chunksize is 256K I see the following
    md/raid0 error (with extra debug tracing added to thin_endio) when
    mkfs.xfs is executed against the thin device:
    
    md/raid0:md99: make_request bug: can't convert block across chunks or bigger than 1152k 6688 127
    device-mapper: thin: bio sector=2080 err=-5 bi_size=130560 bi_rw=17 bi_vcnt=32 bi_idx=0
    
    This extra DM debugging shows that the failing bio is spanning across
    the first and second logical 1152K chunk (sector 2080 + 255 takes the
    bio beyond the first chunk's boundary of sector 2304).  So the bio
    splitting that DM is doing clearly isn't respecting the MD limits.
    
    max_hw_sectors_kb is 127 for both the thin-pool and thin device
    (queue_max_hw_sectors returns 255 so we'll excuse sysfs's lack of
    precision).  So this explains why bi_size is 130560.
    
    But the thin device's max_hw_sectors_kb should be 4 (PAGE_SIZE) given
    that it doesn't have a .merge function (for bio_add_page to consult
    indirectly via dm_merge_bvec) yet the thin-pool does sit above an MD
    device that has a compulsory merge_bvec_fn.  This scenario is exactly
    why DM must resort to sending single PAGE_SIZE bios to the underlying
    layer. Some additional context for this is available in the header for
    commit 8cbeb67a
    
     ("dm: avoid unsupported spanning of md stripe boundaries").
    
    Long story short, the reason a thin device doesn't properly get
    configured to have a max_hw_sectors_kb of 4 (PAGE_SIZE) is that
    thin_io_hints() is blindly copying the queue limits from the thin-pool
    device directly to the thin device's queue limits.
    
    Fix this by eliminating thin_io_hints.  Doing so is safe because the
    block layer's queue limits stacking already enables the upper level thin
    device to inherit the thin-pool device's discard and minimum_io_size and
    optimal_io_size limits that get set in pool_io_hints.  But avoiding the
    queue limits copy allows the thin and thin-pool limits to be different
    where it is important, namely max_hw_sectors_kb.
    
    Reported-by: default avatarDaniel Browning <db@kavod.com>
    Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
    0f640dca