Skip to content
  • Wu Fengguang's avatar
    squeeze max-pause area and drop pass-good area · bb082295
    Wu Fengguang authored
    Revert the pass-good area introduced in ffd1f609
    
     ("writeback:
    introduce max-pause and pass-good dirty limits") and make the max-pause
    area smaller and safe.
    
    This fixes ~30% performance regression in the ext3 data=writeback
    fio_mmap_randwrite_64k/fio_mmap_randrw_64k test cases, where there are
    12 JBOD disks, on each disk runs 8 concurrent tasks doing reads+writes.
    
    Using deadline scheduler also has a regression, but not that big as CFQ,
    so this suggests we have some write starvation.
    
    The test logs show that
    
    - the disks are sometimes under utilized
    
    - global dirty pages sometimes rush high to the pass-good area for
      several hundred seconds, while in the mean time some bdi dirty pages
      drop to very low value (bdi_dirty << bdi_thresh).  Then suddenly the
      global dirty pages dropped under global dirty threshold and bdi_dirty
      rush very high (for example, 2 times higher than bdi_thresh). During
      which time balance_dirty_pages() is not called at all.
    
    So the problems are
    
    1) The random writes progress so slow that they break the assumption of
       the max-pause logic that "8 pages per 200ms is typically more than
       enough to curb heavy dirtiers".
    
    2) The max-pause logic ignored task_bdi_thresh and thus opens the possibility
       for some bdi's to over dirty pages, leading to (bdi_dirty >> bdi_thresh)
       and then (bdi_thresh >> bdi_dirty) for others.
    
    3) The higher max-pause/pass-good thresholds somehow leads to the bad
       swing of dirty pages.
    
    The fix is to allow the task to slightly dirty over task_bdi_thresh, but
    no way to exceed bdi_dirty and/or global dirty_thresh.
    
    Tests show that it fixed the JBOD regression completely (both behavior
    and performance), while still being able to cut down large pause times
    in balance_dirty_pages() for single-disk cases.
    
    Reported-by: default avatarLi Shaohua <shaohua.li@intel.com>
    Tested-by: default avatarLi Shaohua <shaohua.li@intel.com>
    Acked-by: default avatarJan Kara <jack@suse.cz>
    Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
    bb082295