Skip to content
  • Wu Fengguang's avatar
    writeback: dirty position control · 6c14ae1e
    Wu Fengguang authored
    
    
    bdi_position_ratio() provides a scale factor to bdi->dirty_ratelimit, so
    that the resulted task rate limit can drive the dirty pages back to the
    global/bdi setpoints.
    
    Old scheme is,
                                              |
                               free run area  |  throttle area
      ----------------------------------------+---------------------------->
                                        thresh^                  dirty pages
    
    New scheme is,
    
      ^ task rate limit
      |
      |            *
      |             *
      |              *
      |[free run]      *      [smooth throttled]
      |                  *
      |                     *
      |                         *
      ..bdi->dirty_ratelimit..........*
      |                               .     *
      |                               .          *
      |                               .              *
      |                               .                 *
      |                               .                    *
      +-------------------------------.-----------------------*------------>
                              setpoint^                  limit^  dirty pages
    
    The slope of the bdi control line should be
    
    1) large enough to pull the dirty pages to setpoint reasonably fast
    
    2) small enough to avoid big fluctuations in the resulted pos_ratio and
       hence task ratelimit
    
    Since the fluctuation range of the bdi dirty pages is typically observed
    to be within 1-second worth of data, the bdi control line's slope is
    selected to be a linear function of bdi write bandwidth, so that it can
    adapt to slow/fast storage devices well.
    
    Assume the bdi control line
    
    	pos_ratio = 1.0 + k * (dirty - bdi_setpoint)
    
    where k is the negative slope.
    
    If targeting for 12.5% fluctuation range in pos_ratio when dirty pages
    are fluctuating in range
    
    	[bdi_setpoint - write_bw/2, bdi_setpoint + write_bw/2],
    
    we get slope
    
    	k = - 1 / (8 * write_bw)
    
    Let pos_ratio(x_intercept) = 0, we get the parameter used in code:
    
    	x_intercept = bdi_setpoint + 8 * write_bw
    
    The global/bdi slopes are nicely complementing each other when the
    system has only one major bdi (indicated by bdi_thresh ~= thresh):
    
    1) slope of global control line    => scaling to the control scope size
    2) slope of main bdi control line  => scaling to the writeout bandwidth
    
    so that
    
    - in memory tight systems, (1) becomes strong enough to squeeze dirty
      pages inside the control scope
    
    - in large memory systems where the "gravity" of (1) for pulling the
      dirty pages to setpoint is too weak, (2) can back (1) up and drive
      dirty pages to bdi_setpoint ~= setpoint reasonably fast.
    
    Unfortunately in JBOD setups, the fluctuation range of bdi threshold
    is related to memory size due to the interferences between disks.  In
    this case, the bdi slope will be weighted sum of write_bw and bdi_thresh.
    
    Given equations
    
            span = x_intercept - bdi_setpoint
            k = df/dx = - 1 / span
    
    and the extremum values
    
            span = bdi_thresh
            dx = bdi_thresh
    
    we get
    
            df = - dx / span = - 1.0
    
    That means, when bdi_dirty deviates bdi_thresh up, pos_ratio and hence
    task ratelimit will fluctuate by -100%.
    
    peter: use 3rd order polynomial for the global control line
    
    CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Acked-by: default avatarJan Kara <jack@suse.cz>
    Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
    6c14ae1e