Skip to content
  • NeilBrown's avatar
    MM: increase safety margin provided by PF_LESS_THROTTLE · a53eaff8
    NeilBrown authored
    When nfsd is exporting a filesystem over NFS which is then NFS-mounted
    on the local machine there is a risk of deadlock.  This happens when
    there are lots of dirty pages in the NFS filesystem and they cause NFSD
    to be throttled, either in throttle_vm_writeout() or in
    balance_dirty_pages().
    
    To avoid this problem the PF_LESS_THROTTLE flag is set for NFSD threads
    and it provides a 25% increase to the limits that affect NFSD.  Any
    process writing to an NFS filesystem will be throttled well before the
    number of dirty NFS pages reaches the limit imposed on NFSD, so NFSD
    will not deadlock on pages that it needs to write out.  At least it
    shouldn't.
    
    All processes are allowed a small excess margin to avoid performing too
    many calculations: ratelimit_pages.
    
    ratelimit_pages is set so that if a thread on every CPU uses the entire
    margin, the total will only go 3% over the limit, and this is much less
    than the 25% bonus that PF_LESS_THROTTLE provides, so this margin
    shouldn't be a problem.  But it is.
    
    The "total memory" that these 3% and 25% are calculated against are not
    really total memory but are "global_dirtyable_memory()" which doesn't
    include anonymous memory, just free memory and page-cache memory.
    
    The "ratelimit_pages" number is based on whatever the
    global_dirtyable_memory was on the last CPU hot-plug, which might not be
    what you expect, but is probably close to the total freeable memory.
    
    The throttle threshold uses the global_dirtable_memory at the moment
    when the throttling happens, which could be much less than at the last
    CPU hotplug.  So if lots of anonymous memory has been allocated, thus
    pushing out lots of page-cache pages, then NFSD might end up being
    throttled due to dirty NFS pages because the "25%" bonus it gets is
    calculated against a rather small amount of dirtyable memory, while the
    "3%" margin that other processes are allowed to dirty without penalty is
    calculated against a much larger number.
    
    To remove this possibility of deadlock we need to make sure that the
    margin granted to PF_LESS_THROTTLE exceeds that rate-limit margin.
    Simply adding ratelimit_pages isn't enough as that should be multiplied
    by the number of cpus.
    
    So add "global_wb_domain.dirty_limit / 32" as that more accurately
    reflects the current total over-shoot margin.  This ensures that the
    number of dirty NFS pages never gets so high that nfsd will be throttled
    waiting for them to be written.
    
    Link: http://lkml.kernel.org/r/87futgowwv.fsf@notabene.neil.brown.name
    
    
    Signed-off-by: default avatarNeilBrown <neilb@suse.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    a53eaff8