Skip to content
  • Vlastimil Babka's avatar
    mm, compaction: defer each zone individually instead of preferred zone · 53853e2d
    Vlastimil Babka authored
    When direct sync compaction is often unsuccessful, it may become deferred
    for some time to avoid further useless attempts, both sync and async.
    Successful high-order allocations un-defer compaction, while further
    unsuccessful compaction attempts prolong the compaction deferred period.
    
    Currently the checking and setting deferred status is performed only on
    the preferred zone of the allocation that invoked direct compaction.  But
    compaction itself is attempted on all eligible zones in the zonelist, so
    the behavior is suboptimal and may lead both to scenarios where 1)
    compaction is attempted uselessly, or 2) where it's not attempted despite
    good chances of succeeding, as shown on the examples below:
    
    1) A direct compaction with Normal preferred zone failed and set
       deferred compaction for the Normal zone.  Another unrelated direct
       compaction with DMA32 as preferred zone will attempt to compact DMA32
       zone even though the first compaction attempt also included DMA32 zone.
    
       In another scenario, compaction with Normal preferred zone failed to
       compact Normal zone, but succeeded in the DMA32 zone, so it will not
       defer compaction.  In the next attempt, it will try Normal zone which
       will fail again, instead of skipping Normal zone and trying DMA32
       directly.
    
    2) Kswapd will balance DMA32 zone and reset defer status based on
       watermarks looking good.  A direct compaction with preferred Normal
       zone will skip compaction of all zones including DMA32 because Normal
       was still deferred.  The allocation might have succeeded in DMA32, but
       won't.
    
    This patch makes compaction deferring work on individual zone basis
    instead of preferred zone.  For each zone, it checks compaction_deferred()
    to decide if the zone should be skipped.  If watermarks fail after
    compacting the zone, defer_compaction() is called.  The zone where
    watermarks passed can still be deferred when the allocation attempt is
    unsuccessful.  When allocation is successful, compaction_defer_reset() is
    called for the zone containing the allocated page.  This approach should
    approximate calling defer_compaction() only on zones where compaction was
    attempted and did not yield allocated page.  There might be corner cases
    but that is inevitable as long as the decision to stop compacting dues not
    guarantee that a page will be allocated.
    
    Due to a new COMPACT_DEFERRED return value, some functions relying
    implicitly on COMPACT_SKIPPED = 0 had to be updated, with comments made
    more accurate.  The did_some_progress output parameter of
    __alloc_pages_direct_compact() is removed completely, as the caller
    actually does not use it after compaction sets it - it is only considered
    when direct reclaim sets it.
    
    During testing on a two-node machine with a single very small Normal zone
    on node 1, this patch has improved success rates in stress-highalloc
    mmtests benchmark.  The success here were previously made worse by commit
    3a025760
    
     ("mm: page_alloc: spill to remote nodes before waking
    kswapd") as kswapd was no longer resetting often enough the deferred
    compaction for the Normal zone, and DMA32 zones on both nodes were thus
    not considered for compaction.  On different machine, success rates were
    improved with __GFP_NO_KSWAPD allocations.
    
    [akpm@linux-foundation.org: fix CONFIG_COMPACTION=n build]
    Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Acked-by: default avatarMinchan Kim <minchan@kernel.org>
    Reviewed-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
    Acked-by: default avatarMel Gorman <mgorman@suse.de>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Michal Nazarewicz <mina86@mina86.com>
    Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    53853e2d