Skip to content
  • Vlastimil Babka's avatar
    mm: compaction: detect when scanners meet in isolate_freepages · 7ed695e0
    Vlastimil Babka authored
    Compaction of a zone is finished when the migrate scanner (which begins
    at the zone's lowest pfn) meets the free page scanner (which begins at
    the zone's highest pfn).  This is detected in compact_zone() and in the
    case of direct compaction, the compact_blockskip_flush flag is set so
    that kswapd later resets the cached scanner pfn's, and a new compaction
    may again start at the zone's borders.
    
    The meeting of the scanners can happen during either scanner's activity.
    However, it may currently fail to be detected when it occurs in the free
    page scanner, due to two problems.  First, isolate_freepages() keeps
    free_pfn at the highest block where it isolated pages from, for the
    purposes of not missing the pages that are returned back to allocator
    when migration fails.  Second, failing to isolate enough free pages due
    to scanners meeting results in -ENOMEM being returned by
    migrate_pages(), which makes compact_zone() bail out immediately without
    calling compact_finished() that would detect scanners meeting.
    
    This failure to detect scanners meeting might result in repeated
    attempts at compaction of a zone that keep starting from the cached
    pfn's close to the meeting point, and quickly failing through the
    -ENOMEM path, without the cached pfns being reset, over and over.  This
    has been observed (through additional tracepoints) in the third phase of
    the mmtests stress-highalloc benchmark, where the allocator runs on an
    otherwise idle system.  The problem was observed in the DMA32 zone,
    which was used as a fallback to the preferred Normal zone, but on the
    4GB system it was actually the largest zone.  The problem is even
    amplified for such fallback zone - the deferred compaction logic, which
    could (after being fixed by a previous patch) reset the cached scanner
    pfn's, is only applied to the preferred zone and not for the fallbacks.
    
    The problem in the third phase of the benchmark was further amplified by
    commit 81c0a2bb ("mm: page_alloc: fair zone allocator policy") which
    resulted in a non-deterministic regression of the allocation success
    rate from ~85% to ~65%.  This occurs in about half of benchmark runs,
    making bisection problematic.  It is unlikely that the commit itself is
    buggy, but it should put more pressure on the DMA32 zone during phases 1
    and 2, which may leave it more fragmented in phase 3 and expose the bugs
    that this patch fixes.
    
    The fix is to make scanners meeting in isolate_freepage() stay that way,
    and to check in compact_zone() for scanners meeting when migrate_pages()
    returns -ENOMEM.  The result is that compact_finished() also detects
    scanners meeting and sets the compact_blockskip_flush flag to make
    kswapd reset the scanner pfn's.
    
    The results in stress-highalloc benchmark show that the "regression" by
    commit 81c0a2bb
    
     in phase 3 no longer occurs, and phase 1 and 2
    allocation success rates are also significantly improved.
    
    Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    7ed695e0