• Mel Gorman's avatar
    page-allocator: limit the number of MIGRATE_RESERVE pageblocks per zone · 78986a67
    Mel Gorman authored
    
    
    After anti-fragmentation was merged, a bug was reported whereby devices
    that depended on high-order atomic allocations were failing.  The solution
    was to preserve a property in the buddy allocator which tended to keep the
    minimum number of free pages in the zone at the lower physical addresses
    and contiguous.  To preserve this property, MIGRATE_RESERVE was introduced
    and a number of pageblocks at the start of a zone would be marked
    "reserve", the number of which depended on min_free_kbytes.
    
    Anti-fragmentation works by avoiding the mixing of page migratetypes
    within the same pageblock.  One way of helping this is to increase
    min_free_kbytes because it becomes less like that it will be necessary to
    place pages of of MIGRATE_RESERVE is unbounded, the free memory is kept
    there in large contiguous blocks instead of helping anti-fragmentation as
    much as it should.  With the page-allocator tracepoint patches applied, it
    was found during anti-fragmentation tests that the number of
    fragmentation-related events were far higher than expected even with
    min_free_kbytes at higher values.
    
    This patch limits the number of MIGRATE_RESERVE blocks that exist per zone
    to two.  For example, with a sufficient min_free_kbytes, 4MB of memory
    will be kept aside on an x86-64 and remain more or less free and
    contiguous for the systems uptime.  This should be sufficient for devices
    depending on high-order atomic allocations while helping fragmentation
    control when min_free_kbytes is tuned appropriately.  As side-effect of
    this patch is that the reserve variable is converted to int as unsigned
    long was the wrong type to use when ensuring that only the required number
    of reserve blocks are created.
    
    With the patches applied, fragmentation-related events as measured by the
    page allocator tracepoints were significantly reduced when running some
    fragmentation stress-tests on systems with min_free_kbytes tuned to a
    value appropriate for hugepage allocations at runtime.  On x86, the events
    recorded were reduced by 99.8%, on x86-64 by 99.72% and on ppc64 by
    99.83%.
    Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
    Cc: <stable@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    78986a67
page_alloc.c 138 KB