• Mel Gorman's avatar
    mm: page_alloc: use word-based accesses for get/set pageblock bitmaps · e58469ba
    Mel Gorman authored
    
    
    The test_bit operations in get/set pageblock flags are expensive.  This
    patch reads the bitmap on a word basis and use shifts and masks to isolate
    the bits of interest.  Similarly masks are used to set a local copy of the
    bitmap and then use cmpxchg to update the bitmap if there have been no
    other changes made in parallel.
    
    In a test running dd onto tmpfs the overhead of the pageblock-related
    functions went from 1.27% in profiles to 0.5%.
    
    In addition to the performance benefits, this patch closes races that are
    possible between:
    
    a) get_ and set_pageblock_migratetype(), where get_pageblock_migratetype()
       reads part of the bits before and other part of the bits after
       set_pageblock_migratetype() has updated them.
    
    b) set_pageblock_migratetype() and set_pageblock_skip(), where the non-atomic
       read-modify-update set bit operation in set_pageblock_skip() will cause
       lost updates to some bits changed in the set_pageblock_migratetype().
    
    Joonsoo Kim first reported the case a) via code inspection.  Vlastimil
    Babka's testing with a debug patch showed that either a) or b) occurs
    roughly once per mmtests' stress-highalloc benchmark (although not
    necessarily in the same pageblock).  Furthermore during development of
    unrelated compaction patches, it was observed that frequent calls to
    {start,undo}_isolate_page_range() the race occurs several thousands of
    times and has resulted in NULL pointer dereferences in move_freepages()
    and free_one_page() in places where free_list[migratetype] is
    manipulated by e.g.  list_move().  Further debugging confirmed that
    migratetype had invalid value of 6, causing out of bounds access to the
    free_list array.
    
    That confirmed that the race exist, although it may be extremely rare,
    and currently only fatal where page isolation is performed due to
    memory hot remove.  Races on pageblocks being updated by
    set_pageblock_migratetype(), where both old and new migratetype are
    lower MIGRATE_RESERVE, currently cannot result in an invalid value
    being observed, although theoretically they may still lead to
    unexpected creation or destruction of MIGRATE_RESERVE pageblocks.
    Furthermore, things could get suddenly worse when memory isolation is
    used more, or when new migratetypes are added.
    
    After this patch, the race has no longer been observed in testing.
    Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Reported-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
    Reported-and-tested-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Michal Hocko <mhocko@suse.cz>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Theodore Ts'o <tytso@mit.edu>
    Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    e58469ba
page_alloc.c 183 KB