Skip to content
  • Joonsoo Kim's avatar
    mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype · ad53f92e
    Joonsoo Kim authored
    
    
    Before describing bugs itself, I first explain definition of freepage.
    
     1. pages on buddy list are counted as freepage.
     2. pages on isolate migratetype buddy list are *not* counted as freepage.
     3. pages on cma buddy list are counted as CMA freepage, too.
    
    Now, I describe problems and related patch.
    
    Patch 1: There is race conditions on getting pageblock migratetype that
    it results in misplacement of freepages on buddy list, incorrect
    freepage count and un-availability of freepage.
    
    Patch 2: Freepages on pcp list could have stale cached information to
    determine migratetype of buddy list to go.  This causes misplacement of
    freepages on buddy list and incorrect freepage count.
    
    Patch 4: Merging between freepages on different migratetype of
    pageblocks will cause freepages accouting problem.  This patch fixes it.
    
    Without patchset [3], above problem doesn't happens on my CMA allocation
    test, because CMA reserved pages aren't used at all.  So there is no
    chance for above race.
    
    With patchset [3], I did simple CMA allocation test and get below
    result:
    
     - Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
     - run kernel build (make -j16) on background
     - 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
     - Result: more than 5000 freepage count are missed
    
    With patchset [3] and this patchset, I found that no freepage count are
    missed so that I conclude that problems are solved.
    
    On my simple memory offlining test, these problems also occur on that
    environment, too.
    
    This patch (of 4):
    
    There are two paths to reach core free function of buddy allocator,
    __free_one_page(), one is free_one_page()->__free_one_page() and the
    other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page().
    Each paths has race condition causing serious problems.  At first, this
    patch is focused on first type of freepath.  And then, following patch
    will solve the problem in second type of freepath.
    
    In the first type of freepath, we got migratetype of freeing page
    without holding the zone lock, so it could be racy.  There are two cases
    of this race.
    
     1. pages are added to isolate buddy list after restoring orignal
        migratetype
    
        CPU1                                   CPU2
    
        get migratetype => return MIGRATE_ISOLATE
        call free_one_page() with MIGRATE_ISOLATE
    
                                    grab the zone lock
                                    unisolate pageblock
                                    release the zone lock
    
        grab the zone lock
        call __free_one_page() with MIGRATE_ISOLATE
        freepage go into isolate buddy list,
        although pageblock is already unisolated
    
    This may cause two problems.  One is that we can't use this page anymore
    until next isolation attempt of this pageblock, because freepage is on
    isolate buddy list.  The other is that freepage accouting could be wrong
    due to merging between different buddy list.  Freepages on isolate buddy
    list aren't counted as freepage, but ones on normal buddy list are
    counted as freepage.  If merge happens, buddy freepage on normal buddy
    list is inevitably moved to isolate buddy list without any consideration
    of freepage accouting so it could be incorrect.
    
     2. pages are added to normal buddy list while pageblock is isolated.
        It is similar with above case.
    
    This also may cause two problems.  One is that we can't keep these
    freepages from being allocated.  Although this pageblock is isolated,
    freepage would be added to normal buddy list so that it could be
    allocated without any restriction.  And the other problem is same as
    case 1, that it, incorrect freepage accouting.
    
    This race condition would be prevented by checking migratetype again
    with holding the zone lock.  Because it is somewhat heavy operation and
    it isn't needed in common case, we want to avoid rechecking as much as
    possible.  So this patch introduce new variable, nr_isolate_pageblock in
    struct zone to check if there is isolated pageblock.  With this, we can
    avoid to re-check migratetype in common case and do it only if there is
    isolated pageblock or migratetype is MIGRATE_ISOLATE.  This solve above
    mentioned problems.
    
    Changes from v3:
    Add one more check in free_one_page() that checks whether migratetype is
    MIGRATE_ISOLATE or not. Without this, abovementioned case 1 could happens.
    
    Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
    Acked-by: default avatarMinchan Kim <minchan@kernel.org>
    Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
    Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
    Cc: Tang Chen <tangchen@cn.fujitsu.com>
    Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
    Cc: Wen Congyang <wency@cn.fujitsu.com>
    Cc: Marek Szyprowski <m.szyprowski@samsung.com>
    Cc: Laura Abbott <lauraa@codeaurora.org>
    Cc: Heesub Shin <heesub.shin@samsung.com>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
    Cc: Ritesh Harjani <ritesh.list@gmail.com>
    Cc: Gioh Kim <gioh.kim@lge.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    ad53f92e