Skip to content
  • Minchan Kim's avatar
    mm: use up free swap space before reaching OOM kill · 0e50ce3b
    Minchan Kim authored
    Recently, Luigi reported there are lots of free swap space when OOM
    happens.  It's easily reproduced on zram-over-swap, where many instance
    of memory hogs are running and laptop_mode is enabled.  He said there
    was no problem when he disabled laptop_mode.  The problem when I
    investigate problem is following as.
    
    Assumption for easy explanation: There are no page cache page in system
    because they all are already reclaimed.
    
    1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
    2. shrink_inactive_list isolates victim pages from inactive anon lru list.
    3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
       pageout because sc->may_writepage is 0 so the page is rotated back into
       inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty.
    4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
       retry reclaim with higher priority.
    5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
       but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
       inactive anon lru list is full of dirty pages by 3 so it just returns
       without  any reclaim progress.
    6. do_try_to_free_pages doesn't set may_writepage due to zero total_scanned.
       Because sc->nr_scanned is increased by shrink_page_list but we don't call
       shrink_page_list in 5 due to short of isolated pages.
    
    Above loop is continued until OOM happens.
    
    The problem didn't happen before [1] was merged because old logic's
    isolatation in shrink_inactive_list was successful and tried to call
    shrink_page_list to pageout them but it still ends up failed to page out
    by may_writepage.  But important point is that sc->nr_scanned was
    increased although we couldn't swap out them so do_try_to_free_pages
    could set may_writepages.
    
    Since commit f80c0673
    
     ("mm: zone_reclaim: make isolate_lru_page()
    filter-aware") was introduced, it's not a good idea any more to depends
    on only the number of scanned pages for setting may_writepage.  So this
    patch adds new trigger point of setting may_writepage as below
    DEF_PRIOIRTY - 2 which is used to show the significant memory pressure
    in VM so it's good fit for our purpose which would be better to lose
    power saving or clickety rather than OOM killing.
    
    Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
    Reported-by: default avatarLuigi Semenzato <semenzato@google.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    0e50ce3b