1. 17 Jul, 2015 1 commit
  2. 30 Jun, 2015 12 commits
  3. 24 Jun, 2015 5 commits
  4. 12 May, 2015 1 commit
    • Alexander Duyck's avatar
      mm/net: Rename and move page fragment handling from net/ to mm/ · b63ae8ca
      Alexander Duyck authored
      
      
      This change moves the __alloc_page_frag functionality out of the networking
      stack and into the page allocation portion of mm.  The idea it so help make
      this maintainable by placing it with other page allocation functions.
      
      Since we are moving it from skbuff.c to page_alloc.c I have also renamed
      the basic defines and structure from netdev_alloc_cache to page_frag_cache
      to reflect that this is now part of a different kernel subsystem.
      
      I have also added a simple __free_page_frag function which can handle
      freeing the frags based on the skb->head pointer.  The model for this is
      based off of __free_pages since we don't actually need to deal with all of
      the cases that put_page handles.  I incorporated the virt_to_head_page call
      and compound_order into the function as it actually allows for a signficant
      size reduction by reducing code duplication.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b63ae8ca
  5. 15 Apr, 2015 1 commit
  6. 14 Apr, 2015 7 commits
    • Yaowei Bai's avatar
      42ff2703
    • David Rientjes's avatar
      mm: remove GFP_THISNODE · 4167e9b2
      David Rientjes authored
      NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE.
      
      GFP_THISNODE is a secret combination of gfp bits that have different
      behavior than expected.  It is a combination of __GFP_THISNODE,
      __GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page
      allocator slowpath to fail without trying reclaim even though it may be
      used in combination with __GFP_WAIT.
      
      An example of the problem this creates: commit e97ca8e5
      
       ("mm: fix
      GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE
      that really just wanted __GFP_THISNODE.  The problem doesn't end there,
      however, because even it was a no-op for alloc_misplaced_dst_page(),
      which also sets __GFP_NORETRY and __GFP_NOWARN, and
      migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT
      is set in GFP_TRANSHUGE.  Converting GFP_THISNODE to __GFP_THISNODE is a
      no-op in these cases since the page allocator special-cases
      __GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN.
      
      It's time to just remove GFP_THISNODE entirely.  We leave __GFP_THISNODE
      to restrict an allocation to a local node, but remove GFP_THISNODE and
      its obscurity.  Instead, we require that a caller clear __GFP_WAIT if it
      wants to avoid reclaim.
      
      This allows the aforementioned functions to actually reclaim as they
      should.  It also enables any future callers that want to do
      __GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim.  The
      rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT.
      
      Aside: ovs_flow_stats_update() really wants to avoid reclaim as well, so
      it is unchanged.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: default avatarPekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Jarno Rajahalme <jrajahalme@nicira.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4167e9b2
    • Konstantin Khlebnikov's avatar
      mm: completely remove dumping per-cpu lists from show_mem() · 761b0677
      Konstantin Khlebnikov authored
      
      
      It seems nobody needs this.
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      761b0677
    • Konstantin Khlebnikov's avatar
      mm: hide per-cpu lists in output of show_mem() · d1bfcdb8
      Konstantin Khlebnikov authored
      
      
      This makes show_mem() much less verbose on huge machines.  Instead of huge
      and almost useless dump of counters for each per-zone per-cpu lists this
      patch prints the sum of these counters for each zone (free_pcp) and size
      of per-cpu list for current cpu (local_pcp).
      
      The filter flag SHOW_MEM_PERCPU_LISTS reverts to the old verbose mode.
      
      [akpm@linux-foundation.org: update show_free_areas comment]
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d1bfcdb8
    • Joonsoo Kim's avatar
      mm/compaction: enhance compaction finish condition · 2149cdae
      Joonsoo Kim authored
      
      
      Compaction has anti fragmentation algorithm.  It is that freepage should
      be more than pageblock order to finish the compaction if we don't find any
      freepage in requested migratetype buddy list.  This is for mitigating
      fragmentation, but, there is a lack of migratetype consideration and it is
      too excessive compared to page allocator's anti fragmentation algorithm.
      
      Not considering migratetype would cause premature finish of compaction.
      For example, if allocation request is for unmovable migratetype, freepage
      with CMA migratetype doesn't help that allocation and compaction should
      not be stopped.  But, current logic regards this situation as compaction
      is no longer needed, so finish the compaction.
      
      Secondly, condition is too excessive compared to page allocator's logic.
      We can steal freepage from other migratetype and change pageblock
      migratetype on more relaxed conditions in page allocator.  This is
      designed to prevent fragmentation and we can use it here.  Imposing hard
      constraint only to the compaction doesn't help much in this case since
      page allocator would cause fragmentation again.
      
      To solve these problems, this patch borrows anti fragmentation logic from
      page allocator.  It will reduce premature compaction finish in some cases
      and reduce excessive compaction work.
      
      stress-highalloc test in mmtests with non movable order 7 allocation shows
      considerable increase of compaction success rate.
      
      Compaction success rate (Compaction success * 100 / Compaction stalls, %)
      31.82 : 42.20
      
      I tested it on non-reboot 5 runs stress-highalloc benchmark and found that
      there is no more degradation on allocation success rate than before.  That
      roughly means that this patch doesn't result in more fragmentations.
      
      Vlastimil suggests additional idea that we only test for fallbacks when
      migration scanner has scanned a whole pageblock.  It looked good for
      fragmentation because chance of stealing increase due to making more free
      pages in certain pageblock.  So, I tested it, but, it results in decreased
      compaction success rate, roughly 38.00.  I guess the reason that if system
      is low memory condition, watermark check could be failed due to not enough
      order 0 free page and so, sometimes, we can't reach a fallback check
      although migrate_pfn is aligned to pageblock_nr_pages.  I can insert code
      to cope with this situation but it makes code more complicated so I don't
      include his idea at this patch.
      
      [akpm@linux-foundation.org: fix CONFIG_CMA=n build]
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2149cdae
    • Joonsoo Kim's avatar
      mm/page_alloc: factor out fallback freepage checking · 4eb7dce6
      Joonsoo Kim authored
      
      
      This is preparation step to use page allocator's anti fragmentation logic
      in compaction.  This patch just separates fallback freepage checking part
      from fallback freepage management part.  Therefore, there is no functional
      change.
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4eb7dce6
    • Joonsoo Kim's avatar
      mm/cma: change fallback behaviour for CMA freepage · dc67647b
      Joonsoo Kim authored
      
      
      Freepage with MIGRATE_CMA can be used only for MIGRATE_MOVABLE and they
      should not be expanded to other migratetype buddy list to protect them
      from unmovable/reclaimable allocation.  Implementing these requirements in
      __rmqueue_fallback(), that is, finding largest possible block of freepage
      has bad effect that high order freepage with MIGRATE_CMA are broken
      continually although there are suitable order CMA freepage.  Reason is
      that they are not be expanded to other migratetype buddy list and next
      __rmqueue_fallback() invocation try to finds another largest block of
      freepage and break it again.  So, MIGRATE_CMA fallback should be handled
      separately.  This patch introduces __rmqueue_cma_fallback(), that just
      wrapper of __rmqueue_smallest() and call it before __rmqueue_fallback() if
      migratetype == MIGRATE_MOVABLE.
      
      This results in unintended behaviour change that MIGRATE_CMA freepage is
      always used first rather than other migratetype as movable allocation's
      fallback.  But, as already mentioned above, MIGRATE_CMA can be used only
      for MIGRATE_MOVABLE, so it is better to use MIGRATE_CMA freepage first as
      much as possible.  Otherwise, we needlessly take up precious freepages
      with other migratetype and increase chance of fragmentation.
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dc67647b
  7. 12 Mar, 2015 1 commit
    • Michal Hocko's avatar
      mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disabled · e009d5dc
      Michal Hocko authored
      Tetsuo Handa has pointed out that __GFP_NOFAIL allocations might fail
      after OOM killer is disabled if the allocation is performed by a kernel
      thread.  This behavior was introduced from the very beginning by
      7f33d49a
      
       ("mm, PM/Freezer: Disable OOM killer when tasks are frozen").
       This means that the basic contract for the allocation request is broken
      and the context requesting such an allocation might blow up unexpectedly.
      
      There are basically two ways forward.
      
      1) move oom_killer_disable after kernel threads are frozen.  This has a
         risk that the OOM victim wouldn't be able to finish because it would
         depend on an already frozen kernel thread.  This would be really tricky
         to debug.
      
      2) do not fail GFP_NOFAIL allocation no matter what and risk a
         potential Freezable kernel threads will loop and fail the suspend.
         Incidental allocations after kernel threads are frozen will at least
         dump a warning - if we are lucky and the serial console is still active
         of course...
      
      This patch implements the later option because it is safer.  We would see
      warning rather than allocation failures for the kernel threads which would
      blow up otherwise and have a higher chances to identify __GFP_NOFAIL users
      from deeper pm code.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarDavid Rientjes <rientjes@gooogle.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e009d5dc
  8. 28 Feb, 2015 1 commit
  9. 13 Feb, 2015 1 commit
    • Andrey Ryabinin's avatar
      mm: page_alloc: add kasan hooks on alloc and free paths · b8c73fc2
      Andrey Ryabinin authored
      
      
      Add kernel address sanitizer hooks to mark allocated page's addresses as
      accessible in corresponding shadow region.  Mark freed pages as
      inaccessible.
      Signed-off-by: default avatarAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrey Konovalov <adech.fo@gmail.com>
      Cc: Yuri Gribov <tetra2005@gmail.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b8c73fc2
  10. 12 Feb, 2015 2 commits
  11. 11 Feb, 2015 8 commits
    • Vlastimil Babka's avatar
      mm: more aggressive page stealing for UNMOVABLE allocations · 9c0415eb
      Vlastimil Babka authored
      
      
      When allocation falls back to stealing free pages of another migratetype,
      it can decide to steal extra pages, or even the whole pageblock in order
      to reduce fragmentation, which could happen if further allocation
      fallbacks pick a different pageblock.  In try_to_steal_freepages(), one of
      the situations where extra pages are stolen happens when we are trying to
      allocate a MIGRATE_RECLAIMABLE page.
      
      However, MIGRATE_UNMOVABLE allocations are not treated the same way,
      although spreading such allocation over multiple fallback pageblocks is
      arguably even worse than it is for RECLAIMABLE allocations.  To minimize
      fragmentation, we should minimize the number of such fallbacks, and thus
      steal as much as is possible from each fallback pageblock.
      
      Note that in theory this might put more pressure on movable pageblocks and
      cause movable allocations to steal back from unmovable pageblocks.
      However, movable allocations are not as aggressive with stealing, and do
      not cause permanent fragmentation, so the tradeoff is reasonable, and
      evaluation seems to support the change.
      
      This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to
      steal extra free pages.  When evaluating with stress-highalloc from
      mmtests, this has reduced the number of MIGRATE_UNMOVABLE fallbacks to
      roughly 1/6.  The number of these fallbacks stealing from MIGRATE_MOVABLE
      block is reduced to 1/3.  There was no observation of growing number of
      unmovable pageblocks over time, and also not of increased movable
      allocation fallbacks.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9c0415eb
    • Vlastimil Babka's avatar
      mm: always steal split buddies in fallback allocations · 3a1086fb
      Vlastimil Babka authored
      When allocation falls back to another migratetype, it will steal a page
      with highest available order, and (depending on this order and desired
      migratetype), it might also steal the rest of free pages from the same
      pageblock.
      
      Given the preference of highest available order, it is likely that it will
      be higher than the desired order, and result in the stolen buddy page
      being split.  The remaining pages after split are currently stolen only
      when the rest of the free pages are stolen.  This can however lead to
      situations where for MOVABLE allocations we split e.g.  order-4 fallback
      UNMOVABLE page, but steal only order-0 page.  Then on the next MOVABLE
      allocation (which may be batched to fill the pcplists) we split another
      order-3 or higher page, etc.  By stealing all pages that we have split, we
      can avoid further stealing.
      
      This patch therefore adjusts the page stealing so that buddy pages created
      by split are always stolen.  This has effect only on MOVABLE allocations,
      as RECLAIMABLE and UNMOVABLE allocations already always do that in
      addition to stealing the rest of free pages from the pageblock.  The
      change also allows to simplify try_to_steal_freepages() and factor out CMA
      handling.
      
      According to Mel, it has been intended since the beginning that buddy
      pages after split would be stolen always, but it doesn't seem like it was
      ever the case until commit 47118af0 ("mm: mmzone: MIGRATE_CMA
      migration type added").  The commit has unintentionally introduced this
      behavior, but was reverted by commit 0cbef29a
      
       ("mm:
      __rmqueue_fallback() should respect pageblock type").  Neither included
      evaluation.
      
      My evaluation with stress-highalloc from mmtests shows about 2.5x
      reduction of page stealing events for MOVABLE allocations, without
      affecting the page stealing events for other allocation migratetypes.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3a1086fb
    • Vlastimil Babka's avatar
      mm: when stealing freepages, also take pages created by splitting buddy page · 99592d59
      Vlastimil Babka authored
      When studying page stealing, I noticed some weird looking decisions in
      try_to_steal_freepages().  The first I assume is a bug (Patch 1), the
      following two patches were driven by evaluation.
      
      Testing was done with stress-highalloc of mmtests, using the
      mm_page_alloc_extfrag tracepoint and postprocessing to get counts of how
      often page stealing occurs for individual migratetypes, and what
      migratetypes are used for fallbacks.  Arguably, the worst case of page
      stealing is when UNMOVABLE allocation steals from MOVABLE pageblock.
      RECLAIMABLE allocation stealing from MOVABLE allocation is also not ideal,
      so the goal is to minimize these two cases.
      
      The evaluation of v2 wasn't always clear win and Joonsoo questioned the
      results.  Here I used different baseline which includes RFC compaction
      improvements from [1].  I found that the compaction improvements reduce
      variability of stress-highalloc, so there's less noise in the data.
      
      First, let's look at stress-highalloc configured to do sync compaction,
      and how these patches reduce page stealing events during the test.  First
      column is after fresh reboot, other two are reiterations of test without
      reboot.  That was all accumulater over 5 re-iterations (so the benchmark
      was run 5x3 times with 5 fresh restarts).
      
      Baseline:
      
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                        5-nothp-1       5-nothp-2       5-nothp-3
      Page alloc extfrag event                               10264225     8702233    10244125
      Extfrag fragmenting                                    10263271     8701552    10243473
      Extfrag fragmenting for unmovable                         13595       17616       15960
      Extfrag fragmenting unmovable placed with movable          7989       12193        8447
      Extfrag fragmenting for reclaimable                         658        1840        1817
      Extfrag fragmenting reclaimable placed with movable         558        1677        1679
      Extfrag fragmenting for movable                        10249018     8682096    10225696
      
      With Patch 1:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                        6-nothp-1       6-nothp-2       6-nothp-3
      Page alloc extfrag event                               11834954     9877523     9774860
      Extfrag fragmenting                                    11833993     9876880     9774245
      Extfrag fragmenting for unmovable                          7342       16129       11712
      Extfrag fragmenting unmovable placed with movable          4191       10547        6270
      Extfrag fragmenting for reclaimable                         373        1130         923
      Extfrag fragmenting reclaimable placed with movable         302         906         738
      Extfrag fragmenting for movable                        11826278     9859621     9761610
      
      With Patch 2:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                        7-nothp-1       7-nothp-2       7-nothp-3
      Page alloc extfrag event                                4725990     3668793     3807436
      Extfrag fragmenting                                     4725104     3668252     3806898
      Extfrag fragmenting for unmovable                          6678        7974        7281
      Extfrag fragmenting unmovable placed with movable          2051        3829        4017
      Extfrag fragmenting for reclaimable                         429        1208        1278
      Extfrag fragmenting reclaimable placed with movable         369         976        1034
      Extfrag fragmenting for movable                         4717997     3659070     3798339
      
      With Patch 3:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                        8-nothp-1       8-nothp-2       8-nothp-3
      Page alloc extfrag event                                5016183     4700142     3850633
      Extfrag fragmenting                                     5015325     4699613     3850072
      Extfrag fragmenting for unmovable                          1312        3154        3088
      Extfrag fragmenting unmovable placed with movable          1115        2777        2714
      Extfrag fragmenting for reclaimable                         437        1193        1097
      Extfrag fragmenting reclaimable placed with movable         330         969         879
      Extfrag fragmenting for movable                         5013576     4695266     3845887
      
      In v2 we've seen apparent regression with Patch 1 for unmovable events,
      this is now gone, suggesting it was indeed noise.  Here, each patch
      improves the situation for unmovable events.  Reclaimable is improved by
      patch 1 and then either the same modulo noise, or perhaps sligtly worse -
      a small price for unmovable improvements, IMHO.  The number of movable
      allocations falling back to other migratetypes is most noisy, but it's
      reduced to half at Patch 2 nevertheless.  These are least critical as
      compaction can move them around.
      
      If we look at success rates, the patches don't affect them, that didn't change.
      
      Baseline:
                                   3.19-rc4              3.19-rc4              3.19-rc4
                                  5-nothp-1             5-nothp-2             5-nothp-3
      Success 1 Min         49.00 (  0.00%)       42.00 ( 14.29%)       41.00 ( 16.33%)
      Success 1 Mean        51.00 (  0.00%)       45.00 ( 11.76%)       42.60 ( 16.47%)
      Success 1 Max         55.00 (  0.00%)       51.00 (  7.27%)       46.00 ( 16.36%)
      Success 2 Min         53.00 (  0.00%)       47.00 ( 11.32%)       44.00 ( 16.98%)
      Success 2 Mean        59.60 (  0.00%)       50.80 ( 14.77%)       48.20 ( 19.13%)
      Success 2 Max         64.00 (  0.00%)       56.00 ( 12.50%)       52.00 ( 18.75%)
      Success 3 Min         84.00 (  0.00%)       82.00 (  2.38%)       78.00 (  7.14%)
      Success 3 Mean        85.60 (  0.00%)       82.80 (  3.27%)       79.40 (  7.24%)
      Success 3 Max         86.00 (  0.00%)       83.00 (  3.49%)       80.00 (  6.98%)
      
      Patch 1:
                                   3.19-rc4              3.19-rc4              3.19-rc4
                                  6-nothp-1             6-nothp-2             6-nothp-3
      Success 1 Min         49.00 (  0.00%)       44.00 ( 10.20%)       44.00 ( 10.20%)
      Success 1 Mean        51.80 (  0.00%)       46.00 ( 11.20%)       45.80 ( 11.58%)
      Success 1 Max         54.00 (  0.00%)       49.00 (  9.26%)       49.00 (  9.26%)
      Success 2 Min         58.00 (  0.00%)       49.00 ( 15.52%)       48.00 ( 17.24%)
      Success 2 Mean        60.40 (  0.00%)       51.80 ( 14.24%)       50.80 ( 15.89%)
      Success 2 Max         63.00 (  0.00%)       54.00 ( 14.29%)       55.00 ( 12.70%)
      Success 3 Min         84.00 (  0.00%)       81.00 (  3.57%)       79.00 (  5.95%)
      Success 3 Mean        85.00 (  0.00%)       81.60 (  4.00%)       79.80 (  6.12%)
      Success 3 Max         86.00 (  0.00%)       82.00 (  4.65%)       82.00 (  4.65%)
      
      Patch 2:
      
                                   3.19-rc4              3.19-rc4              3.19-rc4
                                  7-nothp-1             7-nothp-2             7-nothp-3
      Success 1 Min         50.00 (  0.00%)       44.00 ( 12.00%)       39.00 ( 22.00%)
      Success 1 Mean        52.80 (  0.00%)       45.60 ( 13.64%)       42.40 ( 19.70%)
      Success 1 Max         55.00 (  0.00%)       46.00 ( 16.36%)       47.00 ( 14.55%)
      Success 2 Min         52.00 (  0.00%)       48.00 (  7.69%)       45.00 ( 13.46%)
      Success 2 Mean        53.40 (  0.00%)       49.80 (  6.74%)       48.80 (  8.61%)
      Success 2 Max         57.00 (  0.00%)       52.00 (  8.77%)       52.00 (  8.77%)
      Success 3 Min         84.00 (  0.00%)       81.00 (  3.57%)       79.00 (  5.95%)
      Success 3 Mean        85.00 (  0.00%)       82.40 (  3.06%)       79.60 (  6.35%)
      Success 3 Max         86.00 (  0.00%)       83.00 (  3.49%)       80.00 (  6.98%)
      
      Patch 3:
                                   3.19-rc4              3.19-rc4              3.19-rc4
                                  8-nothp-1             8-nothp-2             8-nothp-3
      Success 1 Min         46.00 (  0.00%)       44.00 (  4.35%)       42.00 (  8.70%)
      Success 1 Mean        50.20 (  0.00%)       45.60 (  9.16%)       44.00 ( 12.35%)
      Success 1 Max         52.00 (  0.00%)       47.00 (  9.62%)       47.00 (  9.62%)
      Success 2 Min         53.00 (  0.00%)       49.00 (  7.55%)       48.00 (  9.43%)
      Success 2 Mean        55.80 (  0.00%)       50.60 (  9.32%)       49.00 ( 12.19%)
      Success 2 Max         59.00 (  0.00%)       52.00 ( 11.86%)       51.00 ( 13.56%)
      Success 3 Min         84.00 (  0.00%)       80.00 (  4.76%)       79.00 (  5.95%)
      Success 3 Mean        85.40 (  0.00%)       81.60 (  4.45%)       80.40 (  5.85%)
      Success 3 Max         87.00 (  0.00%)       83.00 (  4.60%)       82.00 (  5.75%)
      
      While there's no improvement here, I consider reduced fragmentation events
      to be worth on its own.  Patch 2 also seems to reduce scanning for free
      pages, and migrations in compaction, suggesting it has somewhat less work
      to do:
      
      Patch 1:
      
      Compaction stalls                 4153        3959        3978
      Compaction success                1523        1441        1446
      Compaction failures               2630        2517        2531
      Page migrate success           4600827     4943120     5104348
      Page migrate failure             19763       16656       17806
      Compaction pages isolated      9597640    10305617    10653541
      Compaction migrate scanned    77828948    86533283    87137064
      Compaction free scanned      517758295   521312840   521462251
      Compaction cost                   5503        5932        6110
      
      Patch 2:
      
      Compaction stalls                 3800        3450        3518
      Compaction success                1421        1316        1317
      Compaction failures               2379        2134        2201
      Page migrate success           4160421     4502708     4752148
      Page migrate failure             19705       14340       14911
      Compaction pages isolated      8731983     9382374     9910043
      Compaction migrate scanned    98362797    96349194    98609686
      Compaction free scanned      496512560   469502017   480442545
      Compaction cost                   5173        5526        5811
      
      As with v2, /proc/pagetypeinfo appears unaffected with respect to numbers
      of unmovable and reclaimable pageblocks.
      
      Configuring the benchmark to allocate like THP page fault (i.e.  no sync
      compaction) gives much noisier results for iterations 2 and 3 after
      reboot.  This is not so surprising given how [1] offers lower improvements
      in this scenario due to less restarts after deferred compaction which
      would change compaction pivot.
      
      Baseline:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                          5-thp-1         5-thp-2         5-thp-3
      Page alloc extfrag event                                8148965     6227815     6646741
      Extfrag fragmenting                                     8147872     6227130     6646117
      Extfrag fragmenting for unmovable                         10324       12942       15975
      Extfrag fragmenting unmovable placed with movable          5972        8495       10907
      Extfrag fragmenting for reclaimable                         601        1707        2210
      Extfrag fragmenting reclaimable placed with movable         520        1570        2000
      Extfrag fragmenting for movable                         8136947     6212481     6627932
      
      Patch 1:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                          6-thp-1         6-thp-2         6-thp-3
      Page alloc extfrag event                                8345457     7574471     7020419
      Extfrag fragmenting                                     8343546     7573777     7019718
      Extfrag fragmenting for unmovable                         10256       18535       30716
      Extfrag fragmenting unmovable placed with movable          6893       11726       22181
      Extfrag fragmenting for reclaimable                         465        1208        1023
      Extfrag fragmenting reclaimable placed with movable         353         996         843
      Extfrag fragmenting for movable                         8332825     7554034     6987979
      
      Patch 2:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                          7-thp-1         7-thp-2         7-thp-3
      Page alloc extfrag event                                3512847     3020756     2891625
      Extfrag fragmenting                                     3511940     3020185     2891059
      Extfrag fragmenting for unmovable                          9017        6892        6191
      Extfrag fragmenting unmovable placed with movable          1524        3053        2435
      Extfrag fragmenting for reclaimable                         445        1081        1160
      Extfrag fragmenting reclaimable placed with movable         375         918         986
      Extfrag fragmenting for movable                         3502478     3012212     2883708
      
      Patch 3:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                          8-thp-1         8-thp-2         8-thp-3
      Page alloc extfrag event                                3181699     3082881     2674164
      Extfrag fragmenting                                     3180812     3082303     2673611
      Extfrag fragmenting for unmovable                          1201        4031        4040
      Extfrag fragmenting unmovable placed with movable           974        3611        3645
      Extfrag fragmenting for reclaimable                         478        1165        1294
      Extfrag fragmenting reclaimable placed with movable         387         985        1030
      Extfrag fragmenting for movable                         3179133     3077107     2668277
      
      The improvements for first iteration are clear, the rest is much noisier
      and can appear like regression for Patch 1.  Anyway, patch 2 rectifies it.
      
      Allocation success rates are again unaffected so there's no point in
      making this e-mail any longer.
      
      [1] http://marc.info/?l=linux-mm&m=142166196321125&w=2
      
      This patch (of 3):
      
      When __rmqueue_fallback() is called to allocate a page of order X, it will
      find a page of order Y >= X of a fallback migratetype, which is different
      from the desired migratetype.  With the help of try_to_steal_freepages(),
      it may change the migratetype (to the desired one) also of:
      
      1) all currently free pages in the pageblock containing the fallback page
      2) the fallback pageblock itself
      3) buddy pages created by splitting the fallback page (when Y > X)
      
      These decisions take the order Y into account, as well as the desired
      migratetype, with the goal of preventing multiple fallback allocations
      that could e.g.  distribute UNMOVABLE allocations among multiple
      pageblocks.
      
      Originally, decision for 1) has implied the decision for 3).  Commit
      47118af0 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
      (probably unintentionally) so that the buddy pages in case 3) are always
      changed to the desired migratetype, except for CMA pageblocks.
      
      Commit fef903ef ("mm/page_allo.c: restructure free-page stealing code
      and fix a bug") did some refactoring and added a comment that the case of
      3) is intended.  Commit 0cbef29a ("mm: __rmqueue_fallback() should
      respect pageblock type") removed the comment and tried to restore the
      original behavior where 1) implies 3), but due to the previous
      refactoring, the result is instead that only 2) implies 3) - and the
      conditions for 2) are less frequently met than conditions for 1).  This
      may increase fragmentation in situations where the code decides to steal
      all free pages from the pageblock (case 1)), but then gives back the buddy
      pages produced by splitting.
      
      This patch restores the original intended logic where 1) implies 3).
      During testing with stress-highalloc from mmtests, this has shown to
      decrease the number of events where UNMOVABLE and RECLAIMABLE allocations
      steal from MOVABLE pageblocks, which can lead to permanent fragmentation.
      In some cases it has increased the number of events when MOVABLE
      allocations steal from UNMOVABLE or RECLAIMABLE pageblocks, but these are
      fixable by sync compaction and thus less harmful.
      
      Note that evaluation has shown that the behavior introduced by
      47118af0 for buddy pages in case 3) is actually even better than the
      original logic, so the following patch will introduce it properly once
      again.  For stable backports of this patch it makes thus sense to only fix
      versions containing 0cbef29a
      
      .
      
      [iamjoonsoo.kim@lge.com: tracepoint fix]
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>	[3.13+ containing 0cbef29a
      
      ]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      99592d59
    • Michal Hocko's avatar
      oom, PM: make OOM detection in the freezer path raceless · c32b3cbe
      Michal Hocko authored
      Commit 5695be14
      
       ("OOM, PM: OOM killed task shouldn't escape PM
      suspend") has left a race window when OOM killer manages to
      note_oom_kill after freeze_processes checks the counter.  The race
      window is quite small and really unlikely and partial solution deemed
      sufficient at the time of submission.
      
      Tejun wasn't happy about this partial solution though and insisted on a
      full solution.  That requires the full OOM and freezer's task freezing
      exclusion, though.  This is done by this patch which introduces oom_sem
      RW lock and turns oom_killer_disable() into a full OOM barrier.
      
      oom_killer_disabled check is moved from the allocation path to the OOM
      level and we take oom_sem for reading for both the check and the whole
      OOM invocation.
      
      oom_killer_disable() takes oom_sem for writing so it waits for all
      currently running OOM killer invocations.  Then it disable all the further
      OOMs by setting oom_killer_disabled and checks for any oom victims.
      Victims are counted via mark_tsk_oom_victim resp.  unmark_oom_victim.  The
      last victim wakes up all waiters enqueued by oom_killer_disable().
      Therefore this function acts as the full OOM barrier.
      
      The page fault path is covered now as well although it was assumed to be
      safe before.  As per Tejun, "We used to have freezing points deep in file
      system code which may be reacheable from page fault." so it would be
      better and more robust to not rely on freezing points here.  Same applies
      to the memcg OOM killer.
      
      out_of_memory tells the caller whether the OOM was allowed to trigger and
      the callers are supposed to handle the situation.  The page allocation
      path simply fails the allocation same as before.  The page fault path will
      retry the fault (more on that later) and Sysrq OOM trigger will simply
      complain to the log.
      
      Normally there wouldn't be any unfrozen user tasks after
      try_to_freeze_tasks so the function will not block. But if there was an
      OOM killer racing with try_to_freeze_tasks and the OOM victim didn't
      finish yet then we have to wait for it. This should complete in a finite
      time, though, because
      
      	- the victim cannot loop in the page fault handler (it would die
      	  on the way out from the exception)
      	- it cannot loop in the page allocator because all the further
      	  allocation would fail and __GFP_NOFAIL allocations are not
      	  acceptable at this stage
      	- it shouldn't be blocked on any locks held by frozen tasks
      	  (try_to_freeze expects lockless context) and kernel threads and
      	  work queues are not frozen yet
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Suggested-by: default avatarTejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c32b3cbe
    • Juergen Gross's avatar
      mm: use correct format specifiers when printing address ranges · 8d29e18a
      Juergen Gross authored
      
      
      Especially on 32 bit kernels memory node ranges are printed with 32 bit
      wide addresses only.  Use u64 types and %llx specifiers to print full
      width of addresses.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8d29e18a
    • Kirill A. Shutemov's avatar
      mm: more checks on free_pages_prepare() for tail pages · 81422f29
      Kirill A. Shutemov authored
      
      
      Although it was not called, destroy_compound_page() did some potentially
      useful checks.  Let's re-introduce them in free_pages_prepare(), where
      they can be actually triggered when CONFIG_DEBUG_VM=y.
      
      compound_order() assert is already in free_pages_prepare().  We have few
      checks for tail pages left.
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      81422f29
    • Kirill A. Shutemov's avatar
      mm/page_alloc.c: drop dead destroy_compound_page() · 6e9f0d58
      Kirill A. Shutemov authored
      
      
      The only caller is __free_one_page(). By the time we should have
      page->flags to be cleared already:
      
       - for 0-order pages though PCP list:
      	free_hot_cold_page()
      		free_pages_prepare()
      			free_pages_check()
      				page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
      		<put the page to PCP list>
      
      	free_pcppages_bulk()
      		page = <withdraw pages from PCP list>
      		__free_one_page(page)
      
       - for non-0-order pages:
      	__free_pages_ok()
      		free_pages_prepare()
      			free_pages_check()
      				page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
      		free_one_page()
      			__free_one_page()
      
      So there's no way PageCompound() will return true in __free_one_page().
      Let's remove dead destroy_compound_page() and put assert for page->flags
      there instead.
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6e9f0d58
    • Vlastimil Babka's avatar
      mm: reduce try_to_compact_pages parameters · 1a6d53a1
      Vlastimil Babka authored
      
      
      Expand the usage of the struct alloc_context introduced in the previous
      patch also for calling try_to_compact_pages(), to reduce the number of its
      parameters.  Since the function is in different compilation unit, we need
      to move alloc_context definition in the shared mm/internal.h header.
      
      With this change we get simpler code and small savings of code size and stack
      usage:
      
      add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-27 (-27)
      function                                     old     new   delta
      __alloc_pages_direct_compact                 283     256     -27
      add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-13 (-13)
      function                                     old     new   delta
      try_to_compact_pages                         582     569     -13
      
      Stack usage of __alloc_pages_direct_compact goes from 24 to none (per
      scripts/checkstack.pl).
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1a6d53a1