1. 20 May, 2016 1 commit
    • Alexander Potapenko's avatar
      mm: kasan: initial memory quarantine implementation · 55834c59
      Alexander Potapenko authored
      Quarantine isolates freed objects in a separate queue.  The objects are
      returned to the allocator later, which helps to detect use-after-free
      errors.
      
      When the object is freed, its state changes from KASAN_STATE_ALLOC to
      KASAN_STATE_QUARANTINE.  The object is poisoned and put into quarantine
      instead of being returned to the allocator, therefore every subsequent
      access to that object triggers a KASAN error, and the error handler is
      able to say where the object has been allocated and deallocated.
      
      When it's time for the object to leave quarantine, its state becomes
      KASAN_STATE_FREE and it's returned to the allocator.  From now on the
      allocator may reuse it for another allocation.  Before that happens,
      it's still possible to detect a use-after free on that object (it
      retains the allocation/deallocation stacks).
      
      When the allocator reuses this object, the shadow is unpoisoned and old
      allocation/deallocation stacks are wiped.  Therefore a use of this
      object, even an incorrect one, won't trigger ASan warning.
      
      Without the quarantine, it's not guaranteed that the objects aren't
      reused immediately, that's why the probability of catching a
      use-after-free is lower than with quarantine in place.
      
      Quarantine isolates freed objects in a separate queue.  The objects are
      returned to the allocator later, which helps to detect use-after-free
      errors.
      
      Freed objects are first added to per-cpu quarantine queues.  When a
      cache is destroyed or memory shrinking is requested, the objects are
      moved into the global quarantine queue.  Whenever a kmalloc call allows
      memory reclaiming, the oldest objects are popped out of the global queue
      until the total size of objects in quarantine is less than 3/4 of the
      maximum quarantine size (which is a fraction of installed physical
      memory).
      
      As long as an object remains in the quarantine, KASAN is able to report
      accesses to it, so the chance of reporting a use-after-free is
      increased.  Once the object leaves quarantine, the allocator may reuse
      it, in which case the object is unpoisoned and KASAN can't detect
      incorrect accesses to it.
      
      Right now quarantine support is only enabled in SLAB allocator.
      Unification of KASAN features in SLAB and SLUB will be done later.
      
      This patch is based on the "mm: kasan: quarantine" patch originally
      prepared by Dmitry Chernenkov.  A number of improvements have been
      suggested by Andrey Ryabinin.
      
      [glider@google.com: v9]
        Link: http://lkml.kernel.org/r/1462987130-144092-1-git-send-email-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      55834c59
  2. 25 Mar, 2016 1 commit
    • Alexander Potapenko's avatar
      mm, kasan: add GFP flags to KASAN API · 505f5dcb
      Alexander Potapenko authored
      Add GFP flags to KASAN hooks for future patches to use.
      
      This patch is based on the "mm: kasan: unified support for SLUB and SLAB
      allocators" patch originally prepared by Dmitry Chernenkov.
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      505f5dcb
  3. 17 Mar, 2016 1 commit
  4. 15 Mar, 2016 4 commits
  5. 18 Feb, 2016 1 commit
    • Dmitry Safonov's avatar
      mm: slab: free kmem_cache_node after destroy sysfs file · 52b4b950
      Dmitry Safonov authored
      When slub_debug alloc_calls_show is enabled we will try to track
      location and user of slab object on each online node, kmem_cache_node
      structure and cpu_cache/cpu_slub shouldn't be freed till there is the
      last reference to sysfs file.
      
      This fixes the following panic:
      
         BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
         IP:  list_locations+0x169/0x4e0
         PGD 257304067 PUD 438456067 PMD 0
         Oops: 0000 [#1] SMP
         CPU: 3 PID: 973074 Comm: cat ve: 0 Not tainted 3.10.0-229.7.2.ovz.9.30-00007-japdoll-dirty #2 9.30
         Hardware name: DEPO Computers To Be Filled By O.E.M./H67DE3, BIOS L1.60c 07/14/2011
         task: ffff88042a5dc5b0 ti: ffff88037f8d8000 task.ti: ffff88037f8d8000
         RIP: list_locations+0x169/0x4e0
         Call Trace:
           alloc_calls_show+0x1d/0x30
           slab_attr_show+0x1b/0x30
           sysfs_read_file+0x9a/0x1a0
           vfs_read+0x9c/0x170
           SyS_read+0x58/0xb0
           system_call_fastpath+0x16/0x1b
         Code: 5e 07 12 00 b9 00 04 00 00 3d 00 04 00 00 0f 4f c1 3d 00 04 00 00 89 45 b0 0f 84 c3 00 00 00 48 63 45 b0 49 8b 9c c4 f8 00 00 00 <48> 8b 43 20 48 85 c0 74 b6 48 89 df e8 46 37 44 00 48 8b 53 10
         CR2: 0000000000000020
      
      Separated __kmem_cache_release from __kmem_cache_shutdown which now
      called on slab_kmem_cache_release (after the last reference to sysfs
      file object has dropped).
      
      Reintroduced locking in free_partial as sysfs file might access cache's
      partial list after shutdowning - partial revert of the commit
      69cb8e6b ("slub: free slabs without holding locks").  Zap
      __remove_partial and use remove_partial (w/o underscores) as
      free_partial now takes list_lock which s partial revert for commit
      1e4dd946 ("slub: do not assert not having lock in removing freed
      partial")
      Signed-off-by: default avatarDmitry Safonov <dsafonov@virtuozzo.com>
      Suggested-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      52b4b950
  6. 20 Jan, 2016 1 commit
  7. 14 Jan, 2016 1 commit
    • Vladimir Davydov's avatar
      slab: add SLAB_ACCOUNT flag · 230e9fc2
      Vladimir Davydov authored
      Currently, if we want to account all objects of a particular kmem cache,
      we have to pass __GFP_ACCOUNT to each kmem_cache_alloc call, which is
      inconvenient.  This patch introduces SLAB_ACCOUNT flag which if passed
      to kmem_cache_create will force accounting for every allocation from
      this cache even if __GFP_ACCOUNT is not passed.
      
      This patch does not make any of the existing caches use this flag - it
      will be done later in the series.
      
      Note, a cache with SLAB_ACCOUNT cannot be merged with a cache w/o
      SLAB_ACCOUNT, because merged caches share the same kmem_cache struct and
      hence cannot have different sets of SLAB_* flags.  Thus using this flag
      will probably reduce the number of merged slabs even if kmem accounting
      is not used (only compiled in).
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Suggested-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      230e9fc2
  8. 22 Nov, 2015 1 commit
    • Jesper Dangaard Brouer's avatar
      slab/slub: adjust kmem_cache_alloc_bulk API · 865762a8
      Jesper Dangaard Brouer authored
      Adjust kmem_cache_alloc_bulk API before we have any real users.
      
      Adjust API to return type 'int' instead of previously type 'bool'.  This
      is done to allow future extension of the bulk alloc API.
      
      A future extension could be to allow SLUB to stop at a page boundary, when
      specified by a flag, and then return the number of objects.
      
      The advantage of this approach, would make it easier to make bulk alloc
      run without local IRQs disabled.  With an approach of cmpxchg "stealing"
      the entire c->freelist or page->freelist.  To avoid overshooting we would
      stop processing at a slab-page boundary.  Else we always end up returning
      some objects at the cost of another cmpxchg.
      
      To keep compatible with future users of this API linking against an older
      kernel when using the new flag, we need to return the number of allocated
      objects with this API change.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      865762a8
  9. 05 Nov, 2015 2 commits
    • Vladimir Davydov's avatar
      memcg: unify slab and other kmem pages charging · f3ccb2c4
      Vladimir Davydov authored
      We have memcg_kmem_charge and memcg_kmem_uncharge methods for charging and
      uncharging kmem pages to memcg, but currently they are not used for
      charging slab pages (i.e.  they are only used for charging pages allocated
      with alloc_kmem_pages).  The only reason why the slab subsystem uses
      special helpers, memcg_charge_slab and memcg_uncharge_slab, is that it
      needs to charge to the memcg of kmem cache while memcg_charge_kmem charges
      to the memcg that the current task belongs to.
      
      To remove this diversity, this patch adds an extra argument to
      __memcg_kmem_charge that can be a pointer to a memcg or NULL.  If it is
      not NULL, the function tries to charge to the memcg it points to,
      otherwise it charge to the current context.  Next, it makes the slab
      subsystem use this function to charge slab pages.
      
      Since memcg_charge_kmem and memcg_uncharge_kmem helpers are now used only
      in __memcg_kmem_charge and __memcg_kmem_uncharge, they are inlined.  Since
      __memcg_kmem_charge stores a pointer to the memcg in the page struct, we
      don't need memcg_uncharge_slab anymore and can use free_kmem_pages.
      Besides, one can now detect which memcg a slab page belongs to by reading
      /proc/kpagecgroup.
      
      Note, this patch switches slab to charge-after-alloc design.  Since this
      design is already used for all other memcg charges, it should not make any
      difference.
      
      [hannes@cmpxchg.org: better to have an outer function than a magic parameter for the memcg lookup]
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f3ccb2c4
    • Vladimir Davydov's avatar
      mm/slab_common.c: clear pointers to per memcg caches on destroy · d60fdcc9
      Vladimir Davydov authored
      Currently, we do not clear pointers to per memcg caches in the
      memcg_params.memcg_caches array when a global cache is destroyed with
      kmem_cache_destroy.
      
      This is fine if the global cache does get destroyed.  However, a cache can
      be left on the list if it still has active objects when kmem_cache_destroy
      is called (due to a memory leak).  If this happens, the entries in the
      array will point to already freed areas, which is likely to result in data
      corruption when the cache is reused (via slab merging).
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d60fdcc9
  10. 04 Sep, 2015 2 commits
  11. 24 Jun, 2015 1 commit
    • Daniel Sanders's avatar
      slab: correct size_index table before replacing the bootstrap kmem_cache_node · 34cc6990
      Daniel Sanders authored
      This patch moves the initialization of the size_index table slightly
      earlier so that the first few kmem_cache_node's can be safely allocated
      when KMALLOC_MIN_SIZE is large.
      
      There are currently two ways to generate indices into kmalloc_caches (via
      kmalloc_index() and via the size_index table in slab_common.c) and on some
      arches (possibly only MIPS) they potentially disagree with each other
      until create_kmalloc_caches() has been called.  It seems that the
      intention is that the size_index table is a fast equivalent to
      kmalloc_index() and that create_kmalloc_caches() patches the table to
      return the correct value for the cases where kmalloc_index()'s
      if-statements apply.
      
      The failing sequence was:
      * kmalloc_caches contains NULL elements
      * kmem_cache_init initialises the element that 'struct
        kmem_cache_node' will be allocated to. For 32-bit Mips, this is a
        56-byte struct and kmalloc_index returns KMALLOC_SHIFT_LOW (7).
      * init_list is called which calls kmalloc_node to allocate a 'struct
        kmem_cache_node'.
      * kmalloc_slab selects the kmem_caches element using
        size_index[size_index_elem(size)]. For MIPS, size is 56, and the
        expression returns 6.
      * This element of kmalloc_caches is NULL and allocation fails.
      * If it had not already failed, it would have called
        create_kmalloc_caches() at this point which would have changed
        size_index[size_index_elem(size)] to 7.
      
      I don't believe the bug to be LLVM specific but GCC doesn't normally
      encounter the problem.  I haven't been able to identify exactly what GCC
      is doing better (probably inlining) but it seems that GCC is managing to
      optimize to the point that it eliminates the problematic allocations.
      This theory is supported by the fact that GCC can be made to fail in the
      same way by changing inline, __inline, __inline__, and __always_inline in
      include/linux/compiler-gcc.h such that they don't actually inline things.
      Signed-off-by: default avatarDaniel Sanders <daniel.sanders@imgtec.com>
      Acked-by: default avatarPekka Enberg <penberg@kernel.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      34cc6990
  12. 12 Feb, 2015 3 commits
    • Vladimir Davydov's avatar
      slub: make dead caches discard free slabs immediately · d6e0b7fa
      Vladimir Davydov authored
      To speed up further allocations SLUB may store empty slabs in per cpu/node
      partial lists instead of freeing them immediately.  This prevents per
      memcg caches destruction, because kmem caches created for a memory cgroup
      are only destroyed after the last page charged to the cgroup is freed.
      
      To fix this issue, this patch resurrects approach first proposed in [1].
      It forbids SLUB to cache empty slabs after the memory cgroup that the
      cache belongs to was destroyed.  It is achieved by setting kmem_cache's
      cpu_partial and min_partial constants to 0 and tuning put_cpu_partial() so
      that it would drop frozen empty slabs immediately if cpu_partial = 0.
      
      The runtime overhead is minimal.  From all the hot functions, we only
      touch relatively cold put_cpu_partial(): we make it call
      unfreeze_partials() after freezing a slab that belongs to an offline
      memory cgroup.  Since slab freezing exists to avoid moving slabs from/to a
      partial list on free/alloc, and there can't be allocations from dead
      caches, it shouldn't cause any overhead.  We do have to disable preemption
      for put_cpu_partial() to achieve that though.
      
      The original patch was accepted well and even merged to the mm tree.
      However, I decided to withdraw it due to changes happening to the memcg
      core at that time.  I had an idea of introducing per-memcg shrinkers for
      kmem caches, but now, as memcg has finally settled down, I do not see it
      as an option, because SLUB shrinker would be too costly to call since SLUB
      does not keep free slabs on a separate list.  Besides, we currently do not
      even call per-memcg shrinkers for offline memcgs.  Overall, it would
      introduce much more complexity to both SLUB and memcg than this small
      patch.
      
      Regarding to SLAB, there's no problem with it, because it shrinks
      per-cpu/node caches periodically.  Thanks to list_lru reparenting, we no
      longer keep entries for offline cgroups in per-memcg arrays (such as
      memcg_cache_params->memcg_caches), so we do not have to bother if a
      per-memcg cache will be shrunk a bit later than it could be.
      
      [1] http://thread.gmane.org/gmane.linux.kernel.mm/118649/focus=118650Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d6e0b7fa
    • Vladimir Davydov's avatar
      slab: link memcg caches of the same kind into a list · 426589f5
      Vladimir Davydov authored
      Sometimes, we need to iterate over all memcg copies of a particular root
      kmem cache.  Currently, we use memcg_cache_params->memcg_caches array for
      that, because it contains all existing memcg caches.
      
      However, it's a bad practice to keep all caches, including those that
      belong to offline cgroups, in this array, because it will be growing
      beyond any bounds then.  I'm going to wipe away dead caches from it to
      save space.  To still be able to perform iterations over all memcg caches
      of the same kind, let us link them into a list.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      426589f5
    • Vladimir Davydov's avatar
      slab: embed memcg_cache_params to kmem_cache · f7ce3190
      Vladimir Davydov authored
      Currently, kmem_cache stores a pointer to struct memcg_cache_params
      instead of embedding it.  The rationale is to save memory when kmem
      accounting is disabled.  However, the memcg_cache_params has shrivelled
      drastically since it was first introduced:
      
      * Initially:
      
      struct memcg_cache_params {
      	bool is_root_cache;
      	union {
      		struct kmem_cache *memcg_caches[0];
      		struct {
      			struct mem_cgroup *memcg;
      			struct list_head list;
      			struct kmem_cache *root_cache;
      			bool dead;
      			atomic_t nr_pages;
      			struct work_struct destroy;
      		};
      	};
      };
      
      * Now:
      
      struct memcg_cache_params {
      	bool is_root_cache;
      	union {
      		struct {
      			struct rcu_head rcu_head;
      			struct kmem_cache *memcg_caches[0];
      		};
      		struct {
      			struct mem_cgroup *memcg;
      			struct kmem_cache *root_cache;
      		};
      	};
      };
      
      So the memory saving does not seem to be a clear win anymore.
      
      OTOH, keeping a pointer to memcg_cache_params struct instead of embedding
      it results in touching one more cache line on kmem alloc/free hot paths.
      Besides, it makes linking kmem caches in a list chained by a field of
      struct memcg_cache_params really painful due to a level of indirection,
      while I want to make them linked in the following patch.  That said, let
      us embed it.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f7ce3190
  13. 10 Feb, 2015 1 commit
  14. 10 Dec, 2014 3 commits
    • Vladimir Davydov's avatar
      memcg: use generic slab iterators for showing slabinfo · b047501c
      Vladimir Davydov authored
      Let's use generic slab_start/next/stop for showing memcg caches info.  In
      contrast to the current implementation, this will work even if all memcg
      caches' info doesn't fit into a seq buffer (a page), plus it simply looks
      neater.
      
      Actually, the main reason I do this isn't mere cleanup.  I'm going to zap
      the memcg_slab_caches list, because I find it useless provided we have the
      slab_caches list, and this patch is a step in this direction.
      
      It should be noted that before this patch an attempt to read
      memory.kmem.slabinfo of a cgroup that doesn't have kmem limit set resulted
      in -EIO, while after this patch it will silently show nothing except the
      header, but I don't think it will frustrate anyone.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b047501c
    • Pranith Kumar's avatar
      slab: replace smp_read_barrier_depends() with lockless_dereference() · 8df0c2dc
      Pranith Kumar authored
      Recently lockless_dereference() was added which can be used in place of
      hard-coding smp_read_barrier_depends().  The following PATCH makes the
      change.
      Signed-off-by: default avatarPranith Kumar <bobby.prani@gmail.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8df0c2dc
    • Vladimir Davydov's avatar
      slab: print slabinfo header in seq show · 1df3b26f
      Vladimir Davydov authored
      Currently we print the slabinfo header in the seq start method, which
      makes it unusable for showing leaks, so we have leaks_show, which does
      practically the same as s_show except it doesn't show the header.
      
      However, we can print the header in the seq show method - we only need
      to check if the current element is the first on the list.  This will
      allow us to use the same set of seq iterators for both leaks and
      slabinfo reporting, which is nice.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1df3b26f
  15. 09 Oct, 2014 5 commits
    • Joonsoo Kim's avatar
      mm/slab: use percpu allocator for cpu cache · bf0dea23
      Joonsoo Kim authored
      Because of chicken and egg problem, initialization of SLAB is really
      complicated.  We need to allocate cpu cache through SLAB to make the
      kmem_cache work, but before initialization of kmem_cache, allocation
      through SLAB is impossible.
      
      On the other hand, SLUB does initialization in a more simple way.  It uses
      percpu allocator to allocate cpu cache so there is no chicken and egg
      problem.
      
      So, this patch try to use percpu allocator in SLAB.  This simplifies the
      initialization step in SLAB so that we could maintain SLAB code more
      easily.
      
      In my testing there is no performance difference.
      
      This implementation relies on percpu allocator.  Because percpu allocator
      uses vmalloc address space, vmalloc address space could be exhausted by
      this change on many cpu system with *32 bit* kernel.  This implementation
      can cover 1024 cpus in worst case by following calculation.
      
      Worst: 1024 cpus * 4 bytes for pointer * 300 kmem_caches *
      	120 objects per cpu_cache = 140 MB
      Normal: 1024 cpus * 4 bytes for pointer * 150 kmem_caches(slab merge) *
      	80 objects per cpu_cache = 46 MB
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jeremiah Mahler <jmmahler@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bf0dea23
    • Joonsoo Kim's avatar
      mm/slab: support slab merge · 12220dea
      Joonsoo Kim authored
      Slab merge is good feature to reduce fragmentation.  If new creating slab
      have similar size and property with exsitent slab, this feature reuse it
      rather than creating new one.  As a result, objects are packed into fewer
      slabs so that fragmentation is reduced.
      
      Below is result of my testing.
      
      * After boot, sleep 20; cat /proc/meminfo | grep Slab
      
      <Before>
      Slab: 25136 kB
      
      <After>
      Slab: 24364 kB
      
      We can save 3% memory used by slab.
      
      For supporting this feature in SLAB, we need to implement SLAB specific
      kmem_cache_flag() and __kmem_cache_alias(), because SLUB implements some
      SLUB specific processing related to debug flag and object size change on
      these functions.
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      12220dea
    • Joonsoo Kim's avatar
      mm/slab_common: commonize slab merge logic · 423c929c
      Joonsoo Kim authored
      Slab merge is good feature to reduce fragmentation.  Now, it is only
      applied to SLUB, but, it would be good to apply it to SLAB.  This patch is
      preparation step to apply slab merge to SLAB by commonizing slab merge
      logic.
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      423c929c
    • Mikulas Patocka's avatar
      slab: fix for_each_kmem_cache_node() · 9163582c
      Mikulas Patocka authored
      Fix a bug (discovered with kmemcheck) in for_each_kmem_cache_node().  The
      for loop reads the array "node" before verifying that the index is within
      the range.  This results in kmemcheck warning.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: default avatarPekka Enberg <penberg@kernel.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9163582c
    • Joonsoo Kim's avatar
      mm/slab_common: move kmem_cache definition to internal header · 07f361b2
      Joonsoo Kim authored
      We don't need to keep kmem_cache definition in include/linux/slab.h if we
      don't need to inline kmem_cache_size().  According to my code inspection,
      this function is only called at lc_create() in lib/lru_cache.c which may
      be called at initialization phase of something, so we don't need to inline
      it.  Therfore, move it to slab_common.c and move kmem_cache definition to
      internal header.
      
      After this change, we can change kmem_cache definition easily without full
      kernel build.  For instance, we can turn on/off CONFIG_SLUB_STATS without
      full kernel build.
      
      [akpm@linux-foundation.org: export kmem_cache_size() to modules]
      [rdunlap@infradead.org: add header files to fix kmemcheck.c build errors]
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      07f361b2
  16. 06 Aug, 2014 4 commits
  17. 04 Jun, 2014 4 commits
    • Vladimir Davydov's avatar
      memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab · c67a8a68
      Vladimir Davydov authored
      Currently we have two pairs of kmemcg-related functions that are called on
      slab alloc/free.  The first is memcg_{bind,release}_pages that count the
      total number of pages allocated on a kmem cache.  The second is
      memcg_{un}charge_slab that {un}charge slab pages to kmemcg resource
      counter.  Let's just merge them to keep the code clean.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c67a8a68
    • Vladimir Davydov's avatar
      memcg, slab: do not schedule cache destruction when last page goes away · 1e32e77f
      Vladimir Davydov authored
      This patchset is a part of preparations for kmemcg re-parenting.  It
      targets at simplifying kmemcg work-flows and synchronization.
      
      First, it removes async per memcg cache destruction (see patches 1, 2).
      Now caches are only destroyed on memcg offline.  That means the caches
      that are not empty on memcg offline will be leaked.  However, they are
      already leaked, because memcg_cache_params::nr_pages normally never drops
      to 0 so the destruction work is never scheduled except kmem_cache_shrink
      is called explicitly.  In the future I'm planning reaping such dead caches
      on vmpressure or periodically.
      
      Second, it substitutes per memcg slab_caches_mutex's with the global
      memcg_slab_mutex, which should be taken during the whole per memcg cache
      creation/destruction path before the slab_mutex (see patch 3).  This
      greatly simplifies synchronization among various per memcg cache
      creation/destruction paths.
      
      I'm still not quite sure about the end picture, in particular I don't know
      whether we should reap dead memcgs' kmem caches periodically or try to
      merge them with their parents (see https://lkml.org/lkml/2014/4/20/38 for
      more details), but whichever way we choose, this set looks like a
      reasonable change to me, because it greatly simplifies kmemcg work-flows
      and eases further development.
      
      This patch (of 3):
      
      After a memcg is offlined, we mark its kmem caches that cannot be deleted
      right now due to pending objects as dead by setting the
      memcg_cache_params::dead flag, so that memcg_release_pages will schedule
      cache destruction (memcg_cache_params::destroy) as soon as the last slab
      of the cache is freed (memcg_cache_params::nr_pages drops to zero).
      
      I guess the idea was to destroy the caches as soon as possible, i.e.
      immediately after freeing the last object.  However, it just doesn't work
      that way, because kmem caches always preserve some pages for the sake of
      performance, so that nr_pages never gets to zero unless the cache is
      shrunk explicitly using kmem_cache_shrink.  Of course, we could account
      the total number of objects on the cache or check if all the slabs
      allocated for the cache are empty on kmem_cache_free and schedule
      destruction if so, but that would be too costly.
      
      Thus we have a piece of code that works only when we explicitly call
      kmem_cache_shrink, but complicates the whole picture a lot.  Moreover,
      it's racy in fact.  For instance, kmem_cache_shrink may free the last slab
      and thus schedule cache destruction before it finishes checking that the
      cache is empty, which can lead to use-after-free.
      
      So I propose to remove this async cache destruction from
      memcg_release_pages, and check if the cache is empty explicitly after
      calling kmem_cache_shrink instead.  This will simplify things a lot w/o
      introducing any functional changes.
      
      And regarding dead memcg caches (i.e.  those that are left hanging around
      after memcg offline for they have objects), I suppose we should reap them
      either periodically or on vmpressure as Glauber suggested initially.  I'm
      going to implement this later.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e32e77f
    • Vladimir Davydov's avatar
      slab: get_online_mems for kmem_cache_{create,destroy,shrink} · 03afc0e2
      Vladimir Davydov authored
      When we create a sl[au]b cache, we allocate kmem_cache_node structures
      for each online NUMA node.  To handle nodes taken online/offline, we
      register memory hotplug notifier and allocate/free kmem_cache_node
      corresponding to the node that changes its state for each kmem cache.
      
      To synchronize between the two paths we hold the slab_mutex during both
      the cache creationg/destruction path and while tuning per-node parts of
      kmem caches in memory hotplug handler, but that's not quite right,
      because it does not guarantee that a newly created cache will have all
      kmem_cache_nodes initialized in case it races with memory hotplug.  For
      instance, in case of slub:
      
          CPU0                            CPU1
          ----                            ----
          kmem_cache_create:              online_pages:
           __kmem_cache_create:            slab_memory_callback:
                                            slab_mem_going_online_callback:
                                             lock slab_mutex
                                             for each slab_caches list entry
                                                 allocate kmem_cache node
                                             unlock slab_mutex
            lock slab_mutex
            init_kmem_cache_nodes:
             for_each_node_state(node, N_NORMAL_MEMORY)
                 allocate kmem_cache node
            add kmem_cache to slab_caches list
            unlock slab_mutex
                                          online_pages (continued):
                                           node_states_set_node
      
      As a result we'll get a kmem cache with not all kmem_cache_nodes
      allocated.
      
      To avoid issues like that we should hold get/put_online_mems() during
      the whole kmem cache creation/destruction/shrink paths, just like we
      deal with cpu hotplug.  This patch does the trick.
      
      Note, that after it's applied, there is no need in taking the slab_mutex
      for kmem_cache_shrink any more, so it is removed from there.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      03afc0e2
    • Vladimir Davydov's avatar
      sl[au]b: charge slabs to kmemcg explicitly · 5dfb4175
      Vladimir Davydov authored
      We have only a few places where we actually want to charge kmem so
      instead of intruding into the general page allocation path with
      __GFP_KMEMCG it's better to explictly charge kmem there.  All kmem
      charges will be easier to follow that way.
      
      This is a step towards removing __GFP_KMEMCG.  It removes __GFP_KMEMCG
      from memcg caches' allocflags.  Instead it makes slab allocation path
      call memcg_charge_kmem directly getting memcg to charge from the cache's
      memcg params.
      
      This also eliminates any possibility of misaccounting an allocation
      going from one memcg's cache to another memcg, because now we always
      charge slabs against the memcg the cache belongs to.  That's why this
      patch removes the big comment to memcg_kmem_get_cache.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Acked-by: default avatarGreg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5dfb4175
  18. 06 May, 2014 1 commit
    • Christoph Lameter's avatar
      slub: use sysfs'es release mechanism for kmem_cache · 41a21285
      Christoph Lameter authored
      debugobjects warning during netfilter exit:
      
          ------------[ cut here ]------------
          WARNING: CPU: 6 PID: 4178 at lib/debugobjects.c:260 debug_print_object+0x8d/0xb0()
          ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
          Modules linked in:
          CPU: 6 PID: 4178 Comm: kworker/u16:2 Tainted: G        W 3.11.0-next-20130906-sasha #3984
          Workqueue: netns cleanup_net
          Call Trace:
            dump_stack+0x52/0x87
            warn_slowpath_common+0x8c/0xc0
            warn_slowpath_fmt+0x46/0x50
            debug_print_object+0x8d/0xb0
            __debug_check_no_obj_freed+0xa5/0x220
            debug_check_no_obj_freed+0x15/0x20
            kmem_cache_free+0x197/0x340
            kmem_cache_destroy+0x86/0xe0
            nf_conntrack_cleanup_net_list+0x131/0x170
            nf_conntrack_pernet_exit+0x5d/0x70
            ops_exit_list+0x5e/0x70
            cleanup_net+0xfb/0x1c0
            process_one_work+0x338/0x550
            worker_thread+0x215/0x350
            kthread+0xe7/0xf0
            ret_from_fork+0x7c/0xb0
      
      Also during dcookie cleanup:
      
          WARNING: CPU: 12 PID: 9725 at lib/debugobjects.c:260 debug_print_object+0x8c/0xb0()
          ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
          Modules linked in:
          CPU: 12 PID: 9725 Comm: trinity-c141 Not tainted 3.15.0-rc2-next-20140423-sasha-00018-gc4ff6c4 #408
          Call Trace:
            dump_stack (lib/dump_stack.c:52)
            warn_slowpath_common (kernel/panic.c:430)
            warn_slowpath_fmt (kernel/panic.c:445)
            debug_print_object (lib/debugobjects.c:262)
            __debug_check_no_obj_freed (lib/debugobjects.c:697)
            debug_check_no_obj_freed (lib/debugobjects.c:726)
            kmem_cache_free (mm/slub.c:2689 mm/slub.c:2717)
            kmem_cache_destroy (mm/slab_common.c:363)
            dcookie_unregister (fs/dcookies.c:302 fs/dcookies.c:343)
            event_buffer_release (arch/x86/oprofile/../../../drivers/oprofile/event_buffer.c:153)
            __fput (fs/file_table.c:217)
            ____fput (fs/file_table.c:253)
            task_work_run (kernel/task_work.c:125 (discriminator 1))
            do_notify_resume (include/linux/tracehook.h:196 arch/x86/kernel/signal.c:751)
            int_signal (arch/x86/kernel/entry_64.S:807)
      
      Sysfs has a release mechanism.  Use that to release the kmem_cache
      structure if CONFIG_SYSFS is enabled.
      
      Only slub is changed - slab currently only supports /proc/slabinfo and
      not /sys/kernel/slab/*.  We talked about adding that and someone was
      working on it.
      
      [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build]
      [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build even more]
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Acked-by: default avatarGreg KH <greg@kroah.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      41a21285
  19. 07 Apr, 2014 1 commit
    • Vladimir Davydov's avatar
      memcg, slab: never try to merge memcg caches · a44cb944
      Vladimir Davydov authored
      When a kmem cache is created (kmem_cache_create_memcg()), we first try to
      find a compatible cache that already exists and can handle requests from
      the new cache, i.e.  has the same object size, alignment, ctor, etc.  If
      there is such a cache, we do not create any new caches, instead we simply
      increment the refcount of the cache found and return it.
      
      Currently we do this procedure not only when creating root caches, but
      also for memcg caches.  However, there is no point in that, because, as
      every memcg cache has exactly the same parameters as its parent and cache
      merging cannot be turned off in runtime (only on boot by passing
      "slub_nomerge"), the root caches of any two potentially mergeable memcg
      caches should be merged already, i.e.  it must be the same root cache, and
      therefore we couldn't even get to the memcg cache creation, because it
      already exists.
      
      The only exception is boot caches - they are explicitly forbidden to be
      merged by setting their refcount to -1.  There are currently only two of
      them - kmem_cache and kmem_cache_node, which are used in slab internals (I
      do not count kmalloc caches as their refcount is set to 1 immediately
      after creation).  Since they are prevented from merging preliminary I
      guess we should avoid to merge their children too.
      
      So let's remove the useless code responsible for merging memcg caches.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a44cb944
  20. 23 Jan, 2014 2 commits
    • Vladimir Davydov's avatar
      memcg, slab: RCU protect memcg_params for root caches · f8570263
      Vladimir Davydov authored
      We relocate root cache's memcg_params whenever we need to grow the
      memcg_caches array to accommodate all kmem-active memory cgroups.
      Currently on relocation we free the old version immediately, which can
      lead to use-after-free, because the memcg_caches array is accessed
      lock-free (see cache_from_memcg_idx()).  This patch fixes this by making
      memcg_params RCU-protected for root caches.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f8570263
    • Vladimir Davydov's avatar
      memcg, slab: fix barrier usage when accessing memcg_caches · 959c8963
      Vladimir Davydov authored
      Each root kmem_cache has pointers to per-memcg caches stored in its
      memcg_params::memcg_caches array.  Whenever we want to allocate a slab
      for a memcg, we access this array to get per-memcg cache to allocate
      from (see memcg_kmem_get_cache()).  The access must be lock-free for
      performance reasons, so we should use barriers to assert the kmem_cache
      is up-to-date.
      
      First, we should place a write barrier immediately before setting the
      pointer to it in the memcg_caches array in order to make sure nobody
      will see a partially initialized object.  Second, we should issue a read
      barrier before dereferencing the pointer to conform to the write
      barrier.
      
      However, currently the barrier usage looks rather strange.  We have a
      write barrier *after* setting the pointer and a read barrier *before*
      reading the pointer, which is incorrect.  This patch fixes this.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      959c8963