Skip to content
  • Vladimir Davydov's avatar
    Revert "kernfs: do not account ino_ida allocations to memcg" · b2a209ff
    Vladimir Davydov authored
    Currently, all kmem allocations (namely every kmem_cache_alloc, kmalloc,
    alloc_kmem_pages call) are accounted to memory cgroup automatically.
    Callers have to explicitly opt out if they don't want/need accounting
    for some reason.  Such a design decision leads to several problems:
    
     - kmalloc users are highly sensitive to failures, many of them
       implicitly rely on the fact that kmalloc never fails, while memcg
       makes failures quite plausible.
    
     - A lot of objects are shared among different containers by design.
       Accounting such objects to one of containers is just unfair.
       Moreover, it might lead to pinning a dead memcg along with its kmem
       caches, which aren't tiny, which might result in noticeable increase
       in memory consumption for no apparent reason in the long run.
    
     - There are tons of short-lived objects. Accounting them to memcg will
       only result in slight noise and won't change the overall picture, but
       we still have to pay accounting overhead.
    
    For more info, see
    
     - http://lkml.kernel.org/r/20151105144002.GB15111%40dhcp22.suse.cz
     - http://lkml.kernel.org/r/20151106090555.GK29259@esperanza
    
    Therefore this patchset switches to the white list policy.  Now kmalloc
    users have to explicitly opt in by passing __GFP_ACCOUNT flag.
    
    Currently, the list of accounted objects is quite limited and only
    includes those allocations that (1) are known to be easily triggered
    from userspace and (2) can fail gracefully (for the full list see patch
    no.  6) and it still misses many object types.  However, accounting only
    those objects should be a satisfactory approximation of the behavior we
    used to have for most sane workloads.
    
    This patch (of 6):
    
    Revert 499611ed
    
     ("kernfs: do not account ino_ida allocations
    to memcg").
    
    Black-list kmem accounting policy (aka __GFP_NOACCOUNT) turned out to be
    fragile and difficult to maintain, because there seem to be many more
    allocations that should not be accounted than those that should be.
    Besides, false accounting an allocation might result in much worse
    consequences than not accounting at all, namely increased memory
    consumption due to pinned dead kmem caches.
    
    So it was decided to switch to the white-list policy.  This patch reverts
    bits introducing the black-list policy.  The white-list policy will be
    introduced later in the series.
    
    Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    b2a209ff