Skip to content
  • David Rientjes's avatar
    slub: add min_partial sysfs tunable · 73d342b1
    David Rientjes authored
    
    
    Now that a cache's min_partial has been moved to struct kmem_cache, it's
    possible to easily tune it from userspace by adding a sysfs attribute.
    
    It may not be desirable to keep a large number of partial slabs around
    if a cache is used infrequently and memory, especially when constrained
    by a cgroup, is scarce.  It's better to allow userspace to set the
    minimum policy per cache instead of relying explicitly on
    kmem_cache_shrink().
    
    The memory savings from simply moving min_partial from struct
    kmem_cache_node to struct kmem_cache is obviously not significant
    (unless maybe you're from SGI or something), at the largest it's
    
    	# allocated caches * (MAX_NUMNODES - 1) * sizeof(unsigned long)
    
    The true savings occurs when userspace reduces the number of partial
    slabs that would otherwise be wasted, especially on machines with a
    large number of nodes (ia64 with CONFIG_NODES_SHIFT at 10 for default?).
    As well as the kernel estimates ideal values for n->min_partial and
    ensures it's within a sane range, userspace has no other input other
    than writing to /sys/kernel/slab/cache/shrink.
    
    There simply isn't any better heuristic to add when calculating the
    partial values for a better estimate that works for all possible caches.
    And since it's currently a static value, the user really has no way of
    reclaiming that wasted space, which can be significant when constrained
    by a cgroup (either cpusets or, later, memory controller slab limits)
    without shrinking it entirely.
    
    This also allows the user to specify that increased fragmentation and
    more partial slabs are actually desired to avoid the cost of allocating
    new slabs at runtime for specific caches.
    
    There's also no reason why this should be a per-struct kmem_cache_node
    value in the first place.  You could argue that a machine would have
    such node size asymmetries that it should be specified on a per-node
    basis, but we know nobody is doing that right now since it's a purely
    static value at the moment and there's no convenient way to tune that
    via slub's sysfs interface.
    
    Cc: Christoph Lameter <cl@linux-foundation.org>
    Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
    Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
    73d342b1