Skip to content
  • Jesper Dangaard Brouer's avatar
    slub bulk alloc: extract objects from the per cpu slab · 994eb764
    Jesper Dangaard Brouer authored
    
    
    First piece: acceleration of retrieval of per cpu objects
    
    If we are allocating lots of objects then it is advantageous to disable
    interrupts and avoid the this_cpu_cmpxchg() operation to get these objects
    faster.
    
    Note that we cannot do the fast operation if debugging is enabled, because
    we would have to add extra code to do all the debugging checks.  And it
    would not be fast anyway.
    
    Note also that the requirement of having interrupts disabled avoids having
    to do processor flag operations.
    
    Allocate as many objects as possible in the fast way and then fall back to
    the generic implementation for the rest of the objects.
    
    Measurements on CPU CPU i7-4790K @ 4.00GHz
    Baseline normal fastpath (alloc+free cost): 42 cycles(tsc) 10.554 ns
    
    Bulk- fallback                   - this-patch
      1 -  57 cycles(tsc) 14.432 ns  -  48 cycles(tsc) 12.155 ns  improved 15.8%
      2 -  50 cycles(tsc) 12.746 ns  -  37 cycles(tsc)  9.390 ns  improved 26.0%
      3 -  48 cycles(tsc) 12.180 ns  -  33 cycles(tsc)  8.417 ns  improved 31.2%
      4 -  48 cycles(tsc) 12.015 ns  -  32 cycles(tsc)  8.045 ns  improved 33.3%
      8 -  46 cycles(tsc) 11.526 ns  -  30 cycles(tsc)  7.699 ns  improved 34.8%
     16 -  45 cycles(tsc) 11.418 ns  -  32 cycles(tsc)  8.205 ns  improved 28.9%
     30 -  80 cycles(tsc) 20.246 ns  -  73 cycles(tsc) 18.328 ns  improved  8.8%
     32 -  79 cycles(tsc) 19.946 ns  -  72 cycles(tsc) 18.208 ns  improved  8.9%
     34 -  78 cycles(tsc) 19.659 ns  -  71 cycles(tsc) 17.987 ns  improved  9.0%
     48 -  86 cycles(tsc) 21.516 ns  -  82 cycles(tsc) 20.566 ns  improved  4.7%
     64 -  93 cycles(tsc) 23.423 ns  -  89 cycles(tsc) 22.480 ns  improved  4.3%
    128 - 100 cycles(tsc) 25.170 ns  -  99 cycles(tsc) 24.871 ns  improved  1.0%
    158 - 102 cycles(tsc) 25.549 ns  - 101 cycles(tsc) 25.375 ns  improved  1.0%
    250 - 101 cycles(tsc) 25.344 ns  - 100 cycles(tsc) 25.182 ns  improved  1.0%
    
    Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
    Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    994eb764