1. 18 Jun, 2009 1 commit
  2. 16 Jun, 2009 1 commit
  3. 15 Jun, 2009 3 commits
  4. 13 Jun, 2009 2 commits
  5. 12 Jun, 2009 1 commit
    • Pekka Enberg's avatar
      slab,slub: don't enable interrupts during early boot · 7e85ee0c
      Pekka Enberg authored
      
      
      As explained by Benjamin Herrenschmidt:
      
        Oh and btw, your patch alone doesn't fix powerpc, because it's missing
        a whole bunch of GFP_KERNEL's in the arch code... You would have to
        grep the entire kernel for things that check slab_is_available() and
        even then you'll be missing some.
      
        For example, slab_is_available() didn't always exist, and so in the
        early days on powerpc, we used a mem_init_done global that is set form
        mem_init() (not perfect but works in practice). And we still have code
        using that to do the test.
      
      Therefore, mask out __GFP_WAIT, __GFP_IO, and __GFP_FS in the slab allocators
      in early boot code to avoid enabling interrupts.
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      7e85ee0c
  6. 11 Jun, 2009 3 commits
    • Pekka Enberg's avatar
      slab: setup allocators earlier in the boot sequence · 83b519e8
      Pekka Enberg authored
      
      
      This patch makes kmalloc() available earlier in the boot sequence so we can get
      rid of some bootmem allocations. The bulk of the changes are due to
      kmem_cache_init() being called with interrupts disabled which requires some
      changes to allocator boostrap code.
      
      Note: 32-bit x86 does WP protect test in mem_init() so we must setup traps
      before we call mem_init() during boot as reported by Ingo Molnar:
      
        We have a hard crash in the WP-protect code:
      
        [    0.000000] Checking if this processor honours the WP bit even in supervisor mode...BUG: Int 14: CR2 ffcff000
        [    0.000000]      EDI 00000188  ESI 00000ac7  EBP c17eaf9c  ESP c17eaf8c
        [    0.000000]      EBX 000014e0  EDX 0000000e  ECX 01856067  EAX 00000001
        [    0.000000]      err 00000003  EIP c10135b1   CS 00000060  flg 00010002
        [    0.000000] Stack: c17eafa8 c17fd410 c16747bc c17eafc4 c17fd7e5 000011fd f8616000 c18237cc
        [    0.000000]        00099800 c17bb000 c17eafec c17f1668 000001c5 c17f1322 c166e039 c1822bf0
        [    0.000000]        c166e033 c153a014 c18237cc 00020800 c17eaff8 c17f106a 00020800 01ba5003
        [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.30-tip-02161-g7a74539-dirty #52203
        [    0.000000] Call Trace:
        [    0.000000]  [<c15357c2>] ? printk+0x14/0x16
        [    0.000000]  [<c10135b1>] ? do_test_wp_bit+0x19/0x23
        [    0.000000]  [<c17fd410>] ? test_wp_bit+0x26/0x64
        [    0.000000]  [<c17fd7e5>] ? mem_init+0x1ba/0x1d8
        [    0.000000]  [<c17f1668>] ? start_kernel+0x164/0x2f7
        [    0.000000]  [<c17f1322>] ? unknown_bootoption+0x0/0x19c
        [    0.000000]  [<c17f106a>] ? __init_begin+0x6a/0x6f
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      83b519e8
    • Catalin Marinas's avatar
      kmemleak: Add the slub memory allocation/freeing hooks · 06f22f13
      Catalin Marinas authored
      
      
      This patch adds the callbacks to kmemleak_(alloc|free) functions from the
      slub allocator.
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Reviewed-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      06f22f13
    • Pekka Enberg's avatar
      SLUB: Out-of-memory diagnostics · 781b2ba6
      Pekka Enberg authored
      
      
      As suggested by Mel Gorman, add out-of-memory diagnostics to the SLUB allocator
      to make debugging OOM conditions easier. This patch helped hunt down a nasty
      OOM issue that popped up every now that was caused by SLUB debugging code which
      forced 4096 byte allocations to use order 1 pages even in the fallback case.
      
      An example print out looks like this:
      
        <snip page allocator out-of-memory message>
        SLUB: Unable to allocate memory on node -1 (gfp=20)
          cache: kmalloc-4096, object size: 4096, buffer size: 4168, default order: 3, min order: 1
          node 0: slabs: 95, objs: 665, free: 0
      Acked-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Tested-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      781b2ba6
  7. 06 May, 2009 1 commit
  8. 23 Apr, 2009 1 commit
  9. 12 Apr, 2009 1 commit
  10. 03 Apr, 2009 2 commits
  11. 23 Mar, 2009 1 commit
  12. 25 Feb, 2009 1 commit
  13. 23 Feb, 2009 2 commits
    • David Rientjes's avatar
      slub: add min_partial sysfs tunable · 73d342b1
      David Rientjes authored
      
      
      Now that a cache's min_partial has been moved to struct kmem_cache, it's
      possible to easily tune it from userspace by adding a sysfs attribute.
      
      It may not be desirable to keep a large number of partial slabs around
      if a cache is used infrequently and memory, especially when constrained
      by a cgroup, is scarce.  It's better to allow userspace to set the
      minimum policy per cache instead of relying explicitly on
      kmem_cache_shrink().
      
      The memory savings from simply moving min_partial from struct
      kmem_cache_node to struct kmem_cache is obviously not significant
      (unless maybe you're from SGI or something), at the largest it's
      
      	# allocated caches * (MAX_NUMNODES - 1) * sizeof(unsigned long)
      
      The true savings occurs when userspace reduces the number of partial
      slabs that would otherwise be wasted, especially on machines with a
      large number of nodes (ia64 with CONFIG_NODES_SHIFT at 10 for default?).
      As well as the kernel estimates ideal values for n->min_partial and
      ensures it's within a sane range, userspace has no other input other
      than writing to /sys/kernel/slab/cache/shrink.
      
      There simply isn't any better heuristic to add when calculating the
      partial values for a better estimate that works for all possible caches.
      And since it's currently a static value, the user really has no way of
      reclaiming that wasted space, which can be significant when constrained
      by a cgroup (either cpusets or, later, memory controller slab limits)
      without shrinking it entirely.
      
      This also allows the user to specify that increased fragmentation and
      more partial slabs are actually desired to avoid the cost of allocating
      new slabs at runtime for specific caches.
      
      There's also no reason why this should be a per-struct kmem_cache_node
      value in the first place.  You could argue that a machine would have
      such node size asymmetries that it should be specified on a per-node
      basis, but we know nobody is doing that right now since it's a purely
      static value at the moment and there's no convenient way to tune that
      via slub's sysfs interface.
      
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      73d342b1
    • David Rientjes's avatar
      slub: move min_partial to struct kmem_cache · 3b89d7d8
      David Rientjes authored
      
      
      Although it allows for better cacheline use, it is unnecessary to save a
      copy of the cache's min_partial value in each kmem_cache_node.
      
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      3b89d7d8
  14. 20 Feb, 2009 3 commits
  15. 14 Feb, 2009 1 commit
    • Nick Piggin's avatar
      lockdep: annotate reclaim context (__GFP_NOFS) · cf40bd16
      Nick Piggin authored
      Here is another version, with the incremental patch rolled up, and
      added reclaim context annotation to kswapd, and allocation tracing
      to slab allocators (which may only ever reach the page allocator
      in rare cases, so it is good to put annotations here too).
      
      Haven't tested this version as such, but it should be getting closer
      to merge worthy ;)
      
      --
      After noticing some code in mm/filemap.c accidentally perform a __GFP_FS
      allocation when it should not have been, I thought it might be a good idea to
      try to catch this kind of thing with lockdep.
      
      I coded up a little idea that seems to work. Unfortunately the system has to
      actually be in __GFP_FS page reclaim, then take the lock, before it will mark
      it. But at least that might still be some orders of magnitude more common
      (and more debuggable) than an actual deadlock condition, so we have some
      improvement I hope (the concept is no less complete than discovery of a lock's
      interrupt contexts).
      
      I guess we could even do the same thing with __GFP_IO (normal reclaim), and
      even GFP_NOIO locks too... but filesystems will have the most locks and fiddly
      code paths, so let's start there and see how it goes.
      
      It *seems* to work. I did a quick test.
      
      =================================
      [ INFO: inconsistent lock state ]
      2.6.28-rc6-00007-ged313489-dirty #26
      ---------------------------------
      inconsistent {in-reclaim-W} -> {ov-reclaim-W} usage.
      modprobe/8526 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
      {in-reclaim-W} state was registered at:
        [<ffffffff80267bdb>] __lock_acquire+0x75b/0x1a60
        [<ffffffff80268f71>] lock_acquire+0x91/0xc0
        [<ffffffff8070f0e1>] mutex_lock_nested+0xb1/0x310
        [<ffffffffa002002b>] brd_init+0x2b/0x216 [brd]
        [<ffffffff8020903b>] _stext+0x3b/0x170
        [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
        [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
        [<ffffffffffffffff>] 0xffffffffffffffff
      irq event stamp: 3929
      hardirqs last  enabled at (3929): [<ffffffff8070f2b5>] mutex_lock_nested+0x285/0x310
      hardirqs last disabled at (3928): [<ffffffff8070f089>] mutex_lock_nested+0x59/0x310
      softirqs last  enabled at (3732): [<ffffffff8061f623>] sk_filter+0x83/0xe0
      softirqs last disabled at (3730): [<ffffffff8061f5b6>] sk_filter+0x16/0xe0
      
      other info that might help us debug this:
      1 lock held by modprobe/8526:
       #0:  (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
      
      stack backtrace:
      Pid: 8526, comm: modprobe Not tainted 2.6.28-rc6-00007-ged313489
      
      -dirty #26
      Call Trace:
       [<ffffffff80265483>] print_usage_bug+0x193/0x1d0
       [<ffffffff80266530>] mark_lock+0xaf0/0xca0
       [<ffffffff80266735>] mark_held_locks+0x55/0xc0
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffff802667ca>] trace_reclaim_fs+0x2a/0x60
       [<ffffffff80285005>] __alloc_pages_internal+0x475/0x580
       [<ffffffff8070f29e>] ? mutex_lock_nested+0x26e/0x310
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffffa002006a>] brd_init+0x6a/0x216 [brd]
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffff8020903b>] _stext+0x3b/0x170
       [<ffffffff8070f8b9>] ? mutex_unlock+0x9/0x10
       [<ffffffff8070f83d>] ? __mutex_unlock_slowpath+0x10d/0x180
       [<ffffffff802669ec>] ? trace_hardirqs_on_caller+0x12c/0x190
       [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
       [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cf40bd16
  16. 12 Feb, 2009 1 commit
  17. 28 Jan, 2009 1 commit
  18. 14 Jan, 2009 1 commit
  19. 06 Jan, 2009 1 commit
  20. 31 Dec, 2008 1 commit
  21. 30 Dec, 2008 1 commit
    • Frederic Weisbecker's avatar
      tracing/kmemtrace: normalize the raw tracer event to the unified tracing API · 36994e58
      Frederic Weisbecker authored
      
      
      Impact: new tracer plugin
      
      This patch adapts kmemtrace raw events tracing to the unified tracing API.
      
      To enable and use this tracer, just do the following:
      
       echo kmemtrace > /debugfs/tracing/current_tracer
       cat /debugfs/tracing/trace
      
      You will have the following output:
      
       # tracer: kmemtrace
       #
       #
       # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
       # FREE   |      |     |       |              |   |            |        |
       # |
      
      type_id 1 call_site 18446744071565527833 ptr 18446612134395152256
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345164672 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345164912 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345165152 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 0 call_site 18446744071566144042 ptr 18446612134346191680 bytes_req 1304 bytes_alloc 1312 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      
      That was to stay backward compatible with the format output produced in
      inux/tracepoint.h.
      
      This is the default ouput, but note that I tried something else.
      
      If you change an option:
      
      echo kmem_minimalistic > /debugfs/trace_options
      
      and then cat /debugfs/trace, you will have the following output:
      
       # tracer: kmemtrace
       #
       #
       # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
       # FREE   |      |     |       |              |   |            |        |
       # |
      
         -      C                            0xffff88007c088780          file_free_rcu
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc780     -1   d_alloc
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc870     -1   d_alloc
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc960     -1   d_alloc
         +      K   1304   1312   000000d0   0xffff8800791d7340     -1   reiserfs_alloc_inode
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         -      C                            0xffff88007cad6000          putname
         +      K    992   1000   000000d0   0xffff880079045b58     -1   alloc_inode
         +      K    768   1024   000080d0   0xffff88007c096400     -1   alloc_pipe_info
         +      K    240    240   000000d0   0xffff8800790dca50     -1   d_alloc
         +      K    272    320   000080d0   0xffff88007c088780     -1   get_empty_filp
         +      K    272    320   000080d0   0xffff88007c088000     -1   get_empty_filp
      
      Yeah I shall confess kmem_minimalistic should be: kmem_alternative.
      
      Whatever, I find it more readable but this a personal opinion of course.
      We can drop it if you want.
      
      On the ALLOC/FREE column, + means an allocation and - a free.
      
      On the type column, you have K = kmalloc, C = cache, P = page
      
      I would like the flags to be GFP_* strings but that would not be easy to not
      break the column with strings....
      
      About the node...it seems to always be -1. I don't know why but that shouldn't
      be difficult to find.
      
      I moved linux/tracepoint.h to trace/tracepoint.h as well. I think that would
      be more easy to find the tracer headers if they are all in their common
      directory.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      36994e58
  22. 29 Dec, 2008 8 commits
  23. 13 Dec, 2008 1 commit
    • Rusty Russell's avatar
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and... · 29c0177e
      Rusty Russell authored
      
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.
      
      Impact: change calling convention of existing cpumask APIs
      
      Most cpumask functions started with cpus_: these have been replaced by
      cpumask_ ones which take struct cpumask pointers as expected.
      
      These four functions don't have good replacement names; fortunately
      they're rarely used, so we just change them over.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: paulus@samba.org
      Cc: mingo@redhat.com
      Cc: tony.luck@intel.com
      Cc: ralf@linux-mips.org
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: cl@linux-foundation.org
      Cc: srostedt@redhat.com
      29c0177e
  24. 10 Dec, 2008 1 commit