1. 02 Oct, 2006 4 commits
  2. 01 Oct, 2006 1 commit
    • Andi Kleen's avatar
      [PATCH] Support piping into commands in /proc/sys/kernel/core_pattern · d025c9db
      Andi Kleen authored
      Using the infrastructure created in previous patches implement support to
      pipe core dumps into programs.
      This is done by overloading the existing core_pattern sysctl
      with a new syntax:
      When the first character of the pattern is a '|' the kernel will instead
      threat the rest of the pattern as a command to run.  The core dump will be
      written to the standard input of that program instead of to a file.
      This is useful for having automatic core dump analysis without filling up
      disks.  The program can do some simple analysis and save only a summary of
      the core dump.
      The core dump proces will run with the privileges and in the name space of
      the process that caused the core dump.
      I also increased the core pattern size to 128 bytes so that longer command
      lines fit.
      Most of the changes comes from allowing core dumps without seeks.  They are
      fairly straight forward though.
      One small incompatibility is that if someone had a core pattern previously
      that started with '|' they will get suddenly new behaviour.  I think that's
      unlikely to be a real problem though.
      Additional background:
      > Very nice, do you happen to have a program that can accept this kind of
      > input for crash dumps?  I'm guessing that the embedded people will
      > really want this functionality.
      I had a cheesy demo/prototype.  Basically it wrote the dump to a file again,
      ran gdb on it to get a backtrace and wrote the summary to a shared directory.
      Then there was a simple CGI script to generate a "top 10" crashes HTML
      Unfortunately this still had the disadvantage to needing full disk space for a
      dump except for deleting it afterwards (in fact it was worse because over the
      pipe holes didn't work so if you have a holey address map it would require
      more space).
      Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
      cores (at least it worked with zsh's =(cat core) syntax), so it would be
      likely possible to do it without temporary space with a simple wrapper that
      calls it in the right way.  I ran out of time before doing that though.
      The demo prototype scripts weren't very good.  If there is really interest I
      can dig them out (they are currently on a laptop disk on the desk with the
      laptop itself being in service), but I would recommend to rewrite them for any
      serious application of this and fix the disk space problem.
      Also to be really useful it should probably find a way to automatically fetch
      the debuginfos (I cheated and just installed them in advance).  If nobody else
      does it I can probably do the rewrite myself again at some point.
      My hope at some point was that desktops would support it in their builtin
      crash reporters, but at least the KDE people I talked too seemed to be happy
      with their user space only solution.
      Alan sayeth:
        I don't believe that piping as such as neccessarily the right model, but
        the ability to intercept and processes core dumps from user space is asked
        for by many enterprise users as well.  They want to know about, capture,
        analyse and process core dumps, often centrally and in automated form.
      [akpm@osdl.org: loff_t != unsigned long]
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  3. 29 Sep, 2006 2 commits
  4. 27 Sep, 2006 1 commit
  5. 26 Sep, 2006 4 commits
    • Christoph Lameter's avatar
      [PATCH] zone_reclaim: dynamic slab reclaim · 0ff38490
      Christoph Lameter authored
      Currently one can enable slab reclaim by setting an explicit option in
      /proc/sys/vm/zone_reclaim_mode.  Slab reclaim is then used as a final
      option if the freeing of unmapped file backed pages is not enough to free
      enough pages to allow a local allocation.
      However, that means that the slab can grow excessively and that most memory
      of a node may be used by slabs.  We have had a case where a machine with
      46GB of memory was using 40-42GB for slab.  Zone reclaim was effective in
      dealing with pagecache pages.  However, slab reclaim was only done during
      global reclaim (which is a bit rare on NUMA systems).
      This patch implements slab reclaim during zone reclaim.  Zone reclaim
      occurs if there is a danger of an off node allocation.  At that point we
      1. Shrink the per node page cache if the number of pagecache
         pages is more than min_unmapped_ratio percent of pages in a zone.
      2. Shrink the slab cache if the number of the nodes reclaimable slab pages
         (patch depends on earlier one that implements that counter)
         are more than min_slab_ratio (a new /proc/sys/vm tunable).
      The shrinking of the slab cache is a bit problematic since it is not node
      specific.  So we simply calculate what point in the slab we want to reach
      (current per node slab use minus the number of pages that neeed to be
      allocated) and then repeately run the global reclaim until that is
      unsuccessful or we have reached the limit.  I hope we will have zone based
      slab reclaim at some point which will make that easier.
      The default for the min_slab_ratio is 5%
      Also remove the slab option from /proc/sys/vm/zone_reclaim_mode.
      [akpm@osdl.org: cleanups]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Don Zickus's avatar
      [PATCH] x86: Allow users to force a panic on NMI · 8da5adda
      Don Zickus authored
      To quote Alan Cox:
      The default Linux behaviour on an NMI of either memory or unknown is to
      continue operation. For many environments such as scientific computing
      it is preferable that the box is taken out and the error dealt with than
      an uncorrected parity/ECC error get propogated.
      A small number of systems do generate NMI's for bizarre random reasons
      such as power management so the default is unchanged. In other respects
      the new proc/sys entry works like the existing panic controls already in
      that directory.
      This is separate to the edac support - EDAC allows supported chipsets to
      handle ECC errors well, this change allows unsupported cases to at least
      panic rather than cause problems further down the line.
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
    • Don Zickus's avatar
      [PATCH] x86: Add abilty to enable/disable nmi watchdog with sysctl · 407984f1
      Don Zickus authored
      Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
    • Don Zickus's avatar
      [PATCH] i386/x86-64: Remove un/set_nmi_callback and reserve/release_lapic_nmi functions · 2fbe7b25
      Don Zickus authored
      Removes the un/set_nmi_callback and reserve/release_lapic_nmi functions as
      they are no longer needed.  The various subsystems are modified to register
      with the die_notifier instead.
      Also includes compile fixes by Andrew Morton.
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
  6. 03 Jul, 2006 1 commit
    • Christoph Lameter's avatar
      [PATCH] ZVC/zone_reclaim: Leave 1% of unmapped pagecache pages for file I/O · 9614634f
      Christoph Lameter authored
      It turns out that it is advantageous to leave a small portion of unmapped file
      backed pages if all of a zone's pages (or almost all pages) are allocated and
      so the page allocator has to go off-node.
      This allows recently used file I/O buffers to stay on the node and
      reduces the times that zone reclaim is invoked if file I/O occurs
      when we run out of memory in a zone.
      The problem is that zone reclaim runs too frequently when the page cache is
      used for file I/O (read write and therefore unmapped pages!) alone and we have
      almost all pages of the zone allocated.  Zone reclaim may remove 32 unmapped
      pages.  File I/O will use these pages for the next read/write requests and the
      unmapped pages increase.  After the zone has filled up again zone reclaim will
      remove it again after only 32 pages.  This cycle is too inefficient and there
      are potentially too many zone reclaim cycles.
      With the 1% boundary we may still remove all unmapped pages for file I/O in
      zone reclaim pass.  However.  it will take a large number of read and writes
      to get back to 1% again where we trigger zone reclaim again.
      The zone reclaim 2.6.16/17 does not show this behavior because we have a 30
      second timeout.
      [akpm@osdl.org: rename the /proc file and the variable]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  7. 30 Jun, 2006 2 commits
  8. 27 Jun, 2006 2 commits
    • Ingo Molnar's avatar
      [PATCH] pi-futex: rt mutex core · 23f78d4a
      Ingo Molnar authored
      Core functions for the rt-mutex subsystem.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Ingo Molnar's avatar
      [PATCH] vdso: randomize the i386 vDSO by moving it into a vma · e6e5494c
      Ingo Molnar authored
      Move the i386 VDSO down into a vma and thus randomize it.
      Besides the security implications, this feature also helps debuggers, which
      can COW a vma-backed VDSO just like a normal DSO and can thus do
      single-stepping and other debugging features.
      It's good for hypervisors (Xen, VMWare) too, which typically live in the same
      high-mapped address space as the VDSO, hence whenever the VDSO is used, they
      get lots of guest pagefaults and have to fix such guest accesses up - which
      slows things down instead of speeding things up (the primary purpose of the
      There's a new CONFIG_COMPAT_VDSO (default=y) option, which provides support
      for older glibcs that still rely on a prelinked high-mapped VDSO.  Newer
      distributions (using glibc 2.3.3 or later) can turn this option off.  Turning
      it off is also recommended for security reasons: attackers cannot use the
      predictable high-mapped VDSO page as syscall trampoline anymore.
      There is a new vdso=[0|1] boot option as well, and a runtime
      /proc/sys/vm/vdso_enabled sysctl switch, that allows the VDSO to be turned
      (This version of the VDSO-randomization patch also has working ELF
      coredumping, the previous patch crashed in the coredumping code.)
      This code is a combined work of the exec-shield VDSO randomization
      code and Gerd Hoffmann's hypervisor-centric VDSO patch. Rusty Russell
      started this patch and i completed it.
      [akpm@osdl.org: cleanups]
      [akpm@osdl.org: compile fix]
      [akpm@osdl.org: compile fix 2]
      [akpm@osdl.org: compile fix 3]
      [akpm@osdl.org: revernt MAXMEM change]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArjan van de Ven <arjan@infradead.org>
      Cc: Gerd Hoffmann <kraxel@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  9. 26 Jun, 2006 1 commit
  10. 25 Jun, 2006 1 commit
  11. 23 Jun, 2006 2 commits
  12. 20 Jun, 2006 1 commit
  13. 24 Mar, 2006 3 commits
  14. 08 Mar, 2006 1 commit
    • Dipankar Sarma's avatar
      [PATCH] fix file counting · 529bf6be
      Dipankar Sarma authored
      I have benchmarked this on an x86_64 NUMA system and see no significant
      performance difference on kernbench.  Tested on both x86_64 and powerpc.
      The way we do file struct accounting is not very suitable for batched
      freeing.  For scalability reasons, file accounting was
      constructor/destructor based.  This meant that nr_files was decremented
      only when the object was removed from the slab cache.  This is susceptible
      to slab fragmentation.  With RCU based file structure, consequent batched
      freeing and a test program like Serge's, we just speed this up and end up
      with a very fragmented slab -
      llm22:~ # cat /proc/sys/fs/file-nr
      587730  0       758844
      At the same time, I see only a 2000+ objects in filp cache.  The following
      patch I fixes this problem.
      This patch changes the file counting by removing the filp_count_lock.
      Instead we use a separate percpu counter, nr_files, for now and all
      accesses to it are through get_nr_files() api.  In the sysctl handler for
      nr_files, we populate files_stat.nr_files before returning to user.
      Counting files as an when they are created and destroyed (as opposed to
      inside slab) allows us to correctly count open files with RCU.
      Signed-off-by: default avatarDipankar Sarma <dipankar@in.ibm.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  15. 02 Mar, 2006 1 commit
  16. 28 Feb, 2006 1 commit
  17. 20 Feb, 2006 2 commits
  18. 17 Feb, 2006 1 commit
  19. 01 Feb, 2006 2 commits
  20. 18 Jan, 2006 1 commit
  21. 14 Jan, 2006 1 commit
  22. 11 Jan, 2006 1 commit
  23. 08 Jan, 2006 2 commits
    • Rohit Seth's avatar
      [PATCH] Make high and batch sizes of per_cpu_pagelists configurable · 8ad4b1fb
      Rohit Seth authored
      As recently there has been lot of traffic on the right values for batch and
      high water marks for per_cpu_pagelists.  This patch makes these two
      variables configurable through /proc interface.
      A new tunable /proc/sys/vm/percpu_pagelist_fraction is added.  This entry
      controls the fraction of pages at most in each zone that are allocated for
      each per cpu page list.  The min value for this is 8.  It means that we
      don't allow more than 1/8th of pages in each zone to be allocated in any
      single per_cpu_pagelist.
      The batch value of each per cpu pagelist is also updated as a result.  It
      is set to pcp->high/4.  The upper limit of batch is (PAGE_SHIFT * 8)
      Signed-off-by: default avatarRohit Seth <rohit.seth@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Andrew Morton's avatar
      [PATCH] drop-pagecache · 9d0243bc
      Andrew Morton authored
      Add /proc/sys/vm/drop_caches.  When written to, this will cause the kernel to
      discard as much pagecache and/or reclaimable slab objects as it can.  THis
      operation requires root permissions.
      It won't drop dirty data, so the user should run `sync' first.
      a) Holds inode_lock for exorbitant amounts of time.
      b) Needs to be taught about NUMA nodes: propagate these all the way through
         so the discarding can be controlled on a per-node basis.
      This is a debugging feature: useful for getting consistent results between
      filesystem benchmarks.  We could possibly put it under a config option, but
      it's less than 300 bytes.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  24. 06 Jan, 2006 1 commit
  25. 04 Jan, 2006 1 commit