1. 11 Feb, 2007 4 commits
    • Andrew Morton's avatar
      [PATCH] sysctl warning fix · cb799b89
      Andrew Morton authored
      kernel/sysctl.c:2816: warning: 'sysctl_ipc_data' defined but not used
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb799b89
    • Theodore Ts'o's avatar
      [PATCH] Add TAINT_USER and ability to set taint flags from userspace · 34f5a398
      Theodore Ts'o authored
      Allow taint flags to be set from userspace by writing to
      /proc/sys/kernel/tainted, and add a new taint flag, TAINT_USER, to be used
      when userspace has potentially done something dangerous that might
      compromise the kernel.  This will allow support personnel to ask further
      questions about what may have caused the user taint flag to have been set.
      
      For example, they might examine the logs of the realtime JVM to see if the
      Java program has used the really silly, stupid, dangerous, and
      completely-non-portable direct access to physical memory feature which MUST
      be implemented according to the Real-Time Specification for Java (RTSJ).
      Sigh.  What were those silly people at Sun thinking?
      
      [akpm@osdl.org: build fix]
      [bunk@stusta.de: cleanup]
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      34f5a398
    • Alexey Dobriyan's avatar
      [PATCH] sysctl_{,ms_}jiffies: fix oldlen semantics · 3ee75ac3
      Alexey Dobriyan authored
      currently it's
      1) if *oldlenp == 0,
      	don't writeback anything
      
      2) if *oldlenp >= table->maxlen,
      	don't writeback more than table->maxlen bytes and rewrite *oldlenp
      	don't look at underlying type granularity
      
      3) if 0 < *oldlenp < table->maxlen,
      		*cough*
      	string sysctls don't writeback more than *oldlenp bytes.
      	OK, that's because sizeof(char) == 1
      
      	int sysctls writeback anything in (0, table->maxlen] range
      	Though accept integers divisible by sizeof(int) for writing.
      
      sysctl_jiffies and sysctl_ms_jiffies don't writeback anything but
      sizeof(int), which violates 1) and 2).
      
      So, make sysctl_jiffies and sysctl_ms_jiffies accept
      a) *oldlenp == 0, not doing writeback
      b) *oldlenp >= sizeof(int), writing one integer.
      
      -EINVAL still returned for *oldlenp == 1, 2, 3.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ee75ac3
    • Eric Paris's avatar
      [PATCH] make reading /proc/sys/kernel/cap-bould not require CAP_SYS_MODULE · 6ff1b442
      Eric Paris authored
      Reading /proc/sys/kernel/cap-bound requires CAP_SYS_MODULE.  (see
      proc_dointvec_bset in kernel/sysctl.c)
      
      sysctl appears to drive all over proc reading everything it can get it's
      hands on and is complaining when it is being denied access to read
      cap-bound.  Clearly writing to cap-bound should be a sensitive operation
      but requiring CAP_SYS_MODULE to read cap-bound seems a bit to strong.  I
      believe the information could with reasonable certainty be obtained by
      looking at a bunch of the output of /proc/pid/status which has very low
      security protection, so at best we are just getting a little obfuscation of
      information.
      
      Currently SELinux policy has to 'dontaudit' capability checks for
      CAP_SYS_MODULE for things like sysctl which just want to read cap-bound.
      In doing so we also as a byproduct have to hide warnings of potential
      exploits such as if at some time that sysctl actually tried to load a
      module.  I wondered if anyone would have a problem opening cap-bound up to
      read from anyone?
      Acked-by: default avatarChris Wright <chrisw@sous-sol.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6ff1b442
  2. 13 Dec, 2006 1 commit
  3. 10 Dec, 2006 3 commits
  4. 08 Dec, 2006 5 commits
  5. 07 Dec, 2006 3 commits
    • Helge Deller's avatar
      [PATCH] struct seq_operations and struct file_operations constification · 15ad7cdc
      Helge Deller authored
       - move some file_operations structs into the .rodata section
      
       - move static strings from policy_types[] array into the .rodata section
      
       - fix generic seq_operations usages, so that those structs may be defined
         as "const" as well
      
      [akpm@osdl.org: couple of fixes]
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      15ad7cdc
    • BP, Praveen's avatar
      [PATCH] sysctl: string length calculated is wrong if it contains negative numbers · bd9b0bac
      BP, Praveen authored
      In the functions do_proc_dointvec() and do_proc_doulongvec_minmax(),
      there seems to be a bug in string length calculation if string contains
      negative integer.
      
      The console log given below explains the bug. Setting negative values
      may not be a right thing to do for "console log level" but then the test
      (given below) can be used to demonstrate the bug in the code.
      
      # echo "-1 -1 -1 -123456" > /proc/sys/kernel/printk
      # cat /proc/sys/kernel/printk
      -1      -1      -1      -1234
      #
      # echo "-1 -1 -1 123456" > /proc/sys/kernel/printk
      # cat /proc/sys/kernel/printk
      -1      -1      -1      1234
      #
      
      (akpm: the bug is that 123456 gets truncated)
      
      It works as expected if string contains all +ve integers
      
      # echo "1 2 3 4" > /proc/sys/kernel/printk
      # cat /proc/sys/kernel/printk
      1       2       3       4
      #
      
      The patch given below fixes the issue.
      Signed-off-by: default avatarPraveen BP <praveenbp@ti.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      bd9b0bac
    • Ashwin Chaugule's avatar
      [PATCH] new scheme to preempt swap token · 7602bdf2
      Ashwin Chaugule authored
      The new swap token patches replace the current token traversal algo.  The old
      algo had a crude timeout parameter that was used to handover the token from
      one task to another.  This algo, transfers the token to the tasks that are in
      need of the token.  The urgency for the token is based on the number of times
      a task is required to swap-in pages.  Accordingly, the priority of a task is
      incremented if it has been badly affected due to swap-outs.  To ensure that
      the token doesnt bounce around rapidly, the token holders are given a priority
      boost.  The priority of tasks is also decremented, if their rate of swap-in's
      keeps reducing.  This way, the condition to check whether to pre-empt the swap
      token, is a matter of comparing two task's priority fields.
      
      [akpm@osdl.org: cleanups]
      Signed-off-by: default avatarAshwin Chaugule <ashwin.chaugule@celunite.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7602bdf2
  6. 06 Dec, 2006 1 commit
  7. 06 Nov, 2006 2 commits
    • Eric W. Biederman's avatar
      [PATCH] sysctl: allow a zero ctl_name in the middle of a sysctl table · d99f160a
      Eric W. Biederman authored
      Since it is becoming clear that there are just enough users of the binary
      sysctl interface that completely removing the binary interface from the kernel
      will not be an option for foreseeable future, we need to find a way to address
      the sysctl maintenance issues.
      
      The basic problem is that sysctl requires one central authority to allocate
      sysctl numbers, or else conflicts and ABI breakage occur.  The proc interface
      to sysctl does not have that problem, as names are not densely allocated.
      
      By not terminating a sysctl table until I have neither a ctl_name nor a
      procname, it becomes simple to add sysctl entries that don't show up in the
      binary sysctl interface.  Which allows people to avoid allocating a binary
      sysctl value when not needed.
      
      I have audited the kernel code and in my reading I have not found a single
      sysctl table that wasn't terminated by a completely zero filled entry.  So
      this change in behavior should not affect anything.
      
      I think this mechanism eases the pain enough that combined with a little
      disciple we can solve the reoccurring sysctl ABI breakage.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Acked-by: default avatarAlan Cox <alan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d99f160a
    • Eric W. Biederman's avatar
      [PATCH] Improve the removed sysctl warnings · 0e009be8
      Eric W. Biederman authored
      Don't warn about libpthread's access to kernel.version.  When it receives
      -ENOSYS it will read /proc/sys/kernel/version.
      
      If anything else shows up print the sysctl number string.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Cc: Cal Peake <cp@absolutedigital.net>
      Cc: Alan Cox <alan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0e009be8
  8. 20 Oct, 2006 1 commit
  9. 02 Oct, 2006 5 commits
  10. 01 Oct, 2006 1 commit
    • Andi Kleen's avatar
      [PATCH] Support piping into commands in /proc/sys/kernel/core_pattern · d025c9db
      Andi Kleen authored
      Using the infrastructure created in previous patches implement support to
      pipe core dumps into programs.
      
      This is done by overloading the existing core_pattern sysctl
      with a new syntax:
      
      |program
      
      When the first character of the pattern is a '|' the kernel will instead
      threat the rest of the pattern as a command to run.  The core dump will be
      written to the standard input of that program instead of to a file.
      
      This is useful for having automatic core dump analysis without filling up
      disks.  The program can do some simple analysis and save only a summary of
      the core dump.
      
      The core dump proces will run with the privileges and in the name space of
      the process that caused the core dump.
      
      I also increased the core pattern size to 128 bytes so that longer command
      lines fit.
      
      Most of the changes comes from allowing core dumps without seeks.  They are
      fairly straight forward though.
      
      One small incompatibility is that if someone had a core pattern previously
      that started with '|' they will get suddenly new behaviour.  I think that's
      unlikely to be a real problem though.
      
      Additional background:
      
      > Very nice, do you happen to have a program that can accept this kind of
      > input for crash dumps?  I'm guessing that the embedded people will
      > really want this functionality.
      
      I had a cheesy demo/prototype.  Basically it wrote the dump to a file again,
      ran gdb on it to get a backtrace and wrote the summary to a shared directory.
      Then there was a simple CGI script to generate a "top 10" crashes HTML
      listing.
      
      Unfortunately this still had the disadvantage to needing full disk space for a
      dump except for deleting it afterwards (in fact it was worse because over the
      pipe holes didn't work so if you have a holey address map it would require
      more space).
      
      Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
      cores (at least it worked with zsh's =(cat core) syntax), so it would be
      likely possible to do it without temporary space with a simple wrapper that
      calls it in the right way.  I ran out of time before doing that though.
      
      The demo prototype scripts weren't very good.  If there is really interest I
      can dig them out (they are currently on a laptop disk on the desk with the
      laptop itself being in service), but I would recommend to rewrite them for any
      serious application of this and fix the disk space problem.
      
      Also to be really useful it should probably find a way to automatically fetch
      the debuginfos (I cheated and just installed them in advance).  If nobody else
      does it I can probably do the rewrite myself again at some point.
      
      My hope at some point was that desktops would support it in their builtin
      crash reporters, but at least the KDE people I talked too seemed to be happy
      with their user space only solution.
      
      Alan sayeth:
      
        I don't believe that piping as such as neccessarily the right model, but
        the ability to intercept and processes core dumps from user space is asked
        for by many enterprise users as well.  They want to know about, capture,
        analyse and process core dumps, often centrally and in automated form.
      
      [akpm@osdl.org: loff_t != unsigned long]
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d025c9db
  11. 29 Sep, 2006 2 commits
  12. 27 Sep, 2006 1 commit
  13. 26 Sep, 2006 4 commits
    • Christoph Lameter's avatar
      [PATCH] zone_reclaim: dynamic slab reclaim · 0ff38490
      Christoph Lameter authored
      Currently one can enable slab reclaim by setting an explicit option in
      /proc/sys/vm/zone_reclaim_mode.  Slab reclaim is then used as a final
      option if the freeing of unmapped file backed pages is not enough to free
      enough pages to allow a local allocation.
      
      However, that means that the slab can grow excessively and that most memory
      of a node may be used by slabs.  We have had a case where a machine with
      46GB of memory was using 40-42GB for slab.  Zone reclaim was effective in
      dealing with pagecache pages.  However, slab reclaim was only done during
      global reclaim (which is a bit rare on NUMA systems).
      
      This patch implements slab reclaim during zone reclaim.  Zone reclaim
      occurs if there is a danger of an off node allocation.  At that point we
      
      1. Shrink the per node page cache if the number of pagecache
         pages is more than min_unmapped_ratio percent of pages in a zone.
      
      2. Shrink the slab cache if the number of the nodes reclaimable slab pages
         (patch depends on earlier one that implements that counter)
         are more than min_slab_ratio (a new /proc/sys/vm tunable).
      
      The shrinking of the slab cache is a bit problematic since it is not node
      specific.  So we simply calculate what point in the slab we want to reach
      (current per node slab use minus the number of pages that neeed to be
      allocated) and then repeately run the global reclaim until that is
      unsuccessful or we have reached the limit.  I hope we will have zone based
      slab reclaim at some point which will make that easier.
      
      The default for the min_slab_ratio is 5%
      
      Also remove the slab option from /proc/sys/vm/zone_reclaim_mode.
      
      [akpm@osdl.org: cleanups]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0ff38490
    • Don Zickus's avatar
      [PATCH] x86: Allow users to force a panic on NMI · 8da5adda
      Don Zickus authored
      To quote Alan Cox:
      
      The default Linux behaviour on an NMI of either memory or unknown is to
      continue operation. For many environments such as scientific computing
      it is preferable that the box is taken out and the error dealt with than
      an uncorrected parity/ECC error get propogated.
      
      A small number of systems do generate NMI's for bizarre random reasons
      such as power management so the default is unchanged. In other respects
      the new proc/sys entry works like the existing panic controls already in
      that directory.
      
      This is separate to the edac support - EDAC allows supported chipsets to
      handle ECC errors well, this change allows unsupported cases to at least
      panic rather than cause problems further down the line.
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      8da5adda
    • Don Zickus's avatar
      [PATCH] x86: Add abilty to enable/disable nmi watchdog with sysctl · 407984f1
      Don Zickus authored
      Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi
      watchdog.
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      407984f1
    • Don Zickus's avatar
      [PATCH] i386/x86-64: Remove un/set_nmi_callback and reserve/release_lapic_nmi functions · 2fbe7b25
      Don Zickus authored
      Removes the un/set_nmi_callback and reserve/release_lapic_nmi functions as
      they are no longer needed.  The various subsystems are modified to register
      with the die_notifier instead.
      
      Also includes compile fixes by Andrew Morton.
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      2fbe7b25
  14. 03 Jul, 2006 1 commit
    • Christoph Lameter's avatar
      [PATCH] ZVC/zone_reclaim: Leave 1% of unmapped pagecache pages for file I/O · 9614634f
      Christoph Lameter authored
      It turns out that it is advantageous to leave a small portion of unmapped file
      backed pages if all of a zone's pages (or almost all pages) are allocated and
      so the page allocator has to go off-node.
      
      This allows recently used file I/O buffers to stay on the node and
      reduces the times that zone reclaim is invoked if file I/O occurs
      when we run out of memory in a zone.
      
      The problem is that zone reclaim runs too frequently when the page cache is
      used for file I/O (read write and therefore unmapped pages!) alone and we have
      almost all pages of the zone allocated.  Zone reclaim may remove 32 unmapped
      pages.  File I/O will use these pages for the next read/write requests and the
      unmapped pages increase.  After the zone has filled up again zone reclaim will
      remove it again after only 32 pages.  This cycle is too inefficient and there
      are potentially too many zone reclaim cycles.
      
      With the 1% boundary we may still remove all unmapped pages for file I/O in
      zone reclaim pass.  However.  it will take a large number of read and writes
      to get back to 1% again where we trigger zone reclaim again.
      
      The zone reclaim 2.6.16/17 does not show this behavior because we have a 30
      second timeout.
      
      [akpm@osdl.org: rename the /proc file and the variable]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9614634f
  15. 30 Jun, 2006 2 commits
  16. 27 Jun, 2006 2 commits
    • Ingo Molnar's avatar
      [PATCH] pi-futex: rt mutex core · 23f78d4a
      Ingo Molnar authored
      Core functions for the rt-mutex subsystem.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      23f78d4a
    • Ingo Molnar's avatar
      [PATCH] vdso: randomize the i386 vDSO by moving it into a vma · e6e5494c
      Ingo Molnar authored
      Move the i386 VDSO down into a vma and thus randomize it.
      
      Besides the security implications, this feature also helps debuggers, which
      can COW a vma-backed VDSO just like a normal DSO and can thus do
      single-stepping and other debugging features.
      
      It's good for hypervisors (Xen, VMWare) too, which typically live in the same
      high-mapped address space as the VDSO, hence whenever the VDSO is used, they
      get lots of guest pagefaults and have to fix such guest accesses up - which
      slows things down instead of speeding things up (the primary purpose of the
      VDSO).
      
      There's a new CONFIG_COMPAT_VDSO (default=y) option, which provides support
      for older glibcs that still rely on a prelinked high-mapped VDSO.  Newer
      distributions (using glibc 2.3.3 or later) can turn this option off.  Turning
      it off is also recommended for security reasons: attackers cannot use the
      predictable high-mapped VDSO page as syscall trampoline anymore.
      
      There is a new vdso=[0|1] boot option as well, and a runtime
      /proc/sys/vm/vdso_enabled sysctl switch, that allows the VDSO to be turned
      on/off.
      
      (This version of the VDSO-randomization patch also has working ELF
      coredumping, the previous patch crashed in the coredumping code.)
      
      This code is a combined work of the exec-shield VDSO randomization
      code and Gerd Hoffmann's hypervisor-centric VDSO patch. Rusty Russell
      started this patch and i completed it.
      
      [akpm@osdl.org: cleanups]
      [akpm@osdl.org: compile fix]
      [akpm@osdl.org: compile fix 2]
      [akpm@osdl.org: compile fix 3]
      [akpm@osdl.org: revernt MAXMEM change]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArjan van de Ven <arjan@infradead.org>
      Cc: Gerd Hoffmann <kraxel@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e6e5494c
  17. 26 Jun, 2006 1 commit
  18. 25 Jun, 2006 1 commit