1. 16 Oct, 2007 1 commit
  2. 15 Oct, 2007 4 commits
  3. 13 Oct, 2007 1 commit
    • Al Viro's avatar
      minimal build fixes for uml (fallout from x86 merge) · 2b8232ce
      Al Viro authored
       a) include/asm-um/arch can't just point to include/asm-$(SUBARCH) now
       b) arch/{i386,x86_64}/crypto are merged now
       c) subarch-obj needed changes
       d) cpufeature_64.h should pull "cpufeature_32.h", not <asm/cpufeature_32.h>
          since it can be included from asm-um/cpufeature.h
       e) in case of uml-i386 we need CONFIG_X86_32 for make and gcc, but not
          for Kconfig
       f) sysctl.c shouldn't do vdso_enabled for uml-i386 (actually, that one
          should be registered from corresponding arch/*/kernel/*, with ifdef
          going away; that's a separate patch, though).
      
      With that and with Stephen's patch ("[PATCH net-2.6] uml: hard_header fix")
      we have uml allmodconfig building both on i386 and amd64.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2b8232ce
  4. 11 Oct, 2007 1 commit
  5. 19 Sep, 2007 1 commit
    • Ingo Molnar's avatar
      sched: add /proc/sys/kernel/sched_compat_yield · 1799e35d
      Ingo Molnar authored
      add /proc/sys/kernel/sched_compat_yield to make sys_sched_yield()
      more agressive, by moving the yielding task to the last position
      in the rbtree.
      
      with sched_compat_yield=0:
      
         PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
        2539 mingo     20   0  1576  252  204 R   50  0.0   0:02.03 loop_yield
        2541 mingo     20   0  1576  244  196 R   50  0.0   0:02.05 loop
      
      with sched_compat_yield=1:
      
         PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
        2584 mingo     20   0  1576  248  196 R   99  0.0   0:52.45 loop
        2582 mingo     20   0  1576  256  204 R    0  0.0   0:00.00 loop_yield
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      1799e35d
  6. 25 Aug, 2007 3 commits
  7. 19 Aug, 2007 1 commit
  8. 11 Aug, 2007 1 commit
  9. 29 Jul, 2007 1 commit
  10. 24 Jul, 2007 1 commit
  11. 22 Jul, 2007 1 commit
    • Masoud Asgharifard Sharbiani's avatar
      x86: i386-show-unhandled-signals-v3 · abd4f750
      Masoud Asgharifard Sharbiani authored
      This patch makes the i386 behave the same way that x86_64 does when a
      segfault happens.  A line gets printed to the kernel log so that tools
      that need to check for failures can behave more uniformly between
      debug.show_unhandled_signals sysctl variable to 0 (or by doing echo 0 >
      /proc/sys/debug/exception-trace)
      
      Also, all of the lines being printed are now using printk_ratelimit() to
      deny the ability of DoS from a local user with a program like the
      following:
      
      main()
      {
             while (1)
                     if (!fork()) *(int *)0 = 0;
      }
      
      This new revision also includes the fix that Andrew did which got rid of
      new sysctl that was added to the system in earlier versions of this.
      Also, 'show-unhandled-signals' sysctl has been renamed back to the old
      'exception-trace' to avoid breakage of people's scripts.
      
      AK: Enabling by default for i386 will be likely controversal, but let's see what happens
      AK: Really folks, before complaining just fix your segfaults
      AK: I bet this will find a lot of silent issues
      Signed-off-by: default avatarMasoud Sharbiani <masouds@google.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      [ Personally, I've found the complaints useful on x86-64, so I'm all for
        this. That said, I wonder if we could do it more prettily..   -Linus ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      abd4f750
  12. 19 Jul, 2007 5 commits
    • Andrew Morton's avatar
      kernel/sysctl.c: finish off the warning comments · ed2c12f3
      Andrew Morton authored
      I've been chasing these comments around this file all week.  Hopefully we're
      straight now.
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ed2c12f3
    • Peter Zijlstra's avatar
      lockstat: core infrastructure · f20786ff
      Peter Zijlstra authored
      Introduce the core lock statistics code.
      
      Lock statistics provides lock wait-time and hold-time (as well as the count
      of corresponding contention and acquisitions events). Also, the first few
      call-sites that encounter contention are tracked.
      
      Lock wait-time is the time spent waiting on the lock. This provides insight
      into the locking scheme, that is, a heavily contended lock is indicative of
      a too coarse locking scheme.
      
      Lock hold-time is the duration the lock was held, this provides a reference for
      the wait-time numbers, so they can be put into perspective.
      
        1)
          lock
        2)
          ... do stuff ..
          unlock
        3)
      
      The time between 1 and 2 is the wait-time. The time between 2 and 3 is the
      hold-time.
      
      The lockdep held-lock tracking code is reused, because it already collects locks
      into meaningful groups (classes), and because it is an existing infrastructure
      for lock instrumentation.
      
      Currently lockdep tracks lock acquisition with two hooks:
      
        lock()
          lock_acquire()
          _lock()
      
       ... code protected by lock ...
      
        unlock()
          lock_release()
          _unlock()
      
      We need to extend this with two more hooks, in order to measure contention.
      
        lock_contended() - used to measure contention events
        lock_acquired()  - completion of the contention
      
      These are then placed the following way:
      
        lock()
          lock_acquire()
          if (!_try_lock())
            lock_contended()
            _lock()
            lock_acquired()
      
       ... do locked stuff ...
      
        unlock()
          lock_release()
          _unlock()
      
      (Note: the try_lock() 'trick' is used to avoid instrumenting all platform
             dependent lock primitive implementations.)
      
      It is also possible to toggle the two lockdep features at runtime using:
      
        /proc/sys/kernel/prove_locking
        /proc/sys/kernel/lock_stat
      
      (esp. turning off the O(n^2) prove_locking functionaliy can help)
      
      [akpm@linux-foundation.org: build fixes]
      [akpm@linux-foundation.org: nuke unneeded ifdefs]
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarJason Baron <jbaron@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f20786ff
    • Kawai, Hidehiro's avatar
      coredump masking: bound suid_dumpable sysctl · 76fdbb25
      Kawai, Hidehiro authored
      This patch series is version 5 of the core dump masking feature, which
      controls which VMAs should be dumped based on their memory types and
      per-process flags.
      
      I adopted most of Andrew's suggestion at the previous version.  He also
      suggested using system call instead of /proc/<pid>/ interface, I decided to
      use the latter continuously because adding new system call with pid argument
      will give a big impact on the kernel.
      
      You can access the per-process flags via /proc/<pid>/coredump_filter
      interface.  coredump_filter represents a bitmask of memory types, and if a bit
      is set, VMAs of corresponding memory type are written into a core file when
      the process is dumped.  The bitmask is inherited from the parent process when
      a process is created.
      
      The original purpose is to avoid longtime system slowdown when a number of
      processes which share a huge shared memory are dumped at the same time.  To
      achieve this purpose, this patch series adds an ability to suppress dumping
      anonymous shared memory for specified processes.  In this version, three other
      memory types are also supported.
      
      Here are the coredump_filter bits:
        bit 0: anonymous private memory
        bit 1: anonymous shared memory
        bit 2: file-backed private memory
        bit 3: file-backed shared memory
      
      The default value of coredump_filter is 0x3.  This means the new core dump
      routine has the same behavior as conventional behavior by default.
      
      In this version, coredump_filter bits and mm.dumpable are merged into
      mm.flags, and it is accessed by atomic bitops.
      
      The supported core file formats are ELF and ELF-FDPIC.  ELF has been tested,
      but ELF-FDPIC has not been built and tested because I don't have the test
      environment.
      
      This patch limits a value of suid_dumpable sysctl to the range of 0 to 2.
      Signed-off-by: default avatarHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      76fdbb25
    • Peter Zijlstra's avatar
      audit: rework execve audit · bdf4c48a
      Peter Zijlstra authored
      The purpose of audit_bprm() is to log the argv array to a userspace daemon at
      the end of the execve system call.  Since user-space hasn't had time to run,
      this array is still in pristine state on the process' stack; so no need to
      copy it, we can just grab it from there.
      
      In order to minimize the damage to audit_log_*() copy each string into a
      temporary kernel buffer first.
      
      Currently the audit code requires that the full argument vector fits in a
      single packet.  So currently it does clip the argv size to a (sysctl) limit,
      but only when execve auditing is enabled.
      
      If the audit protocol gets extended to allow for multiple packets this check
      can be removed.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarOllie Wild <aaw@google.com>
      Cc: <linux-audit@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bdf4c48a
    • Pavel Machek's avatar
      PM: Integrate beeping flag with existing acpi_sleep flags · 77afcf78
      Pavel Machek authored
      Move "debug during resume from s2ram" into the variable we already use
      for real-mode flags to simplify code. It also closes nasty trap for
      the user in acpi_sleep_setup; order of parameters actually mattered there,
      acpi_sleep=s3_bios,s3_mode doing something different from
      acpi_sleep=s3_mode,s3_bios.
      Signed-off-by: default avatarPavel Machek <pavel@suse.cz>
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      77afcf78
  13. 18 Jul, 2007 1 commit
    • Jeremy Fitzhardinge's avatar
      Add common orderly_poweroff() · 10a0a8d4
      Jeremy Fitzhardinge authored
      Various pieces of code around the kernel want to be able to trigger an
      orderly poweroff.  This pulls them together into a single
      implementation.
      
      By default the poweroff command is /sbin/poweroff, but it can be set
      via sysctl: kernel/poweroff_cmd.  This is split at whitespace, so it
      can include command-line arguments.
      
      This patch replaces four other instances of invoking either "poweroff"
      or "shutdown -h now": two sbus drivers, and acpi thermal
      management.
      
      sparc64 has its own "powerd"; still need to determine whether it should
      be replaced by orderly_poweroff().
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Acked-by: default avatarLen Brown <lenb@kernel.org>
      Signed-off-by: default avatarChris Wright <chrisw@sous-sol.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: David S. Miller <davem@davemloft.net>
      10a0a8d4
  14. 17 Jul, 2007 2 commits
  15. 16 Jul, 2007 4 commits
    • Linus Torvalds's avatar
      Remove duplicate comments from sysctl.c · 7144521f
      Linus Torvalds authored
      Randy Dunlap noticed that the recent comment clarifications from Andrew
      had somehow gotten duplicated.  Quoth Andrew: "hm, that could have been
      some late-night reject-fixing."
      
      Fix it up.
      
      Cc: From: Andrew Morton <akpm@linux-foundation.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7144521f
    • Andrew Morton's avatar
      sysctl.c: add text telling people to use CTL_UNNUMBERED · 2be7fe07
      Andrew Morton authored
      Hopefully this will help people to understand the new regime.
      
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2be7fe07
    • Ingo Molnar's avatar
      vdso: print fatal signals · 45807a1d
      Ingo Molnar authored
      Add the print-fatal-signals=1 boot option and the
      /proc/sys/kernel/print-fatal-signals runtime switch.
      
      This feature prints some minimal information about userspace segfaults to
      the kernel console.  This is useful to find early bootup bugs where
      userspace debugging is very hard.
      
      Defaults to off.
      
      [akpm@linux-foundation.org: Don't add new sysctl numbers]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArjan van de Ven <arjan@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      45807a1d
    • KAMEZAWA Hiroyuki's avatar
      change zonelist order: zonelist order selection logic · f0c0b2b8
      KAMEZAWA Hiroyuki authored
      Make zonelist creation policy selectable from sysctl/boot option v6.
      
      This patch makes NUMA's zonelist (of pgdat) order selectable.
      Available order are Default(automatic)/ Node-based / Zone-based.
      
      [Default Order]
      The kernel selects Node-based or Zone-based order automatically.
      
      [Node-based Order]
      This policy treats the locality of memory as the most important parameter.
      Zonelist order is created by each zone's locality. This means lower zones
      (ex. ZONE_DMA) can be used before higher zone (ex. ZONE_NORMAL) exhausion.
      IOW. ZONE_DMA will be in the middle of zonelist.
      current 2.6.21 kernel uses this.
      
      Pros.
       * A user can expect local memory as much as possible.
      Cons.
       * lower zone will be exhansted before higher zone. This may cause OOM_KILL.
      
      Maybe suitable if ZONE_DMA is relatively big and you never see OOM_KILL
      because of ZONE_DMA exhaution and you need the best locality.
      
      (example)
      assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.
      
      *node(0)'s memory allocation order:
      
       node(0)'s NORMAL -> node(0)'s DMA -> node(1)'s NORMAL.
      
      *node(1)'s memory allocation order:
      
       node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA.
      
      [Zone-based order]
      This policy treats the zone type as the most important parameter.
      Zonelist order is created by zone-type order. This means lower zone
      never be used bofere higher zone exhaustion.
      IOW. ZONE_DMA will be always at the tail of zonelist.
      
      Pros.
       * OOM_KILL(bacause of lower zone) occurs only if the whole zones are exhausted.
      Cons.
       * memory locality may not be best.
      
      (example)
      assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.
      
      *node(0)'s memory allocation order:
      
       node(0)'s NORMAL -> node(1)'s NORMAL -> node(0)'s DMA.
      
      *node(1)'s memory allocation order:
      
       node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA.
      
      bootoption "numa_zonelist_order=" and proc/sysctl is supporetd.
      
      command:
      %echo N > /proc/sys/vm/numa_zonelist_order
      
      Will rebuild zonelist in Node-based order.
      
      command:
      %echo Z > /proc/sys/vm/numa_zonelist_order
      
      Will rebuild zonelist in Zone-based order.
      
      Thanks to Lee Schermerhorn, he gives me much help and codes.
      
      [Lee.Schermerhorn@hp.com: add check_highest_zone to build_zonelists_in_zone_order]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "jesse.barnes@intel.com" <jesse.barnes@intel.com>
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f0c0b2b8
  16. 11 Jul, 2007 1 commit
    • Eric Paris's avatar
      security: Protection for exploiting null dereference using mmap · ed032189
      Eric Paris authored
      Add a new security check on mmap operations to see if the user is attempting
      to mmap to low area of the address space.  The amount of space protected is
      indicated by the new proc tunable /proc/sys/vm/mmap_min_addr and defaults to
      0, preserving existing behavior.
      
      This patch uses a new SELinux security class "memprotect."  Policy already
      contains a number of allow rules like a_t self:process * (unconfined_t being
      one of them) which mean that putting this check in the process class (its
      best current fit) would make it useless as all user processes, which we also
      want to protect against, would be allowed. By taking the memprotect name of
      the new class it will also make it possible for us to move some of the other
      memory protect permissions out of 'process' and into the new class next time
      we bump the policy version number (which I also think is a good future idea)
      Acked-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: default avatarChris Wright <chrisw@sous-sol.org>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      ed032189
  17. 09 Jul, 2007 1 commit
  18. 17 May, 2007 1 commit
  19. 09 May, 2007 1 commit
  20. 08 May, 2007 1 commit
    • Kees Cook's avatar
      proc: maps protection · 5096add8
      Kees Cook authored
      The /proc/pid/ "maps", "smaps", and "numa_maps" files contain sensitive
      information about the memory location and usage of processes.  Issues:
      
      - maps should not be world-readable, especially if programs expect any
        kind of ASLR protection from local attackers.
      - maps cannot just be 0400 because "-D_FORTIFY_SOURCE=2 -O2" makes glibc
        check the maps when %n is in a *printf call, and a setuid(getuid())
        process wouldn't be able to read its own maps file.  (For reference
        see http://lkml.org/lkml/2006/1/22/150)
      - a system-wide toggle is needed to allow prior behavior in the case of
        non-root applications that depend on access to the maps contents.
      
      This change implements a check using "ptrace_may_attach" before allowing
      access to read the maps contents.  To control this protection, the new knob
      /proc/sys/kernel/maps_protect has been added, with corresponding updates to
      the procfs documentation.
      
      [akpm@linux-foundation.org: build fixes]
      [akpm@linux-foundation.org: New sysctl numbers are old hat]
      Signed-off-by: default avatarKees Cook <kees@outflux.net>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5096add8
  21. 24 Apr, 2007 1 commit
  22. 04 Mar, 2007 1 commit
  23. 01 Mar, 2007 1 commit
  24. 14 Feb, 2007 4 commits