1. 14 Apr, 2010 1 commit
    • Dmitry Torokhov's avatar
      Input: implement SysRq as a separate input handler · 97f5f0cd
      Dmitry Torokhov authored
      Instead of keeping SysRq support inside of legacy keyboard driver split
      it out into a separate input handler (filter). This stops most SysRq input
      events from leaking into evdev clients (some events, such as first SysRq
      scancode - not keycode - event, are still leaked into both legacy keyboard
      and evdev).
      
      [martinez.javier@gmail.com: fix compile error when CONFIG_MAGIC_SYSRQ is
       not defined]
      Signed-off-by: default avatarDmitry Torokhov <dtor@mail.ru>
      97f5f0cd
  2. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  3. 16 Dec, 2009 1 commit
    • KAMEZAWA Hiroyuki's avatar
      oom-kill: fix NUMA constraint check with nodemask · 4365a567
      KAMEZAWA Hiroyuki authored
      Fix node-oriented allocation handling in oom-kill.c I myself think of this
      as a bugfix not as an ehnancement.
      
      In these days, things are changed as
        - alloc_pages() eats nodemask as its arguments, __alloc_pages_nodemask().
        - mempolicy don't maintain its own private zonelists.
        (And cpuset doesn't use nodemask for __alloc_pages_nodemask())
      
      So, current oom-killer's check function is wrong.
      
      This patch does
        - check nodemask, if nodemask && nodemask doesn't cover all
          node_states[N_HIGH_MEMORY], this is CONSTRAINT_MEMORY_POLICY.
        - Scan all zonelist under nodemask, if it hits cpuset's wall
          this faiulre is from cpuset.
      And
        - modifies the caller of out_of_memory not to call oom if __GFP_THISNODE.
          This doesn't change "current" behavior. If callers use __GFP_THISNODE
          it should handle "page allocation failure" by itself.
      
        - handle __GFP_NOFAIL+__GFP_THISNODE path.
          This is something like a FIXME but this gfpmask is not used now.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hioryu@jp.fujitsu.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4365a567
  4. 21 Sep, 2009 1 commit
    • Ingo Molnar's avatar
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar authored
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: default avatarStephane Eranian <eranian@google.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Reviewed-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cdd6c482
  5. 03 Aug, 2009 1 commit
    • Ingo Molnar's avatar
      debug lockups: Improve lockup detection, fix generic arch fallback · 47cab6a7
      Ingo Molnar authored
      As Andrew noted, my previous patch ("debug lockups: Improve lockup
      detection") broke/removed SysRq-L support from architecture that do
      not provide a __trigger_all_cpu_backtrace implementation.
      
      Restore a fallback path and clean up the SysRq-L machinery a bit:
      
       - Rename the arch method to arch_trigger_all_cpu_backtrace()
      
       - Simplify the define
      
       - Document the method a bit - in the hope of more architectures
         adding support for it.
      
      [ The patch touches Sparc code for the rename. ]
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      LKML-Reference: <20090802140809.7ec4bb6b.akpm@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      47cab6a7
  6. 02 Aug, 2009 1 commit
    • Ingo Molnar's avatar
      debug lockups: Improve lockup detection · c1dc0b9c
      Ingo Molnar authored
      When debugging a recent lockup bug i found various deficiencies
      in how our current lockup detection helpers work:
      
       - SysRq-L is not very efficient as it uses a workqueue, hence
         it cannot punch through hard lockups and cannot see through
         most soft lockups either.
      
       - The SysRq-L code depends on the NMI watchdog - which is off
         by default.
      
       - We dont print backtraces from the RCU code's built-in
         'RCU state machine is stuck' debug code. This debug
         code tends to be one of the first (and only) mechanisms
         that show that a lockup has occured.
      
      This patch changes the code so taht we:
      
       - Trigger the NMI backtrace code from SysRq-L instead of using
         a workqueue (which cannot punch through hard lockups)
      
       - Trigger print-all-CPU-backtraces from the RCU lockup detection
         code
      
      Also decouple the backtrace printing code from the NMI watchdog:
      
       - Dont use variable size cpumasks (it might not be initialized
         and they are a bit more fragile anyway)
      
       - Trigger an NMI immediately via an IPI, instead of waiting
         for the NMI tick to occur. This is a lot faster and can
         produce more relevant backtraces. It will also work if the
         NMI watchdog is disabled.
      
       - Dont print the 'dazed and confused' message when we print
         a backtrace from the NMI
      
       - Do a show_regs() plus a dump_stack() to get maximum info
         out of the dump. Worst-case we get two stacktraces - which
         is not a big deal. Sometimes, if register content is
         corrupted, the precise stack walker in show_regs() wont
         give us a full backtrace - in this case dump_stack() will
         do it.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c1dc0b9c
  7. 29 Jul, 2009 1 commit
    • Hidetoshi Seto's avatar
      sysrq, kdump: make sysrq-c consistent · cab8bd34
      Hidetoshi Seto authored
      commit d6580a9f ("kexec: sysrq: simplify
      sysrq-c handler") changed the behavior of sysrq-c to unconditional
      dereference of NULL pointer.  So in cases with CONFIG_KEXEC, where
      crash_kexec() was directly called from sysrq-c before, now it can be said
      that a step of "real oops" was inserted before starting kdump.
      
      However, in contrast to oops via SysRq-c from keyboard which results in
      panic due to in_interrupt(), oops via "echo c > /proc/sysrq-trigger" will
      not become panic unless panic_on_oops=1.  It means that even if dump is
      properly configured to be taken on panic, the sysrq-c from proc interface
      might not start crashdump while the sysrq-c from keyboard can start
      crashdump.  This confuses traditional users of kdump, i.e.  people who
      expect sysrq-c to do common behavior in both of the keyboard and proc
      interface.
      
      This patch brings the keyboard and proc interface behavior of sysrq-c in
      line, by forcing panic_on_oops=1 before oops in sysrq-c handler.
      
      And some updates in documentation are included, to clarify that there is
      no longer dependency with CONFIG_KEXEC, and that now the system can just
      crash by sysrq-c if no dump mechanism is configured.
      Signed-off-by: default avatarHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Brayan Arraes <brayan@yack.com.br>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cab8bd34
  8. 18 Jun, 2009 1 commit
    • Neil Horman's avatar
      kexec: sysrq: simplify sysrq-c handler · d6580a9f
      Neil Horman authored
      Currently the sysrq-c handler is bit over-engineered.  Its behavior is
      dependent on a few compile time and run time factors that alter its
      behavior which is really unnecessecary.
      
      If CONFIG_KEXEC is not configured, sysrq-c, crashes the system with a NULL
      pointer dereference.  If CONFIG_KEXEC is configured, it calls crash_kexec
      directly, which implies that the kexec kernel will either be booted (if
      its been previously loaded), or it will simply do nothing (the no kexec
      kernel has been loaded).
      
      It would be much easier to just simplify the whole thing to dereference a
      NULL pointer all the time regardless of configuration.  That way, it will
      always try to crash the system, and if a kexec kernel has been loaded into
      reserved space, it will still boot from the page fault trap handler
      (assuming panic_on_oops is set appropriately).
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Brayan Arraes <brayan@yack.com.br>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d6580a9f
  9. 15 May, 2009 1 commit
  10. 13 Apr, 2009 1 commit
  11. 01 Apr, 2009 1 commit
    • Eric Sandeen's avatar
      filesystem freeze: allow SysRq emergency thaw to thaw frozen filesystems · c2d75438
      Eric Sandeen authored
      Now that the filesystem freeze operation has been elevated to the VFS, and
      is just an ioctl away, some sort of safety net for unintentionally frozen
      root filesystems may be in order.
      
      The timeout thaw originally proposed did not get merged, but perhaps
      something like this would be useful in emergencies.
      
      For example, freeze /path/to/mountpoint may freeze your root filesystem if
      you forgot that you had that unmounted.
      
      I chose 'j' as the last remaining character other than 'h' which is sort
      of reserved for help (because help is generated on any unknown character).
      
      I've tested this on a non-root fs with multiple (nested) freezers, as well
      as on a system rendered unresponsive due to a frozen root fs.
      
      [randy.dunlap@oracle.com: emergency thaw only if CONFIG_BLOCK enabled]
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Cc: Takashi Sato <t-sato@yk.jp.nec.com>
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c2d75438
  12. 31 Mar, 2009 1 commit
  13. 26 Jan, 2009 1 commit
  14. 15 Jan, 2009 1 commit
  15. 06 Jan, 2009 1 commit
  16. 26 Dec, 2008 1 commit
  17. 08 Dec, 2008 1 commit
    • Thomas Gleixner's avatar
      performance counters: core code · 0793a61d
      Thomas Gleixner authored
      Implement the core kernel bits of Performance Counters subsystem.
      
      The Linux Performance Counter subsystem provides an abstraction of
      performance counter hardware capabilities. It provides per task and per
      CPU counters, and it provides event capabilities on top of those.
      
      Performance counters are accessed via special file descriptors.
      There's one file descriptor per virtual counter used.
      
      The special file descriptor is opened via the perf_counter_open()
      system call:
      
       int
       perf_counter_open(u32 hw_event_type,
                         u32 hw_event_period,
                         u32 record_type,
                         pid_t pid,
                         int cpu);
      
      The syscall returns the new fd. The fd can be used via the normal
      VFS system calls: read() can be used to read the counter, fcntl()
      can be used to set the blocking mode, etc.
      
      Multiple counters can be kept open at a time, and the counters
      can be poll()ed.
      
      See more details in Documentation/perf-counters.txt.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0793a61d
  18. 04 Nov, 2008 1 commit
  19. 20 Oct, 2008 3 commits
  20. 16 Oct, 2008 2 commits
  21. 27 Jun, 2008 1 commit
  22. 20 May, 2008 1 commit
    • David S. Miller's avatar
      sparc64: Add global register dumping facility. · 93dae5b7
      David S. Miller authored
      When a cpu really is stuck in the kernel, it can be often
      impossible to figure out which cpu is stuck where.  The
      worst case is when the stuck cpu has interrupts disabled.
      
      Therefore, implement a global cpu state capture that uses
      SMP message interrupts which are not disabled by the
      normal IRQ enable/disable APIs of the kernel.
      
      As long as we can get a sysrq 'y' to the kernel, we can
      get a dump.  Even if the console interrupt cpu is wedged,
      we can trigger it from userspace using /proc/sysrq-trigger
      
      The output is made compact so that this facility is more
      useful on high cpu count systems, which is where this
      facility will likely find itself the most useful :)
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93dae5b7
  23. 29 Apr, 2008 1 commit
    • Rik van Riel's avatar
      sysrq: add show-backtrace-on-all-cpus function · 5045bcae
      Rik van Riel authored
      SysRQ-P is not always useful on SMP systems, since it usually ends up showing
      the backtrace of a CPU that is doing just fine, instead of the backtrace of
      the CPU that is having problems.
      
      This patch adds SysRQ show-all-cpus(L), which shows the backtrace of every
      active CPU in the system.  It skips idle CPUs because some SMP systems are
      just too large and we already know what the backtrace of the idle task looks
      like.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Randy Dunlap <randy.dunlap@oracle.com>
      Cc: <lwoodman@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5045bcae
  24. 28 Apr, 2008 1 commit
  25. 19 Oct, 2007 1 commit
    • Serge E. Hallyn's avatar
      pid namespaces: define is_global_init() and is_container_init() · b460cbc5
      Serge E. Hallyn authored
      is_init() is an ambiguous name for the pid==1 check.  Split it into
      is_global_init() and is_container_init().
      
      A cgroup init has it's tsk->pid == 1.
      
      A global init also has it's tsk->pid == 1 and it's active pid namespace
      is the init_pid_ns.  But rather than check the active pid namespace,
      compare the task structure with 'init_pid_ns.child_reaper', which is
      initialized during boot to the /sbin/init process and never changes.
      
      Changelog:
      
      	2.6.22-rc4-mm2-pidns1:
      	- Use 'init_pid_ns.child_reaper' to determine if a given task is the
      	  global init (/sbin/init) process. This would improve performance
      	  and remove dependence on the task_pid().
      
      	2.6.21-mm2-pidns2:
      
      	- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
      	  ppc,avr32}/traps.c for the _exception() call to is_global_init().
      	  This way, we kill only the cgroup if the cgroup's init has a
      	  bug rather than force a kernel panic.
      
      [akpm@linux-foundation.org: fix comment]
      [sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
      [bunk@stusta.de: kernel/pid.c: remove unused exports]
      [sukadev@us.ibm.com: Fix capability.c to work with threaded init]
      Signed-off-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@us.ibm.com>
      Acked-by: default avatarPavel Emelianov <xemul@openvz.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Herbert Poetzel <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b460cbc5
  26. 17 Oct, 2007 2 commits
  27. 08 May, 2007 1 commit
  28. 16 Feb, 2007 1 commit
  29. 13 Feb, 2007 1 commit
    • Eric W. Biederman's avatar
      [PATCH] Fix SAK_work workqueue initialization. · 7f1f86a0
      Eric W. Biederman authored
      Somewhere in the rewrite of the work queues my cleanup of SAK handling
      got broken.  Maybe I didn't retest it properly or possibly the API
      was changing so fast I missed something.  Regardless currently
      triggering a SAK now generates an ugly BUG_ON and kills the kernel.
      
      Thanks to Alexey Dobriyan <adobriyan@openvz.org> for spotting this.
      
      This modifies the use of SAK_work to initialize it when the data
      structure it resides in is initialized, and to simply call
      schedule_work when we need to generate a SAK.  I update both
      data structures that have a SAK_work member for consistency.
      
      All of the old PREPARE_WORK calls that are now gone.
      
      If we call schedule_work again before it has processed it
      has generated the first SAK it will simply ignore the duplicate
      schedule_work request.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7f1f86a0
  30. 11 Feb, 2007 1 commit
    • Eric W. Biederman's avatar
      [PATCH] vt: refactor console SAK processing · 8b6312f4
      Eric W. Biederman authored
      This does several things.
      - It moves looking up of the current foreground console into process
        context where we can safely take the semaphore that protects this
        operation.
      - It uses the new flavor of work queue processing.
      - This generates a factor of do_SAK, __do_SAK that runs immediately.
      - This calls __do_SAK with the console semaphore held ensuring nothing
        else happens to the console while we process the SAK operation.
      - With the console SAK processing moved into process context this
        patch removes the xchg operations that I used to attempt to attomically
        update struct pid, because of the strange locking used in the SAK processing.
        With SAK using the normal console semaphore nothing special is needed.
      
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8b6312f4
  31. 01 Feb, 2007 1 commit
  32. 13 Dec, 2006 1 commit
  33. 07 Dec, 2006 1 commit
  34. 22 Nov, 2006 1 commit
    • David Howells's avatar
      WorkStruct: Pass the work_struct pointer instead of context data · 65f27f38
      David Howells authored
      Pass the work_struct pointer to the work function rather than context data.
      The work function can use container_of() to work out the data.
      
      For the cases where the container of the work_struct may go away the moment the
      pending bit is cleared, it is made possible to defer the release of the
      structure by deferring the clearing of the pending bit.
      
      To make this work, an extra flag is introduced into the management side of the
      work_struct.  This governs auto-release of the structure upon execution.
      
      Ordinarily, the work queue executor would release the work_struct for further
      scheduling or deallocation by clearing the pending bit prior to jumping to the
      work function.  This means that, unless the driver makes some guarantee itself
      that the work_struct won't go away, the work function may not access anything
      else in the work_struct or its container lest they be deallocated..  This is a
      problem if the auxiliary data is taken away (as done by the last patch).
      
      However, if the pending bit is *not* cleared before jumping to the work
      function, then the work function *may* access the work_struct and its container
      with no problems.  But then the work function must itself release the
      work_struct by calling work_release().
      
      In most cases, automatic release is fine, so this is the default.  Special
      initiators exist for the non-auto-release case (ending in _NAR).
      Signed-Off-By: default avatarDavid Howells <dhowells@redhat.com>
      65f27f38
  35. 06 Oct, 2006 1 commit
  36. 05 Oct, 2006 1 commit
    • David Howells's avatar
      IRQ: Maintain regs pointer globally rather than passing to IRQ handlers · 7d12e780
      David Howells authored
      Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
      of passing regs around manually through all ~1800 interrupt handlers in the
      Linux kernel.
      
      The regs pointer is used in few places, but it potentially costs both stack
      space and code to pass it around.  On the FRV arch, removing the regs parameter
      from all the genirq function results in a 20% speed up of the IRQ exit path
      (ie: from leaving timer_interrupt() to leaving do_IRQ()).
      
      Where appropriate, an arch may override the generic storage facility and do
      something different with the variable.  On FRV, for instance, the address is
      maintained in GR28 at all times inside the kernel as part of general exception
      handling.
      
      Having looked over the code, it appears that the parameter may be handed down
      through up to twenty or so layers of functions.  Consider a USB character
      device attached to a USB hub, attached to a USB controller that posts its
      interrupts through a cascaded auxiliary interrupt controller.  A character
      device driver may want to pass regs to the sysrq handler through the input
      layer which adds another few layers of parameter passing.
      
      I've build this code with allyesconfig for x86_64 and i386.  I've runtested the
      main part of the code on FRV and i386, though I can't test most of the drivers.
      I've also done partial conversion for powerpc and MIPS - these at least compile
      with minimal configurations.
      
      This will affect all archs.  Mostly the changes should be relatively easy.
      Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
      
      	struct pt_regs *old_regs = set_irq_regs(regs);
      
      And put the old one back at the end:
      
      	set_irq_regs(old_regs);
      
      Don't pass regs through to generic_handle_irq() or __do_IRQ().
      
      In timer_interrupt(), this sort of change will be necessary:
      
      	-	update_process_times(user_mode(regs));
      	-	profile_tick(CPU_PROFILING, regs);
      	+	update_process_times(user_mode(get_irq_regs()));
      	+	profile_tick(CPU_PROFILING);
      
      I'd like to move update_process_times()'s use of get_irq_regs() into itself,
      except that i386, alone of the archs, uses something other than user_mode().
      
      Some notes on the interrupt handling in the drivers:
      
       (*) input_dev() is now gone entirely.  The regs pointer is no longer stored in
           the input_dev struct.
      
       (*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking.  It does
           something different depending on whether it's been supplied with a regs
           pointer or not.
      
       (*) Various IRQ handler function pointers have been moved to type
           irq_handler_t.
      Signed-Off-By: default avatarDavid Howells <dhowells@redhat.com>
      (cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
      7d12e780