1. 16 Feb, 2007 1 commit
  2. 13 Feb, 2007 4 commits
    • Stephane Eranian's avatar
      [PATCH] i386: add idle notifier · 2ff2d3d7
      Stephane Eranian authored
      
      
      Add a notifier mechanism to the low level idle loop.  You can register a
      callback function which gets invoked on entry and exit from the low level idle
      loop.  The low level idle loop is defined as the polling loop, low-power call,
      or the mwait instruction.  Interrupts processed by the idle thread are not
      considered part of the low level loop.
      
      The notifier can be used to measure precisely how much is spent in useless
      execution (or low power mode).  The perfmon subsystem uses it to turn on/off
      monitoring.
      
      Signed-off-by: default avatarstephane eranian <eranian@hpl.hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      2ff2d3d7
    • Zachary Amsden's avatar
      [PATCH] i386: iOPL handling for paravirt guests · 8b151144
      Zachary Amsden authored
      
      
      I found a clever way to make the extra IOPL switching invisible to
      non-paravirt compiles - since kernel_rpl is statically defined to be zero
      there, and only non-zero rpl kernel have a problem restoring IOPL, as popf
      does not restore IOPL flags unless run at CPL-0.
      
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      8b151144
    • Zachary Amsden's avatar
      [PATCH] i386: paravirt CPU hypercall batching mode · 9226d125
      Zachary Amsden authored
      
      
      The VMI ROM has a mode where hypercalls can be queued and batched.  This turns
      out to be a significant win during context switch, but must be done at a
      specific point before side effects to CPU state are visible to subsequent
      instructions.  This is similar to the MMU batching hooks already provided.
      The same hooks could be used by the Xen backend to implement a context switch
      multicall.
      
      To explain a bit more about lazy modes in the paravirt patches, basically, the
      idea is that only one of lazy CPU or MMU mode can be active at any given time.
       Lazy MMU mode is similar to this lazy CPU mode, and allows for batching of
      multiple PTE updates (say, inside a remap loop), but to avoid keeping some
      kind of state machine about when to flush cpu or mmu updates, we just allow
      one or the other to be active.  Although there is no real reason a more
      comprehensive scheme could not be implemented, there is also no demonstrated
      need for this extra complexity.
      
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      9226d125
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Convert i386 PDA code to use %fs · 464d1a78
      Jeremy Fitzhardinge authored
      
      
      Convert the PDA code to use %fs rather than %gs as the segment for
      per-processor data.  This is because some processors show a small but
      measurable performance gain for reloading a NULL segment selector (as %fs
      generally is in user-space) versus a non-NULL one (as %gs generally is).
      
      On modern processors the difference is very small, perhaps undetectable.
      Some old AMD "K6 3D+" processors are noticably slower when %fs is used
      rather than %gs; I have no idea why this might be, but I think they're
      sufficiently rare that it doesn't matter much.
      
      This patch also fixes the math emulator, which had not been adjusted to
      match the changed struct pt_regs.
      
      [frederik.deweerdt@gmail.com: fixit with gdb]
      [mingo@elte.hu: Fix KVM too]
      
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Ian Campbell <Ian.Campbell@XenSource.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarZachary Amsden <zach@vmware.com>
      Cc: Eric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarFrederik Deweerdt <frederik.deweerdt@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      464d1a78
  3. 22 Dec, 2006 1 commit
    • Ingo Molnar's avatar
      [PATCH] sched: fix bad missed wakeups in the i386, x86_64, ia64, ACPI and APM idle code · 0888f06a
      Ingo Molnar authored
      Fernando Lopez-Lezcano reported frequent scheduling latencies and audio
      xruns starting at the 2.6.18-rt kernel, and those problems persisted all
      until current -rt kernels. The latencies were serious and unjustified by
      system load, often in the milliseconds range.
      
      After a patient and heroic multi-month effort of Fernando, where he
      tested dozens of kernels, tried various configs, boot options,
      test-patches of mine and provided latency traces of those incidents, the
      following 'smoking gun' trace was captured by him:
      
                       _------=> CPU#
                      / _-----=> irqs-off
                     | / _----=> need-resched
                     || / _---=> hardirq/softirq
                     ||| / _--=> preempt-depth
                     |||| /
                     |||||     delay
         cmd     pid ||||| time  |   caller
            \   /    |||||   \   |   /
        IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup (try_to_wake_up)
        IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup <<...>-5856> (37 0)
        IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup (c01262ba 0 0)
        IRQ_19-1479  1D..1    0us : resched_task (try_to_wake_up)
        IRQ_19-1479  1D..1    0us : __spin_unlock_irqrestore (try_to_wake_up)
        ...
        <idle>-0     1...1   11us!: default_idle (cpu_idle)
        ...
        <idle>-0     0Dn.1  602us : smp_apic_timer_interrupt (c0103baf 1 0)
        ...
         <...>-5856  0D..2  618us : __switch_to (__schedule)
         <...>-5856  0D..2  618us : __schedule <<idle>-0> (20 162)
         <...>-5856  0D..2  619us : __spin_unlock_irq (__schedule)
         <...>-5856  0...1  619us : trace_stop_sched_switched (__schedule)
         <...>-5856  0D..1  619us : trace_stop_sched_switched <<...>-5856> (37 0)
      
      what is visible in this trace is that CPU#1 ran try_to_wake_up() for
      PID:5856, it placed PID:5856 on CPU#0's runqueue and ran resched_task()
      for CPU#0. But it decided to not send an IPI that no CPU - due to
      TS_POLLING. But CPU#0 never woke up after its NEED_RESCHED bit was set,
      and only rescheduled to PID:5856 upon the next lapic timer IRQ. The
      result was a 600+ usecs latency and a missed wakeup!
      
      the bug turned out to be an idle-wakeup bug introduced into the mainline
      kernel this summer via an optimization in the x86_64 tree:
      
          commit 495ab9c0
      
      
          Author: Andi Kleen <ak@suse.de>
          Date:   Mon Jun 26 13:59:11 2006 +0200
      
          [PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status
      
          During some profiling I noticed that default_idle causes a lot of
          memory traffic. I think that is caused by the atomic operations
          to clear/set the polling flag in thread_info. There is actually
          no reason to make this atomic - only the idle thread does it
          to itself, other CPUs only read it. So I moved it into ti->status.
      
      the problem is this type of change:
      
              if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
      -               clear_thread_flag(TIF_POLLING_NRFLAG);
      +               current_thread_info()->status &= ~TS_POLLING;
                      smp_mb__after_clear_bit();
                      while (!need_resched()) {
                              local_irq_disable();
      
      this changes clear_thread_flag() to an explicit clearing of TS_POLLING.
      clear_thread_flag() is defined as:
      
              clear_bit(flag, &ti->flags);
      
      and clear_bit() is a LOCK-ed atomic instruction on all x86 platforms:
      
        static inline void clear_bit(int nr, volatile unsigned long * addr)
        {
                __asm__ __volatile__( LOCK_PREFIX
                        "btrl %1,%0"
      
      hence smp_mb__after_clear_bit() is defined as a simple compile barrier:
      
        #define smp_mb__after_clear_bit()       barrier()
      
      but the explicit TS_POLLING clearing introduced by the patch:
      
      +               current_thread_info()->status &= ~TS_POLLING;
      
      is not an atomic op! So the clearing of the TS_POLLING bit is freely
      reorderable with the reading of the NEED_RESCHED bit - and both now
      reside in different memory addresses.
      
      CPU idle wakeup very much depends on ordered memory ops, the clearing of
      the TS_POLLING flag must always be done before we test need_resched()
      and hit the idle instruction(s). [Symmetrically, the wakeup code needs
      to set NEED_RESCHED before it tests the TS_POLLING flag, so memory
      ordering is paramount.]
      
      Fernando's dual-core Athlon64 system has a sufficiently advanced memory
      ordering model so that it triggered this scenario very often.
      
      ( And it also turned out that the reason why these latencies never
        triggered on my testsystems is that i routinely use idle=poll, which
        was the only idle variant not affected by this bug. )
      
      The fix is to change the smp_mb__after_clear_bit() to an smp_mb(), to
      act as an absolute barrier between the TS_POLLING write and the
      NEED_RESCHED read. This affects almost all idling methods (default,
      ACPI, APM), on all 3 x86 architectures: i386, x86_64, ia64.
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Tested-by: default avatarFernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0888f06a
  4. 06 Dec, 2006 6 commits
    • Chuck Ebbert's avatar
      [PATCH] i386: remove IOPL check on task switch · 8c898126
      Chuck Ebbert authored
      
      
      IOPL is implicitly saved and restored on task switch,
      so explicit check is no longer needed.
      
      Signed-off-by: default avatarChuck Ebbert <76306.1226@compuserve.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      8c898126
    • Andi Kleen's avatar
      [PATCH] x86: Don't use nested idle loops · 72690a21
      Andi Kleen authored
      
      
      Currently the idle loop has two nested loops -- one high level
      in cpu_idle and in some low level idle functions another one.
      
      Looping in the low level idle functions breaks the idle notifiers
      because interrupts waking up sleep states need to execute
      exit_idle() which is only in cpu_idle().
      
      So don't do that, only loop in cpu_idle(). This only removes
      code.
      
      In some cases e.g. poll_idle the idle loop is a little longer
      now because cpu_idle checks more things. I hope that isn't a problem
      ACPI idle doesn't change behaviour because it never looped anyways.
      
      Cc: len.brown@intel.com
      Cc: eranian@hpl.hp.com
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      72690a21
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Implement "current" with the PDA · ec7fcaab
      Jeremy Fitzhardinge authored
      
      
      Use the pcurrent field in the PDA to implement the "current" macro.  This ends
      up compiling down to a single instruction to get the current task.
      
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Chuck Ebbert <76306.1226@compuserve.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      ec7fcaab
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Fix places where using %gs changes the usermode ABI · 66e10a44
      Jeremy Fitzhardinge authored
      
      
      There are a few places where the change in struct pt_regs and the use of %gs
      affect the userspace ABI.  These are primarily debugging interfaces where
      thread state can be inspected or extracted.
      
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Chuck Ebbert <76306.1226@compuserve.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      66e10a44
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Use %gs as the PDA base-segment in the kernel · f95d47ca
      Jeremy Fitzhardinge authored
      
      
      This patch is the meat of the PDA change.  This patch makes several related
      changes:
      
      1: Most significantly, %gs is now used in the kernel.  This means that on
         entry, the old value of %gs is saved away, and it is reloaded with
         __KERNEL_PDA.
      
      2: entry.S constructs the stack in the shape of struct pt_regs, and this
         is passed around the kernel so that the process's saved register
         state can be accessed.
      
         Unfortunately struct pt_regs doesn't currently have space for %gs
         (or %fs). This patch extends pt_regs to add space for gs (no space
         is allocated for %fs, since it won't be used, and it would just
         complicate the code in entry.S to work around the space).
      
      3: Because %gs is now saved on the stack like %ds, %es and the integer
         registers, there are a number of places where it no longer needs to
         be handled specially; namely context switch, and saving/restoring the
         register state in a signal context.
      
      4: And since kernel threads run in kernel space and call normal kernel
         code, they need to be created with their %gs == __KERNEL_PDA.
      
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Chuck Ebbert <76306.1226@compuserve.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      f95d47ca
    • Chuck Ebbert's avatar
      [PATCH] i386: add sleazy FPU optimization · acc20761
      Chuck Ebbert authored
      
      
      i386 port of the sLeAZY-fpu feature.  Chuck reports that this gives him a +/-
      0.4% improvement on his simple benchmark
      
      x86_64 description follows:
      
      Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every*
      context switch a trap is taken for the first FPU use to restore the FPU
      context lazily.  This is of course great for applications that have very
      sporadic or no FPU use (since then you avoid doing the expensive save/restore
      all the time).  However for very frequent FPU users...  you take an extra trap
      every context switch.
      
      The patch below adds a simple heuristic to this code: After 5 consecutive
      context switches of FPU use, the lazy behavior is disabled and the context
      gets restored every context switch.  If the app indeed uses the FPU, the trap
      is avoided.  (the chance of the 6th time slice using FPU after the previous 5
      having done so are quite high obviously).
      
      After 256 switches, this is reset and lazy behavior is returned (until there
      are 5 consecutive ones again).  The reason for this is to give apps that do
      longer bursts of FPU use still the lazy behavior back after some time.
      
      Signed-off-by: default avatarChuck Ebbert <76306.1226@compuserve.com>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      acc20761
  5. 17 Nov, 2006 1 commit
    • Ingo Molnar's avatar
      [PATCH] i386/x86_64: ACPI cpu_idle_wait() fix · dc1829a4
      Ingo Molnar authored
      
      
      The scheduler on Andreas Friedrich's hyperthreading system stopped
      working properly: the scheduler would never move tasks to another CPU!
      The lask known working kernel was 2.6.8.
      
      After a couple of attempts to corner the bug, the following smoking gun
      was found:
      
        BIOS reported wrong ACPI idfor the processor
        CPU#1: set_cpus_allowed(), swapper:1, 3 -> 2
         [<c0103bbe>] show_trace_log_lvl+0x34/0x4a
         [<c0103ceb>] show_trace+0x2c/0x2e
         [<c01045f8>] dump_stack+0x2b/0x2d
         [<c0116a77>] set_cpus_allowed+0x52/0xec
         [<c0101d86>] cpu_idle_wait+0x2e/0x100
         [<c0259c57>] acpi_processor_power_exit+0x45/0x58
         [<c0259752>] acpi_processor_remove+0x46/0xea
         [<c025c6fb>] acpi_start_single_object+0x47/0x54
         [<c025cee5>] acpi_bus_register_driver+0xa4/0xd3
         [<c04ab2d7>] acpi_processor_init+0x57/0x77
         [<c01004d7>] init+0x146/0x2fd
         [<c0103a87>] kernel_thread_helper+0x7/0x10
      
      a quick look at cpu_idle_wait() shows how broken that code is
      on i386: it changes the init task's affinity map but never
      restores it ...
      
      and because all userspace tasks get forked by init, they all
      inherited that single-CPU affinity mask. x86_64 cloned this
      bug too.
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Andreas Friedrich <andreas.friedrich@fujitsu-siemens.com>
      Cc: Wolfgang Erig <Wolfgang.Erig@fujitsu-siemens.com>
      Cc: Andrew Morton <akpm@osdl.org>
      Cc: Adrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      dc1829a4
  6. 21 Oct, 2006 1 commit
  7. 13 Oct, 2006 1 commit
  8. 05 Oct, 2006 1 commit
    • Andi Kleen's avatar
      [PATCH] x86: Terminate the kernel stacks for the unwinder · 51ec28e1
      Andi Kleen authored
      
      
      Always make sure RIP/EIP is 0 in the registers stored on the top
      of the stack of a kernel thread. This makes sure the unwinder code
      won't try a fallback but knows the stack has ended.
      
      AK: this patch is a bit mysterious. in theory they should be terminated
      anyways, but it seems to fix at least one crash. Anyways double termination
      probably doesn't hurt.
      
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      51ec28e1
  9. 02 Oct, 2006 1 commit
    • Serge E. Hallyn's avatar
      [PATCH] namespaces: utsname: use init_utsname when appropriate · 96b644bd
      Serge E. Hallyn authored
      
      
      In some places, particularly drivers and __init code, the init utsns is the
      appropriate one to use.  This patch replaces those with a the init_utsname
      helper.
      
      Changes: Removed several uses of init_utsname().  Hope I picked all the
      	right ones in net/ipv4/ipconfig.c.  These are now changed to
      	utsname() (the per-process namespace utsname) in the previous
      	patch (2/7)
      
      [akpm@osdl.org: CIFS fix]
      Signed-off-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      96b644bd
  10. 01 Oct, 2006 1 commit
  11. 26 Sep, 2006 3 commits
  12. 28 Jul, 2006 1 commit
  13. 09 Jul, 2006 1 commit
  14. 30 Jun, 2006 1 commit
  15. 26 Jun, 2006 2 commits
  16. 31 Mar, 2006 1 commit
  17. 26 Mar, 2006 1 commit
    • bibo mao's avatar
      [PATCH] kretprobe instance recycled by parent process · c6fd91f0
      bibo mao authored
      
      
      When kretprobe probes the schedule() function, if the probed process exits
      then schedule() will never return, so some kretprobe instances will never
      be recycled.
      
      In this patch the parent process will recycle retprobe instances of the
      probed function and there will be no memory leak of kretprobe instances.
      
      Signed-off-by: default avatarbibo mao <bibo.mao@intel.com>
      Cc: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
      Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c6fd91f0
  18. 23 Mar, 2006 1 commit
    • Jan Beulich's avatar
      [PATCH] i386: fix uses of user_mode() vs. user_mode_vm() · db753bdf
      Jan Beulich authored
      >commit 76381fee
      
      
      >Author: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
      >Date:   Thu Jun 23 00:08:46 2005 -0700
      >
      >    [PATCH] xen: x86_64: use more usermode macro
      >
      >    Make use of the user_mode macro where it's possible.  This is useful for Xen
      >    because it will need only to redefine only the macro to a hypervisor call.
      
      I am of the opinion that the above changeset is incomplete, i.e.  it missed
      converting some previous uses of user_mode to user_mode_vm.  While most of
      them could be considered just cosmetical, at least the one in die_nmi
      doesn't appear to be.
      
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Cc: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      db753bdf
  19. 05 Feb, 2006 1 commit
  20. 12 Jan, 2006 3 commits
  21. 08 Jan, 2006 1 commit
    • Matt Mackall's avatar
      [PATCH] Make vm86 support optional · 64ca9004
      Matt Mackall authored
      
      
      This adds an option to remove vm86 support under CONFIG_EMBEDDED.  Saves
      about 5k.
      
      This version eliminates most of the #ifdefs of the previous version and
      instead uses function stubs in vm86.h.  Also, release_vm86_irqs is moved
      from asm-i386/irq.h to a more appropriate home in vm86.h so that the stubs
      can live together.
      
      $ size vmlinux-baseline vmlinux-novm86
         text    data     bss     dec     hex filename
      2920821  523232  190652 3634705  377611 vmlinux-baseline
      2916268  523100  190492 3629860  376324 vmlinux-novm86
      
      Signed-off-by: default avatarMatt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      64ca9004
  22. 06 Jan, 2006 2 commits
  23. 31 Dec, 2005 1 commit
  24. 23 Nov, 2005 1 commit
    • Jim Keniston's avatar
      [PATCH] kprobes: Fix return probes on sys_execve · 8bf1101b
      Jim Keniston authored
      
      
      Fix a bug in kprobes that can cause an Oops or even a crash when a return
      probe is installed on one of the following functions: sys_execve,
      do_execve, load_*_binary, flush_old_exec, or flush_thread.  The fix is to
      remove the call to kprobe_flush_task() in flush_thread().  This fix has
      been tested on all architectures for which the return-probes feature has
      been implemented (i386, x86_64, ppc64, ia64).  Please apply.
      
      BACKGROUND
      
      Up to now, we have called kprobe_flush_task() under two situations: when a
      task exits, and when it execs.  Flushing kretprobe_instances on exit is
      correct because (a) do_exit() doesn't return, and (b) one or more
      return-probed functions may be active when a task calls do_exit().  Neither
      is the case for sys_execve() and its callees.
      
      Initially, the mistaken call to kprobe_flush_task() on exec was harmless
      because we put the "real" return address of each active probed function
      back in the stack, just to be safe, when we recycled its
      kretprobe_instance.  When support for ppc64 and ia64 was added, this safety
      measure couldn't be employed, and was eventually dropped even for i386 and
      x86_64.  sys_execve() and its callees were informally blacklisted for
      return probes until this fix was developed.
      
      Acked-by: default avatarPrasanna S Panchamukhi <prasanna@in.ibm.com>
      Signed-off-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8bf1101b
  25. 09 Nov, 2005 2 commits
    • Nick Piggin's avatar
      [PATCH] sched: resched and cpu_idle rework · 64c7c8f8
      Nick Piggin authored
      
      
      Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
      confusion, and make their semantics rigid.  Improves efficiency of
      resched_task and some cpu_idle routines.
      
      * In resched_task:
      - TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
        and as we hold it during resched_task, then there is no need for an
        atomic test and set there. The only other time this should be set is
        when the task's quantum expires, in the timer interrupt - this is
        protected against because the rq lock is irq-safe.
      
      - If TIF_NEED_RESCHED is set, then we don't need to do anything. It
        won't get unset until the task get's schedule()d off.
      
      - If we are running on the same CPU as the task we resched, then set
        TIF_NEED_RESCHED and no further action is required.
      
      - If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
        after TIF_NEED_RESCHED has been set, then we need to send an IPI.
      
      Using these rules, we are able to remove the test and set operation in
      resched_task, and make clear the previously vague semantics of
      POLLING_NRFLAG.
      
      * In idle routines:
      - Enter cpu_idle with preempt disabled. When the need_resched() condition
        becomes true, explicitly call schedule(). This makes things a bit clearer
        (IMO), but haven't updated all architectures yet.
      
      - Many do a test and clear of TIF_NEED_RESCHED for some reason. According
        to the resched_task rules, this isn't needed (and actually breaks the
        assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
        held). So remove that. Generally one less locked memory op when switching
        to the idle thread.
      
      - Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
        most polling idle loops. The above resched_task semantics allow it to be
        set until before the last time need_resched() is checked before going into
        a halt requiring interrupt wakeup.
      
        Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
        can be always left set, completely eliminating resched IPIs when rescheduling
        the idle task.
      
        POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.
      
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Con Kolivas <kernel@kolivas.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      64c7c8f8
    • Nick Piggin's avatar
      [PATCH] sched: disable preempt in idle tasks · 5bfb5d69
      Nick Piggin authored
      
      
      Run idle threads with preempt disabled.
      
      Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()).
      How did it ever work before?
      
      Might fix the CPU hotplugging hang which Nigel Cunningham noted.
      
      We think the bug hits if the idle thread is preempted after checking
      need_resched() and before going to sleep, then the CPU offlined.
      
      After calling stop_machine_run, the CPU eventually returns from preemption and
      into the idle thread and goes to sleep.  The CPU will continue executing
      previous idle and have no chance to call play_dead.
      
      By disabling preemption until we are ready to explicitly schedule, this bug is
      fixed and the idle threads generally become more robust.
      
      From: alexs <ashepard@u.washington.edu>
      
        PPC build fix
      
      From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
      
        MIPS build fix
      
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarYoichi Yuasa <yuasa@hh.iij4u.or.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5bfb5d69