1. 14 Mar, 2010 2 commits
  2. 13 Mar, 2010 31 commits
  3. 12 Mar, 2010 7 commits
    • Steven Rostedt's avatar
      tracing: Do not record user stack trace from NMI context · b6345879
      Steven Rostedt authored
      A bug was found with Li Zefan's ftrace_stress_test that caused applications
      to segfault during the test.
      Placing a tracing_off() in the segfault code, and examining several
      traces, I found that the following was always the case. The lock tracer
      was enabled (lockdep being required) and userstack was enabled. Testing
      this out, I just enabled the two, but that was not good enough. I needed
      to run something else that could trigger it. Running a load like hackbench
      did not work, but executing a new program would. The following would
      trigger the segfault within seconds:
        # echo 1 > /debug/tracing/options/userstacktrace
        # echo 1 > /debug/tracing/events/lock/enable
        # while :; do ls > /dev/null ; done
      Enabling the function graph tracer and looking at what was happening
      I finally noticed that all cashes happened just after an NMI.
       1)               |    copy_user_handle_tail() {
       1)               |      bad_area_nosemaphore() {
       1)               |        __bad_area_nosemaphore() {
       1)               |          no_context() {
       1)               |            fixup_exception() {
       1)   0.319 us    |              search_exception_tables();
       1)   0.873 us    |            }
       1)   0.314 us    |  __rcu_read_unlock();
       1)   0.325 us    |    native_apic_mem_write();
       1)   0.943 us    |  }
       1)   0.304 us    |  rcu_nmi_exit();
       1)   0.479 us    |  find_vma();
       1)               |  bad_area() {
       1)               |    __bad_area() {
      After capturing several traces of failures, all of them happened
      after an NMI. Curious about this, I added a trace_printk() to the NMI
      handler to read the regs->ip to see where the NMI happened. In which I
      found out it was here:
      ffffffff8135b660 <page_fault>:
      ffffffff8135b660:       48 83 ec 78             sub    $0x78,%rsp
      ffffffff8135b664:       e8 97 01 00 00          callq  ffffffff8135b800 <error_entry>
      What was happening is that the NMI would happen at the place that a page
      fault occurred. It would call rcu_read_lock() which was traced by
      the lock events, and the user_stack_trace would run. This would trigger
      a page fault inside the NMI. I do not see where the CR2 register is
      saved or restored in NMI handling. This means that it would corrupt
      the page fault handling that the NMI interrupted.
      The reason the while loop of ls helped trigger the bug, was that
      each execution of ls would cause lots of pages to be faulted in, and
      increase the chances of the race happening.
      The simple solution is to not allow user stack traces in NMI context.
      After this patch, I ran the above "ls" test for a couple of hours
      without any issues. Without this patch, the bug would trigger in less
      than a minute.
      Cc: stable@kernel.org
      Reported-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt's avatar
      tracing: Disable buffer switching when starting or stopping trace · a2f80714
      Steven Rostedt authored
      When the trace iterator is read, tracing_start() and tracing_stop()
      is called to stop tracing while the iterator is processing the trace
      These functions disable both the standard buffer and the max latency
      buffer. But if the wakeup tracer is running, it can switch these
      buffers between the two disables:
        buffer = global_trace.buffer;
        if (buffer)
            <<<--------- swap happens here
        buffer = max_tr.buffer;
        if (buffer)
      What happens is that we disabled the same buffer twice. On tracing_start()
      we can enable the same buffer twice. All ring_buffer_record_disable()
      must be matched with a ring_buffer_record_enable() or the buffer
      can be disable permanently, or enable prematurely, and cause a bug
      where a reset happens while a trace is commiting.
      This patch protects these two by taking the ftrace_max_lock to prevent
      a switch from occurring.
      Found with Li Zefan's ftrace_stress_test.
      Cc: stable@kernel.org
      Reported-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt's avatar
      tracing: Use same local variable when resetting the ring buffer · 283740c6
      Steven Rostedt authored
      In the ftrace code that resets the ring buffer it references the
      buffer with a local variable, but then uses the tr->buffer as the
      parameter to reset. If the wakeup tracer is running, which can
      switch the tr->buffer with the max saved buffer, this can break
      the requirement of disabling the buffer before the reset.
         buffer = tr->buffer;
         __tracing_reset(tr->buffer, cpu);
      If the tr->buffer is swapped, then the reset is not happening to the
      buffer that was disabled. This will cause the ring buffer to fail.
      Found with Li Zefan's ftrace_stress_test.
      Cc: stable@kernel.org
      Reported-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt's avatar
      function-graph: Init curr_ret_stack with ret_stack · ea14eb71
      Steven Rostedt authored
      If the graph tracer is active, and a task is forked but the allocating of
      the processes graph stack fails, it can cause crash later on.
      This is due to the temporary stack being NULL, but the curr_ret_stack
      variable is copied from the parent. If it is not -1, then in
      ftrace_graph_probe_sched_switch() the following:
      	for (index = next->curr_ret_stack; index >= 0; index--)
      		next->ret_stack[index].calltime += timestamp;
      Will cause a kernel OOPS.
      Found with Li Zefan's ftrace_stress_test.
      Cc: stable@kernel.org
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Lai Jiangshan's avatar
      ring-buffer: Move disabled check into preempt disable section · 52fbe9cd
      Lai Jiangshan authored
      The ring buffer resizing and resetting relies on a schedule RCU
      action. The buffers are disabled, a synchronize_sched() is called
      and then the resize or reset takes place.
      But this only works if the disabling of the buffers are within the
      preempt disabled section, otherwise a window exists that the buffers
      can be written to while a reset or resize takes place.
      Cc: stable@kernel.org
      Reported-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4B949E43.2010906@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 · daf9fe2e
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
        mfd: Several MFD drivers should depend on GENERIC_HARDIRQS
        mfd: Fix sm501 requested region size
    • Linus Torvalds's avatar