1. 14 May, 2010 8 commits
    • Steven Rostedt's avatar
      tracing: Allow events to share their print functions · a9a57763
      Steven Rostedt authored
      
      
      Multiple events may use the same method to print their data.
      Instead of having all events have a pointer to their print funtions,
      the trace_event structure now points to a trace_event_functions structure
      that will hold the way to print ouf the event.
      
      The event itself is now passed to the print function to let the print
      function know what kind of event it should print.
      
      This opens the door to consolidating the way several events print
      their output.
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4900382	1048964	 861512	6810858	 67ecea	vmlinux.init
      4900446	1049028	 861512	6810986	 67ed6a	vmlinux.preprint
      
      This change slightly increases the size but is needed for the next change.
      
      v3: Fix the branch tracer events to handle this change.
      
      v2: Fix the new function graph tracer event calls to handle this change.
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      a9a57763
    • Steven Rostedt's avatar
      tracing: Move raw_init from events to class · 0405ab80
      Steven Rostedt authored
      
      
      The raw_init function pointer in the event is used to initialize
      various kinds of events. The type of initialization needed is usually
      classed to the kind of event it is.
      
      Two events with the same class will always have the same initialization
      function, so it makes sense to move this to the class structure.
      
      Perhaps even making a special system structure would work since
      the initialization is the same for all events within a system.
      But since there's no system structure (yet), this will just move it
      to the class.
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4900375	1053380	 861512	6815267	 67fe23	vmlinux.fields
      4900382	1048964	 861512	6810858	 67ecea	vmlinux.init
      
      The text grew very slightly, but this is a constant growth that happened
      with the changing of the C files that call the init code.
      The bigger savings is the data which will be saved the more events share
      a class.
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      0405ab80
    • Steven Rostedt's avatar
      tracing: Move fields from event to class structure · 2e33af02
      Steven Rostedt authored
      
      
      Move the defined fields from the event to the class structure.
      Since the fields of the event are defined by the class they belong
      to, it makes sense to have the class hold the information instead
      of the individual events. The events of the same class would just
      hold duplicate information.
      
      After this change the size of the kernel dropped another 3K:
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4900252	1057412	 861512	6819176	 680d68	vmlinux.regs
      4900375	1053380	 861512	6815267	 67fe23	vmlinux.fields
      
      Although the text increased, this was mainly due to the C files
      having to adapt to the change. This is a constant increase, where
      new tracepoints will not increase the Text. But the big drop is
      in the data size (as well as needed allocations to hold the fields).
      This will give even more savings as more tracepoints are created.
      
      Note, if just TRACE_EVENT()s are used and not DECLARE_EVENT_CLASS()
      with several DEFINE_EVENT()s, then the savings will be lost. But
      we are pushing developers to consolidate events with DEFINE_EVENT()
      so this should not be an issue.
      
      The kprobes define a unique class to every new event, but are dynamic
      so it should not be a issue.
      
      The syscalls however have a single class but the fields for the individual
      events are different. The syscalls use a metadata to define the
      fields. I moved the fields list from the event to the metadata and
      added a "get_fields()" function to the class. This function is used
      to find the fields. For normal events and kprobes, get_fields() just
      returns a pointer to the fields list_head in the class. For syscall
      events, it returns the fields list_head in the metadata for the event.
      
      v2:  Fixed the syscall fields. The syscall metadata needs a list
           of fields for both enter and exit.
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      2e33af02
    • Steven Rostedt's avatar
      tracing: Remove per event trace registering · 2239291a
      Steven Rostedt authored
      
      
      This patch removes the register functions of TRACE_EVENT() to enable
      and disable tracepoints. The registering of a event is now down
      directly in the trace_events.c file. The tracepoint_probe_register()
      is now called directly.
      
      The prototypes are no longer type checked, but this should not be
      an issue since the tracepoints are created automatically by the
      macros. If a prototype is incorrect in the TRACE_EVENT() macro, then
      other macros will catch it.
      
      The trace_event_class structure now holds the probes to be called
      by the callbacks. This removes needing to have each event have
      a separate pointer for the probe.
      
      To handle kprobes and syscalls, since they register probes in a
      different manner, a "reg" field is added to the ftrace_event_class
      structure. If the "reg" field is assigned, then it will be called for
      enabling and disabling of the probe for either ftrace or perf. To let
      the reg function know what is happening, a new enum (trace_reg) is
      created that has the type of control that is needed.
      
      With this new rework, the 82 kernel events and 618 syscall events
      has their footprint dramatically lowered:
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4914025	1088868	 861512	6864405	 68be15	vmlinux.class
      4918492	1084612	 861512	6864616	 68bee8	vmlinux.tracepoint
      4900252	1057412	 861512	6819176	 680d68	vmlinux.regs
      
      The size went from 6863829 to 6819176, that's a total of 44K
      in savings. With tracepoints being continuously added, this is
      critical that the footprint becomes minimal.
      
      v5: Added #ifdef CONFIG_PERF_EVENTS around a reference to perf
          specific structure in trace_events.c.
      
      v4: Fixed trace self tests to check probe because regfunc no longer
          exists.
      
      v3: Updated to handle void *data in beginning of probe parameters.
          Also added the tracepoint: check_trace_callback_type_##call().
      
      v2: Changed the callback probes to pass void * and typecast the
          value within the function.
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      2239291a
    • Steven Rostedt's avatar
      tracing: Let tracepoints have data passed to tracepoint callbacks · 38516ab5
      Steven Rostedt authored
      
      
      This patch adds data to be passed to tracepoint callbacks.
      
      The created functions from DECLARE_TRACE() now need a mandatory data
      parameter. For example:
      
      DECLARE_TRACE(mytracepoint, int value, value)
      
      Will create the register function:
      
      int register_trace_mytracepoint((void(*)(void *data, int value))probe,
                                      void *data);
      
      As the first argument, all callbacks (probes) must take a (void *data)
      parameter. So a callback for the above tracepoint will look like:
      
      void myprobe(void *data, int value)
      {
      }
      
      The callback may choose to ignore the data parameter.
      
      This change allows callbacks to register a private data pointer along
      with the function probe.
      
      	void mycallback(void *data, int value);
      
      	register_trace_mytracepoint(mycallback, mydata);
      
      Then the mycallback() will receive the "mydata" as the first parameter
      before the args.
      
      A more detailed example:
      
        DECLARE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));
      
        /* In the C file */
      
        DEFINE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));
      
        [...]
      
             trace_mytracepoint(status);
      
        /* In a file registering this tracepoint */
      
        int my_callback(void *data, int status)
        {
      	struct my_struct my_data = data;
      	[...]
        }
      
        [...]
      	my_data = kmalloc(sizeof(*my_data), GFP_KERNEL);
      	init_my_data(my_data);
      	register_trace_mytracepoint(my_callback, my_data);
      
      The same callback can also be registered to the same tracepoint as long
      as the data registered is different. Note, the data must also be used
      to unregister the callback:
      
      	unregister_trace_mytracepoint(my_callback, my_data);
      
      Because of the data parameter, tracepoints declared this way can not have
      no args. That is:
      
        DECLARE_TRACE(mytracepoint, TP_PROTO(void), TP_ARGS());
      
      will cause an error.
      
      If no arguments are needed, a new macro can be used instead:
      
        DECLARE_TRACE_NOARGS(mytracepoint);
      
      Since there are no arguments, the proto and args fields are left out.
      
      This is part of a series to make the tracepoint footprint smaller:
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4914025	1088868	 861512	6864405	 68be15	vmlinux.class
      4918492	1084612	 861512	6864616	 68bee8	vmlinux.tracepoint
      
      Again, this patch also increases the size of the kernel, but
      lays the ground work for decreasing it.
      
       v5: Fixed net/core/drop_monitor.c to handle these updates.
      
       v4: Moved the DECLARE_TRACE() DECLARE_TRACE_NOARGS out of the
           #ifdef CONFIG_TRACE_POINTS, since the two are the same in both
           cases. The __DECLARE_TRACE() is what changes.
           Thanks to Frederic Weisbecker for pointing this out.
      
       v3: Made all register_* functions require data to be passed and
           all callbacks to take a void * parameter as its first argument.
           This makes the calling functions comply with C standards.
      
           Also added more comments to the modifications of DECLARE_TRACE().
      
       v2: Made the DECLARE_TRACE() have the ability to pass arguments
           and added a new DECLARE_TRACE_NOARGS() for tracepoints that
           do not need any arguments.
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      38516ab5
    • Mathieu Desnoyers's avatar
      tracepoints: Add check trace callback type · 53da59aa
      Mathieu Desnoyers authored
      
      
      This check is meant to be used by tracepoint users which do a direct cast of
      callbacks to (void *) for direct registration, thus bypassing the
      register_trace_##name and unregister_trace_##name checks.
      
      This permits to ensure that the callback type matches the function type at the
      call site, but without generating any code.
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      LKML-Reference: <20100430165959.GA25605@Krystal>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Arnaldo Carvalho de Melo <acme@redhat.com>
      CC: Lai Jiangshan <laijs@cn.fujitsu.com>
      CC: Li Zefan <lizf@cn.fujitsu.com>
      CC: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      53da59aa
    • Steven Rostedt's avatar
      tracing: Create class struct for events · 8f082018
      Steven Rostedt authored
      
      
      This patch creates a ftrace_event_class struct that event structs point to.
      This class struct will be made to hold information to modify the
      events. Currently the class struct only holds the events system name.
      
      This patch slightly increases the size, but this change lays the ground work
      of other changes to make the footprint of tracepoints smaller.
      
      With 82 standard tracepoints, and 618 system call tracepoints
      (two tracepoints per syscall: enter and exit):
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4914025	1088868	 861512	6864405	 68be15	vmlinux.class
      
      This patch also cleans up some stale comments in ftrace.h.
      
      v2: Fixed missing semi-colon in macro.
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      8f082018
    • Steven Rostedt's avatar
      Merge branch 'sched/core' of... · 23e117fa
      Steven Rostedt authored
      Merge branch 'sched/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip into trace/tip/tracing/core-4
      23e117fa
  2. 11 May, 2010 1 commit
    • Changli Gao's avatar
      sched, wait: Use wrapper functions · a93d2f17
      Changli Gao authored
      
      
      epoll should not touch flags in wait_queue_t. This patch introduces a new
      function __add_wait_queue_exclusive(), for the users, who use wait queue as a
      LIFO queue.
      
      __add_wait_queue_tail_exclusive() is introduced too instead of
      add_wait_queue_exclusive_locked(). remove_wait_queue_locked() is removed, as
      it is a duplicate of __remove_wait_queue(), disliked by users, and with less
      users.
      Signed-off-by: default avatarChangli Gao <xiaosuo@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: <containers@lists.linux-foundation.org>
      LKML-Reference: <1273214006-2979-1-git-send-email-xiaosuo@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a93d2f17
  3. 10 May, 2010 1 commit
  4. 09 May, 2010 8 commits
  5. 08 May, 2010 2 commits
  6. 07 May, 2010 2 commits
  7. 06 May, 2010 6 commits
    • Paul E. McKenney's avatar
      rcu: need barrier() in UP synchronize_sched_expedited() · fc390cde
      Paul E. McKenney authored
      
      
      If synchronize_sched_expedited() is ever to be called from within
      kernel/sched.c in a !SMP PREEMPT kernel, the !SMP implementation needs
      a barrier().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      fc390cde
    • Paul E. McKenney's avatar
      sched: correctly place paranioa memory barriers in synchronize_sched_expedited() · cc631fb7
      Paul E. McKenney authored
      
      
      The memory barriers must be in the SMP case, not in the !SMP case.
      Also add a barrier after the atomic_inc() in order to ensure that
      other CPUs see post-synchronize_sched_expedited() actions as following
      the expedited grace period.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      cc631fb7
    • Tejun Heo's avatar
      sched: kill paranoia check in synchronize_sched_expedited() · 94458d5e
      Tejun Heo authored
      
      
      The paranoid check which verifies that the cpu_stop callback is
      actually called on all online cpus is completely superflous.  It's
      guaranteed by cpu_stop facility and if it didn't work as advertised
      other things would go horribly wrong and trying to recover using
      synchronize_sched() wouldn't be very meaningful.
      
      Kill the paranoid check.  Removal of this feature is done as a
      separate step so that it can serve as a bisection point if something
      actually goes wrong.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: Josh Triplett <josh@freedesktop.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      94458d5e
    • Tejun Heo's avatar
      sched: replace migration_thread with cpu_stop · 969c7921
      Tejun Heo authored
      
      
      Currently migration_thread is serving three purposes - migration
      pusher, context to execute active_load_balance() and forced context
      switcher for expedited RCU synchronize_sched.  All three roles are
      hardcoded into migration_thread() and determining which job is
      scheduled is slightly messy.
      
      This patch kills migration_thread and replaces all three uses with
      cpu_stop.  The three different roles of migration_thread() are
      splitted into three separate cpu_stop callbacks -
      migration_cpu_stop(), active_load_balance_cpu_stop() and
      synchronize_sched_expedited_cpu_stop() - and each use case now simply
      asks cpu_stop to execute the callback as necessary.
      
      synchronize_sched_expedited() was implemented with private
      preallocated resources and custom multi-cpu queueing and waiting
      logic, both of which are provided by cpu_stop.
      synchronize_sched_expedited_count is made atomic and all other shared
      resources along with the mutex are dropped.
      
      synchronize_sched_expedited() also implemented a check to detect cases
      where not all the callback got executed on their assigned cpus and
      fall back to synchronize_sched().  If called with cpu hotplug blocked,
      cpu_stop already guarantees that and the condition cannot happen;
      otherwise, stop_machine() would break.  However, this patch preserves
      the paranoid check using a cpumask to record on which cpus the stopper
      ran so that it can serve as a bisection point if something actually
      goes wrong theree.
      
      Because the internal execution state is no longer visible,
      rcu_expedited_torture_stats() is removed.
      
      This patch also renames cpu_stop threads to from "stopper/%d" to
      "migration/%d".  The names of these threads ultimately don't matter
      and there's no reason to make unnecessary userland visible changes.
      
      With this patch applied, stop_machine() and sched now share the same
      resources.  stop_machine() is faster without wasting any resources and
      sched migration users are much cleaner.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: Josh Triplett <josh@freedesktop.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      969c7921
    • Tejun Heo's avatar
      stop_machine: reimplement using cpu_stop · 3fc1f1e2
      Tejun Heo authored
      
      
      Reimplement stop_machine using cpu_stop.  As cpu stoppers are
      guaranteed to be available for all online cpus,
      stop_machine_create/destroy() are no longer necessary and removed.
      
      With resource management and synchronization handled by cpu_stop, the
      new implementation is much simpler.  Asking the cpu_stop to execute
      the stop_cpu() state machine on all online cpus with cpu hotplug
      disabled is enough.
      
      stop_machine itself doesn't need to manage any global resources
      anymore, so all per-instance information is rolled into struct
      stop_machine_data and the mutex and all static data variables are
      removed.
      
      The previous implementation created and destroyed RT workqueues as
      necessary which made stop_machine() calls highly expensive on very
      large machines.  According to Dimitri Sivanich, preventing the dynamic
      creation/destruction makes booting faster more than twice on very
      large machines.  cpu_stop resources are preallocated for all online
      cpus and should have the same effect.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      3fc1f1e2
    • Tejun Heo's avatar
      cpu_stop: implement stop_cpu[s]() · 1142d810
      Tejun Heo authored
      
      
      Implement a simplistic per-cpu maximum priority cpu monopolization
      mechanism.  A non-sleeping callback can be scheduled to run on one or
      multiple cpus with maximum priority monopolozing those cpus.  This is
      primarily to replace and unify RT workqueue usage in stop_machine and
      scheduler migration_thread which currently is serving multiple
      purposes.
      
      Four functions are provided - stop_one_cpu(), stop_one_cpu_nowait(),
      stop_cpus() and try_stop_cpus().
      
      This is to allow clean sharing of resources among stop_cpu and all the
      migration thread users.  One stopper thread per cpu is created which
      is currently named "stopper/CPU".  This will eventually replace the
      migration thread and take on its name.
      
      * This facility was originally named cpuhog and lived in separate
        files but Peter Zijlstra nacked the name and thus got renamed to
        cpu_stop and moved into stop_machine.c.
      
      * Better reporting of preemption leak as per Peter's suggestion.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      1142d810
  8. 05 May, 2010 2 commits
    • Thiago Farina's avatar
      tracing: Fix "integer as NULL pointer" warning. · 668eb65f
      Thiago Farina authored
      
      
      kernel/trace/trace_output.c:256:24: warning: Using plain integer as NULL pointer
      Signed-off-by: default avatarThiago Farina <tfransosi@gmail.com>
      LKML-Reference: <1264349038-1766-3-git-send-email-tfransosi@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      668eb65f
    • Steven Rostedt's avatar
      tracing: Fix tracepoint.h DECLARE_TRACE() to allow more than one header · 2e26ca71
      Steven Rostedt authored
      
      
      When more than one header is included under CREATE_TRACE_POINTS
      the DECLARE_TRACE() macro is not defined back to its original meaning
      and the second include will fail to initialize the TRACE_EVENT()
      and DECLARE_TRACE() correctly.
      
      To fix this the tracepoint.h file moves the define of DECLARE_TRACE()
      out of the #ifdef _LINUX_TRACEPOINT_H protection (just like the
      define of the TRACE_EVENT()). This way the define_trace.h will undef
      the DECLARE_TRACE() at the end and allow new headers to start
      from scratch.
      
      This patch also requires fixing the include/events/napi.h
      
      It currently uses DECLARE_TRACE() and should be converted to a TRACE_EVENT()
      format. But I'll leave that change to the authors of that file.
      But since the napi.h file depends on using the CREATE_TRACE_POINTS
      and does not define its own DEFINE_TRACE() it must use the define_trace.h
      method instead.
      
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      2e26ca71
  9. 04 May, 2010 3 commits
  10. 29 Apr, 2010 7 commits