1. 17 Oct, 2006 1 commit
    • Peter Zijlstra's avatar
      [PATCH] rt-mutex: fixup rt-mutex debug code · bea493a0
      Peter Zijlstra authored
      BUG: warning at kernel/rtmutex-debug.c:125/rt_mutex_debug_task_free() (Not tainted)
       [<c04051e3>] show_trace_log_lvl+0x58/0x16a
       [<c04057f0>] show_trace+0xd/0x10
       [<c0405900>] dump_stack+0x19/0x1b
       [<c043f03d>] rt_mutex_debug_task_free+0x35/0x6a
       [<c04224c0>] free_task+0x15/0x24
       [<c042378c>] copy_process+0x12bd/0x1324
       [<c0423835>] do_fork+0x42/0x113
       [<c04021dd>] sys_fork+0x19/0x1b
       [<c0403fb7>] syscall_call+0x7/0xb
      In copy_process(), dup_task_struct() also duplicates the ->pi_lock,
      ->pi_waiters and ->pi_blocked_on members.  rt_mutex_debug_task_free()
      called from free_task() validates these members.  However free_task() can
      be invoked before these members are reset for the new task.
      Move the initialization code before the first bail that can hit free_task().
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  2. 02 Oct, 2006 6 commits
  3. 01 Oct, 2006 1 commit
    • Jay Lan's avatar
      [PATCH] csa: convert CONFIG tag for extended accounting routines · 8f0ab514
      Jay Lan authored
      There were a few accounting data/macros that are used in CSA but are #ifdef'ed
      inside CONFIG_BSD_PROCESS_ACCT.  This patch is to change those ifdef's from
      CONFIG_BSD_PROCESS_ACCT to CONFIG_TASK_XACCT.  A few defines are moved from
      kernel/acct.c and include/linux/acct.h to kernel/tsacct.c and
      Signed-off-by: default avatarJay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  4. 29 Sep, 2006 2 commits
  5. 26 Sep, 2006 1 commit
  6. 20 Sep, 2006 1 commit
  7. 01 Sep, 2006 1 commit
    • Shailabh Nagar's avatar
      [PATCH] task delay accounting fixes · 35df17c5
      Shailabh Nagar authored
      Cleanup allocation and freeing of tsk->delays used by delay accounting.
      This solves two problems reported for delay accounting:
      1. oops in __delayacct_blkio_ticks
      Currently tsk->delays is getting freed too early in task exit which can
      cause a NULL tsk->delays to get accessed via reading of /proc/<tgid>/stats.
       The patch fixes this problem by freeing tsk->delays closer to when
      task_struct itself is freed up.  As a result, it also eliminates the use of
      tsk->delays_lock which was only being used (inadequately) to safeguard
      access to tsk->delays while a task was exiting.
      2. Possible memory leak in kernel/delayacct.c
      The patch cleans up tsk->delays allocations after a bad fork which was
      missing earlier.
      The patch has been tested to fix the problems listed above and stress
      tested with rapid calls to delay accounting's taskstats command interface
      (which is the other path that can access the same data, besides the /proc
      interface causing the oops above).
      Signed-off-by: default avatarShailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  8. 06 Aug, 2006 1 commit
  9. 14 Jul, 2006 2 commits
    • Shailabh Nagar's avatar
      [PATCH] delay accounting taskstats interface send tgid once · ad4ecbcb
      Shailabh Nagar authored
      Send per-tgid data only once during exit of a thread group instead of once
      with each member thread exit.
      Currently, when a thread exits, besides its per-tid data, the per-tgid data
      of its thread group is also sent out, if its thread group is non-empty.
      The per-tgid data sent consists of the sum of per-tid stats for all
      *remaining* threads of the thread group.
      This patch modifies this sending in two ways:
      - the per-tgid data is sent only when the last thread of a thread group
        exits.  This cuts down heavily on the overhead of sending/receiving
        per-tgid data, especially when other exploiters of the taskstats
        interface aren't interested in per-tgid stats
      - the semantics of the per-tgid data sent are changed.  Instead of being
        the sum of per-tid data for remaining threads, the value now sent is the
        true total accumalated statistics for all threads that are/were part of
        the thread group.
      The patch also addresses a minor issue where failure of one accounting
      subsystem to fill in the taskstats structure was causing the send of
      taskstats to not be sent at all.
      The patch has been tested for stability and run cerberus for over 4 hours
      on an SMP.
      [akpm@osdl.org: bugfixes]
      Signed-off-by: default avatarShailabh Nagar <nagar@watson.ibm.com>
      Signed-off-by: default avatarBalbir Singh <balbir@in.ibm.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Shailabh Nagar's avatar
      [PATCH] per-task-delay-accounting: setup · ca74e92b
      Shailabh Nagar authored
      Initialization code related to collection of per-task "delay" statistics which
      measure how long it had to wait for cpu, sync block io, swapping etc.  The
      collection of statistics and the interface are in other patches.  This patch
      sets up the data structures and allows the statistics collection to be
      disabled through a kernel boot parameter.
      Signed-off-by: default avatarShailabh Nagar <nagar@watson.ibm.com>
      Signed-off-by: default avatarBalbir Singh <balbir@in.ibm.com>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
      Cc: Erich Focht <efocht@ess.nec.de>
      Cc: Levent Serinol <lserinol@gmail.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  10. 10 Jul, 2006 1 commit
  11. 03 Jul, 2006 5 commits
    • Ingo Molnar's avatar
      [PATCH] sched: cleanup, remove task_t, convert to struct task_struct · 36c8b586
      Ingo Molnar authored
      cleanup: remove task_t and convert all the uses to struct task_struct. I
      introduced it for the scheduler anno and it was a mistake.
      Conversion was mostly scripted, the result was reviewed and all
      secondary whitespace and style impact (if any) was fixed up by hand.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Ingo Molnar's avatar
      [PATCH] lockdep: annotate ->mmap_sem · ad339451
      Ingo Molnar authored
      Teach special (recursive) locking code to the lock validator.  Has no effect
      on non-lockdep kernels.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Ingo Molnar's avatar
      [PATCH] lockdep: core · fbb9ce95
      Ingo Molnar authored
      Do 'make oldconfig' and accept all the defaults for new config options -
      reboot into the kernel and if everything goes well it should boot up fine and
      you should have /proc/lockdep and /proc/lockdep_stats files.
      Typically if the lock validator finds some problem it will print out
      voluminous debug output that begins with "BUG: ..." and which syslog output
      can be used by kernel developers to figure out the precise locking scenario.
      What does the lock validator do?  It "observes" and maps all locking rules as
      they occur dynamically (as triggered by the kernel's natural use of spinlocks,
      rwlocks, mutexes and rwsems).  Whenever the lock validator subsystem detects a
      new locking scenario, it validates this new rule against the existing set of
      rules.  If this new rule is consistent with the existing set of rules then the
      new rule is added transparently and the kernel continues as normal.  If the
      new rule could create a deadlock scenario then this condition is printed out.
      When determining validity of locking, all possible "deadlock scenarios" are
      considered: assuming arbitrary number of CPUs, arbitrary irq context and task
      context constellations, running arbitrary combinations of all the existing
      locking scenarios.  In a typical system this means millions of separate
      scenarios.  This is why we call it a "locking correctness" validator - for all
      rules that are observed the lock validator proves it with mathematical
      certainty that a deadlock could not occur (assuming that the lock validator
      implementation itself is correct and its internal data structures are not
      corrupted by some other kernel subsystem).  [see more details and conditionals
      of this statement in include/linux/lockdep.h and
      Furthermore, this "all possible scenarios" property of the validator also
      enables the finding of complex, highly unlikely multi-CPU multi-context races
      via single single-context rules, increasing the likelyhood of finding bugs
      drastically.  In practical terms: the lock validator already found a bug in
      the upstream kernel that could only occur on systems with 3 or more CPUs, and
      which needed 3 very unlikely code sequences to occur at once on the 3 CPUs.
      That bug was found and reported on a single-CPU system (!).  So in essence a
      race will be found "piecemail-wise", triggering all the necessary components
      for the race, without having to reproduce the race scenario itself!  In its
      short existence the lock validator found and reported many bugs before they
      actually caused a real deadlock.
      To further increase the efficiency of the validator, the mapping is not per
      "lock instance", but per "lock-class".  For example, all struct inode objects
      in the kernel have inode->inotify_mutex.  If there are 10,000 inodes cached,
      then there are 10,000 lock objects.  But ->inotify_mutex is a single "lock
      type", and all locking activities that occur against ->inotify_mutex are
      "unified" into this single lock-class.  The advantage of the lock-class
      approach is that all historical ->inotify_mutex uses are mapped into a single
      (and as narrow as possible) set of locking rules - regardless of how many
      different tasks or inode structures it took to build this set of rules.  The
      set of rules persist during the lifetime of the kernel.
      To see the rough magnitude of checking that the lock validator does, here's a
      portion of /proc/lockdep_stats, fresh after bootup:
       lock-classes:                            694 [max: 2048]
       direct dependencies:                  1598 [max: 8192]
       indirect dependencies:               17896
       all direct dependencies:             16206
       dependency chains:                    1910 [max: 8192]
       in-hardirq chains:                      17
       in-softirq chains:                     105
       in-process chains:                    1065
       stack-trace entries:                 38761 [max: 131072]
       combined max dependencies:         2033928
       hardirq-safe locks:                     24
       hardirq-unsafe locks:                  176
       softirq-safe locks:                     53
       softirq-unsafe locks:                  137
       irq-safe locks:                         59
       irq-unsafe locks:                      176
      The lock validator has observed 1598 actual single-thread locking patterns,
      and has validated all possible 2033928 distinct locking scenarios.
      More details about the design of the lock validator can be found in
      Documentation/lockdep-design.txt, which can also found at:
      [bunk@stusta.de: cleanups]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Ingo Molnar's avatar
      [PATCH] lockdep: irqtrace subsystem, core · de30a2b3
      Ingo Molnar authored
      Accurate hard-IRQ-flags and softirq-flags state tracing.
      This allows us to attach extra functionality to IRQ flags on/off
      events (such as trace-on/off).
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Ingo Molnar's avatar
      [PATCH] lockdep: better lock debugging · 9a11b49a
      Ingo Molnar authored
      Generic lock debugging:
       - generalized lock debugging framework. For example, a bug in one lock
         subsystem turns off debugging in all lock subsystems.
       - got rid of the caller address passing (__IP__/__IP_DECL__/etc.) from
         the mutex/rtmutex debugging code: it caused way too much prototype
         hackery, and lockdep will give the same information anyway.
       - ability to do silent tests
       - check lock freeing in vfree too.
       - more finegrained debugging options, to allow distributions to
         turn off more expensive debugging features.
      There's no separate 'held mutexes' list anymore - but there's a 'held locks'
      stack within lockdep, which unifies deadlock detection across all lock
      classes.  (this is independent of the lockdep validation stuff - lockdep first
      checks whether we are holding a lock already)
      Here are the current debugging options:
      which do:
       config DEBUG_MUTEXES
                bool "Mutex debugging, basic checks"
       config DEBUG_LOCK_ALLOC
               bool "Detect incorrect freeing of live mutexes"
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  12. 30 Jun, 2006 1 commit
  13. 27 Jun, 2006 2 commits
  14. 26 Jun, 2006 2 commits
  15. 25 Jun, 2006 1 commit
    • KaiGai Kohei's avatar
      [PATCH] pacct: add pacct_struct to fix some pacct bugs. · 0e464814
      KaiGai Kohei authored
      The pacct facility need an i/o operation when an accounting record is
      generated.  There is a possibility to wake OOM killer up.  If OOM killer is
      activated, it kills some processes to make them release process memory
      But acct_process() is called in the killed processes context before calling
      exit_mm(), so those processes cannot release own memory.  In the results, any
      processes stop in this point and it finally cause a system stall.
  16. 23 Jun, 2006 2 commits
  17. 01 May, 2006 1 commit
  18. 20 Apr, 2006 1 commit
  19. 19 Apr, 2006 1 commit
  20. 14 Apr, 2006 1 commit
  21. 31 Mar, 2006 3 commits
    • Kirill Korotaev's avatar
      [PATCH] wrong error path in dup_fd() leading to oopses in RCU · 42862298
      Kirill Korotaev authored
      Wrong error path in dup_fd() - it should return NULL on error,
      not an address of already freed memory :/
      Triggered by OpenVZ stress test suite.
      What is interesting is that it was causing different oopses in RCU like
      Call Trace:
         [<c013492c>] rcu_do_batch+0x2c/0x80
         [<c0134bdd>] rcu_process_callbacks+0x3d/0x70
         [<c0126cf3>] tasklet_action+0x73/0xe0
         [<c01269aa>] __do_softirq+0x10a/0x130
         [<c01058ff>] do_softirq+0x4f/0x60
         [<c0113817>] smp_apic_timer_interrupt+0x77/0x110
         [<c0103b54>] apic_timer_interrupt+0x1c/0x24
        Code:  Bad EIP value.
         <0>Kernel panic - not syncing: Fatal exception in interrupt
      Signed-Off-By: default avatarPavel Emelianov <xemul@sw.ru>
      Signed-Off-By: default avatarDmitry Mishin <dim@openvz.org>
      Signed-Off-By: default avatarKirill Korotaev <dev@openvz.org>
      Signed-Off-By: default avatarLinus Torvalds <torvalds@osdl.org>
    • Eric W. Biederman's avatar
      [PATCH] pidhash: Refactor the pid hash table · 92476d7f
      Eric W. Biederman authored
      Simplifies the code, reduces the need for 4 pid hash tables, and makes the
      code more capable.
      In the discussions I had with Oleg it was felt that to a large extent the
      cleanup itself justified the work.  With struct pid being dynamically
      allocated meant we could create the hash table entry when the pid was
      allocated and free the hash table entry when the pid was freed.  Instead of
      playing with the hash lists when ever a process would attach or detach to a
      For myself the fact that it gave what my previous task_ref patch gave for free
      with simpler code was a big win.  The problem is that if you hold a reference
      to struct task_struct you lock in 10K of low memory.  If you do that in a user
      controllable way like /proc does, with an unprivileged but hostile user space
      application with typical resource limits of 1000 fds and 100 processes I can
      trigger the OOM killer by consuming all of low memory with task structs, on a
      machine wight 1GB of low memory.
      If I instead hold a reference to struct pid which holds a pointer to my
      task_struct, I don't suffer from that problem because struct pid is 2 orders
      of magnitude smaller.  In fact struct pid is small enough that most other
      kernel data structures dwarf it, so simply limiting the number of referring
      data structures is enough to prevent exhaustion of low memory.
      This splits the current struct pid into two structures, struct pid and struct
      pid_link, and reduces our number of hash tables from PIDTYPE_MAX to just one.
      struct pid_link is the per process linkage into the hash tables and lives in
      struct task_struct.  struct pid is given an indepedent lifetime, and holds
      pointers to each of the pid types.
      The independent life of struct pid simplifies attach_pid, and detach_pid,
      because we are always manipulating the list of pids and not the hash table.
      In addition in giving struct pid an indpendent life it makes the concept much
      more powerful.
      Kernel data structures can now embed a struct pid * instead of a pid_t and
      not suffer from pid wrap around problems or from keeping unnecessarily
      large amounts of memory allocated.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Andrew Morton's avatar
      [PATCH] resurrect __put_task_struct · 158d9ebd
      Andrew Morton authored
      This just got nuked in mainline.  Bring it back because Eric's patches use it.
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  22. 28 Mar, 2006 3 commits