1. 19 Jun, 2015 1 commit
    • Peter Zijlstra's avatar
      sched/stop_machine: Fix deadlock between multiple stop_two_cpus() · b17718d0
      Peter Zijlstra authored
      Jiri reported a machine stuck in multi_cpu_stop() with
      migrate_swap_stop() as function and with the following src,dst cpu
      pairs: {11,  4} {13, 11} { 4, 13}
      
                              4       11      13
      
      cpuM: queue(4 ,13)
                              *Ma
      cpuN: queue(13,11)
                                      *N      Na
                              *M              Mb
      cpuO: queue(11, 4)
                              *O      Oa
                                      *Nb
                              *Ob
      
      Where *X denotes the cpu running the queueing of cpu-X and X[ab] denotes
      the first/second queued work.
      
      You'll observe the top of the workqueue for each cpu: 4,11,13 to be work
      from cpus: M, O, N resp. IOW. deadlock.
      
      Do away with the queueing trickery and introduce lg_double_lock() to
      lock both CPUs and fully serialize the stop_two_cpus() callers instead
      of the partial (and buggy) serialization we have now.
      Reported-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150605153023.GH19282@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b17718d0
  2. 07 Apr, 2014 1 commit
  3. 08 Nov, 2013 1 commit
  4. 09 Oct, 2012 3 commits
  5. 29 May, 2012 2 commits
  6. 22 Dec, 2011 1 commit
    • Srivatsa S. Bhat's avatar
      VFS: Fix race between CPU hotplug and lglocks · e30e2fdf
      Srivatsa S. Bhat authored
      Currently, the *_global_[un]lock_online() routines are not at all synchronized
      with CPU hotplug. Soft-lockups detected as a consequence of this race was
      reported earlier at https://lkml.org/lkml/2011/8/24/185. (Thanks to Cong Meng
      for finding out that the root-cause of this issue is the race condition
      between br_write_[un]lock() and CPU hotplug, which results in the lock states
      getting messed up).
      
      Fixing this race by just adding {get,put}_online_cpus() at appropriate places
      in *_global_[un]lock_online() is not a good option, because, then suddenly
      br_write_[un]lock() would become blocking, whereas they have been kept as
      non-blocking all this time, and we would want to keep them that way.
      
      So, overall, we want to ensure 3 things:
      1. br_write_lock() and br_write_unlock() must remain as non-blocking.
      2. The corresponding lock and unlock of the per-cpu spinlocks must not happen
         for different sets of CPUs.
      3. Either prevent any new CPU online operation in between this lock-unlock, or
         ensure that the newly onlined CPU does not proceed with its corresponding
         per-cpu spinlock unlocked.
      
      To achieve all this:
      (a) We introduce a new spinlock that is taken by the *_global_lock_online()
          routine and released by the *_global_unlock_online() routine.
      (b) We register a callback for CPU hotplug notifications, and this callback
          takes the same spinlock as above.
      (c) We maintain a bitmap which is close to the cpu_online_mask, and once it is
          initialized in the lock_init() code, all future updates to it are done in
          the callback, under the above spinlock.
      (d) The above bitmap is used (instead of cpu_online_mask) while locking and
          unlocking the per-cpu locks.
      
      The callback takes the spinlock upon the CPU_UP_PREPARE event. So, if the
      br_write_lock-unlock sequence is in progress, the callback keeps spinning,
      thus preventing the CPU online operation till the lock-unlock sequence is
      complete. This takes care of requirement (3).
      
      The bitmap that we maintain remains unmodified throughout the lock-unlock
      sequence, since all updates to it are managed by the callback, which takes
      the same spinlock as the one taken by the lock code and released only by the
      unlock routine. Combining this with (d) above, satisfies requirement (2).
      
      Overall, since we use a spinlock (mentioned in (a)) to prevent CPU hotplug
      operations from racing with br_write_lock-unlock, requirement (1) is also
      taken care of.
      
      By the way, it is to be noted that a CPU offline operation can actually run
      in parallel with our lock-unlock sequence, because our callback doesn't react
      to notifications earlier than CPU_DEAD (in order to maintain our bitmap
      properly). And this means, since we use our own bitmap (which is stale, on
      purpose) during the lock-unlock sequence, we could end up unlocking the
      per-cpu lock of an offline CPU (because we had locked it earlier, when the
      CPU was online), in order to satisfy requirement (2). But this is harmless,
      though it looks a bit awkward.
      Debugged-by: default avatarCong Meng <mc@linux.vnet.ibm.com>
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      e30e2fdf
  7. 09 Sep, 2010 1 commit
  8. 18 Aug, 2010 1 commit
    • Nick Piggin's avatar
      lglock: introduce special lglock and brlock spin locks · 2dc91abe
      Nick Piggin authored
      lglock: introduce special lglock and brlock spin locks
      
      This patch introduces "local-global" locks (lglocks). These can be used to:
      
      - Provide fast exclusive access to per-CPU data, with exclusive access to
        another CPU's data allowed but possibly subject to contention, and to provide
        very slow exclusive access to all per-CPU data.
      - Or to provide very fast and scalable read serialisation, and to provide
        very slow exclusive serialisation of data (not necessarily per-CPU data).
      
      Brlocks are also implemented as a short-hand notation for the latter use
      case.
      
      Thanks to Paul for local/global naming convention.
      
      Cc: linux-kernel@vger.kernel.org
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      2dc91abe