Skip to content
  • Srivatsa S. Bhat's avatar
    CPU hotplug, smp: flush any pending IPI callbacks before CPU offline · 8d056c48
    Srivatsa S. Bhat authored
    
    
    There is a race between the CPU offline code (within stop-machine) and
    the smp-call-function code, which can lead to getting IPIs on the
    outgoing CPU, *after* it has gone offline.
    
    Specifically, this can happen when using
    smp_call_function_single_async() to send the IPI, since this API allows
    sending asynchronous IPIs from IRQ disabled contexts.  The exact race
    condition is described below.
    
    During CPU offline, in stop-machine, we don't enforce any rule in the
    _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and
    the other CPUs disable their local interrupts.  Due to this, we can
    encounter a situation in which an IPI is sent by one of the other CPUs
    to the outgoing CPU (while it is *still* online), but the outgoing CPU
    ends up noticing it only *after* it has gone offline.
    
                  CPU 1                                         CPU 2
              (Online CPU)                               (CPU going offline)
    
           Enter _PREPARE stage                          Enter _PREPARE stage
    
                                                         Enter _DISABLE_IRQ stage
    
                                                       =
           Got a device interrupt, and                 | Didn't notice the IPI
           the interrupt handler sent an               | since interrupts were
           IPI to CPU 2 using                          | disabled on this CPU.
           smp_call_function_single_async()            |
                                                       =
    
           Enter _DISABLE_IRQ stage
    
           Enter _RUN stage                              Enter _RUN stage
    
                                      =
           Busy loop with interrupts  |                  Invoke take_cpu_down()
           disabled.                  |                  and take CPU 2 offline
                                      =
    
           Enter _EXIT stage                             Enter _EXIT stage
    
           Re-enable interrupts                          Re-enable interrupts
    
                                                         The pending IPI is noted
                                                         immediately, but alas,
                                                         the CPU is offline at
                                                         this point.
    
    This of course, makes the smp-call-function IPI handler code running on
    CPU 2 unhappy and it complains about "receiving an IPI on an offline
    CPU".
    
    One real example of the scenario on CPU 1 is the block layer's
    complete-request call-path:
    
    	__blk_complete_request() [interrupt-handler]
    	    raise_blk_irq()
    	        smp_call_function_single_async()
    
    However, if we look closely, the block layer does check that the target
    CPU is online before firing the IPI.  So in this case, it is actually
    the unfortunate ordering/timing of events in the stop-machine phase that
    leads to receiving IPIs after the target CPU has gone offline.
    
    In reality, getting a late IPI on an offline CPU is not too bad by
    itself (this can happen even due to hardware latencies in IPI
    send-receive).  It is a bug only if the target CPU really went offline
    without executing all the callbacks queued on its list.  (Note that a
    CPU is free to execute its pending smp-call-function callbacks in a
    batch, without waiting for the corresponding IPIs to arrive for each one
    of those callbacks).
    
    So, fixing this issue can be broken up into two parts:
    
    1. Ensure that a CPU goes offline only after executing all the
       callbacks queued on it.
    
    2. Modify the warning condition in the smp-call-function IPI handler
       code such that it warns only if an offline CPU got an IPI *and* that
       CPU had gone offline with callbacks still pending in its queue.
    
    Achieving part 1 is straight-forward - just flush (execute) all the
    queued callbacks on the outgoing CPU in the CPU_DYING stage[1],
    including those callbacks for which the source CPU's IPIs might not have
    been received on the outgoing CPU yet.  Once we do this, an IPI that
    arrives late on the CPU going offline (either due to the race mentioned
    above, or due to hardware latencies) will be completely harmless, since
    the outgoing CPU would have executed all the queued callbacks before
    going offline.
    
    Overall, this fix (parts 1 and 2 put together) additionally guarantees
    that we will see a warning only when the *IPI-sender code* is buggy -
    that is, if it queues the callback _after_ the target CPU has gone
    offline.
    
    [1].  The CPU_DYING part needs a little more explanation: by the time we
    execute the CPU_DYING notifier callbacks, the CPU would have already
    been marked offline.  But we want to flush out the pending callbacks at
    this stage, ignoring the fact that the CPU is offline.  So restructure
    the IPI handler code so that we can by-pass the "is-cpu-offline?" check
    in this particular case.  (Of course, the right solution here is to fix
    CPU hotplug to mark the CPU offline _after_ invoking the CPU_DYING
    notifiers, but this requires a lot of audit to ensure that this change
    doesn't break any existing code; hence lets go with the solution
    proposed above until that is done).
    
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
    Suggested-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
    Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
    Cc: Borislav Petkov <bp@suse.de>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Frederic Weisbecker <fweisbec@gmail.com>
    Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Mike Galbraith <mgalbraith@suse.de>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Rusty Russell <rusty@rustcorp.com.au>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: default avatarSachin Kamat <sachin.kamat@samsung.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    8d056c48