Skip to content
  • Doug Ledford's avatar
    IPoIB: Use dedicated workqueues per interface · 5141861c
    Doug Ledford authored
    
    
    During my recent work on the rtnl lock deadlock in the IPoIB driver, I
    saw that even once I fixed the apparent races for a single device, as
    soon as that device had any children, new races popped up.  It turns
    out that this is because no matter how well we protect against races
    on a single device, the fact that all devices use the same workqueue,
    and flush_workqueue() flushes *everything* from that workqueue, we can
    have one device in the middle of a down and holding the rtnl lock and
    another totally unrelated device needing to run mcast_restart_task,
    which wants the rtnl lock and will loop trying to take it unless is
    sees its own FLAG_ADMIN_UP flag go away.  Because the unrelated
    interface will never see its own ADMIN_UP flag drop, the interface
    going down will deadlock trying to flush the queue.  There are several
    possible solutions to this problem:
    
    Make carrier_on_task and mcast_restart_task try to take the rtnl for
    some set period of time and if they fail, then bail.  This runs the
    real risk of dropping work on the floor, which can end up being its
    own separate kind of deadlock.
    
    Set some global flag in the driver that says some device is in the
    middle of going down, letting all tasks know to bail.  Again, this can
    drop work on the floor.  I suppose if our own ADMIN_UP flag doesn't go
    away, then maybe after a few tries on the rtnl lock we can queue our
    own task back up as a delayed work and return and avoid dropping work
    on the floor that way.  But I'm not 100% convinced that we won't cause
    other problems.
    
    Or the method this patch attempts to use, which is when we bring an
    interface up, create a workqueue specifically for that interface, so
    that when we take it back down, we are flushing only those tasks
    associated with our interface.  In addition, keep the global
    workqueue, but now limit it to only flush tasks.  In this way, the
    flush tasks can always flush the device specific work queues without
    having deadlock issues.
    
    Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
    Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
    5141861c