Skip to content
  • James Smart's avatar
    [SCSI] FC transport: fixes for workq deadlocks · aedf3497
    James Smart authored
    
    
    As previously reported via Michael Reed, the FC transport took a hit
    in 2.6.15 (perhaps a little earlier) when we solved a recursion error.
    There are 2 deadlocks occurring:
    - With scan and the delete items sharing the same workq, flushing the
      workq for the delete code was getting it stalled behind a very long
      running scan code path.
    - There's a deadlock where scsi_remove_target() has to sit behind
      scsi_scan_target() due to contention over the scan_lock().
    
    This patch resolves the 1st deadlock and significantly reduces the
    odds of the second. So far, we have only replicated the 2nd deadlock
    on a highly-parallel SMP system. More on the 2nd deadlock in a following
    email.
    
    This patch reworks the transport to:
    - Only use the scsi host workq for scanning
    - Use 2 other workq's internally. One for deletions, the other for
      scheduled deletions. Originally, we tried this with a single workq,
      but the occassional flushes of the scheduled queues was hitting the
      second deadlock with a slightly higher frequency. In the future, we'll
      look at the LLDD's and the transport to see if we can get rid of this
      extra overhead.
    - When moving to the other workq's we tightened up some object states
      and some lock handling.
    - Properly syncs adds/deletes
    - minor code cleanups
      - directly reference fc_host_attrs, rather than through attribute
        macros
      - flush the right workq on delayed work cancel failures.
    
    Large kudos to Michael Reed who has been working this issue for the last
    month.
    
    Signed-off-by: default avatarJames Bottomley <James.Bottomley@SteelEye.com>
    aedf3497