Skip to content
  • James Bottomley's avatar
    [SCSI] libsas: fix error handling · a8e14fec
    James Bottomley authored
    
    
    The libsas error handler has two fairly fatal bugs
    
    1. scsi_sas_task_done calls scsi_eh_finish_cmd() too early.  This
       happens if the task completes after it has been aborted but before
       the error handler starts up.  Because scsi_eh_finish_cmd()
       decrements host_failed and adds the task to the done list, the
       error handler start check (host_failed == host_busy) never passes
       and the eh never starts.
    
    2. The multiple task completion paths sas_scsi_clear_queue_... all
       simply delete the task from the error queue.  This causes it to
       disappear into the ether, since a command must be placed on the
       done queue to be finished off by the error handler.  This behaviour
       causes the HBA to hang on pending commands.
    
    Fix 1. by moving the SAS_TASK_STATE_ABORTED check to an exit clause at
    the top of the routine and calling ->scsi_done() unconditionally (it
    is a nop if the timer has fired).  This keeps the task in the error
    handling queue until the eh starts.
    
    Fix 2. by making sure every task goes through task complete followed
    by scsi_eh_finish_cmd().
    
    Tested this by firing resets across a disk running a hammer test (now
    it actually survives without hanging the system)
    
    Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
    a8e14fec