Skip to content
  • James Smart's avatar
    [SCSI] scsi_lib: pause between error retries · 573e5913
    James Smart authored
    
    
    During cable pull tests on our 16G FC adapter, we are seeing errors,
    typically reads to close targets, which fail due to CRC or framing
    errors caused by the cable being pull (return status DID_ERROR).
    The adapter detects the error on one of the first frames received,
    marks the FC exchange as dead (further frames go to bit bucket) and
    signals the host of the error. This action is so quick, and coupled
    with fast host CPUs, creates a scenario in which the midlayer sees
    the failure and retries the io almost immediately. We've seen link
    traces with the retry on the link while the original i/o is still
    being processed by the target. We're also seeing the time window
    for the "link to pull-apart" and the physical interface to report
    disconnected to be in the few millisecond range. Which means, we're
    encountering scenarios where the full retry count is exhausted
    (all with error) by the midlayer before the link disconnect state
    is detected.
    
    We looked at 8G FC behavior and occasionally see the same behavior,
    but as the link was slower, it rarely could exhaust all retries
    before the link reported disconnect.
    
    What is needed is a slight delay between io retries due to DID_ERROR
    to cover this error.  It is inappropriate to put this delay in the
    driver, as the error is indistinguishable from other link-related errors,
    nor does the driver track whether the io is a retry or not. This is also
    easier than tracking between-io-error bursts that are seen in this
    scenario.
    
    The patch below updates the retry path so that it inserts a delay as
    if the target was busy.  The busy delay is on the order of 6ms. This
    delay is sufficient to ensure the link down condition is reported
    before the retry count is exhausted (at most 1 retry is seen).
    
    Signed-off-by: default avatarAlex Iannicelli <alex.iannicelli@emulex.com>
    Signed-off-by: default avatarJames Smart <james.smart@emulex.com>
    Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
    573e5913