1. 13 Oct, 2008 1 commit
    • Mike Christie's avatar
      [SCSI] fc class: unblock target after calling terminate callback (take 2) · fff9d40c
      Mike Christie authored
      When we block a rport and the driver implements the terminate
      callback we will fail IO that was running quickly. However
      IO that was in the scsi_device/block queue sits there until
      the dev_loss_tmo fires, and this can make it look like IO is
      lost because new IO will get executed but that IO stuck in
      the blocked queue sits there for some time longer.
      
      With this patch when the fast io fail tmo fires, we will
      fail the blocked IO and any new IO. This patch also allows
      all drivers to partially support the fast io fail tmo. If the
      terminate io callback is not implemented, we will still fail blocked
      IO and any new IO, so multipath can handle that.
      
      This patch also allows the fc and iscsi classes to implement the
      same behavior. The timers are just unfornately named differently.
      
      This patch also fixes the problem where drivers were unblocking
      the target in their terminate callback, which was needed for
      rport removal, but for fast io fail timeout it would cause
      IO to bounce arround the scsi/block layer and the LLD queuecommand.
      And it for drivers that could have IO stuck but did not have
      a terminate callback the unblock calls in the class will fix
      them.
      
      v2.
      - fix up bit setting style to meet JamesS's pref.
      - Broke out new host byte error changes to make it easier to read.
      - added JamesS's ack from list.
      v1
      - initial patch
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Acked-by: default avatarJames Smart <James.Smart@emulex.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      fff9d40c
  2. 09 Oct, 2008 1 commit
  3. 03 Oct, 2008 1 commit
  4. 27 Jul, 2008 1 commit
  5. 21 Jul, 2008 1 commit
  6. 27 Apr, 2008 1 commit
  7. 19 Apr, 2008 1 commit
  8. 11 Jan, 2008 1 commit
  9. 12 Oct, 2007 3 commits
  10. 18 Jul, 2007 1 commit
  11. 17 Jun, 2007 1 commit
  12. 26 May, 2007 1 commit
  13. 16 May, 2007 1 commit
    • James Smart's avatar
      [SCSI] FC Transport support for vports based on NPIV · a53eb5e0
      James Smart authored
      This patch provides support for FC virtual ports based on NPIV.
      For information on the interfaces and design, please read the
      Documentation/scsi/scsi_fc_transport.txt file enclosed within
      the patch.
      
      The RFC was originally posted here:
      http://marc.info/?l=linux-scsi&m=117226959918393&w=2
      
      Changes from the initial RFC:
      - Bug fix: needed a transport_class_unregister() for the vport class
      - Create a symlink to the vport in the shost device if it is not the
          parent of the vport.
      - Made symbolic name writable so it can be set after creation
      - Made the temporary fc_vport_identifiers struct private to the
      transport.
      - Deleted the vport_id field from the vport. I couldn't find any good
        use for it (and symname is a good replacement).
      - Made the vport_state and vport_last_state "private" attributes.
        Added the fc_vport_set_state() helper function to manage state
        transitions
      - Updated vport_create() to allow a vport to be created in a disabled
        state.
      - Added INITIALIZING and FAILED vport states
      - Added VPCERR_xxx defines for errors to be returned from vport_create()
      - Created a Documentation/scsi/scsi_fc_transport.txt file that describes
        the interfaces and expected LLDD behaviors.
      Signed-off-by: default avatarJames Smart <James.Smart@emulex.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@SteelEye.com>
      a53eb5e0
  14. 06 May, 2007 1 commit
    • James Smart's avatar
      [SCSI] fc_transport: make all rports wait dev_loss_tmo before removing them · 92740b24
      James Smart authored
      Per the comment in the change - it's not always prudent to immediately
      remove the rport upon first notice of a disconnect. Make all rports
      wait dev_loss_tmo before being deleted (and each could have a separate
      dev_loss_tmo value).
      
      The original post was:
      http://marc.info/?l=linux-scsi&m=117392196006703&w=2
      
      The repost contains the following changes:
       - Bug fix in fc_starget_delete(). Dev_loss_tmo_callbk() was called prior to
         tearing down the target. The callback is to be the last thing called, as
         it tells the LLDD that the rport is completely finished and can be torn
         down.  Rework so that terminate_rport_io() is called to terminate the
         outstanding io. Isolated work so it's is simply "starget" work.
       - Fix holes in original patch. There were code paths that did not expect
         the dev_loss_tmo timer to be running for the non-fcp rports.
       - Bug Fix: the transport wasn't protecting against a LLDD calling
         fc_remote_port_delete() back-to-back. Thus, the dev_loss_tmo timer
         could be restarted such that it fires after the rport had been deleted.
         Validate rport state before starting the timer.
      Signed-off-by: default avatarJames Smart <James.Smart@emulex.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@SteelEye.com>
      92740b24
  15. 20 Mar, 2007 1 commit
  16. 17 Feb, 2007 1 commit
  17. 14 Feb, 2007 1 commit
    • Tim Schmielau's avatar
      [PATCH] remove many unneeded #includes of sched.h · cd354f1a
      Tim Schmielau authored
      After Al Viro (finally) succeeded in removing the sched.h #include in module.h
      recently, it makes sense again to remove other superfluous sched.h includes.
      There are quite a lot of files which include it but don't actually need
      anything defined in there.  Presumably these includes were once needed for
      macros that used to live in sched.h, but moved to other header files in the
      course of cleaning it up.
      
      To ease the pain, this time I did not fiddle with any header files and only
      removed #includes from .c-files, which tend to cause less trouble.
      
      Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
      arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
      allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
      configs in arch/arm/configs on arm.  I also checked that no new warnings were
      introduced by the patch (actually, some warnings are removed that were emitted
      by unnecessarily included header files).
      Signed-off-by: default avatarTim Schmielau <tim@physik3.uni-rostock.de>
      Acked-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cd354f1a
  18. 22 Nov, 2006 1 commit
  19. 23 Sep, 2006 1 commit
  20. 04 Sep, 2006 1 commit
  21. 02 Sep, 2006 2 commits
  22. 19 Aug, 2006 2 commits
  23. 27 Jun, 2006 3 commits
    • James Smart's avatar
      [SCSI] fc transport: bug fix: correct references · 3bdad7bd
      James Smart authored
      Original post was incorrect as it didn't realize that we already had
      a self-referenc due to device_initialize(), and we were really only
      missing the put on our own reference. This was hidden by the other bug
      which had the midlayer reusing stargets after they were already free,
      which was doing too many puts on our rport.
      
      Updating FC transport for:
      - Add put in fc_rport_final_delete(), to release the rport.
        Prior, we were leaving the rport with a reference, thus the shost
        with references, etc. If the driver was unloaded, shosts and rports
        remained, along with work threads, etc
      - Fix fc_rport_create failure path - too many put's on parent
      - Add commenting to easily track ref taking.
      Signed-off-by: default avatarJames Smart <james.smart@emulex.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@SteelEye.com>
      3bdad7bd
    • James Smart's avatar
      [SCSI] update max sdev block limit · 1c9e16e4
      James Smart authored
      Updated patch to address comments from Pat Mansfield and Michael Reed:
      Bumped max to 600 (10mins). Set default dev_loss_tmo to a value other
      than the max (30s).
      Signed-off-by: default avatarJames Smart <James.Smart@emulex.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@SteelEye.com>
      1c9e16e4
    • James Smart's avatar
      [SCSI] fc transport: resolve scan vs delete deadlocks · a0785edf
      James Smart authored
      In a prior posting to linux-scsi on the fc transport and workq
      deadlocks, we noted a second error that did not have a patch:
        http://marc.theaimsgroup.com/?l=linux-scsi&m=114467847711383&w=2
        - There's a deadlock where scsi_remove_target() has to sit behind
          scsi_scan_target() due to contention over the scan_lock().
      
      Subsequently we posted a request for comments about the deadlock:
        http://marc.theaimsgroup.com/?l=linux-scsi&m=114469358829500&w=2
      
      This posting resolves the second error. Here's what we now understand,
      and are implementing:
      
        If the lldd deletes the rport while a scan is active, the sdev's queue
        is blocked which stops the issuing of commands associated with the scan.
        At this point, the scan stalls, and does so with the shost->scan_mutex held.
        If, at this point, if any scan or delete request is made on the host, it
        will stall waiting for the scan_mutex.
      
        For the FC transport, we queue all delete work to a single workq.
        So, things worked fine when competing with the scan, as long as the
        target blocking the scan was the same target at the top of our delete
        workq, as the delete workq routine always unblocked just prior to
        requesting the delete.  Unfortunately, if the top of our delete workq
        was for a different target, we deadlock.  Additionally, if the target
        blocking scan returned, we were unblocking it in the scan workq routine,
        which really won't execute until the existing stalled scan workq
        completes (e.g. we're re-scheduling it while it is in the midst of its
        execution).
      
        This patch moves the unblock out of the workq routines and moves it to
        the context that is scheduling the work. This ensures that at some point,
        we will unblock the target that is blocking scan.  Please note, however,
        that the deadlock condition may still occur while it waits for the
        transport to timeout an unblock on a target.  Worst case, this is bounded
        by the transport dev_loss_tmo (default: 30 seconds).
      
      Finally, Michael Reed deserves the credit for the bulk of this patch,
      analysis, and it's testing. Thank you for your help.
      
      Note: The request for comments statements about the gross-ness of the
        scan_mutex still stand.
      Signed-off-by: default avatarMichael Reed <mdr@sgi.com>
      Signed-off-by: default avatarJames Smart <james.smart@emulex.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@SteelEye.com>
      a0785edf
  24. 10 Jun, 2006 1 commit
  25. 13 Apr, 2006 1 commit
    • James Smart's avatar
      [SCSI] FC transport: fixes for workq deadlocks · aedf3497
      James Smart authored
      As previously reported via Michael Reed, the FC transport took a hit
      in 2.6.15 (perhaps a little earlier) when we solved a recursion error.
      There are 2 deadlocks occurring:
      - With scan and the delete items sharing the same workq, flushing the
        workq for the delete code was getting it stalled behind a very long
        running scan code path.
      - There's a deadlock where scsi_remove_target() has to sit behind
        scsi_scan_target() due to contention over the scan_lock().
      
      This patch resolves the 1st deadlock and significantly reduces the
      odds of the second. So far, we have only replicated the 2nd deadlock
      on a highly-parallel SMP system. More on the 2nd deadlock in a following
      email.
      
      This patch reworks the transport to:
      - Only use the scsi host workq for scanning
      - Use 2 other workq's internally. One for deletions, the other for
        scheduled deletions. Originally, we tried this with a single workq,
        but the occassional flushes of the scheduled queues was hitting the
        second deadlock with a slightly higher frequency. In the future, we'll
        look at the LLDD's and the transport to see if we can get rid of this
        extra overhead.
      - When moving to the other workq's we tightened up some object states
        and some lock handling.
      - Properly syncs adds/deletes
      - minor code cleanups
        - directly reference fc_host_attrs, rather than through attribute
          macros
        - flush the right workq on delayed work cancel failures.
      
      Large kudos to Michael Reed who has been working this issue for the last
      month.
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@SteelEye.com>
      aedf3497
  26. 13 Mar, 2006 1 commit
  27. 09 Mar, 2006 1 commit
  28. 27 Feb, 2006 2 commits
  29. 14 Jan, 2006 2 commits
    • Christoph Hellwig's avatar
      [SCSI] remove target parent limitiation · e02f3f59
      Christoph Hellwig authored
      When James Smart fixed the issue of the userspace scan atributes
      crashing the system with the FC transport class he added a patch to
      let the transport class check if the parent is valid for a given
      transport class.
      
      When adding support for the integrated raid of fusion sas devices
      we ran into a problem with that, as it didn't allow adding virtual
      raid volumes without the transport class knowing about it.
      
      So this patch adds a user_scan attribute instead, that takes over from
      scsi_scan_host_selected if the transport class sets it and thus lets
      the transport class control the user-initiated scanning.  As this
      plugs the hole about user-initiated scanning the target_parent hook
      goes away and we rely on callers of the scanning routines to do
      something sensible.
      
      For SAS this meant I had to switch from a spinlock to a mutex to
      synchronize the topology linked lists, in FC they were completely
      unsynchronized which seems wrong.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@SteelEye.com>
      e02f3f59
    • Andreas Herrmann's avatar
      [SCSI] fc transport: add permanent_port_name fc_host attribute · 6b7281d0
      Andreas Herrmann authored
      Add fc_host attribute permanent_port_name which is
      used to show the port name of the primary port -
      the port that initially logged into the fabric.
      
      For a virtual port (registered via the primary port with
      FDISC command) it is useful to know not only its (virtual)
      port name but also the permanent port name.
      Signed-off-by: default avatarAndreas Herrmann <aherrman@de.ibm.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@SteelEye.com>
      6b7281d0
  30. 15 Dec, 2005 1 commit
    • James.Smart@Emulex.Com's avatar
      [SCSI] fix for fc transport recursion problem. · 42e33148
      James.Smart@Emulex.Com authored
      In the scenario that a link was broken, the devloss timer for each
      rport was expire at roughly the same time, causing lots of "delete"
      workqueue items being queued. Depth is dependent upon the number of
      rports that were on the link.
      
      The rport target remove calls were calling flush_scheduled_work(),
      which would interrupt the stream, and start the next workqueue item,
      which did the same thing, and so on until recursion depth was large.
      
      This fix stops the recursion in the initial delete path, and pushes it
      off to a host-level work item that reaps the dead rports.
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@SteelEye.com>
      42e33148
  31. 13 Dec, 2005 1 commit
  32. 30 Oct, 2005 1 commit
    • Tim Schmielau's avatar
      [PATCH] fix missing includes · 4e57b681
      Tim Schmielau authored
      I recently picked up my older work to remove unnecessary #includes of
      sched.h, starting from a patch by Dave Jones to not include sched.h
      from module.h. This reduces the number of indirect includes of sched.h
      by ~300. Another ~400 pointless direct includes can be removed after
      this disentangling (patch to follow later).
      However, quite a few indirect includes need to be fixed up for this.
      
      In order to feed the patches through -mm with as little disturbance as
      possible, I've split out the fixes I accumulated up to now (complete for
      i386 and x86_64, more archs to follow later) and post them before the real
      patch.  This way this large part of the patch is kept simple with only
      adding #includes, and all hunks are independent of each other.  So if any
      hunk rejects or gets in the way of other patches, just drop it.  My scripts
      will pick it up again in the next round.
      Signed-off-by: default avatarTim Schmielau <tim@physik3.uni-rostock.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4e57b681