1. 19 Aug, 2016 1 commit
  2. 05 Jul, 2016 2 commits
  3. 12 Jun, 2016 6 commits
  4. 02 May, 2016 1 commit
  5. 01 Dec, 2015 1 commit
  6. 09 Oct, 2015 6 commits
  7. 18 Aug, 2015 2 commits
  8. 21 Jul, 2015 1 commit
  9. 05 Jun, 2015 1 commit
    • Keith Busch's avatar
      NVMe: Automatic namespace rescan · a5768aa8
      Keith Busch authored
      Namespaces may be dynamically allocated and deleted or attached and
      detached. This has the driver rescan the device for namespace changes
      after each device reset or namespace change asynchronous event.
      
      There could potentially be many detached namespaces that we don't want
      polluting /dev/ with unusable block handles, so this will delete disks
      if the namespace is not active as indicated by the response from identify
      namespace. This also skips adding the disk if no capacity is provisioned
      to the namespace in the first place.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      a5768aa8
  10. 22 May, 2015 3 commits
  11. 07 Apr, 2015 1 commit
  12. 19 Feb, 2015 5 commits
    • Keith Busch's avatar
      NVMe: Fix potential corruption during shutdown · 07836e65
      Keith Busch authored
      The driver has to end unreturned commands at some point even if the
      controller has not provided a completion. The driver tried to be safe by
      deleting IO queues prior to ending all unreturned commands. That should
      cause the controller to internally abort inflight commands, but IO queue
      deletion request does not have to be successful, so all bets are off. We
      still have to make progress, so to be extra safe, this patch doesn't
      clear a queue to release the dma mapping for a command until after the
      pci device has been disabled.
      
      This patch removes the special handling during device initialization
      so controller recovery can be done all the time. This is possible since
      initialization is not inlined with pci probe anymore.
      Reported-by: default avatarNilish Choudhury <nilesh.choudhury@oracle.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      07836e65
    • Keith Busch's avatar
      NVMe: Asynchronous controller probe · 2e1d8448
      Keith Busch authored
      This performs the longest parts of nvme device probe in scheduled work.
      This speeds up probe significantly when multiple devices are in use.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      2e1d8448
    • Keith Busch's avatar
      NVMe: Register management handle under nvme class · b3fffdef
      Keith Busch authored
      This creates a new class type for nvme devices to register their
      management character devices with. This is so we do not rely on miscdev
      to provide enough minors for as many nvme devices some people plan to
      use. The previous limit was approximately 60 NVMe controllers, depending
      on the platform and kernel. Now the limit is 1M, which ought to be enough
      for anybody.
      
      Since we have a new device class, it makes sense to attach the block
      devices under this as well, so part of this patch moves the management
      handle initialization prior to the namespaces discovery.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      b3fffdef
    • Keith Busch's avatar
      NVMe: Update SCSI Inquiry VPD 83h translation · 4f1982b4
      Keith Busch authored
      The original translation created collisions on Inquiry VPD 83 for many
      existing devices. Newer specifications provide other ways to translate
      based on the device's version can be used to create unique identifiers.
      
      Version 1.1 provides an EUI64 field that uniquely identifies each
      namespace, and 1.2 added the longer NGUID field for the same reason.
      Both follow the IEEE EUI format and readily translate to the SCSI device
      identification EUI designator type 2h. For devices implementing either,
      the translation will use this type, defaulting to the EUI64 8-byte type if
      implemented then NGUID's 16 byte version if not. If neither are provided,
      the 1.0 translation is used, and is updated to use the SCSI String format
      to guarantee a unique identifier.
      
      Knowing when to use the new fields depends on the nvme controller's
      revision. The NVME_VS macro was not decoding this correctly, so that is
      fixed in this patch and moved to a more appropriate place.
      
      Since the Identify Namespace structure required an update for the NGUID
      field, this patch adds the remaining new 1.2 fields to the structure.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      4f1982b4
    • Keith Busch's avatar
      NVMe: Metadata format support · e1e5e564
      Keith Busch authored
      Adds support for NVMe metadata formats and exposes block devices for
      all namespaces regardless of their format. Namespace formats that are
      unusable will have disk capacity set to 0, but a handle to the block
      device is created to simplify device management. A namespace is not
      usable when the format requires host interleave block and metadata in
      single buffer, has no provisioned storage, or has better data but failed
      to register with blk integrity.
      
      The namespace has to be scanned in two phases to support separate
      metadata formats. The first establishes the sector size and capacity
      prior to invoking add_disk. If metadata is required, the capacity will
      be temporarilly set to 0 until it can be revalidated and registered with
      the integrity extenstions after add_disk completes.
      
      The driver relies on the integrity extensions to provide the metadata
      buffer. NVMe requires this be a single physically contiguous region,
      so only one integrity segment is allowed per command. If the metadata
      is used for T10 PI, the driver provides mappings to save and restore
      the reftag physical block translation. The driver provides no-op
      functions for generate and verify if metadata is not used for protection
      information. This way the setup is always provided by the block layer.
      
      If a request does not supply a required metadata buffer, the command
      is failed with bad address. This could only happen if a user manually
      disables verify/generate on such a disk. The only exception to where
      this is okay is if the controller is capable of stripping/generating
      the metadata, which is possible on some types of formats.
      
      The metadata scatter gather list now occupies the spot in the nvme_iod
      that used to be used to link retryable IOD's, but we don't do that
      anymore, so the field was unused.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      e1e5e564
  13. 29 Jan, 2015 1 commit
    • Jens Axboe's avatar
      NVMe: avoid kmalloc/kfree for smaller IO · ac3dd5bd
      Jens Axboe authored
      Currently we allocate an nvme_iod for each IO, which holds the
      sg list, prps, and other IO related info. Set a threshold of
      2 pages and/or 8KB of data, below which we can just embed this
      in the per-command pdu in blk-mq. For any IO at or below
      NVME_INT_PAGES and NVME_INT_BYTES, we save a kmalloc and kfree.
      
      For higher IOPS, this saves up to 1% of CPU time.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      ac3dd5bd
  14. 04 Nov, 2014 3 commits
    • Matias Bjørling's avatar
      NVMe: Convert to blk-mq · a4aea562
      Matias Bjørling authored
      This converts the NVMe driver to a blk-mq request-based driver.
      
      The NVMe driver is currently bio-based and implements queue logic within
      itself.  By using blk-mq, a lot of these responsibilities can be moved
      and simplified.
      
      The patch is divided into the following blocks:
      
       * Per-command data and cmdid have been moved into the struct request
         field. The cmdid_data can be retrieved using blk_mq_rq_to_pdu() and id
         maintenance are now handled by blk-mq through the rq->tag field.
      
       * The logic for splitting bio's has been moved into the blk-mq layer.
         The driver instead notifies the block layer about limited gap support in
         SG lists.
      
       * blk-mq handles timeouts and is reimplemented within nvme_timeout().
         This both includes abort handling and command cancelation.
      
       * Assignment of nvme queues to CPUs are replaced with the blk-mq
         version. The current blk-mq strategy is to assign the number of
         mapped queues and CPUs to provide synergy, while the nvme driver
         assign as many nvme hw queues as possible. This can be implemented in
         blk-mq if needed.
      
       * NVMe queues are merged with the tags structure of blk-mq.
      
       * blk-mq takes care of setup/teardown of nvme queues and guards invalid
         accesses. Therefore, RCU-usage for nvme queues can be removed.
      
       * IO tracing and accounting are handled by blk-mq and therefore removed.
      
       * Queue suspension logic is replaced with the logic from the block
         layer.
      
      Contributions in this patch from:
      
        Sam Bradshaw <sbradshaw@micron.com>
        Jens Axboe <axboe@fb.com>
        Keith Busch <keith.busch@intel.com>
        Robert Nelson <rlnelson@google.com>
      Acked-by: default avatarKeith Busch <keith.busch@intel.com>
      Acked-by: default avatarJens Axboe <axboe@fb.com>
      
      Updated for new ->queue_rq() prototype.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      a4aea562
    • Keith Busch's avatar
      NVMe: Mismatched host/device page size support · 1d090624
      Keith Busch authored
      Adds support for devices with max page size smaller than the host's.
      In the case we encounter such a host/device combination, the driver will
      split a page into as many PRP entries as necessary for the device's page
      size capabilities. If the device's reported minimum page size is greater
      than the host's, the driver will not attempt to enable the device and
      return an error instead.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarMatthew Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      1d090624
    • Keith Busch's avatar
      NVMe: Async event request · 6fccf938
      Keith Busch authored
      Submits NVMe asynchronous event requests, one event up to the controller
      maximum or number of possible different event types (8), whichever is
      smaller. Events successfully returned by the controller are logged.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarMatthew Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      6fccf938
  15. 13 Jun, 2014 1 commit
    • Keith Busch's avatar
      NVMe: Fix hot cpu notification dead lock · f3db22fe
      Keith Busch authored
      There is a potential dead lock if a cpu event occurs during nvme probe
      since it registered with hot cpu notification. This fixes the race by
      having the module register with notification outside of probe rather
      than have each device register.
      
      The actual work is done in a scheduled work queue instead of in the
      notifier since assigning IO queues has the potential to block if the
      driver creates additional queues.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarMatthew Wilcox <matthew.r.wilcox@intel.com>
      f3db22fe
  16. 03 Jun, 2014 1 commit
  17. 05 May, 2014 3 commits
  18. 10 Apr, 2014 1 commit
    • Keith Busch's avatar
      NVMe: Retry failed commands with non-fatal errors · edd10d33
      Keith Busch authored
      For commands returned with failed status, queue these for resubmission
      and continue retrying them until success or for a limited amount of
      time. The final timeout was arbitrarily chosen so requests can't be
      retried indefinitely.
      
      Since these are requeued on the nvmeq that submitted the command, the
      callbacks have to take an nvmeq instead of an nvme_dev as a parameter
      so that we can use the locked queue to append the iod to retry later.
      
      The nvme_iod conviently can be used to track how long we've been trying
      to successfully complete an iod request. The nvme_iod also provides the
      nvme prp dma mappings, so I had to move a few things around so we can
      keep those mappings.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      [fixed checkpatch issue with long line]
      Signed-off-by: default avatarMatthew Wilcox <matthew.r.wilcox@intel.com>
      edd10d33