1. 07 Aug, 2016 1 commit
    • Jens Axboe's avatar
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe authored
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      1eff9d32
  2. 20 Jul, 2016 1 commit
  3. 07 Jun, 2016 4 commits
  4. 13 May, 2016 3 commits
  5. 05 May, 2016 1 commit
  6. 11 Mar, 2016 1 commit
    • Mike Snitzer's avatar
      dm thin: consistently return -ENOSPC if pool has run out of data space · c3667cc6
      Mike Snitzer authored
      Commit 0a927c2f ("dm thin: return -ENOSPC when erroring retry list due
      to out of data space") was a step in the right direction but didn't go
      far enough.
      
      Add a new 'out_of_data_space' flag to 'struct pool' and set it if/when
      the pool runs of of data space.  This fixes cell_error() and
      error_retry_list() to not blindly return -EIO.
      
      We cannot rely on the 'error_if_no_space' feature flag since it is
      transient (in that it can be reset once space is added, plus it only
      controls whether errors are issued, it doesn't reflect whether the
      pool is actually out of space).
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      c3667cc6
  7. 22 Feb, 2016 1 commit
  8. 06 Jan, 2016 1 commit
  9. 17 Dec, 2015 1 commit
    • Nikolay Borisov's avatar
      dm thin: fix race condition when destroying thin pool workqueue · 18d03e8c
      Nikolay Borisov authored
      When a thin pool is being destroyed delayed work items are
      cancelled using cancel_delayed_work(), which doesn't guarantee that on
      return the delayed item isn't running.  This can cause the work item to
      requeue itself on an already destroyed workqueue.  Fix this by using
      cancel_delayed_work_sync() which guarantees that on return the work item
      is not running anymore.
      
      Fixes: 905e51b3 ("dm thin: commit outstanding data every second")
      Fixes: 85ad643b ("dm thin: add timeout to stop out-of-data-space mode holding IO forever")
      Signed-off-by: default avatarNikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      18d03e8c
  10. 23 Nov, 2015 1 commit
    • Mike Snitzer's avatar
      dm thin: fix regression in advertised discard limits · 0fcb04d5
      Mike Snitzer authored
      When establishing a thin device's discard limits we cannot rely on the
      underlying thin-pool device's discard capabilities (which are inherited
      from the thin-pool's underlying data device) given that DM thin devices
      must provide discard support even when the thin-pool's underlying data
      device doesn't support discards.
      
      Users were exposed to this thin device discard limits regression if
      their thin-pool's underlying data device does _not_ support discards.
      This regression caused all upper-layers that called the
      blkdev_issue_discard() interface to not be able to issue discards to
      thin devices (because discard_granularity was 0).  This regression
      wasn't caught earlier because the device-mapper-test-suite's extensive
      'thin-provisioning' discard tests are only ever performed against
      thin-pool's with data devices that support discards.
      
      Fix is to have thin_io_hints() test the pool's 'discard_enabled' feature
      rather than inferring whether or not a thin device's discard support
      should be enabled by looking at the thin-pool's discard_granularity.
      
      Fixes: 21607670 ("dm thin: disable discard support for thin devices if pool's is disabled")
      Reported-by: default avatarMike Gerber <mike@sprachgewalt.de>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.1+
      0fcb04d5
  11. 16 Nov, 2015 1 commit
    • Mike Snitzer's avatar
      dm thin: restore requested 'error_if_no_space' setting on OODS to WRITE transition · 172c2386
      Mike Snitzer authored
      A thin-pool that is in out-of-data-space (OODS) mode may transition back
      to write mode -- without the admin adding more space to the thin-pool --
      if/when blocks are released (either by deleting thin devices or
      discarding provisioned blocks).
      
      But as part of the thin-pool's earlier transition to out-of-data-space
      mode the thin-pool may have set the 'error_if_no_space' flag to true if
      the no_space_timeout expires without more space having been made
      available.  That implementation detail, of changing the pool's
      error_if_no_space setting, needs to be reset back to the default that
      the user specified when the thin-pool's table was loaded.
      
      Otherwise we'll drop the user requested behaviour on the floor when this
      out-of-data-space to write mode transition occurs.
      Reported-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Fixes: 2c43fd26 ("dm thin: fix missing out-of-data-space to write mode transition if blocks are released")
      Cc: stable@vger.kernel.org
      172c2386
  12. 13 Oct, 2015 1 commit
  13. 13 Sep, 2015 1 commit
  14. 18 Aug, 2015 1 commit
  15. 13 Aug, 2015 1 commit
    • Kent Overstreet's avatar
      block: kill merge_bvec_fn() completely · 8ae12666
      Kent Overstreet authored
      As generic_make_request() is now able to handle arbitrarily sized bios,
      it's no longer necessary for each individual block driver to define its
      own ->merge_bvec_fn() callback. Remove every invocation completely.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: drbd-user@lists.linbit.com
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@kernel.org>
      Cc: ceph-devel@vger.kernel.org
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Neil Brown <neilb@suse.de>
      Cc: linux-raid@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits)
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      [dpark: also remove ->merge_bvec_fn() in dm-thin as well as
       dm-era-target, and resolve merge conflicts]
      Signed-off-by: default avatarDongsu Park <dpark@posteo.net>
      Signed-off-by: default avatarMing Lin <ming.l@ssi.samsung.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      8ae12666
  16. 29 Jul, 2015 1 commit
    • Christoph Hellwig's avatar
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig authored
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      4246a0b6
  17. 26 Jul, 2015 1 commit
  18. 16 Jul, 2015 2 commits
    • Mike Snitzer's avatar
      dm thin: display 'needs_check' in status if it is set · e4c78e21
      Mike Snitzer authored
      There is currently no way to see that the needs_check flag has been set
      in the metadata.  Display 'needs_check' in the thin-pool status if it is
      set in the thinp metadata.
      
      Also, update thinp documentation.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      e4c78e21
    • Mike Snitzer's avatar
      dm thin: stay in out-of-data-space mode once no_space_timeout expires · bcc696fa
      Mike Snitzer authored
      This fixes an issue where running out of data space would cause the
      thin-pool's metadata to become read-only.  There was no reason to make
      metadata read-only -- calling set_pool_mode() with PM_READ_ONLY was a
      misguided way to error all queued and future write IOs.  We can
      accomplish the same by degrading from PM_OUT_OF_DATA_SPACE to
      PM_OUT_OF_DATA_SPACE with error_if_no_space enabled.
      
      Otherwise, the use of PM_READ_ONLY could cause a race where commit() was
      started before the PM_READ_ONLY transition but dm_pool_commit_metadata()
      would go on to fail because the block manager had transitioned to
      read-only.  The return of -EPERM from dm_pool_commit_metadata(), due to
      attempting to commit while in read-only mode, caused the thin-pool to
      set 'needs_check' because a metadata_operation_failed().  This needless
      cascade of failures makes life for users more difficult than needed.
      Reported-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      bcc696fa
  19. 05 Jul, 2015 1 commit
  20. 11 Jun, 2015 2 commits
    • Mike Snitzer's avatar
      dm thin: fail messages with EOPNOTSUPP when pool cannot handle messages · fd467696
      Mike Snitzer authored
      Use EOPNOTSUPP, rather than EINVAL, error code when user attempts to
      send the pool a message.  Otherwise usespace is led to believe the
      message failed due to invalid argument.
      Reported-by: default avatarZdenek Kabelac <zkabelac@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      fd467696
    • Joe Thornber's avatar
      dm thin: range discard support · 34fbcf62
      Joe Thornber authored
      Previously REQ_DISCARD bios have been split into block sized chunks
      before submission to the thin target.  There are a couple of issues with
      this:
      
       - If the block size is small, a large discard request can
         get broken up into a great many bios which is both slow and causes
         a lot of memory pressure.
      
       - The thin pool block size and the discard granularity for the
         underlying data device need to be compatible if we want to passdown
         the discard.
      
      This patch relaxes the block size granularity for thin devices.  It
      makes use of the recent range locking added to the bio_prison to
      quiesce a whole range of thin blocks before unmapping them.  Once a
      thin range has been unmapped the discard can then be passed down to
      the data device for those sub ranges where the data blocks are no
      longer used (ie. they weren't shared in the first place).
      
      This patch also doesn't make any apologies about open-coding portions
      of block core as a means to supporting async discard completions in the
      near-term -- if/when late bio splitting lands it'll all get cleaned up.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      34fbcf62
  21. 29 May, 2015 2 commits
  22. 22 May, 2015 1 commit
    • Mike Snitzer's avatar
      block: remove management of bi_remaining when restoring original bi_end_io · 326e1dbb
      Mike Snitzer authored
      Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
      non-chains") regressed all existing callers that followed this pattern:
       1) saving a bio's original bi_end_io
       2) wiring up an intermediate bi_end_io
       3) restoring the original bi_end_io from intermediate bi_end_io
       4) calling bio_endio() to execute the restored original bi_end_io
      
      The regression was due to BIO_CHAIN only ever getting set if
      bio_inc_remaining() is called.  For the above pattern it isn't set until
      step 3 above (step 2 would've needed to establish BIO_CHAIN).  As such
      the first bio_endio(), in step 2 above, never decremented __bi_remaining
      before calling the intermediate bi_end_io -- leaving __bi_remaining with
      the value 1 instead of 0.  When bio_inc_remaining() occurred during step
      3 it brought it to a value of 2.  When the second bio_endio() was
      called, in step 4 above, it should've called the original bi_end_io but
      it didn't because there was an extra reference that wasn't dropped (due
      to atomic operations being optimized away since BIO_CHAIN wasn't set
      upfront).
      
      Fix this issue by removing the __bi_remaining management complexity for
      all callers that use the above pattern -- bio_chain() is the only
      interface that _needs_ to be concerned with __bi_remaining.  For the
      above pattern callers just expect the bi_end_io they set to get called!
      Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
      that aren't associated with the bio_chain() interface.
      
      Also, the bio_inc_remaining() interface has been moved local to bio.c.
      
      Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      326e1dbb
  23. 05 May, 2015 1 commit
  24. 27 Feb, 2015 1 commit
    • Joe Thornber's avatar
      dm thin: fix to consistently zero-fill reads to unprovisioned blocks · 5f027a3b
      Joe Thornber authored
      It was always intended that a read to an unprovisioned block will return
      zeroes regardless of whether the pool is in read-only or read-write
      mode.  thin_bio_map() was inconsistent with its handling of such reads
      when the pool is in read-only mode, it now properly zero-fills the bios
      it returns in response to unprovisioned block reads.
      
      Eliminate thin_bio_map()'s special read-only mode handling of -ENODATA
      and just allow the IO to be deferred to the worker which will result in
      pool->process_bio() handling the IO (which already properly zero-fills
      reads to unprovisioned blocks).
      Reported-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      5f027a3b
  25. 09 Feb, 2015 1 commit
  26. 28 Jan, 2015 1 commit
  27. 17 Dec, 2014 3 commits
    • Marc Dionne's avatar
      dm thin: fix crash by initializing thin device's refcount and completion earlier · 2b94e896
      Marc Dionne authored
      Commit 80e96c54 ("dm thin: do not allow thin device activation
      while pool is suspended") delayed the initialization of a new thin
      device's refcount and completion until after this new thin was added
      to the pool's active_thins list and the pool lock is released.  This
      opens a race with a worker thread that walks the list and calls
      thin_get/put, noticing that the refcount goes to 0 and calling
      complete, freezing up the system and giving the oops below:
      
       kernel: BUG: unable to handle kernel NULL pointer dereference at           (null)
       kernel: IP: [<ffffffff810d360b>] __wake_up_common+0x2b/0x90
      
       kernel: Call Trace:
       kernel: [<ffffffff810d3683>] __wake_up_locked+0x13/0x20
       kernel: [<ffffffff810d3dc7>] complete+0x37/0x50
       kernel: [<ffffffffa0595c50>] thin_put+0x20/0x30 [dm_thin_pool]
       kernel: [<ffffffffa059aab7>] do_worker+0x667/0x870 [dm_thin_pool]
       kernel: [<ffffffff816a8a4c>] ? __schedule+0x3ac/0x9a0
       kernel: [<ffffffff810b1aef>] process_one_work+0x14f/0x400
       kernel: [<ffffffff810b206b>] worker_thread+0x6b/0x490
       kernel: [<ffffffff810b2000>] ? rescuer_thread+0x260/0x260
       kernel: [<ffffffff810b6a7b>] kthread+0xdb/0x100
       kernel: [<ffffffff810b69a0>] ? kthread_create_on_node+0x170/0x170
       kernel: [<ffffffff816ad7ec>] ret_from_fork+0x7c/0xb0
       kernel: [<ffffffff810b69a0>] ? kthread_create_on_node+0x170/0x170
      
      Set the thin device's initial refcount and initialize the completion
      before adding it to the pool's active_thins list in thin_ctr().
      Signed-off-by: default avatarMarc Dionne <marc.dionne@your-file-system.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      2b94e896
    • Joe Thornber's avatar
      dm thin: fix missing out-of-data-space to write mode transition if blocks are released · 2c43fd26
      Joe Thornber authored
      Discard bios and thin device deletion have the potential to release data
      blocks.  If the thin-pool is in out-of-data-space mode, and blocks were
      released, transition the thin-pool back to full write mode.
      
      The correct time to do this is just after the thin-pool metadata commit.
      It cannot be done before the commit because the space maps will not
      allow immediate reuse of the data blocks in case there's a rollback
      following power failure.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      2c43fd26
    • Joe Thornber's avatar
      dm thin: fix inability to discard blocks when in out-of-data-space mode · 45ec9bd0
      Joe Thornber authored
      When the pool was in PM_OUT_OF_SPACE mode its process_prepared_discard
      function pointer was incorrectly being set to
      process_prepared_discard_passdown rather than process_prepared_discard.
      
      This incorrect function pointer meant the discard was being passed down,
      but not effecting the mapping.  As such any discard that was issued, in
      an attempt to reclaim blocks, would not successfully free data space.
      Reported-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      45ec9bd0
  28. 21 Nov, 2014 1 commit
    • Mike Snitzer's avatar
      dm thin: fix pool_io_hints to avoid looking at max_hw_sectors · d200c30e
      Mike Snitzer authored
      Simplify the pool_io_hints code that works to establish a max_sectors
      value that is a power-of-2 factor of the thin-pool's blocksize.  The
      biggest associated improvement is that the DM thin-pool is no longer
      concerning itself with the data device's max_hw_sectors when adjusting
      max_sectors.
      
      This fixes the relative fragility of the original "dm thin: adjust
      max_sectors_kb based on thinp blocksize" commit that only became
      apparent when testing was performed using a DM thin-pool ontop of a
      virtio_blk device.  One proposed upstream patch detailed the problems
      inherent in virtio_blk: https://lkml.org/lkml/2014/11/20/611
      
      So even though virtio_blk incorrectly set its max_hw_sectors it actually
      helped make it clear that we need DM thinp to be tolerant of any future
      Linux driver that incorrectly sets max_hw_sectors.
      
      We only need to be concerned with modifying the thin-pool device's
      max_sectors limit if it is smaller than the thin-pool's blocksize.  In
      this case the value of max_sectors does become a limiting factor when
      upper layers (e.g. filesystems) construct their bios.  But if the
      hardware can support IOs larger than the thin-pool's blocksize the user
      is encouraged to adjust the thin-pool's data device's max_sectors
      accordingly -- doing so will enable the thin-pool to inherit the
      established user-defined max_sectors.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      d200c30e
  29. 19 Nov, 2014 2 commits