Skip to content
Snippets Groups Projects
  1. Apr 21, 2011
  2. Apr 19, 2011
  3. Apr 18, 2011
    • NeilBrown's avatar
      md: fix up raid1/raid10 unplugging. · c3b328ac
      NeilBrown authored
      
      We just need to make sure that an unplug event wakes up the md
      thread, which is exactly what mddev_check_plugged does.
      
      Also remove some plug-related code that is no longer needed.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      c3b328ac
    • NeilBrown's avatar
      md: incorporate new plugging into raid5. · 7c13edc8
      NeilBrown authored
      
      In raid5 plugging is used for 2 things:
       1/ collecting writes that require a bitmap update
       2/ collecting writes in the hope that we can create full
          stripes - or at least more-full.
      
      We now release these different sets of stripes when plug_cnt
      is zero.
      
      Also in make_request, we call mddev_check_plug to hopefully increase
      plug_cnt, and wake up the thread at the end if plugging wasn't
      achieved for some reason.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      7c13edc8
    • NeilBrown's avatar
      md: provide generic support for handling unplug callbacks. · 97658cdd
      NeilBrown authored
      
      When an md device adds a request to a queue, it can call
      mddev_check_plugged.
      If this succeeds then we know that the md thread will be woken up
      shortly, and ->plug_cnt will be non-zero until then, so some
      processing can be delayed.
      
      If it fails, then no unplug callback is expected and the make_request
      function needs to do whatever is required to make the request happen.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      97658cdd
    • NeilBrown's avatar
      md - remove old plugging code. · 482c0834
      NeilBrown authored
      
      md has some plugging infrastructure for RAID5 to use because the
      normal plugging infrastructure required a 'request_queue', and when
      called from dm, RAID5 doesn't have one of those available.
      
      This relied on the ->unplug_fn callback which doesn't exist any more.
      
      So remove all of that code, both in md and raid5.  Subsequent patches
      with restore the plugging functionality.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      482c0834
    • NeilBrown's avatar
      md/dm - remove remains of plug_fn callback. · af1db72d
      NeilBrown authored
      
      Now that unplugging is done differently, the unplug_fn callback is
      never called, so it can be completely discarded.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      af1db72d
    • NeilBrown's avatar
      md: use new plugging interface for RAID IO. · e1dfa0a2
      NeilBrown authored
      
      md/raid submits a lot of IO from the various raid threads.
      So adding start/finish plug calls to those so that some
      plugging happens.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      e1dfa0a2
  4. Apr 05, 2011
    • Mike Snitzer's avatar
      dm: improve block integrity support · a63a5cf8
      Mike Snitzer authored
      
      The current block integrity (DIF/DIX) support in DM is verifying that
      all devices' integrity profiles match during DM device resume (which
      is past the point of no return).  To some degree that is unavoidable
      (stacked DM devices force this late checking).  But for most DM
      devices (which aren't stacking on other DM devices) the ideal time to
      verify all integrity profiles match is during table load.
      
      Introduce the notion of an "initialized" integrity profile: a profile
      that was blk_integrity_register()'d with a non-NULL 'blk_integrity'
      template.  Add blk_integrity_is_initialized() to allow checking if a
      profile was initialized.
      
      Update DM integrity support to:
      - check all devices with _initialized_ integrity profiles match
        during table load; uninitialized profiles (e.g. for underlying DM
        device(s) of a stacked DM device) are ignored.
      - disallow a table load that would result in an integrity profile that
        conflicts with a DM device's existing (in-use) integrity profile
      - avoid clearing an existing integrity profile
      - validate all integrity profiles match during resume; but if they
        don't all we can do is report the mismatch (during resume we're past
        the point of no return)
      
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      a63a5cf8
  5. Mar 31, 2011
  6. Mar 28, 2011
  7. Mar 24, 2011
    • Mustafa Mesanovic's avatar
      dm stripe: implement merge method · 29915202
      Mustafa Mesanovic authored
      
      Implement a merge function in the striped target.
      
      When the striped target's underlying devices provide a merge_bvec_fn
      (like all DM devices do via dm_merge_bvec) it is important to call down
      to them when building a biovec that doesn't span a stripe boundary.
      
      Without the merge method, a striped DM device stacked on DM devices
      causes bios with a single page to be submitted which results
      in unnecessary overhead that hurts performance.
      
      This change really helps filesystems (e.g. XFS and now ext4) which take
      care to assemble larger bios.  By implementing stripe_merge(), DM and the
      stripe target no longer undermine the filesystem's work by only allowing
      a single page per bio.  Buffered IO sees the biggest improvement
      (particularly uncached reads, buffered writes to a lesser degree).  This
      is especially so for more capable "enterprise" storage LUNs.
      
      The performance improvement has been measured to be ~12-35% -- when a
      reasonable chunk_size is used (e.g. 64K) in conjunction with a stripe
      count that is a power of 2.
      
      In contrast, the performance penalty is ~5-7% for the pathological worst
      case stripe configuration (small chunk_size with a stripe count that is
      not a power of 2).  The reason for this is that stripe_map_sector() is
      now called once for every call to dm_merge_bvec().  stripe_map_sector()
      will use slower division if stripe count isn't a power of 2.
      
      Signed-off-by: default avatarMustafa Mesanovic <mume@linux.vnet.ibm.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      29915202
    • Mike Snitzer's avatar
      dm mpath: allow table load with no priority groups · a490a07a
      Mike Snitzer authored
      
      This patch adjusts the multipath target to allow a table with both 0
      priority groups and 0 for the initial priority group number.
      
      If any mpath device is held open when all paths in the last priority
      group have failed, userspace multipathd will attempt to reload the
      associated DM table to reflect the fact that the device no longer has
      any priority groups.  But the reload attempt always failed because the
      multipath target did not allow 0 priority groups.
      
      All multipath target messages related to priority group (enable_group,
      disable_group, switch_group) will handle a priority group of 0 (will
      cause error).
      
      When reloading a multipath table with 0 priority groups, userspace
      multipathd must be updated to specify an initial priority group number
      of 0 (rather than 1).
      
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: Babu Moger <babu.moger@lsi.com>
      Acked-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      a490a07a
    • Mike Snitzer's avatar
      dm mpath: fail message ioctl if specified path is not valid · 19040c0b
      Mike Snitzer authored
      
      Fail the reinstate_path and fail_path message ioctl if the specified
      path is not valid.
      
      The message ioctl would succeed for the 'reinistate_path' and
      'fail_path' messages even if action was not taken because the
      specified device was not a valid path of the multipath device.
      
      Before, when /dev/vdb is not a path of mpathb:
      $ dmsetup message mpathb 0 reinstate_path /dev/vdb
      $ echo $?
      0
      
      After:
      $ dmsetup message mpathb 0 reinstate_path /dev/vdb
      device-mapper: message ioctl failed: Invalid argument
      Command failed
      $ echo $?
      1
      
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      19040c0b
    • Milan Broz's avatar
      dm ioctl: add flag to wipe buffers for secure data · f8681205
      Milan Broz authored
      
      Add DM_SECURE_DATA_FLAG which userspace can use to ensure
      that all buffers allocated for dm-ioctl are wiped
      immediately after use.
      
      The user buffer is wiped as well (we do not want to keep
      and return sensitive data back to userspace if the flag is set).
      
      Wiping is useful for cryptsetup to ensure that the key
      is present in memory only in defined places and only
      for the time needed.
      
      (For crypt, key can be present in table during load or table
      status, wait and message commands).
      
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      f8681205
    • Milan Broz's avatar
      dm ioctl: prepare for crypt key wiping · 6bb43b5d
      Milan Broz authored
      
      Prepare code for implementing buffer wipe flag.
      No functional change in this patch.
      
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      6bb43b5d
    • Milan Broz's avatar
      dm crypt: wipe keys string immediately after key is set · de8be5ac
      Milan Broz authored
      
      Always wipe the original copy of the key after processing it
      in crypt_set_key().
      
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      de8be5ac
    • Josef Bacik's avatar
      dm: add flakey target · 3407ef52
      Josef Bacik authored
      
      This target is the same as the linear target except that it returns I/O
      errors periodically.  It's been found useful in simulating failing
      devices for testing purposes.
      
      I needed a dm target to do some failure testing on btrfs's raid code, and
      Mike pointed me at this.
      
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      3407ef52
    • Milan Broz's avatar
      dm: fix opening log and cow devices for read only tables · 024d37e9
      Milan Broz authored
      
      If a table is read-only, also open any log and cow devices it uses read-only.
      
      Previously, even read-only devices were opened read-write internally.
      After patch 75f1dc0d
        block: check bdev_read_only() from blkdev_get()
      was applied, loading such tables began to fail.  The patch
      was reverted by e51900f7
        block: revert block_dev read-only check
      but this patch fixes this part of the code to work with the original patch.
      
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      024d37e9
  8. Mar 23, 2011
  9. Mar 22, 2011
  10. Mar 17, 2011
  11. Mar 10, 2011
    • Jens Axboe's avatar
      block: kill off REQ_UNPLUG · 721a9602
      Jens Axboe authored
      
      With the plugging now being explicitly controlled by the
      submitter, callers need not pass down unplugging hints
      to the block layer. If they want to unplug, it's because they
      manually plugged on their own - in which case, they should just
      unplug at will.
      
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      721a9602
    • Jens Axboe's avatar
      block: remove per-queue plugging · 7eaceacc
      Jens Axboe authored
      
      Code has been converted over to the new explicit on-stack plugging,
      and delay users have been converted to use the new API for that.
      So lets kill off the old plugging along with aops->sync_page().
      
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      7eaceacc
  12. Mar 03, 2011
  13. Feb 23, 2011
    • NeilBrown's avatar
      md: Fix - again - partition detection when array becomes active · f0b4f7e2
      NeilBrown authored
      
      Revert
          b821eaa5
      and
          f3b99be1
      
      When I wrote the first of these I had a wrong idea about the
      lifetime of 'struct block_device'.  It can disappear at any time that
      the block device is not open if it falls out of the inode cache.
      
      So relying on the 'size' recorded with it to detect when the
      device size has changed and so we need to revalidate, is wrong.
      
      Rather, we really do need the 'changed' attribute stored directly in
      the mddev and set/tested as appropriate.
      
      Without this patch, a sequence of:
         mknod / open / close / unlink
      
      (which can cause a block_device to be created and then destroyed)
      will result in a rescan of the partition table and consequence removal
      and addition of partitions.
      Several of these in a row can get udev racing to create and unlink and
      other code can get confused.
      
      With the patch, the rescan is only performed when needed and so there
      are no races.
      
      This is suitable for any stable kernel from 2.6.35.
      
      Reported-by: default avatar"Wojcik, Krzysztof" <krzysztof.wojcik@intel.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Cc: stable@kernel.org
      f0b4f7e2
  14. Feb 21, 2011
    • NeilBrown's avatar
      md: avoid spinlock problem in blk_throtl_exit · da9cf505
      NeilBrown authored
      
      blk_throtl_exit assumes that ->queue_lock still exists,
      so make sure that it does.
      To do this, we stop redirecting ->queue_lock to conf->device_lock
      and leave it pointing where it is initialised - __queue_lock.
      
      As the blk_plug functions check the ->queue_lock is held, we now
      take that spin_lock explicitly around the plug functions.  We don't
      need the locking, just the warning removal.
      
      This is needed for any kernel with the blk_throtl code, which is
      which is 2.6.37 and later.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      da9cf505
  15. Feb 15, 2011
    • NeilBrown's avatar
      md: correctly handle probe of an 'mdp' device. · 8f5f02c4
      NeilBrown authored
      'mdp' devices are md devices with preallocated device numbers
      for partitions. As such it is possible to mknod and open a partition
      before opening the whole device.
      
      this causes  md_probe() to be called with a device number of a
      partition, which in-turn calls mddev_find with such a number.
      
      However mddev_find expects the number of a 'whole device' and
      does the wrong thing with partition numbers.
      
      So add code to mddev_find to remove the 'partition' part of
      a device number and just work with the 'whole device'.
      
      This patch addresses https://bugzilla.kernel.org/show_bug.cgi?id=28652
      
      
      
      Reported-by: default avatar <hkmaly@bigfoot.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Cc: <stable@kernel.org>
      8f5f02c4
    • NeilBrown's avatar
      md: don't set_capacity before array is active. · cbe6ef1d
      NeilBrown authored
      
      If the desired size of an array is set (via sysfs) before the array is
      active (which is the normal sequence), we currrently call set_capacity
      immediately.
      This means that a subsequent 'open' (as can be caused by some
      udev-triggers program) will notice the new size and try to probe for
      partitions.  However as the array isn't quite ready yet the read will
      fail.  Then when the array is read, as the size doesn't change again
      we don't try to re-probe.
      
      So when setting array size via sysfs, only call set_capacity if the
      array is already active.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      cbe6ef1d
  16. Feb 13, 2011
  17. Feb 12, 2011
  18. Feb 07, 2011
    • Krzysztof Wojcik's avatar
      FIX: md: process hangs at wait_barrier after 0->10 takeover · 02214dc5
      Krzysztof Wojcik authored
      
      Following symptoms were observed:
      1. After raid0->raid10 takeover operation we have array with 2
      missing disks.
      When we add disk for rebuild, recovery process starts as expected
      but it does not finish- it stops at about 90%, md126_resync process
      hangs in "D" state.
      2. Similar behavior is when we have mounted raid0 array and we
      execute takeover to raid10. After this when we try to unmount array-
      it causes process umount hangs in "D"
      
      In scenarios above processes hang at the same function- wait_barrier
      in raid10.c.
      Process waits in macro "wait_event_lock_irq" until the
      "!conf->barrier" condition will be true.
      In scenarios above it never happens.
      
      Reason was that at the end of level_store, after calling pers->run,
      we call mddev_resume. This calls pers->quiesce(mddev, 0) with
      RAID10, that calls lower_barrier.
      However raise_barrier hadn't been called on that 'conf' yet,
      so conf->barrier becomes negative, which is bad.
      
      This patch introduces setting conf->barrier=1 after takeover
      operation. It prevents to become barrier negative after call
      lower_barrier().
      
      Signed-off-by: default avatarKrzysztof Wojcik <krzysztof.wojcik@intel.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      02214dc5
    • Chris Mason's avatar
      md_make_request: don't touch the bio after calling make_request · e91ece55
      Chris Mason authored
      
      md_make_request was calling bio_sectors() for part_stat_add
      after it was calling the make_request function.  This is
      bad because the make_request function can free the bio and
      because the bi_size field can change around.
      
      The fix here was suggested by Jens Axboe.  It saves the
      sector count before the make_request call.  I hit this
      with CONFIG_DEBUG_PAGEALLOC turned on while trying to break
      his pretty fusionio card.
      
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      e91ece55
  19. Feb 01, 2011
  20. Jan 30, 2011
Loading