1. 09 Sep, 2016 1 commit
  2. 08 Sep, 2016 1 commit
  3. 31 Aug, 2016 2 commits
    • Shaohua Li's avatar
      raid5: guarantee enough stripes to avoid reshape hang · ad5b0f76
      Shaohua Li authored
      If there aren't enough stripes, reshape will hang. We have a check for
      this in new reshape, but miss it for reshape resume, hence we could see
      hang in reshape resume. This patch forces enough stripes existed if
      reshape resumes.
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      ad5b0f76
    • Shaohua Li's avatar
      raid5-cache: fix a deadlock in superblock write · 8e018c21
      Shaohua Li authored
      There is a potential deadlock in superblock write. Discard could zero data, so
      before discard we must make sure superblock is updated to new log tail.
      Updating superblock (either directly call md_update_sb() or depend on md
      thread) must hold reconfig mutex. On the other hand, raid5_quiesce is called
      with reconfig_mutex hold. The first step of raid5_quiesce() is waitting for all
      IO finish, hence waitting for reclaim thread, while reclaim thread is calling
      this function and waitting for reconfig mutex. So there is a deadlock. We
      workaround this issue with a trylock. The downside of the solution is we could
      miss discard if we can't take reconfig mutex. But this should happen rarely
      (mainly in raid array stop), so miss discard shouldn't be a big problem.
      
      Cc: NeilBrown <neilb@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      8e018c21
  4. 30 Aug, 2016 6 commits
  5. 24 Aug, 2016 7 commits
  6. 18 Aug, 2016 3 commits
    • Eric Wheeler's avatar
      bcache: pr_err: more meaningful error message when nr_stripes is invalid · 90706094
      Eric Wheeler authored
      The original error was thought to be corruption, but was actually caused by:
      	make-bcache --data-offset N
      where N was in bytes and should have been in sectors.  While userspace
      tools should be updated to check --data-offset beyond end of volume,
      hopefully this will help others that might not have noticed the units.
      Signed-off-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      90706094
    • Kent Overstreet's avatar
      bcache: RESERVE_PRIO is too small by one when prio_buckets() is a power of two. · acc9cf8c
      Kent Overstreet authored
      This patch fixes a cachedev registration-time allocation deadlock.
      This can deadlock on boot if your initrd auto-registeres bcache devices:
      
      Allocator thread:
      [  720.727614] INFO: task bcache_allocato:3833 blocked for more than 120 seconds.
      [  720.732361]  [<ffffffff816eeac7>] schedule+0x37/0x90
      [  720.732963]  [<ffffffffa05192b8>] bch_bucket_alloc+0x188/0x360 [bcache]
      [  720.733538]  [<ffffffff810e6950>] ? prepare_to_wait_event+0xf0/0xf0
      [  720.734137]  [<ffffffffa05302bd>] bch_prio_write+0x19d/0x340 [bcache]
      [  720.734715]  [<ffffffffa05190bf>] bch_allocator_thread+0x3ff/0x470 [bcache]
      [  720.735311]  [<ffffffff816ee41c>] ? __schedule+0x2dc/0x950
      [  720.735884]  [<ffffffffa0518cc0>] ? invalidate_buckets+0x980/0x980 [bcache]
      
      Registration thread:
      [  720.710403] INFO: task bash:3531 blocked for more than 120 seconds.
      [  720.715226]  [<ffffffff816eeac7>] schedule+0x37/0x90
      [  720.715805]  [<ffffffffa05235cd>] __bch_btree_map_nodes+0x12d/0x150 [bcache]
      [  720.716409]  [<ffffffffa0522d30>] ? bch_btree_insert_check_key+0x1c0/0x1c0 [bcache]
      [  720.717008]  [<ffffffffa05236e4>] bch_btree_insert+0xf4/0x170 [bcache]
      [  720.717586]  [<ffffffff810e6950>] ? prepare_to_wait_event+0xf0/0xf0
      [  720.718191]  [<ffffffffa0527d9a>] bch_journal_replay+0x14a/0x290 [bcache]
      [  720.718766]  [<ffffffff810cc90d>] ? ttwu_do_activate.constprop.94+0x5d/0x70
      [  720.719369]  [<ffffffff810cf684>] ? try_to_wake_up+0x1d4/0x350
      [  720.719968]  [<ffffffffa05317d0>] run_cache_set+0x580/0x8e0 [bcache]
      [  720.720553]  [<ffffffffa053302e>] register_bcache+0xe2e/0x13b0 [bcache]
      [  720.721153]  [<ffffffff81354cef>] kobj_attr_store+0xf/0x20
      [  720.721730]  [<ffffffff812a2dad>] sysfs_kf_write+0x3d/0x50
      [  720.722327]  [<ffffffff812a225a>] kernfs_fop_write+0x12a/0x180
      [  720.722904]  [<ffffffff81225177>] __vfs_write+0x37/0x110
      [  720.723503]  [<ffffffff81228048>] ? __sb_start_write+0x58/0x110
      [  720.724100]  [<ffffffff812cedb3>] ? security_file_permission+0x23/0xa0
      [  720.724675]  [<ffffffff812258a9>] vfs_write+0xa9/0x1b0
      [  720.725275]  [<ffffffff8102479c>] ? do_audit_syscall_entry+0x6c/0x70
      [  720.725849]  [<ffffffff81226755>] SyS_write+0x55/0xd0
      [  720.726451]  [<ffffffff8106a390>] ? do_page_fault+0x30/0x80
      [  720.727045]  [<ffffffff816f2cae>] system_call_fastpath+0x12/0x71
      
      The fifo code in upstream bcache can't use the last element in the buffer,
      which was the cause of the bug: if you asked for a power of two size,
      it'd give you a fifo that could hold one less than what you asked for
      rather than allocating a buffer twice as big.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: stable@vger.kernel.org
      acc9cf8c
    • Eric Wheeler's avatar
      bcache: register_bcache(): call blkdev_put() when cache_alloc() fails · d9dc1702
      Eric Wheeler authored
      register_cache() is supposed to return an error string on error so that
      register_bcache() will will blkdev_put and cleanup other user counters,
      but it does not set 'char *err' when cache_alloc() fails (eg, due to
      memory pressure) and thus register_bcache() performs no cleanup.
      
      register_bcache() <----------\  <- no jump to err_close, no blkdev_put()
         |                         |
         +->register_cache()       |  <- fails to set char *err
               |                   |
               +->cache_alloc() ---/  <- returns error
      
      This patch sets `char *err` for this failure case so that register_cache()
      will cause register_bcache() to correctly jump to err_close and do
      cleanup.  This was tested under OOM conditions that triggered the bug.
      Signed-off-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: stable@vger.kernel.org
      d9dc1702
  7. 17 Aug, 2016 3 commits
  8. 16 Aug, 2016 4 commits
  9. 15 Aug, 2016 2 commits
  10. 07 Aug, 2016 1 commit
    • Jens Axboe's avatar
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe authored
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      1eff9d32
  11. 05 Aug, 2016 2 commits
    • Alexey Obitotskiy's avatar
      md: Prevent IO hold during accessing to faulty raid5 array · 11367799
      Alexey Obitotskiy authored
      After array enters in faulty state (e.g. number of failed drives
      becomes more then accepted for raid5 level) it sets error flags
      (one of this flags is MD_CHANGE_PENDING). For internal metadata
      arrays MD_CHANGE_PENDING cleared into md_update_sb, but not for
      external metadata arrays. MD_CHANGE_PENDING flag set prevents to
      finish all new or non-finished IOs to array and hold them in
      pending state. In some cases this can leads to deadlock situation.
      
      For example, we have faulty array (2 of 4 drives failed) and
      udev handle array state changes and blkid started (or other
      userspace application that used array to read/write) but unable
      to finish reads due to IO hold. At the same time we unable to get
      exclusive access to array (to stop array in our case) because
      another external application still use this array.
      
      Fix makes possible to return IO with errors immediately.
      So external application can finish working with array and
      give exclusive access to other applications to perform
      required management actions with array.
      Signed-off-by: default avatarAlexey Obitotskiy <aleksey.obitotskiy@intel.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      11367799
    • Shaohua Li's avatar
      MD: hold mddev lock to change bitmap location · d9dd26b2
      Shaohua Li authored
      Changing the location changes a lot of things. Holding the lock to avoid race.
      This makes the .quiesce called with mddev lock hold too.
      Acked-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      d9dd26b2
  12. 04 Aug, 2016 1 commit
  13. 03 Aug, 2016 2 commits
  14. 02 Aug, 2016 4 commits
  15. 01 Aug, 2016 1 commit
    • ZhengYuan Liu's avatar
      raid5: fix incorrectly counter of conf->empty_inactive_list_nr · ff00d3b4
      ZhengYuan Liu authored
      The counter conf->empty_inactive_list_nr is only used for determine if the
      raid5 is congested which is deal with in function raid5_congested().
      It was increased in get_free_stripe() when conf->inactive_list got to be
      empty and decreased in release_inactive_stripe_list() when splice
      temp_inactive_list to conf->inactive_list. However, this may have a
      problem when raid5_get_active_stripe or stripe_add_to_batch_list was called,
      because these two functions may call list_del_init(&sh->lru) to delete sh from
      "conf->inactive_list + hash" which may cause "conf->inactive_list + hash" to
      be empty when atomic_inc_not_zero(&sh->count) got false. So a check should be
      done at these two point and increase empty_inactive_list_nr accordingly.
      Otherwise the counter may get to be negative number which would influence
      async readahead from VFS.
      Signed-off-by: default avatarZhengYuan Liu <liuzhengyuan@kylinos.cn>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      ff00d3b4