1. 18 Aug, 2016 3 commits
    • Eric Wheeler's avatar
      bcache: pr_err: more meaningful error message when nr_stripes is invalid · 90706094
      Eric Wheeler authored
      The original error was thought to be corruption, but was actually caused by:
      	make-bcache --data-offset N
      where N was in bytes and should have been in sectors.  While userspace
      tools should be updated to check --data-offset beyond end of volume,
      hopefully this will help others that might not have noticed the units.
      Signed-off-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      90706094
    • Kent Overstreet's avatar
      bcache: RESERVE_PRIO is too small by one when prio_buckets() is a power of two. · acc9cf8c
      Kent Overstreet authored
      This patch fixes a cachedev registration-time allocation deadlock.
      This can deadlock on boot if your initrd auto-registeres bcache devices:
      
      Allocator thread:
      [  720.727614] INFO: task bcache_allocato:3833 blocked for more than 120 seconds.
      [  720.732361]  [<ffffffff816eeac7>] schedule+0x37/0x90
      [  720.732963]  [<ffffffffa05192b8>] bch_bucket_alloc+0x188/0x360 [bcache]
      [  720.733538]  [<ffffffff810e6950>] ? prepare_to_wait_event+0xf0/0xf0
      [  720.734137]  [<ffffffffa05302bd>] bch_prio_write+0x19d/0x340 [bcache]
      [  720.734715]  [<ffffffffa05190bf>] bch_allocator_thread+0x3ff/0x470 [bcache]
      [  720.735311]  [<ffffffff816ee41c>] ? __schedule+0x2dc/0x950
      [  720.735884]  [<ffffffffa0518cc0>] ? invalidate_buckets+0x980/0x980 [bcache]
      
      Registration thread:
      [  720.710403] INFO: task bash:3531 blocked for more than 120 seconds.
      [  720.715226]  [<ffffffff816eeac7>] schedule+0x37/0x90
      [  720.715805]  [<ffffffffa05235cd>] __bch_btree_map_nodes+0x12d/0x150 [bcache]
      [  720.716409]  [<ffffffffa0522d30>] ? bch_btree_insert_check_key+0x1c0/0x1c0 [bcache]
      [  720.717008]  [<ffffffffa05236e4>] bch_btree_insert+0xf4/0x170 [bcache]
      [  720.717586]  [<ffffffff810e6950>] ? prepare_to_wait_event+0xf0/0xf0
      [  720.718191]  [<ffffffffa0527d9a>] bch_journal_replay+0x14a/0x290 [bcache]
      [  720.718766]  [<ffffffff810cc90d>] ? ttwu_do_activate.constprop.94+0x5d/0x70
      [  720.719369]  [<ffffffff810cf684>] ? try_to_wake_up+0x1d4/0x350
      [  720.719968]  [<ffffffffa05317d0>] run_cache_set+0x580/0x8e0 [bcache]
      [  720.720553]  [<ffffffffa053302e>] register_bcache+0xe2e/0x13b0 [bcache]
      [  720.721153]  [<ffffffff81354cef>] kobj_attr_store+0xf/0x20
      [  720.721730]  [<ffffffff812a2dad>] sysfs_kf_write+0x3d/0x50
      [  720.722327]  [<ffffffff812a225a>] kernfs_fop_write+0x12a/0x180
      [  720.722904]  [<ffffffff81225177>] __vfs_write+0x37/0x110
      [  720.723503]  [<ffffffff81228048>] ? __sb_start_write+0x58/0x110
      [  720.724100]  [<ffffffff812cedb3>] ? security_file_permission+0x23/0xa0
      [  720.724675]  [<ffffffff812258a9>] vfs_write+0xa9/0x1b0
      [  720.725275]  [<ffffffff8102479c>] ? do_audit_syscall_entry+0x6c/0x70
      [  720.725849]  [<ffffffff81226755>] SyS_write+0x55/0xd0
      [  720.726451]  [<ffffffff8106a390>] ? do_page_fault+0x30/0x80
      [  720.727045]  [<ffffffff816f2cae>] system_call_fastpath+0x12/0x71
      
      The fifo code in upstream bcache can't use the last element in the buffer,
      which was the cause of the bug: if you asked for a power of two size,
      it'd give you a fifo that could hold one less than what you asked for
      rather than allocating a buffer twice as big.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: stable@vger.kernel.org
      acc9cf8c
    • Eric Wheeler's avatar
      bcache: register_bcache(): call blkdev_put() when cache_alloc() fails · d9dc1702
      Eric Wheeler authored
      register_cache() is supposed to return an error string on error so that
      register_bcache() will will blkdev_put and cleanup other user counters,
      but it does not set 'char *err' when cache_alloc() fails (eg, due to
      memory pressure) and thus register_bcache() performs no cleanup.
      
      register_bcache() <----------\  <- no jump to err_close, no blkdev_put()
         |                         |
         +->register_cache()       |  <- fails to set char *err
               |                   |
               +->cache_alloc() ---/  <- returns error
      
      This patch sets `char *err` for this failure case so that register_cache()
      will cause register_bcache() to correctly jump to err_close and do
      cleanup.  This was tested under OOM conditions that triggered the bug.
      Signed-off-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: stable@vger.kernel.org
      d9dc1702
  2. 07 Aug, 2016 1 commit
    • Jens Axboe's avatar
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe authored
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      1eff9d32
  3. 20 Jul, 2016 1 commit
  4. 05 Jul, 2016 3 commits
  5. 11 Jun, 2016 1 commit
  6. 07 Jun, 2016 4 commits
  7. 24 May, 2016 3 commits
  8. 12 Apr, 2016 1 commit
  9. 08 Mar, 2016 3 commits
  10. 04 Jan, 2016 1 commit
  11. 30 Dec, 2015 8 commits
    • Kent Overstreet's avatar
      bcache: Change refill_dirty() to always scan entire disk if necessary · 627ccd20
      Kent Overstreet authored
      Previously, it would only scan the entire disk if it was starting from
      the very start of the disk - i.e. if the previous scan got to the end.
      
      This was broken by refill_full_stripes(), which updates last_scanned so
      that refill_dirty was never triggering the searched_from_start path.
      
      But if we change refill_dirty() to always scan the entire disk if
      necessary, regardless of what last_scanned was, the code gets cleaner
      and we fix that bug too.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      627ccd20
    • Stefan Bader's avatar
      bcache: prevent crash on changing writeback_running · 8d16ce54
      Stefan Bader authored
      Added a safeguard in the shutdown case. At least while not being
      attached it is also possible to trigger a kernel bug by writing into
      writeback_running. This change  adds the same check before trying to
      wake up the thread for that case.
      Signed-off-by: default avatarStefan Bader <stefan.bader@canonical.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      8d16ce54
    • Gabriel de Perthuis's avatar
      bcache: allows use of register in udev to avoid "device_busy" error. · d7076f21
      Gabriel de Perthuis authored
      Allows to use register, not register_quiet in udev to avoid "device_busy" error.
      The initial patch proposed at https://lkml.org/lkml/2013/8/26/549 by Gabriel de Perthuis
      <g2p.code@gmail.com> does not unlock the mutex and hangs the kernel.
      
      See http://thread.gmane.org/gmane.linux.kernel.bcache.devel/2594 for the discussion.
      
      Cc: Denis Bychkov <manover@gmail.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Eric Wheeler <bcache@linux.ewheeler.net>
      Cc: Gabriel de Perthuis <g2p.code@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      d7076f21
    • Zheng Liu's avatar
      bcache: unregister reboot notifier if bcache fails to unregister device · 2ecf0cdb
      Zheng Liu authored
      In bcache_init() function it forgot to unregister reboot notifier if
      bcache fails to unregister a block device.  This commit fixes this.
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Tested-by: default avatarJoshua Schmid <jschmid@suse.com>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      2ecf0cdb
    • Al Viro's avatar
      bcache: fix a leak in bch_cached_dev_run() · 4d4d8573
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarJoshua Schmid <jschmid@suse.com>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      4d4d8573
    • Zheng Liu's avatar
      bcache: clear BCACHE_DEV_UNLINK_DONE flag when attaching a backing device · fecaee6f
      Zheng Liu authored
      This bug can be reproduced by the following script:
      
        #!/bin/bash
      
        bcache_sysfs="/sys/fs/bcache"
      
        function clear_cache()
        {
        	if [ ! -e $bcache_sysfs ]; then
        		echo "no bcache sysfs"
        		exit
        	fi
      
        	cset_uuid=$(ls -l $bcache_sysfs|head -n 2|tail -n 1|awk '{print $9}')
        	sudo sh -c "echo $cset_uuid > /sys/block/sdb/sdb1/bcache/detach"
        	sleep 5
        	sudo sh -c "echo $cset_uuid > /sys/block/sdb/sdb1/bcache/attach"
        }
      
        for ((i=0;i<10;i++)); do
        	clear_cache
        done
      
      The warning messages look like below:
      [  275.948611] ------------[ cut here ]------------
      [  275.963840] WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xb8/0xd0() (Tainted: P        W
      ---------------   )
      [  275.979253] Hardware name: Tecal RH2285
      [  275.994106] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:09.0/0000:08:00.0/host4/target4:2:1/4:2:1:0/block/sdb/sdb1/bcache/cache'
      [  276.024105] Modules linked in: bcache tcp_diag inet_diag ipmi_devintf ipmi_si ipmi_msghandler
      bonding 8021q garp stp llc ipv6 ext3 jbd loop sg iomemory_vsl(P) bnx2 microcode serio_raw i2c_i801
      i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 jbd2 mbcache megaraid_sas
      pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan]
      [  276.072643] Pid: 2765, comm: sh Tainted: P        W  ---------------    2.6.32 #1
      [  276.089315] Call Trace:
      [  276.105801]  [<ffffffff81070fe7>] ? warn_slowpath_common+0x87/0xc0
      [  276.122650]  [<ffffffff810710d6>] ? warn_slowpath_fmt+0x46/0x50
      [  276.139361]  [<ffffffff81205c08>] ? sysfs_add_one+0xb8/0xd0
      [  276.156012]  [<ffffffff8120609b>] ? sysfs_do_create_link+0x12b/0x170
      [  276.172682]  [<ffffffff81206113>] ? sysfs_create_link+0x13/0x20
      [  276.189282]  [<ffffffffa03bda21>] ? bcache_device_link+0xc1/0x110 [bcache]
      [  276.205993]  [<ffffffffa03bfa08>] ? bch_cached_dev_attach+0x478/0x4f0 [bcache]
      [  276.222794]  [<ffffffffa03c4a17>] ? bch_cached_dev_store+0x627/0x780 [bcache]
      [  276.239680]  [<ffffffff8116783a>] ? alloc_pages_current+0xaa/0x110
      [  276.256594]  [<ffffffff81203b15>] ? sysfs_write_file+0xe5/0x170
      [  276.273364]  [<ffffffff811887b8>] ? vfs_write+0xb8/0x1a0
      [  276.290133]  [<ffffffff811890b1>] ? sys_write+0x51/0x90
      [  276.306368]  [<ffffffff8100c072>] ? system_call_fastpath+0x16/0x1b
      [  276.322301] ---[ end trace 9f5d4fcdd0c3edfb ]---
      [  276.338241] ------------[ cut here ]------------
      [  276.354109] WARNING: at /home/wenqing.lz/bcache/bcache/super.c:720
      bcache_device_link+0xdf/0x110 [bcache]() (Tainted: P        W  ---------------   )
      [  276.386017] Hardware name: Tecal RH2285
      [  276.401430] Couldn't create device <-> cache set symlinks
      [  276.401759] Modules linked in: bcache tcp_diag inet_diag ipmi_devintf ipmi_si ipmi_msghandler
      bonding 8021q garp stp llc ipv6 ext3 jbd loop sg iomemory_vsl(P) bnx2 microcode serio_raw i2c_i801
      i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 jbd2 mbcache megaraid_sas
      pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan]
      [  276.465477] Pid: 2765, comm: sh Tainted: P        W  ---------------    2.6.32 #1
      [  276.482169] Call Trace:
      [  276.498610]  [<ffffffff81070fe7>] ? warn_slowpath_common+0x87/0xc0
      [  276.515405]  [<ffffffff810710d6>] ? warn_slowpath_fmt+0x46/0x50
      [  276.532059]  [<ffffffffa03bda3f>] ? bcache_device_link+0xdf/0x110 [bcache]
      [  276.548808]  [<ffffffffa03bfa08>] ? bch_cached_dev_attach+0x478/0x4f0 [bcache]
      [  276.565569]  [<ffffffffa03c4a17>] ? bch_cached_dev_store+0x627/0x780 [bcache]
      [  276.582418]  [<ffffffff8116783a>] ? alloc_pages_current+0xaa/0x110
      [  276.599341]  [<ffffffff81203b15>] ? sysfs_write_file+0xe5/0x170
      [  276.616142]  [<ffffffff811887b8>] ? vfs_write+0xb8/0x1a0
      [  276.632607]  [<ffffffff811890b1>] ? sys_write+0x51/0x90
      [  276.648671]  [<ffffffff8100c072>] ? system_call_fastpath+0x16/0x1b
      [  276.664756] ---[ end trace 9f5d4fcdd0c3edfc ]---
      
      We forget to clear BCACHE_DEV_UNLINK_DONE flag in bcache_device_attach()
      function when we attach a backing device first time.  After detaching this
      backing device, this flag will be true and sysfs_remove_link() isn't called in
      bcache_device_unlink().  Then when we attach this backing device again,
      sysfs_create_link() will return EEXIST error in bcache_device_link().
      
      So the fix is trival and we clear this flag in bcache_device_link().
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Tested-by: default avatarJoshua Schmid <jschmid@suse.com>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      fecaee6f
    • Kent Overstreet's avatar
      bcache: Add a cond_resched() call to gc · c5f1e5ad
      Kent Overstreet authored
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      c5f1e5ad
    • Zheng Liu's avatar
      bcache: fix a livelock when we cause a huge number of cache misses · 2ef9ccbf
      Zheng Liu authored
      Subject :	[PATCH v2] bcache: fix a livelock in btree lock
      Date :	Wed, 25 Feb 2015 20:32:09 +0800 (02/25/2015 04:32:09 AM)
      
      This commit tries to fix a livelock in bcache.  This livelock might
      happen when we causes a huge number of cache misses simultaneously.
      
      When we get a cache miss, bcache will execute the following path.
      
      ->cached_dev_make_request()
        ->cached_dev_read()
          ->cached_lookup()
            ->bch->btree_map_keys()
              ->btree_root()  <------------------------
                ->bch_btree_map_keys_recurse()        |
                  ->cache_lookup_fn()                 |
                    ->cached_dev_cache_miss()         |
                      ->bch_btree_insert_check_key() -|
                        [If btree->seq is not equal to seq + 1, we should return
                         EINTR and traverse btree again.]
      
      In bch_btree_insert_check_key() function we first need to check upgrade
      flag (op->lock == -1), and when this flag is true we need to release
      read btree->lock and try to take write btree->lock.  During taking and
      releasing this write lock, btree->seq will be monotone increased in
      order to prevent other threads modify this in cache miss (see btree.h:74).
      But if there are some cache misses caused by some requested, we could
      meet a livelock because btree->seq is always changed by others.  Thus no
      one can make progress.
      
      This commit will try to take write btree->lock if it encounters a race
      when we traverse btree.  Although it sacrifice the scalability but we
      can ensure that only one can modify the btree.
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Tested-by: default avatarJoshua Schmid <jschmid@suse.com>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Joshua Schmid <jschmid@suse.com>
      Cc: Zhu Yanhai <zhu.yanhai@gmail.com>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      2ef9ccbf
  12. 07 Nov, 2015 1 commit
  13. 06 Nov, 2015 1 commit
  14. 13 Aug, 2015 1 commit
  15. 29 Jul, 2015 1 commit
    • Christoph Hellwig's avatar
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig authored
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      4246a0b6
  16. 17 Jul, 2015 1 commit
  17. 11 Jul, 2015 1 commit
  18. 30 Jun, 2015 2 commits
  19. 02 Jun, 2015 1 commit
    • Tejun Heo's avatar
      writeback: separate out include/linux/backing-dev-defs.h · 66114cad
      Tejun Heo authored
      With the planned cgroup writeback support, backing-dev related
      declarations will be more widely used across block and cgroup;
      unfortunately, including backing-dev.h from include/linux/blkdev.h
      makes cyclic include dependency quite likely.
      
      This patch separates out backing-dev-defs.h which only has the
      essential definitions and updates blkdev.h to include it.  c files
      which need access to more backing-dev details now include
      backing-dev.h directly.  This takes backing-dev.h off the common
      include dependency chain making it a lot easier to use it across block
      and cgroup.
      
      v2: fs/fat build failure fixed.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      66114cad
  20. 22 May, 2015 1 commit
    • Mike Snitzer's avatar
      block: remove management of bi_remaining when restoring original bi_end_io · 326e1dbb
      Mike Snitzer authored
      Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
      non-chains") regressed all existing callers that followed this pattern:
       1) saving a bio's original bi_end_io
       2) wiring up an intermediate bi_end_io
       3) restoring the original bi_end_io from intermediate bi_end_io
       4) calling bio_endio() to execute the restored original bi_end_io
      
      The regression was due to BIO_CHAIN only ever getting set if
      bio_inc_remaining() is called.  For the above pattern it isn't set until
      step 3 above (step 2 would've needed to establish BIO_CHAIN).  As such
      the first bio_endio(), in step 2 above, never decremented __bi_remaining
      before calling the intermediate bi_end_io -- leaving __bi_remaining with
      the value 1 instead of 0.  When bio_inc_remaining() occurred during step
      3 it brought it to a value of 2.  When the second bio_endio() was
      called, in step 4 above, it should've called the original bi_end_io but
      it didn't because there was an extra reference that wasn't dropped (due
      to atomic operations being optimized away since BIO_CHAIN wasn't set
      upfront).
      
      Fix this issue by removing the __bi_remaining management complexity for
      all callers that use the above pattern -- bio_chain() is the only
      interface that _needs_ to be concerned with __bi_remaining.  For the
      above pattern callers just expect the bi_end_io they set to get called!
      Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
      that aren't associated with the bio_chain() interface.
      
      Also, the bio_inc_remaining() interface has been moved local to bio.c.
      
      Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      326e1dbb
  21. 05 May, 2015 1 commit
    • Jens Axboe's avatar
      bio: skip atomic inc/dec of ->bi_cnt for most use cases · dac56212
      Jens Axboe authored
      Struct bio has a reference count that controls when it can be freed.
      Most uses cases is allocating the bio, which then returns with a
      single reference to it, doing IO, and then dropping that single
      reference. We can remove this atomic_dec_and_test() in the completion
      path, if nobody else is holding a reference to the bio.
      
      If someone does call bio_get() on the bio, then we flag the bio as
      now having valid count and that we must properly honor the reference
      count when it's being put.
      Tested-by: default avatarRobert Elliott <elliott@hp.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      dac56212