1. 07 Jan, 2016 18 commits
    • David Sterba's avatar
      btrfs: allocate root item at snapshot ioctl time · b0c0ea63
      David Sterba authored
      The actual snapshot creation is delayed until transaction commit. If we
      cannot get enough memory for the root item there, we have to fail the
      whole transaction commit which is bad. So we'll allocate the memory at
      the ioctl call and pass it along with the pending_snapshot struct. The
      potential ENOMEM will be returned to the caller of snapshot ioctl.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    • David Sterba's avatar
      btrfs: do an allocation earlier during snapshot creation · a1ee7362
      David Sterba authored
      We can allocate pending_snapshot earlier and do not have to do cleanup
      in case of failure.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    • David Sterba's avatar
      btrfs: use smaller type for btrfs_path locks · 4fb72bf2
      David Sterba authored
      The values of btrfs_path::locks are 0 to 4, fit into a u8. Let's see:
      * overall size of btrfs_path drops down from 136 to 112 (-24 bytes),
      * better packing in a slab page +6 objects
      * the whole structure now fits to 2 cachelines
      * slight decrease in code size:
         text    data     bss     dec     hex filename
       938731   43670   23144 1005545   f57e9 fs/btrfs/btrfs.ko.before
       938203   43670   23144 1005017   f55d9 fs/btrfs/btrfs.ko.after
      (and the generated assembly does not change much)
      The main purpose is to decrease the size of the structure without
      affecting performance. The byte access is usually well behaving accross
      arches, the locks are not accessed frequently and sometimes just
      compared to zero.
      Note for further size reduction attempts: the slots could be made u16
      but this might generate worse code on some arches (non-byte and non-int
      access). Also the range of operations on slots is wider compared to
      locks and the potential performance drop should be evaluated first.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    • David Sterba's avatar
      btrfs: use smaller type for btrfs_path lowest_level · 7853f15b
      David Sterba authored
      The level is 0..7, we can use smaller type. The size of btrfs_path is now
      136 bytes from 144, which is +2 objects that fit into a 4k slab.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    • David Sterba's avatar
      btrfs: use smaller type for btrfs_path reada · dccabfad
      David Sterba authored
      The possible values for reada are all positive and bounded, we can later
      save some bytes by storing it in u8.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    • David Sterba's avatar
      btrfs: cleanup, use enum values for btrfs_path reada · e4058b54
      David Sterba authored
      Replace the integers by enums for better readability. The value 2 does
      not have any meaning since a7175319
      "Btrfs: do less aggressive btree readahead" (2009-01-22).
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    • David Sterba's avatar
      btrfs: constify static arrays · 4d4ab6d6
      David Sterba authored
      There are a few statically initialized arrays that can be made const.
      The remaining (like file_system_type, sysfs attributes or prop handlers)
      do not allow that due to type mismatch when passed to the APIs or
      because the structures are modified through other members.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    • David Sterba's avatar
      btrfs: constify remaining structs with function pointers · 20e5506b
      David Sterba authored
      * struct extent_io_ops
      * struct btrfs_free_space_op
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    • David Sterba's avatar
      btrfs tests: replace whole ops structure for free space tests · 28f0779a
      David Sterba authored
      Preparatory work for making btrfs_free_space_op constant. In
      test_steal_space_from_bitmap_to_extent, we substitute use_bitmap with
      own version thus preventing constification. We can rework it so we
      replace the whole structure with the correct function pointers.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    • David Sterba's avatar
      btrfs: don't use slab cache for struct btrfs_delalloc_work · 100d5702
      David Sterba authored
      Although we prefer to use separate caches for various structs, it seems
      better not to do that for struct btrfs_delalloc_work. Objects of this
      type are allocated rarely, when transaction commit calls
      btrfs_start_delalloc_roots, requesting delayed iputs.
      The objects are temporary (with some IO involved) but still allocated
      and freed within __start_delalloc_inodes. Memory allocation failure is
      The slab cache is empty most of the time (observed on several systems),
      so if we need to allocate a new slab object, the first one has to
      allocate a full page. In a potential case of low memory conditions this
      might fail with higher probability compared to using the generic slab
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    • David Sterba's avatar
      btrfs: drop duplicate prefix from scrub workqueues · 0de270fa
      David Sterba authored
      The helper btrfs_alloc_workqueue will add the "btrfs-" prefix.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    • David Sterba's avatar
    • David Sterba's avatar
      btrfs: handle invalid num_stripes in sys_array · f5cdedd7
      David Sterba authored
      We can handle the special case of num_stripes == 0 directly inside
      btrfs_read_sys_array. The BUG_ON in btrfs_chunk_item_size is there to
      catch other unhandled cases where we fail to validate external data.
      A crafted or corrupted image crashes at mount time:
      BTRFS: device fsid 9006933e-2a9a-44f0-917f-514252aeec2c devid 1 transid 7 /dev/loop0
      BTRFS info (device loop0): disk space caching is enabled
      BUG: failure at fs/btrfs/ctree.h:337/btrfs_chunk_item_size()!
      Kernel panic - not syncing: BUG!
      CPU: 0 PID: 313 Comm: mount Not tainted 4.2.5-00657-ge047887-dirty #25
       637af890 60062489 602aeb2e 604192ba
       60387961 00000011 637af8a0 6038a835
       637af9c0 6038776b 634ef32b 00000000
      Call Trace:
       [<6001c86d>] show_stack+0xfe/0x15b
       [<6038a835>] dump_stack+0x2a/0x2c
       [<6038776b>] panic+0x13e/0x2b3
       [<6020f099>] btrfs_read_sys_array+0x25d/0x2ff
       [<601cfbbe>] open_ctree+0x192d/0x27af
       [<6019c2c1>] btrfs_mount+0x8f5/0xb9a
       [<600bc9a7>] mount_fs+0x11/0xf3
       [<600d5167>] vfs_kern_mount+0x75/0x11a
       [<6019bcb0>] btrfs_mount+0x2e4/0xb9a
       [<600bc9a7>] mount_fs+0x11/0xf3
       [<600d5167>] vfs_kern_mount+0x75/0x11a
       [<600d710b>] do_mount+0xa35/0xbc9
       [<600d7557>] SyS_mount+0x95/0xc8
       [<6001e884>] handle_syscall+0x6b/0x8e
      Reported-by: default avatarJiri Slaby <jslaby@suse.com>
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      CC: stable@vger.kernel.org	# 3.19+
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    • David Sterba's avatar
      btrfs: better packing of btrfs_delayed_extent_op · 35b3ad50
      David Sterba authored
      btrfs_delayed_extent_op can be packed in a better way, it's 40 bytes now
      and has 8 unused bytes. Reducing the level type to u8 makes it possible
      to squeeze it to the padding byte after key. The bitfields were switched
      to bool as there's space to store the full byte without increasing the
      whole structure, besides that the generated assembly is smaller.
      struct btrfs_delayed_extent_op {
      	struct btrfs_disk_key      key;                  /*     0    17 */
      	u8                         level;                /*    17     1 */
      	bool                       update_key;           /*    18     1 */
      	bool                       update_flags;         /*    19     1 */
      	bool                       is_data;              /*    20     1 */
      	/* XXX 3 bytes hole, try to pack */
      	u64                        flags_to_set;         /*    24     8 */
      	/* size: 32, cachelines: 1, members: 6 */
      	/* sum members: 29, holes: 1, sum holes: 3 */
      	/* last cacheline: 32 bytes */
      The final size is 32 bytes which gives +26 object per slab page.
         text	   data	    bss	    dec	    hex	filename
       938811	  43670	  23144	1005625	  f5839	fs/btrfs/btrfs.ko.before
       938747	  43670	  23144	1005561	  f57f9	fs/btrfs/btrfs.ko.after
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    • David Sterba's avatar
      btrfs: put delayed item hook into inode · 8089fe62
      David Sterba authored
      Inodes for delayed iput allocate a trivial helper structure, let's place
      the list hook directly into the inode and save a kmalloc (killing a
      __GFP_NOFAIL as a bonus) at the cost of increasing size of btrfs_inode.
      The inode can be put into the delayed_iputs list more than once and we
      have to keep the count. This means we can't use the list_splice to
      process a bunch of inodes because we'd lost track of the count if the
      inode is put into the delayed iputs again while it's processed.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    • Zhao Lei's avatar
      btrfs: Support convert to -d dup for btrfs-convert · c5ca8781
      Zhao Lei authored
      Since we will add support for -d dup for non-mixed filesystem,
      kernel need to support converting to this raid-type.
      This patch remove limitation of above case.
      Tested by following script:
      (combination of dup conversion with fsck):
      export TEST_DEV='/dev/vdc'
      export TEST_DIR='/var/ltf/tester/mnt'
          local m_from="$1"
          local d_from="$2"
          local m_to="$3"
          local d_to="$4"
          echo "Convert from -m $m_from -d $d_from to -m $m_to -d $d_to"
          umount "$TEST_DIR" &>/dev/null
          ./mkfs.btrfs -f -m "$m_from" -d "$d_from" "$TEST_DEV" >/dev/null || return 1
          mount "$TEST_DEV" "$TEST_DIR" || return 1
          cp -a /sbin/* "$TEST_DIR"
          [[ "$m_from" != "$m_to" ]] && {
              ./btrfs balance start -f -mconvert="$m_to" "$TEST_DIR" || return 1
          [[ "$d_from" != "$d_to" ]] && {
      	local opt=()
      	[[ "$d_to" == single ]] && opt+=("-f")
              ./btrfs balance start "${opt[@]}" -dconvert="$d_to" "$TEST_DIR" || return 1
          umount "$TEST_DIR" || return 1
          ./btrfsck "$TEST_DEV" || return 1
          return 0
          for m_from in single dup; do
          for d_from in single dup; do
          for m_to in single dup; do
          for d_to in single dup; do
          do_dup_test "$m_from" "$d_from" "$m_to" "$d_to" || return 1
      Signed-off-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    • Josef Bacik's avatar
      Btrfs: igrab inode in writepage · be7bd730
      Josef Bacik authored
      We hit this panic on a few of our boxes this week where we have an
      ordered_extent with an NULL inode.  We do an igrab() of the inode in writepages,
      but weren't doing it in writepage which can be called directly from the VM on
      dirty pages.  If the inode has been unlinked then we could have I_FREEING set
      which means igrab() would return NULL and we get this panic.  Fix this by trying
      to igrab in btrfs_writepage, and if it returns NULL then just redirty the page
      and return AOP_WRITEPAGE_ACTIVATE; so the VM knows it wasn't successful.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    • Anand Jain's avatar
      Btrfs: add missing brelse when superblock checksum fails · b2acdddf
      Anand Jain authored
      Looks like oversight, call brelse() when checksum fails. Further down the
      code, in the non error path, we do call brelse() and so we don't see
      brelse() in the goto error paths.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
  2. 20 Dec, 2015 4 commits
    • Linus Torvalds's avatar
      Linux 4.4-rc6 · 4ef76753
      Linus Torvalds authored
    • Linus Torvalds's avatar
      Merge tag 'rtc-4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · 9f7e4327
      Linus Torvalds authored
      Pull RTC fixes from Alexandre Belloni:
       "Late fixes for the RTC subsystem for 4.4:
        A fix for a nasty hardware bug in rk808 and an initialization
        reordering in da9063 to fix a possible crash"
      * tag 'rtc-4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
        rtc: da9063: fix access ordering error during RTC interrupt at system power on
        rtc: rk808: Compensate for Rockchip calendar deviation on November 31st
    • Steve Twiss's avatar
      rtc: da9063: fix access ordering error during RTC interrupt at system power on · 77535ace
      Steve Twiss authored
      This fix alters the ordering of the IRQ and device registrations in the RTC
      driver probe function. This change will apply to the RTC driver that supports
      both DA9063 and DA9062 PMICs.
      A problem could occur with the existing RTC driver if:
      A system is started from a cold boot using the PMIC RTC IRQ to initiate a
      power on operation. For instance, if an RTC alarm is used to start a
      platform from power off.
      The existing driver IRQ is requested before the device has been properly
          ret = devm_request_threaded_irq()
      comes before
          rtc->rtc_dev = devm_rtc_device_register();
      In this case, the interrupt can be called before the device has been
      registered and the handler can be called immediately. The IRQ handler
      da9063_alarm_event() contains the function call
          rtc_update_irq(rtc->rtc_dev, 1, RTC_IRQF | RTC_AF);
      which in turn tries to access the unavailable rtc->rtc_dev.
      The fix is to reorder the functions inside the RTC probe. The IRQ is
      requested after the RTC device resource has been registered so that
      get_irq_byname is the last thing to happen.
      Signed-off-by: default avatarSteve Twiss <stwiss.opensource@diasemi.com>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
    • Julius Werner's avatar
      rtc: rk808: Compensate for Rockchip calendar deviation on November 31st · f076ef44
      Julius Werner authored
      In A.D. 1582 Pope Gregory XIII found that the existing Julian calendar
      insufficiently represented reality, and changed the rules about
      calculating leap years to account for this. Similarly, in A.D. 2013
      Rockchip hardware engineers found that the new Gregorian calendar still
      contained flaws, and that the month of November should be counted up to
      31 days instead. Unfortunately it takes a long time for calendar changes
      to gain widespread adoption, and just like more than 300 years went by
      before the last Protestant nation implemented Greg's proposal, we will
      have to wait a while until all religions and operating system kernels
      acknowledge the inherent advantages of the Rockchip system. Until then
      we need to translate dates read from (and written to) Rockchip hardware
      back to the Gregorian format.
      This patch works by defining Jan 1st, 2016 as the arbitrary anchor date
      on which Rockchip and Gregorian calendars are in sync. From that we can
      translate arbitrary later dates back and forth by counting the number
      of November/December transitons since the anchor date to determine the
      offset between the calendars. We choose this method (rather than trying
      to regularly "correct" the date stored in hardware) since it's the only
      way to ensure perfect time-keeping even if the system may be shut down
      for an unknown number of years. The drawback is that other software
      reading the same hardware (e.g. mainboard firmware) must use the same
      translation convention (including the same anchor date) to be able to
      read and write correct timestamps from/to the RTC.
      Signed-off-by: default avatarJulius Werner <jwerner@chromium.org>
      Reviewed-by: default avatarDouglas Anderson <dianders@chromium.org>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
  3. 19 Dec, 2015 10 commits
  4. 18 Dec, 2015 8 commits