1. 15 Mar, 2010 3 commits
    • Miao Xie's avatar
      btrfs: fix btrfs_mkdir goto for no free objectids · 0be2e981
      Miao Xie authored
      btrfs_mkdir() must jump to the place of ending transaction after
      btrfs_find_free_objectid() failed. Or this transaction can't end.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Chris Mason's avatar
      Btrfs: add new defrag-range ioctl. · 1e701a32
      Chris Mason authored
      The btrfs defrag ioctl was limited to doing the entire file.  This
      commit adds a new interface that can defrag a specific range inside
      the file.
      It can also force compression on the file, allowing you to selectively
      compress individual files after they were created, even when mount -o
      compress isn't turned on.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Josef Bacik's avatar
      Btrfs: change how we mount subvolumes · 73f73415
      Josef Bacik authored
      This work is in preperation for being able to set a different root as the
      default mounting root.
      There is currently a problem with how we mount subvolumes.  We cannot currently
      mount a subvolume of a subvolume, you can only mount subvolumes/snapshots of the
      default subvolume.  So say you take a snapshot of the default subvolume and call
      it snap1, and then take a snapshot of snap1 and call it snap2, so now you have
      as your available volumes.  Currently you can only mount / and /snap1,
      you cannot mount /snap1/snap2.  To fix this problem instead of passing
      subvolid=<name> you must pass in subvolid=<treeid>, where <treeid> is
      the tree id that gets spit out via the subvolume listing you get from
      the subvolume listing patches (btrfs filesystem list).  This allows us
      to mount /, /snap1 and /snap1/snap2 as the root volume.
      In addition to the above, we also now read the default dir item in the
      tree root to get the root key that it points to.  For now this just
      points at what has always been the default subvolme, but later on I plan
      to change it to point at whatever root you want to be the new default
      root, so you can just set the default mount and not have to mount with
      -o subvolid=<treeid>.  I tested this out with the above scenario and it
      worked perfectly.  Thanks,
      mount -o subvol operates inside the selected subvolid.  For example:
      mount -o subvol=snap1,subvolid=256 /dev/xxx /mnt
      /mnt will have the snap1 directory for the subvolume with id
      mount -o subvol=snap /dev/xxx /mnt
      /mnt will be the snap directory of whatever the default subvolume
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
  2. 04 Feb, 2010 2 commits
  3. 28 Jan, 2010 3 commits
    • Josef Bacik's avatar
      Btrfs: run orphan cleanup on default fs root · e3acc2a6
      Josef Bacik authored
      This patch revert's commit
      Since it introduces this problem where we can run orphan cleanup on a
      volume that can have orphan entries re-added.  Instead of my original
      fix, Yan Zheng pointed out that we can just revert my original fix and
      then run the orphan cleanup in open_ctree after we look up the fs_root.
      I have tested this with all the tests that gave me problems and this
      patch fixes both problems.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Aneesh Kumar K.V's avatar
      Btrfs: Use correct values when updating inode i_size on fallocate · d1ea6a61
      Aneesh Kumar K.V authored
      commit f2bc9dd07e3424c4ec5f3949961fe053d47bc825
      Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Date:   Wed Jan 20 12:57:53 2010 +0530
          Btrfs: Use correct values when updating inode i_size on fallocate
          Even though we allocate more, we should be updating inode i_size
          as per the arguments passed
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Chris Mason's avatar
      Btrfs: Add mount -o compress-force · a555f810
      Chris Mason authored
      The default btrfs mount -o compress mode will quickly back off
      compressing a file if it notices that compression does not reduce the
      size of the data being written.  This can save considerable CPU because
      all future writes to the file go through uncompressed.
      But some files are both very large and have mixed data stored in
      them.  In that case, we want to add the ability to always try
      compressing data before writing it.
      This commit adds mount -o compress-force.  A later commit will add
      a new inode flag that does the same thing.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
  4. 17 Jan, 2010 2 commits
    • Josef Bacik's avatar
      Btrfs: fix regression in orphan cleanup · 6c090a11
      Josef Bacik authored
      Currently orphan cleanup only ever gets triggered if we cross subvolumes during
      a lookup, which means that if we just mount a plain jane fs that has orphans in
      it, they will never get cleaned up.  This results in panic's like these
      where adding an orphan entry results in -EEXIST being returned and we panic.  In
      order to fix this, we check to see on lookup if our root has had the orphan
      cleanup done, and if not go ahead and do it.  This is easily reproduceable by
      running this testcase
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <string.h>
      #include <unistd.h>
      #include <stdio.h>
      int main(int argc, char **argv)
      	char data[4096];
      	char newdata[4096];
      	int fd1, fd2;
      	memset(data, 'a', 4096);
      	memset(newdata, 'b', 4096);
      	while (1) {
      		int i;
      		fd1 = creat("file1", 0666);
      		if (fd1 < 0)
      		for (i = 0; i < 512; i++)
      			write(fd1, data, 4096);
      		fd2 = creat("file2", 0666);
      		if (fd2 < 0)
      		ftruncate(fd2, 4096 * 512);
      		for (i = 0; i < 512; i++)
      			write(fd2, newdata, 4096);
      		i = rename("file2", "file1");
      	return 0;
      and then pulling the power on the box, and then trying to run that test again
      when the box comes back up.  I've tested this locally and it fixes the problem.
      Thanks to Tomas Carnecky for helping me track this down initially.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Jan Engelhardt's avatar
      btrfs: fix missing last-entry in readdir(3) · 406266ab
      Jan Engelhardt authored
      parent 49313cdac7b34c9f7ecbb1780cfc648b1c082cd7 (v2.6.32-1-g49313cd)
      commit ff48c08e1c05c67e8348ab6f8a24de8034e0e34d
      Author: Jan Engelhardt <jengelh@medozas.de>
      Date:   Wed Dec 9 22:57:36 2009 +0100
      Btrfs: fix missing last-entry in readdir(3)
      When one does a 32-bit readdir(3), the last entry of a directory is
      missing. This is however not due to passing a large value to filldir,
      but it seems to have to do with glibc doing telldir or something
      quirky. In any case, this patch fixes it in practice.
      Signed-off-by: default avatarJan Engelhardt <jengelh@medozas.de>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
  5. 17 Dec, 2009 8 commits
  6. 15 Dec, 2009 1 commit
  7. 11 Nov, 2009 4 commits
  8. 14 Oct, 2009 1 commit
  9. 13 Oct, 2009 1 commit
    • Chris Mason's avatar
      Btrfs: avoid tree log commit when there are no changes · 257c62e1
      Chris Mason authored
      rpm has a habit of running fdatasync when the file hasn't
      changed.  We already detect if a file hasn't been changed
      in the current transaction but it might have been sent to
      the tree-log in this transaction and not changed since
      the last call to fsync.
      In this case, we want to avoid a tree log sync, which includes
      a number of synchronous writes and barriers.  This commit
      extends the existing tracking of the last transaction to change
      a file to also track the last sub-transaction.
      The end result is that rpm -ivh and -Uvh are roughly twice as fast,
      and on par with ext3.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
  10. 09 Oct, 2009 3 commits
  11. 08 Oct, 2009 3 commits
    • Josef Bacik's avatar
      Btrfs: release delalloc reservations on extent item insertion · 32c00aff
      Josef Bacik authored
      This patch fixes an issue with the delalloc metadata space reservation
      code.  The problem is we used to free the reservation as soon as we
      allocated the delalloc region.  The problem with this is if we are not
      inserting an inline extent, we don't actually insert the extent item until
      after the ordered extent is written out.  This patch does 3 things,
      1) It moves the reservation clearing stuff into the ordered code, so when
      we remove the ordered extent we remove the reservation.
      2) It adds a EXTENT_DO_ACCOUNTING flag that gets passed when we clear
      delalloc bits in the cases where we want to clear the metadata reservation
      when we clear the delalloc extent, in the case that we do an inline extent
      or we invalidate the page.
      3) It adds another waitqueue to the space info so that when we start a fs
      wide delalloc flush, anybody else who also hits that area will simply wait
      for the flush to finish and then try to make their allocation.
      This has been tested thoroughly to make sure we did not regress on
      Signed-off-by: default avatarJosef Bacik <jbacik@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Chris Mason's avatar
      Btrfs: delay clearing EXTENT_DELALLOC for compressed extents · a3429ab7
      Chris Mason authored
      When compression is on, the cow_file_range code is farmed off to
      worker threads.  This allows us to do significant CPU work in parallel
      on SMP machines.
      But it is a delicate balance around when we clear flags and how.  In
      the past we cleared the delalloc flag immediately, which was safe
      because the pages stayed locked.
      But this is causing problems with the newest ENOSPC code, and with the
      recent extent state cleanups we can now clear the delalloc bit at the
      same time the uncompressed code does.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Chris Mason's avatar
      Btrfs: cleanup extent_clear_unlock_delalloc flags · a791e35e
      Chris Mason authored
      extent_clear_unlock_delalloc has a growing set of ugly parameters
      that is very difficult to read and maintain.
      This switches to a flag field and well named flag defines.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
  12. 01 Oct, 2009 2 commits
  13. 28 Sep, 2009 1 commit
    • Josef Bacik's avatar
      Btrfs: proper -ENOSPC handling · 9ed74f2d
      Josef Bacik authored
      At the start of a transaction we do a btrfs_reserve_metadata_space() and
      specify how many items we plan on modifying.  Then once we've done our
      modifications and such, just call btrfs_unreserve_metadata_space() for
      the same number of items we reserved.
      For keeping track of metadata needed for data I've had to add an extent_io op
      for when we merge extents.  This lets us track space properly when we are doing
      sequential writes, so we don't end up reserving way more metadata space than
      what we need.
      The only place where the metadata space accounting is not done is in the
      relocation code.  This is because Yan is going to be reworking that code in the
      near future, so running btrfs-vol -b could still possibly result in a ENOSPC
      related panic.  This patch also turns off the metadata_ratio stuff in order to
      allow users to more efficiently use their disk space.
      This patch makes it so we track how much metadata we need for an inode's
      delayed allocation extents by tracking how many extents are currently
      waiting for allocation.  It introduces two new callbacks for the
      extent_io tree's, merge_extent_hook and split_extent_hook.  These help
      us keep track of when we merge delalloc extents together and split them
      up.  Reservations are handled prior to any actually dirty'ing occurs,
      and then we unreserve after we dirty.
      btrfs_unreserve_metadata_for_delalloc() will make the appropriate
      unreservations as needed based on the number of reservations we
      currently have and the number of extents we currently have.  Doing the
      reservation outside of doing any of the actual dirty'ing lets us do
      things like filemap_flush() the inode to try and force delalloc to
      happen, or as a last resort actually start allocation on all delalloc
      inodes in the fs.  This has survived dbench, fs_mark and an fsx torture
      Signed-off-by: default avatarJosef Bacik <jbacik@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
  14. 24 Sep, 2009 2 commits
    • Yan, Zheng's avatar
      Btrfs: don't rename file into dummy directory · f679a840
      Yan, Zheng authored
      A recent change enforces only one access point to each subvolume. The first
      directory entry (the one added when the subvolume/snapshot was created) is
      treated as valid access point, all other subvolume links are linked to dummy
      empty directories. The dummy directories are temporary inodes that only in
      memory, so we can not rename file into them.
      Signed-off-by: default avatarYan Zheng <zheng.yan@oracle.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Yan, Zheng's avatar
      Btrfs: check size of inode backref before adding hardlink · a5719521
      Yan, Zheng authored
      For every hardlink in btrfs, there is a corresponding inode back
      reference. All inode back references for hardlinks in a given
      directory are stored in single b-tree item. The size of b-tree item
      is limited by the size of b-tree leaf, so we can only create limited
      number of hardlinks to a given file in a directory.
      The original code lacks of the check, it oops if the number of
      hardlinks goes over the limit. This patch fixes the issue by adding
      check to btrfs_link and btrfs_rename.
      Signed-off-by: default avatarYan Zheng <zheng.yan@oracle.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
  15. 22 Sep, 2009 2 commits
  16. 21 Sep, 2009 2 commits
    • Yan, Zheng's avatar
      Btrfs: add snapshot/subvolume destroy ioctl · 76dda93c
      Yan, Zheng authored
      This patch adds snapshot/subvolume destroy ioctl.  A subvolume that isn't being
      used and doesn't contains links to other subvolumes can be destroyed.
      Signed-off-by: default avatarYan Zheng <zheng.yan@oracle.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Yan, Zheng's avatar
      Btrfs: change how subvolumes are organized · 4df27c4d
      Yan, Zheng authored
      btrfs allows subvolumes and snapshots anywhere in the directory tree.
      If we snapshot a subvolume that contains a link to other subvolume
      called subvolA, subvolA can be accessed through both the original
      subvolume and the snapshot. This is similar to creating hard link to
      directory, and has the very similar problems.
      The aim of this patch is enforcing there is only one access point to
      each subvolume. Only the first directory entry (the one added when
      the subvolume/snapshot was created) is treated as valid access point.
      The first directory entry is distinguished by checking root forward
      reference. If the corresponding root forward reference is missing,
      we know the entry is not the first one.
      This patch also adds snapshot/subvolume rename support, the code
      allows rename subvolume link across subvolumes.
      Signed-off-by: default avatarYan Zheng <zheng.yan@oracle.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>