1. 11 Nov, 2008 1 commit
    • Yan Zheng's avatar
      Btrfs: Fix starting search offset inside btrfs_drop_extents · 8247b41a
      Yan Zheng authored
      
      
      btrfs_drop_extents will drop paths and search again when it needs to
      force COW of higher nodes.  It was using the key it found during the last
      search as the offset for the next search.
      
      But, this wasn't always correct.  The key could be from before our desired
      range, and because we're dropping the path, it is possible for file's items
      to change while we do the search again.
      
      The fix here is to make sure we don't search for something smaller than
      the offset btrfs_drop_extents was called with.
      
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      8247b41a
  2. 10 Nov, 2008 2 commits
  3. 06 Nov, 2008 1 commit
    • Chris Mason's avatar
      Btrfs: Optimize compressed writeback and reads · 771ed689
      Chris Mason authored
      
      
      When reading compressed extents, try to put pages into the page cache
      for any pages covered by the compressed extent that readpages didn't already
      preload.
      
      Add an async work queue to handle transformations at delayed allocation processing
      time.  Right now this is just compression.  The workflow is:
      
      1) Find offsets in the file marked for delayed allocation
      2) Lock the pages
      3) Lock the state bits
      4) Call the async delalloc code
      
      The async delalloc code clears the state lock bits and delalloc bits.  It is
      important this happens before the range goes into the work queue because
      otherwise it might deadlock with other work queue items that try to lock
      those extent bits.
      
      The file pages are compressed, and if the compression doesn't work the
      pages are written back directly.
      
      An ordered work queue is used to make sure the inodes are written in the same
      order that pdflush or writepages sent them down.
      
      This changes extent_write_cache_pages to let the writepage function
      update the wbc nr_written count.
      
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      771ed689
  4. 31 Oct, 2008 1 commit
    • Chris Mason's avatar
      Btrfs: Compression corner fixes · 70b99e69
      Chris Mason authored
      
      
      Make sure we keep page->mapping NULL on the pages we're getting
      via alloc_page.  It gets set so a few of the callbacks can do the right
      thing, but in general these pages don't have a mapping.
      
      Don't try to truncate compressed inline items in btrfs_drop_extents.
      The whole compressed item must be preserved.
      
      Don't try to create multipage inline compressed items.  When we try to
      overwrite just the first page of the file, we would have to read in and recow
      all the pages after it in the same compressed inline items.  For now, only
      create single page inline items.
      
      Make sure we lock pages in the correct order during delalloc.  The
      search into the state tree for delalloc bytes can return bytes before
      the page we already have locked.
      
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      70b99e69
  5. 30 Oct, 2008 3 commits
    • Yan Zheng's avatar
      Btrfs: Add fallocate support v2 · d899e052
      Yan Zheng authored
      
      This patch updates btrfs-progs for fallocate support.
      
      fallocate is a little different in Btrfs because we need to tell the
      COW system that a given preallocated extent doesn't need to be
      cow'd as long as there are no snapshots of it.  This leverages the
      -o nodatacow checks.
       
      Signed-off-by: default avatarYan Zheng <zheng.yan@oracle.com>
      d899e052
    • Yan Zheng's avatar
      Btrfs: Fix bookend extent race v2 · 6643558d
      Yan Zheng authored
      
      
      When dropping middle part of an extent, btrfs_drop_extents truncates
      the extent at first, then inserts a bookend extent.
      
      Since truncation and insertion can't be done atomically, there is a small
      period that the bookend extent isn't in the tree. This causes problem for
      functions that search the tree for file extent item. The way to fix this is
      lock the range of the bookend extent before truncation.
      
      Signed-off-by: default avatarYan Zheng <zheng.yan@oracle.com>
      6643558d
    • Yan Zheng's avatar
      Btrfs: update hole handling v2 · 9036c102
      Yan Zheng authored
      
      
      This patch splits the hole insertion code out of btrfs_setattr
      into btrfs_cont_expand and updates btrfs_get_extent to properly
      handle the case that file extent items are not continuous.
      
      Signed-off-by: default avatarYan Zheng <zheng.yan@oracle.com>
      9036c102
  6. 29 Oct, 2008 1 commit
    • Chris Mason's avatar
      Btrfs: Add zlib compression support · c8b97818
      Chris Mason authored
      
      
      This is a large change for adding compression on reading and writing,
      both for inline and regular extents.  It does some fairly large
      surgery to the writeback paths.
      
      Compression is off by default and enabled by mount -o compress.  Even
      when the -o compress mount option is not used, it is possible to read
      compressed extents off the disk.
      
      If compression for a given set of pages fails to make them smaller, the
      file is flagged to avoid future compression attempts later.
      
      * While finding delalloc extents, the pages are locked before being sent down
      to the delalloc handler.  This allows the delalloc handler to do complex things
      such as cleaning the pages, marking them writeback and starting IO on their
      behalf.
      
      * Inline extents are inserted at delalloc time now.  This allows us to compress
      the data before inserting the inline extent, and it allows us to insert
      an inline extent that spans multiple pages.
      
      * All of the in-memory extent representations (extent_map.c, ordered-data.c etc)
      are changed to record both an in-memory size and an on disk size, as well
      as a flag for compression.
      
      From a disk format point of view, the extent pointers in the file are changed
      to record the on disk size of a given extent and some encoding flags.
      Space in the disk format is allocated for compression encoding, as well
      as encryption and a generic 'other' field.  Neither the encryption or the
      'other' field are currently used.
      
      In order to limit the amount of data read for a single random read in the
      file, the size of a compressed extent is limited to 128k.  This is a
      software only limit, the disk format supports u64 sized compressed extents.
      
      In order to limit the ram consumed while processing extents, the uncompressed
      size of a compressed extent is limited to 256k.  This is a software only limit
      and will be subject to tuning later.
      
      Checksumming is still done on compressed extents, and it is done on the
      uncompressed version of the data.  This way additional encodings can be
      layered on without having to figure out which encoding to checksum.
      
      Compression happens at delalloc time, which is basically singled threaded because
      it is usually done by a single pdflush thread.  This makes it tricky to
      spread the compression load across all the cpus on the box.  We'll have to
      look at parallel pdflush walks of dirty inodes at a later time.
      
      Decompression is hooked into readpages and it does spread across CPUs nicely.
      
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      c8b97818
  7. 09 Oct, 2008 2 commits
    • Yan Zheng's avatar
      Btrfs: Remove offset field from struct btrfs_extent_ref · 3bb1a1bc
      Yan Zheng authored
      
      
      The offset field in struct btrfs_extent_ref records the position
      inside file that file extent is referenced by. In the new back
      reference system, tree leaves holding references to file extent
      are recorded explicitly. We can scan these tree leaves very quickly, so the
      offset field is not required.
      
      This patch also makes the back reference system check the objectid
      when extents are in deleting.
      
      Signed-off-by: default avatarYan Zheng <zheng.yan@oracle.com>
      3bb1a1bc
    • Yan Zheng's avatar
      Btrfs: Count space allocated to file in bytes · a76a3cd4
      Yan Zheng authored
      
      
      This patch makes btrfs count space allocated to file in bytes instead
      of 512 byte sectors.
      
      Everything else in btrfs uses a byte count instead of sector sizes or
      blocks sizes, so this fits better.
      
      Signed-off-by: default avatarYan Zheng <zheng.yan@oracle.com>
      a76a3cd4
  8. 03 Oct, 2008 1 commit
    • Chris Mason's avatar
      Btrfs: O_DIRECT writes via buffered writes + invaldiate · cb843a6f
      Chris Mason authored
      
      
      This reworks the btrfs O_DIRECT write code a bit.  It had always fallen
      back to buffered IO and done an invalidate, but needed to be updated
      for the data=ordered code.  The invalidate wasn't actually removing pages
      because they were still inside an ordered extent.
      
      This also combines the O_DIRECT/O_SYNC paths where possible, and kicks
      off IO in the main btrfs_file_write loop to keep the pipe down the the
      disk full as we process long writes.
      
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      cb843a6f
  9. 29 Sep, 2008 1 commit
    • Chris Mason's avatar
      Btrfs: add and improve comments · d352ac68
      Chris Mason authored
      
      
      This improves the comments at the top of many functions.  It didn't
      dive into the guts of functions because I was trying to
      avoid merging problems with the new allocator and back reference work.
      
      extent-tree.c and volumes.c were both skipped, and there is definitely
      more work todo in cleaning and commenting the code.
      
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      d352ac68
  10. 26 Sep, 2008 1 commit
    • Zheng Yan's avatar
      Btrfs: extent_map and data=ordered fixes for space balancing · 5b21f2ed
      Zheng Yan authored
      
      
      * Add an EXTENT_BOUNDARY state bit to keep the writepage code
      from merging data extents that are in the process of being
      relocated.  This allows us to do accounting for them properly.
      
      * The balancing code relocates data extents indepdent of the underlying
      inode.  The extent_map code was modified to properly account for
      things moving around (invalidating extent_map caches in the inode).
      
      * Don't take the drop_mutex in the create_subvol ioctl.  It isn't
      required.
      
      * Fix walking of the ordered extent list to avoid races with sys_unlink
      
      * Change the lock ordering rules.  Transaction start goes outside
      the drop_mutex.  This allows btrfs_commit_transaction to directly
      drop the relocation trees.
      
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      5b21f2ed
  11. 25 Sep, 2008 26 commits