• Josef Bacik's avatar
    Btrfs: use hybrid extents+bitmap rb tree for free space · 96303081
    Josef Bacik authored
    
    
    Currently btrfs has a problem where it can use a ridiculous amount of RAM simply
    tracking free space.  As free space gets fragmented, we end up with thousands of
    entries on an rb-tree per block group, which usually spans 1 gig of area.  Since
    we currently don't ever flush free space cache back to disk this gets to be a
    bit unweildly on large fs's with lots of fragmentation.
    
    This patch solves this problem by using PAGE_SIZE bitmaps for parts of the free
    space cache.  Initially we calculate a threshold of extent entries we can
    handle, which is however many extent entries we can cram into 16k of ram.  The
    maximum amount of RAM that should ever be used to track 1 gigabyte of diskspace
    will be 32k of RAM, which scales much better than we did before.
    
    Once we pass the extent threshold, we start adding bitmaps and using those
    instead for tracking the free space.  This patch also makes it so that any free
    space thats less than 4 * sectorsize we go ahead and put into a bitmap.  This is
    nice since we try and allocate out of the front of a block group, so if the
    front of a block group is heavily fragmented and then has a huge chunk of free
    space at the end, we go ahead and add the fragmented areas to bitmaps and use a
    normal extent entry to track the big chunk at the back of the block group.
    
    I've also taken the opportunity to revamp how we search for free space.
    Previously we indexed free space via an offset indexed rb tree and a bytes
    indexed rb tree.  I've dropped the bytes indexed rb tree and use only the offset
    indexed rb tree.  This cuts the number of tree operations we were doing
    previously down by half, and gives us a little bit of a better allocation
    pattern since we will always start from a specific offset and search forward
    from there, instead of searching for the size we need and try and get it as
    close as possible to the offset we want.
    
    I've given this a healthy amount of testing pre-new format stuff, as well as
    post-new format stuff.  I've booted up my fedora box which is installed on btrfs
    with this patch and ran with it for a few days without issues.  I've not seen
    any performance regressions in any of my tests.
    
    Since the last patch Yan Zheng fixed a problem where we could have overlapping
    entries, so updating their offset inline would cause problems.  Thanks,
    Signed-off-by: default avatarJosef Bacik <jbacik@redhat.com>
    Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    96303081
ctree.h 73.6 KB