Skip to content
  • Dave Chinner's avatar
    xfs: dynamic speculative EOF preallocation · 055388a3
    Dave Chinner authored
    
    
    Currently the size of the speculative preallocation during delayed
    allocation is fixed by either the allocsize mount option of a
    default size. We are seeing a lot of cases where we need to
    recommend using the allocsize mount option to prevent fragmentation
    when buffered writes land in the same AG.
    
    Rather than using a fixed preallocation size by default (up to 64k),
    make it dynamic by basing it on the current inode size. That way the
    EOF preallocation will increase as the file size increases.  Hence
    for streaming writes we are much more likely to get large
    preallocations exactly when we need it to reduce fragementation.
    
    For default settings, the size of the initial extents is determined
    by the number of parallel writers and the amount of memory in the
    machine. For 4GB RAM and 4 concurrent 32GB file writes:
    
    EXT: FILE-OFFSET           BLOCK-RANGE          AG AG-OFFSET                 TOTAL
       0: [0..1048575]:         1048672..2097247      0 (1048672..2097247)      1048576
       1: [1048576..2097151]:   5242976..6291551      0 (5242976..6291551)      1048576
       2: [2097152..4194303]:   12583008..14680159    0 (12583008..14680159)    2097152
       3: [4194304..8388607]:   25165920..29360223    0 (25165920..29360223)    4194304
       4: [8388608..16777215]:  58720352..67108959    0 (58720352..67108959)    8388608
       5: [16777216..33554423]: 117440584..134217791  0 (117440584..134217791) 16777208
       6: [33554424..50331511]: 184549056..201326143  0 (184549056..201326143) 16777088
       7: [50331512..67108599]: 251657408..268434495  0 (251657408..268434495) 16777088
    
    and for 16 concurrent 16GB file writes:
    
     EXT: FILE-OFFSET           BLOCK-RANGE          AG AG-OFFSET                 TOTAL
       0: [0..262143]:          2490472..2752615      0 (2490472..2752615)       262144
       1: [262144..524287]:     6291560..6553703      0 (6291560..6553703)       262144
       2: [524288..1048575]:    13631592..14155879    0 (13631592..14155879)     524288
       3: [1048576..2097151]:   30408808..31457383    0 (30408808..31457383)    1048576
       4: [2097152..4194303]:   52428904..54526055    0 (52428904..54526055)    2097152
       5: [4194304..8388607]:   104857704..109052007  0 (104857704..109052007)  4194304
       6: [8388608..16777215]:  209715304..218103911  0 (209715304..218103911)  8388608
       7: [16777216..33554423]: 452984848..469762055  0 (452984848..469762055) 16777208
    
    Because it is hard to take back specualtive preallocation, cases
    where there are large slow growing log files on a nearly full
    filesystem may cause premature ENOSPC. Hence as the filesystem nears
    full, the maximum dynamic prealloc size іs reduced according to this
    table (based on 4k block size):
    
    freespace       max prealloc size
      >5%             full extent (8GB)
      4-5%             2GB (8GB >> 2)
      3-4%             1GB (8GB >> 3)
      2-3%           512MB (8GB >> 4)
      1-2%           256MB (8GB >> 5)
      <1%            128MB (8GB >> 6)
    
    This should reduce the amount of space held in speculative
    preallocation for such cases.
    
    The allocsize mount option turns off the dynamic behaviour and fixes
    the prealloc size to whatever the mount option specifies. i.e. the
    behaviour is unchanged.
    
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    055388a3