1. 15 Jan, 2010 9 commits
    • Dave Chinner's avatar
      xfs: clean up inconsistent variable naming in xfs_swap_extent · 6bded0f3
      Dave Chinner authored
      
      
      The swap extent ioctl passes in a target inode and a temporary inode
      which are clearly named in the ioctl structure. The code then
      assigns temp to target and vice versa, making it extremely difficult
      to work out which inode is which later in the code.  Make this
      consistent throughout the code.
      
      Also make xfs_swap_extent static as there are no external users of
      the function.
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      6bded0f3
    • Dave Chinner's avatar
      xfs: add tracing to xfs_swap_extents · 3a85cd96
      Dave Chinner authored
      
      
      To be able to diagnose whether the swap extents function is
      detecting compatible inode data fork configurations for swapping
      extents, add tracing points to the code to allow us to see the
      format of the inode forks before and after the swap.
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      3a85cd96
    • Dave Chinner's avatar
      xfs: xfs_swap_extents needs to handle dynamic fork offsets · e09f9860
      Dave Chinner authored
      
      
      When swapping extents, we can corrupt inodes by swapping data forks
      that are in incompatible formats.  This is caused by the two indoes
      having different fork offsets due to the presence of an attribute
      fork on an attr2 filesystem.  xfs_fsr tries to be smart about
      setting the fork offset, but the trick it plays only works on attr1
      (old fixed format attribute fork) filesystems.
      
      Changing the way xfs_fsr sets up the attribute fork will prevent
      this situation from ever occurring, so in the kernel code we can get
      by with a preventative fix - check that the data fork in the
      defragmented inode is in a format valid for the inode it is being
      swapped into.  This will lead to files that will silently and
      potentially repeatedly fail defragmentation, so issue a warning to
      the log when this particular failure occurs to let us know that
      xfs_fsr needs updating/fixing.
      
      To help identify how to improve xfs_fsr to avoid this issue, add
      trace points for the inodes being swapped so that we can determine
      why the swap was rejected and to confirm that the code is making the
      right decisions and modifications when swapping forks.
      
      A further complication is even when the swap is allowed to proceed
      when the fork offset is different between the two inodes then value
      for the maximum number of extents the data fork can hold can be
      wrong. Make sure these are also set correctly after the swap occurs.
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      e09f9860
    • Dave Chinner's avatar
      xfs: fix missing error check in xfs_rtfree_range · 3daeb42c
      Dave Chinner authored
      
      
      When xfs_rtfind_forw() returns an error, the block is returned
      uninitialised.  xfs_rtfree_range() is not checking the error return,
      so could be using an uninitialised block number for modifying bitmap
      summary info.
      
      The problem was found by gcc when compiling the *userspace* libxfs
      code - it is an copy of the kernel code with the exact same bug.
      gcc gives an uninitialised variable warning on the userspace code
      but not on the kernel code. You gotta love the consistency (Mmmm,
      slightly chewy today!).
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      3daeb42c
    • Dave Chinner's avatar
      xfs: fix stale inode flush avoidance · 4b6a4688
      Dave Chinner authored
      
      
      When reclaiming stale inodes, we need to guarantee that inodes are
      unpinned before returning with a "clean" status. If we don't we can
      reclaim inodes that are pinned, leading to use after free in the
      transaction subsystem as transactions complete.
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      4b6a4688
    • Dave Chinner's avatar
      xfs: Remove inode iolock held check during allocation · 126976c7
      Dave Chinner authored
      
      
      lockdep complains about a the lock not being initialised as we do an
      ASSERT based check that the lock is not held before we initialise it
      to catch inodes freed with the lock held.
      
      lockdep does this check for us in the lock initialisation code, so
      remove the ASSERT to stop the lockdep warning.
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      126976c7
    • Dave Chinner's avatar
      xfs: reclaim all inodes by background tree walks · 57817c68
      Dave Chinner authored
      
      
      We cannot do direct inode reclaim without taking the flush lock to
      ensure that we do not reclaim an inode under IO. We check the inode
      is clean before doing direct reclaim, but this is not good enough
      because the inode flush code marks the inode clean once it has
      copied the in-core dirty state to the backing buffer.
      
      It is the flush lock that determines whether the inode is still
      under IO, even though it is marked clean, and the inode is still
      required at IO completion so we can't reclaim it even though it is
      clean in core. Hence the requirement that we need to take the flush
      lock even on clean inodes because this guarantees that the inode
      writeback IO has completed and it is safe to reclaim the inode.
      
      With delayed write inode flushing, we coul dend up waiting a long
      time on the flush lock even for a clean inode. The background
      reclaim already handles this efficiently, so avoid all the problems
      by killing the direct reclaim path altogether.
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      57817c68
    • Dave Chinner's avatar
      xfs: Avoid inodes in reclaim when flushing from inode cache · 018027be
      Dave Chinner authored
      
      
      The reclaim code will handle flushing of dirty inodes before reclaim
      occurs, so avoid them when determining whether an inode is a
      candidate for flushing to disk when walking the radix trees.  This
      is based on a test patch from Christoph Hellwig.
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      018027be
    • Dave Chinner's avatar
      xfs: reclaim inodes under a write lock · c8e20be0
      Dave Chinner authored
      
      
      Make the inode tree reclaim walk exclusive to avoid races with
      concurrent sync walkers and lookups. This is a version of a patch
      posted by Christoph Hellwig that avoids all the code duplication.
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      c8e20be0
  2. 12 Jan, 2010 31 commits