1. 05 Dec, 2010 1 commit
    • Eric W. Biederman's avatar
      Revert "vfs: show unreachable paths in getcwd and proc" · 7b2a69ba
      Eric W. Biederman authored
      Because it caused a chroot ttyname regression in 2.6.36.
      As of 2.6.36 ttyname does not work in a chroot.  It has already been
      reported that screen breaks, and for me this breaks an automated
      distribution testsuite, that I need to preserve the ability to run the
      existing binaries on for several more years.  glibc 2.11.3 which has a
      fix for this is not an option.
      The root cause of this breakage is:
          commit 8df9d1a4
          Author: Miklos Szeredi <mszeredi@suse.cz>
          Date:   Tue Aug 10 11:41:41 2010 +0200
          vfs: show unreachable paths in getcwd and proc
          Prepend "(unreachable)" to path strings if the path is not reachable
          from the current root.
          Two places updated are
           - the return string from getcwd()
           - and symlinks under /proc/$PID.
          Other uses of d_path() are left unchanged (we know that some old
          software crashes if /proc/mounts is changed).
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      So remove the nice sounding, but ultimately ill advised change to how
      /proc/fd symlinks work.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  2. 02 Dec, 2010 1 commit
  3. 01 Dec, 2010 5 commits
    • Dave Chinner's avatar
      xfs: only run xfs_error_test if error injection is active · c76febef
      Dave Chinner authored
      Recent tests writing lots of small files showed the flusher thread
      being CPU bound and taking a long time to do allocations on a debug
      kernel. perf showed this as the prime reason:
                   samples  pcnt function                    DSO
                   _______ _____ ___________________________ _________________
                 224648.00 36.8% xfs_error_test              [kernel.kallsyms]
                  86045.00 14.1% xfs_btree_check_sblock      [kernel.kallsyms]
                  39778.00  6.5% prandom32                   [kernel.kallsyms]
                  37436.00  6.1% xfs_btree_increment         [kernel.kallsyms]
                  29278.00  4.8% xfs_btree_get_rec           [kernel.kallsyms]
                  27717.00  4.5% random32                    [kernel.kallsyms]
      Walking btree blocks during allocation checking them requires each
      block (a cache hit, so no I/O) call xfs_error_test(), which then
      does a random32() call as the first operation.  IOWs, ~50% of the
      CPU is being consumed just testing whether we need to inject an
      error, even though error injection is not active.
      Kill this overhead when error injection is not active by adding a
      global counter of active error traps and only calling into
      xfs_error_test when fault injection is active.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    • Dave Chinner's avatar
      xfs: avoid moving stale inodes in the AIL · de25c181
      Dave Chinner authored
      When an inode has been marked stale because the cluster is being
      freed, we don't want to (re-)insert this inode into the AIL. There
      is a race condition where the cluster buffer may be unpinned before
      the inode is inserted into the AIL during transaction committed
      processing. If the buffer is unpinned before the inode item has been
      committed and inserted, then it is possible for the buffer to be
      released and hence processthe stale inode callbacks before the inode
      is inserted into the AIL.
      In this case, we then insert a clean, stale inode into the AIL which
      will never get removed by an IO completion. It will, however, get
      reclaimed and that triggers an assert in xfs_inode_free()
      complaining about freeing an inode still in the AIL.
      This race can be avoided by not moving stale inodes forward in the AIL
      during transaction commit completion processing. This closes the
      race condition by ensuring we never insert clean stale inodes into
      the AIL. It is safe to do this because a dirty stale inode, by
      definition, must already be in the AIL.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    • Dave Chinner's avatar
      xfs: delayed alloc blocks beyond EOF are valid after writeback · 309c8480
      Dave Chinner authored
      There is an assumption in the parts of XFS that flushing a dirty
      file will make all the delayed allocation blocks disappear from an
      inode. That is, that after calling xfs_flush_pages() then
      ip->i_delayed_blks will be zero.
      This is an invalid assumption as we may have specualtive
      preallocation beyond EOF and they are recorded in
      ip->i_delayed_blks. A flush of the dirty pages of an inode will not
      change the state of these blocks beyond EOF, so a non-zero
      deeelalloc block count after a flush is valid.
      The bmap code has an invalid ASSERT() that needs to be removed, and
      the swapext code has a bug in that while it swaps the data forks
      around, it fails to swap the i_delayed_blks counter associated with
      the fork and hence can get the block accounting wrong.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    • Dave Chinner's avatar
      xfs: push stale, pinned buffers on trylock failures · 90810b9e
      Dave Chinner authored
      As reported by Nick Piggin, XFS is suffering from long pauses under
      highly concurrent workloads when hosted on ramdisks. The problem is
      that an inode buffer is stuck in the pinned state in memory and as a
      result either the inode buffer or one of the inodes within the
      buffer is stopping the tail of the log from being moved forward.
      The system remains in this state until a periodic log force issued
      by xfssyncd causes the buffer to be unpinned. The main problem is
      that these are stale buffers, and are hence held locked until the
      transaction/checkpoint that marked them state has been committed to
      disk. When the filesystem gets into this state, only the xfssyncd
      can cause the async transactions to be committed to disk and hence
      unpin the inode buffer.
      This problem was encountered when scaling the busy extent list, but
      only the blocking lock interface was fixed to solve the problem.
      Extend the same fix to the buffer trylock operations - if we fail to
      lock a pinned, stale buffer, then force the log immediately so that
      when the next attempt to lock it comes around, it will have been
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    • Dave Chinner's avatar
      xfs: fix failed write truncation handling. · c726de44
      Dave Chinner authored
      Since the move to the new truncate sequence we call xfs_setattr to
      truncate down excessively instanciated blocks.  As shown by the testcase
      in kernel.org BZ #22452 that doesn't work too well.  Due to the confusion
      of the internal inode size, and the VFS inode i_size it zeroes data that
      it shouldn't.
      But full blown truncate seems like overkill here.  We only instanciate
      delayed allocations in the write path, and given that we never released
      the iolock we can't have converted them to real allocations yet either.
      The only nasty case is pre-existing preallocation which we need to skip.
      We already do this for page discard during writeback, so make the delayed
      allocation block punching a generic function and call it from the failed
      write path as well as xfs_aops_discard_page. The callers are
      responsible for ensuring that partial blocks are not truncated away,
      and that they hold the ilock.
      Based on a fix originally from Christoph Hellwig. This version used
      filesystem blocks as the range unit.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
  4. 30 Nov, 2010 4 commits
  5. 29 Nov, 2010 7 commits
  6. 28 Nov, 2010 4 commits
    • Chris Mason's avatar
      Btrfs: deal with DIO bios that span more than one ordered extent · 163cf09c
      Chris Mason authored
      The new DIO bio splitting code has problems when the bio
      spans more than one ordered extent.  This will happen as the
      generic DIO code merges our get_blocks calls together into
      a bigger single bio.
      This fixes things by walking forward in the ordered extent
      code finding all the overlapping ordered extents and completing them
      all at once.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Linus Torvalds's avatar
      Un-inline get_pipe_info() helper function · 72083646
      Linus Torvalds authored
      This avoids some include-file hell, and the function isn't really
      important enough to be inlined anyway.
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Linus Torvalds's avatar
      Export 'get_pipe_info()' to other users · c66fb347
      Linus Torvalds authored
      And in particular, use it in 'pipe_fcntl()'.
      The other pipe functions do not need to use the 'careful' version, since
      they are only ever called for things that are already known to be pipes.
      The normal read/write/ioctl functions are called through the file
      operations structures, so if a file isn't a pipe, they'd never get
      called.  But pipe_fcntl() is special, and called directly from the
      generic fcntl code, and needs to use the same careful function that the
      splice code is using.
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Linus Torvalds's avatar
      Rename 'pipe_info()' to 'get_pipe_info()' · 71993e62
      Linus Torvalds authored
      .. and change it to take the 'file' pointer instead of an inode, since
      that's what all users want anyway.
      The renaming is preparatory to exporting it to other users.  The old
      'pipe_info()' name was too generic and is already used elsewhere, so
      before making the function public we need to use a more specific name.
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  7. 27 Nov, 2010 6 commits
    • Josef Bacik's avatar
      Btrfs: setup blank root and fs_info for mount time · 450ba0ea
      Josef Bacik authored
      There is a problem with how we use sget, it searches through the list of supers
      attached to the fs_type looking for a super with the same fs_devices as what
      we're trying to mount.  This depends on sb->s_fs_info being filled, but we don't
      fill that in until we get to btrfs_fill_super, so we could hit supers on the
      fs_type super list that have a null s_fs_info.  In order to fix that we need to
      go ahead and setup a blank root with a blank fs_info to hold fs_devices, that
      way our test will work out right and then we can set s_fs_info in
      btrfs_set_super, and then open_ctree will simply use our pre-allocated root and
      fs_info when setting everything up.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Josef Bacik's avatar
      Btrfs: fix fiemap · 975f84fe
      Josef Bacik authored
      There are two big problems currently with FIEMAP
      1) We return extents for holes.  This isn't supposed to happen, we just don't
      return extents for holes and then userspace interprets the lack of an extent as
      a hole.
      2) We sometimes don't set FIEMAP_EXTENT_LAST properly.  This is because we wait
      to see a EXTENT_FLAG_VACANCY flag on the em, but this won't happen if say we ask
      fiemap to map up to the last extent in a file, and there is nothing but holes up
      to the i_size.  To fix this we need to lookup the last extent in this file and
      save the logical offset, so if we happen to try and map that extent we can be
      sure to set FIEMAP_EXTENT_LAST.
      With this patch we now pass xfstest 225, which we never have before.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Ian Kent's avatar
      Btrfs - fix race between btrfs_get_sb() and umount · 619c8c76
      Ian Kent authored
      When mounting a btrfs file system btrfs_test_super() may attempt to
      use sb->s_fs_info, the btrfs root, of a super block that is going away
      and that has had the btrfs root set to NULL in its ->put_super(). But
      if the super block is going away it cannot be an existing super block
      so we can return false in this case.
      Signed-off-by: default avatarIan Kent <raven@themaw.net>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Josef Bacik's avatar
      Btrfs: update inode ctime when using links · bc1cbf1f
      Josef Bacik authored
      Currently we fail xfstest 236 because we're not updating the inode ctime on
      link.  This is a simple fix, and makes it so we pass 236 now.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Josef Bacik's avatar
      Btrfs: make sure new inode size is ok in fallocate · 0ed42a63
      Josef Bacik authored
      We have been failing xfstest 228 forever, because we don't check to make sure
      the new inode size is acceptable as far as RLIMIT is concerned.  Just check to
      make sure it's ok to create a inode with this new size and error out if not.
      With this patch we now pass 228.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Josef Bacik's avatar
      Btrfs: fix typo in fallocate to make it honor actual size · 55a61d1d
      Josef Bacik authored
      There is a typo in __btrfs_prealloc_file_range() where we set the i_size to
      actual_len/cur_offset, and then just set it to cur_offset again, and do the same
      with btrfs_ordered_update_i_size().  This fixes it back to keeping i_size in a
      local variable and then updating i_size properly.  Tested this with
      xfs_io -F -f -c "falloc 0 1" -c "pwrite 0 1" foo
      stat'ing foo gives us a size of 1 instead of 4096 like it was.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
  8. 24 Nov, 2010 3 commits
  9. 23 Nov, 2010 2 commits
  10. 22 Nov, 2010 7 commits