1. 19 Nov, 2009 3 commits
    • David Howells's avatar
      FS-Cache: Allow the current state of all objects to be dumped · 4fbf4291
      David Howells authored
      Allow the current state of all fscache objects to be dumped by doing:
      	cat /proc/fs/fscache/objects
      By default, all objects and all fields will be shown.  This can be restricted
      by adding a suitable key to one of the caller's keyrings (such as the session
      	keyctl add user fscache:objlist "<restrictions>" @s
      The <restrictions> are:
      	K	Show hexdump of object key (don't show if not given)
      	A	Show hexdump of object aux data (don't show if not given)
      And paired restrictions:
      	C	Show objects that have a cookie
      	c	Show objects that don't have a cookie
      	B	Show objects that are busy
      	b	Show objects that aren't busy
      	W	Show objects that have pending writes
      	w	Show objects that don't have pending writes
      	R	Show objects that have outstanding reads
      	r	Show objects that don't have outstanding reads
      	S	Show objects that have slow work queued
      	s	Show objects that don't have slow work queued
      If neither side of a restriction pair is given, then both are implied.  For
      	keyctl add user fscache:objlist KB @s
      shows objects that are busy, and lists their object keys, but does not dump
      their auxiliary data.  It also implies "CcWwRrSs", but as 'B' is given, 'b' is
      not implied.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      FS-Cache: Annotate slow-work runqueue proc lines for FS-Cache work items · 440f0aff
      David Howells authored
      Annotate slow-work runqueue proc lines for FS-Cache work items.  Objects
      include the object ID and the state.  Operations include the object ID, the
      operation ID and the operation type and state.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      SLOW_WORK: Wait for outstanding work items belonging to a module to clear · 3d7a641e
      David Howells authored
      Wait for outstanding slow work items belonging to a module to clear when
      unregistering that module as a user of the facility.  This prevents the put_ref
      code of a work item from being taken away before it returns.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
  2. 17 Nov, 2009 4 commits
  3. 16 Nov, 2009 1 commit
  4. 15 Nov, 2009 1 commit
  5. 14 Nov, 2009 1 commit
  6. 12 Nov, 2009 6 commits
  7. 11 Nov, 2009 13 commits
    • Josef Bacik's avatar
      Btrfs: fix panic when trying to destroy a newly allocated · a6dbd429
      Josef Bacik authored
      There is a problem where iget5_locked will look for an inode, not find it, and
      then subsequently try to allocate it.  Another CPU will have raced in and
      allocated the inode instead, so when iget5_locked gets the inode spin lock again
      and does a search, it finds the new inode.  So it goes ahead and calls
      destroy_inode on the inode it just allocated.  The problem is we don't set
      BTRFS_I(inode)->root until the new inode is completely initialized.  This patch
      makes us set root to NULL when alloc'ing a new inode, so when we get to
      btrfs_destroy_inode and we see that root is NULL we can just free up the memory
      and continue on.  This fixes the panic
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Chris Mason's avatar
      Btrfs: allow more metadata chunk preallocation · 33b25808
      Chris Mason authored
      On an FS where all of the space has not been allocated into chunks yet,
      the enospc can return enospc just because the existing metadata chunks
      are full.
      We get around this by allowing more metadata chunks to be allocated up
      to a certain limit, and finding the right limit is a little fuzzy.  The
      problem is the reservations for delalloc would preallocate way too much
      of the FS as metadata.  We need to start saying no and just force some
      IO to happen.
      But we also need to let a reasonable amount of the FS become metadata.
      This bumps the hard limit up, later releases will have a better system.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Josef Bacik's avatar
      Btrfs: fallback on uncompressed io if compressed io fails · f5a84ee3
      Josef Bacik authored
      Currently compressed IO does not deal with not having its entire extent able to
      be allocated.  So if we have enough free space to allocate for the extent, but
      its not contiguous, it will fail spectacularly.  This patch fixes this by
      falling back on uncompressed IO which lets us spread the delalloc extent across
      multiple extents.  I tested this by making us randomly think the reservation had
      failed to make it fallback on the uncompressed io way and it seemed to work
      fine.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Josef Bacik's avatar
      Btrfs: find ideal block group for caching · ccf0e725
      Josef Bacik authored
      This patch changes a few things.  Hopefully the comments are helpfull, but
      I'll try and be as verbose here.
      My fedora box was taking 1 minute and 21 seconds to boot with btrfs as root.
      Part of this problem was we pick the first block group we can find and start
      caching it, even if it may not have enough free space.  The other problem is
      we only search for cached block groups the first time around, which we won't
      find any cached block groups because this is a newly mounted fs, so we end up
      caching several block groups during bootup, which with alot of fragmentation
      takes around 30-45 seconds to complete, which bogs down the system.  So
      1) Don't cache block groups willy-nilly at first.  Instead try and figure out
      which block group has the most free, and therefore will take the least amount
      of time to cache.
      2) Don't be so picky about cached block groups.  The other problem is once
      we've filled up a cluster, if the block group isn't finished caching the next
      time we try and do the allocation we'll completely ignore the cluster and
      start searching from the beginning of the space, which makes us cache more
      block groups, which slows us down even more.  So instead of skipping block
      groups that are not finished caching when we have a hint, only skip the block
      group if it hasn't started caching yet.
      There is one other tweak in here.  Before if we allocated a chunk and still
      couldn't find new space, we'd end up switching the space info to force another
      chunk allocation.  This could make us end up with way too many chunks, so keep
      track of this particular case.
      With this patch and my previous cluster fixes my fedora box now boots in 43
      seconds, and according to the bootchart is not held up by our block group
      caching at all.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Dan Carpenter's avatar
      Btrfs: avoid null deref in unpin_extent_cache() · 4eb3991c
      Dan Carpenter authored
      I re-orderred the checks to avoid dereferencing "em" if it was null.
      Found by smatch static checker.
      Signed-off-by: default avatarDan Carpenter <error27@gmail.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Li Dongyang's avatar
      Btrfs: skip btrfs_release_path in btrfs_update_root and btrfs_del_root · df66916e
      Li Dongyang authored
      We don't need to call btrfs_release_path because btrfs_free_path will do
      that for us.
      Signed-off-by: default avatarLi Dongyang <Jerry87905@gmail.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Josef Bacik's avatar
      Btrfs: fix some metadata enospc issues · 5df6a9f6
      Josef Bacik authored
      We weren't reserving metadata space for rename, rmdir and unlink, which could
      cause problems.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Josef Bacik's avatar
      Btrfs: fix how we set max_size for free space clusters · 01dea1ef
      Josef Bacik authored
      This patch fixes a problem where max_size can be set to 0 even though we
      filled the cluster properly.  We set max_size to 0 if we restart the cluster
      window, but if the new start entry is big enough to be our new cluster then we
      could return with a max_size set to 0, which will mean the next time we try to
      allocate from this cluster it will fail.  So set max_extent to the entry's
      size.  Tested this on my box and now we actually allocate from the cluster
      after we fill it.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Josef Bacik's avatar
      Btrfs: cleanup transaction starting and fix journal_info usage · 249ac1e5
      Josef Bacik authored
      We use journal_info to tell if we're in a nested transaction to make sure we
      don't commit the transaction within a nested transaction.  We use another
      method to see if there are any outstanding ioctl trans handles, so if we're
      starting one do not set current->journal_info, since it will screw with other
      filesystems.  This patch also cleans up the starting stuff so there aren't any
      magic numbers.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Josef Bacik's avatar
      Btrfs: fix data allocation hint start · 6346c939
      Josef Bacik authored
      Sometimes our start allocation hint when we cow a file can be either
      EXTENT_HOLE or some other such place holder, which is not optimal.  So if we
      find that our em->block_start is one of these special values, check to see
      where the first block of the inode is stored, and use that as a hint.  If that
      block is also a special value, just fallback on a hint of 0 and let the
      allocator figure out a good place to put the data.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
    • Tao Ma's avatar
      JBD/JBD2: free j_wbuf if journal init fails. · 7b02bec0
      Tao Ma authored
      If journal init fails, we need to free j_wbuf.
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarTao Ma <tao.ma@oracle.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
    • Jan Kara's avatar
      ext3: Wait for proper transaction commit on fsync · fe8bc91c
      Jan Kara authored
      We cannot rely on buffer dirty bits during fsync because pdflush can come
      before fsync is called and clear dirty bits without forcing a transaction
      commit. What we do is that we track which transaction has last changed
      the inode and which transaction last changed allocation and force it to
      disk on fsync.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
    • Eric Sandeen's avatar
      ext3: retry failed direct IO allocations · ea0174a7
      Eric Sandeen authored
      On a 256M 4k block filesystem, doing this in a loop:
          dd if=/dev/zero of=test oflag=direct bs=1M count=64
          rm -f test
      eventually leads to spurious ENOSPC:
          dd: writing `test': No space left on device
      As with other block allocation callers, it looks like we need to
      potentially retry the allocations on the initial ENOSPC.
      A similar patch went into ext4 (commit
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
  8. 08 Nov, 2009 3 commits
    • Theodore Ts'o's avatar
      ext4: partial revert to fix double brelse WARNING() · 1e424a34
      Theodore Ts'o authored
      This is a partial revert of commit 6487a9d3 (only the changes made to
      fs/ext4/namei.c), since it is causing the following brelse()
      double-free warning when running fsstress on a file system with 1k
      blocksize and we run into a block allocation failure while converting
      a single-block directory to a multi-block hash-tree indexed directory.
      WARNING: at fs/buffer.c:1197 __brelse+0x2e/0x33()
      Hardware name: 
      VFS: brelse: Trying to free free buffer
      Modules linked in:
      Pid: 2226, comm: jbd2/sdd-8 Not tainted 2.6.32-rc6-00577-g0003f55 #101
      Call Trace:
       [<c01587fb>] warn_slowpath_common+0x65/0x95
       [<c0158869>] warn_slowpath_fmt+0x29/0x2c
       [<c021168e>] __brelse+0x2e/0x33
       [<c0288a9f>] jbd2_journal_refile_buffer+0x67/0x6c
       [<c028a9ed>] jbd2_journal_commit_transaction+0x319/0x14d8
       [<c0164d73>] ? try_to_del_timer_sync+0x58/0x60
       [<c0175bcc>] ? sched_clock_cpu+0x12a/0x13e
       [<c017f6b4>] ? trace_hardirqs_off+0xb/0xd
       [<c0175c1f>] ? cpu_clock+0x3f/0x5b
       [<c017f6ec>] ? lock_release_holdtime+0x36/0x137
       [<c0664ad0>] ? _spin_unlock_irqrestore+0x44/0x51
       [<c0180af3>] ? trace_hardirqs_on_caller+0x103/0x124
       [<c0180b1f>] ? trace_hardirqs_on+0xb/0xd
       [<c0164d73>] ? try_to_del_timer_sync+0x58/0x60
       [<c0290d1c>] kjournald2+0x11a/0x310
       [<c017118e>] ? autoremove_wake_function+0x0/0x38
       [<c0290c02>] ? kjournald2+0x0/0x310
       [<c0170ee6>] kthread+0x66/0x6b
       [<c0170e80>] ? kthread+0x0/0x6b
       [<c01251b3>] kernel_thread_helper+0x7/0x10
      ---[ end trace 5579351b86af61e3 ]---
      Commit 6487a9d3
       was an attempt some buffer head leaks in an ENOSPC
      error path, but in some cases it actually results in an excess ENOSPC,
      as shown above.  Fixing this means cleaning up who is responsible for
      releasing the buffer heads from the callee to the caller of
      Since that's a relatively complex change, and we're late in the rcX
      development cycle, I'm reverting this now, and holding back a more
      complete fix until after 2.6.32 ships.  We've lived with this
      buffer_head leak on ENOSPC in ext3 and ext4 for a very long time; a
      few more months won't kill us.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: Curt Wohlgemuth <curtw@google.com>
    • Ryusuke Konishi's avatar
      nilfs2: fix missing cleanup of gc cache on error cases · c083234f
      Ryusuke Konishi authored
      This fixes an -rc1 regression brought by the commit:
       ("nilfs2: shorten freeze
      period due to GC in write operation v3").
      Although the patch moved out a function call of
      nilfs_ioctl_move_blocks() to nilfs_ioctl_clean_segments() from
      nilfs_ioctl_prepare_clean_segments(), it didn't move corresponding
      cleanup job needed for the error case.
      This will move the missing cleanup job to the destination function.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Acked-by: default avatarJiro SEKIBA <jir@unicus.jp>
    • Ryusuke Konishi's avatar
      nilfs2: fix kernel oops in error case of nilfs_ioctl_move_blocks · 5399dd1f
      Ryusuke Konishi authored
      This fixes a kernel oops reported by Markus Trippelsdorf in the email
      titled "[NILFS users] kernel Oops while running nilfs_cleanerd".
      The oops was caused by a bug of error path in
      nilfs_ioctl_move_blocks() function, which was inlined in
      nilfs_ioctl_move_blocks checks duplication of blocks which will be
      moved in garbage collection.  But, the check should have be done
      within nilfs_ioctl_move_inode_block() to prevent list corruption among
      buffers storing the target blocks.
      To fix the kernel oops, this moves forward the duplication check
      before the list insertion.
      I also tested this for stable trees [2.6.30, 2.6.31].
      Reported-by: default avatarMarkus Trippelsdorf <markus@trippelsdorf.de>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: stable <stable@kernel.org>
  9. 06 Nov, 2009 3 commits
    • Jeff Layton's avatar
      cifs: don't use CIFSGetSrvInodeNumber in is_path_accessible · f475f677
      Jeff Layton authored
      Because it's lighter weight, CIFS tries to use CIFSGetSrvInodeNumber to
      verify the accessibility of the root inode and then falls back to doing a
      full QPathInfo if that fails with -EOPNOTSUPP. I have at least a report
      of a server that returns NT_STATUS_INTERNAL_ERROR rather than something
      that translates to EOPNOTSUPP.
      Rather than trying to be clever with that call, just have
      is_path_accessible do a normal QPathInfo. That call is widely
      supported and it shouldn't increase the overhead significantly.
      Cc: Stable <stable@kernel.org>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarSteve French <sfrench@us.ibm.com>
    • Jeff Layton's avatar
      cifs: clean up handling when server doesn't consistently support inode numbers · ec06aedd
      Jeff Layton authored
      It's possible that a server will return a valid FileID when we query the
      FILE_INTERNAL_INFO for the root inode, but then zeroed out inode numbers
      when we do a FindFile with an infolevel of
      In this situation turn off querying for server inode numbers, generate a
      warning for the user and just generate an inode number using iunique.
      Once we generate any inode number with iunique we can no longer use any
      server inode numbers or we risk collisions, so ensure that we don't do
      that in cifs_get_inode_info either.
      Cc: Stable <stable@kernel.org>
      Reported-by: default avatarTimothy Normand Miller <theosib@gmail.com>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarSteve French <sfrench@us.ibm.com>
    • Mingming's avatar
      ext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/O · ba230c3f
      Mingming authored
      To prepare for a direct I/O write, we need to split the unwritten
      extents before submitting the I/O.  When no extents needed to be
      split, ext4_split_unwritten_extents() was incorrectly returning 0
      instead of the size of uninitialized extents. This bug caused the
      wrong return value sent back to VFS code when it gets called from
      async IO path, leading to an unnecessary fall back to buffered IO.
      This bug also hid the fact that the check to see whether or not a
      split would be necessary was incorrect; we can only skip splitting the
      extent if the write completely covers the uninitialized extent.
      Signed-off-by: default avatarMingming Cao <cmm@us.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
  10. 04 Nov, 2009 5 commits