1. 14 Jul, 2012 7 commits
    • Artem Bityutskiy's avatar
      affs: get rid of affs_sync_super · 3dd84782
      Artem Bityutskiy authored
      This patch makes affs stop using the VFS '->write_super()' method along with
      the 's_dirt' superblock flag, because they are on their way out.
      The whole "superblock write-out" VFS infrastructure is served by the
      'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
      writes out all dirty superblocks using the '->write_super()' call-back.  But the
      problem with this thread is that it wastes power by waking up the system every
      5 seconds, even if there are no diry superblocks, or there are no client
      file-systems which would need this (e.g., btrfs does not use
      '->write_super()'). So we want to kill it completely and thus, we need to make
      file-systems to stop using the '->write_super()' VFS service, and then remove
      it together with the kernel thread.
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Artem Bityutskiy's avatar
      affs: introduce VFS superblock object back-reference · a215fef7
      Artem Bityutskiy authored
      Add an 'sb' VFS superblock back-reference to the 'struct affs_sb_info' data
      structure - we will need to find the VFS superblock from a 'struct
      affs_sb_info' object in the next patch, so this change is jut a preparation.
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Artem Bityutskiy's avatar
      affs: stop using lock_super · a8371074
      Artem Bityutskiy authored
      The VFS's 'lock_super()' and 'unlock_super()' calls are deprecated and unwanted
      and just wait for a brave knight who'd kill them. This patch makes AFFS stop
      using them and use the buffer-head's own lock instead.
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Artem Bityutskiy's avatar
      affs: re-structure superblock locking a bit · e0471c8d
      Artem Bityutskiy authored
      AFFS wants to serialize the superblock (the root block in AFFS terms) updates
      and uses 'lock_super()/unlock_super()' for these purposes. This patch pushes the
      locking down to the 'affs_commit_super()' from the callers.
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Artem Bityutskiy's avatar
      affs: remove useless superblock writeout on remount · 0164b1a3
      Artem Bityutskiy authored
      We do not need to write out the superblock from '->remount_fs()' because
      VFS has already called '->sync_fs()' by this time and the superblock has
      already been written out. Thus, remove the 'affs_write_super()'
      infocation from 'affs_remount()'.
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Artem Bityutskiy's avatar
      affs: remove useless superblock writeout on unmount · c9753b1d
      Artem Bityutskiy authored
      We do not need to write out the superblock from '->put_super()' because VFS has
      already called '->sync_fs()' by this time and the superblock has already been
      written out. Thus, remove the 'affs_commit_super()' infocation from
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Artem Bityutskiy's avatar
      affs: stop setting bm_flags · bc86256d
      Artem Bityutskiy authored
      AFFS stores values '1' and '2' in 'bm_flags', and I fail to see any logic when
      it prefers one or another. AFFS writes '1' only from '->put_super()', while
      '->sync_fs()' and '->write_super()' store value '2'.  So on the first glance,
      it looks like we want to have '1' if we unmount.  However, this does not really
      happen in these cases:
        1. superblock is written via 'write_super()' then we unmount;
        2. we re-mount R/O, then unmount.
      which are quite typical.
      I could not find good documentation describing this field, except of one random
      piece of documentation in the internet which says that -1 means that the root
      block is valid, which is not consistent with what we have in the Linux AFFS
      Jan Kara commented on this: "I have some vague recollection that on Amiga
      boolean was usually encoded as: 0 == false, ~0 == -1 == true. But it has been
      Thus, my conclusion is that value of '1' is as good as value of '2' and we can
      just always use '2'. An Jan Kara suggested to go further: "generally bm_flags
      handling looks strange. If they are 0, we mount fs read only and thus cannot
      change them.  If they are != 0, we write 2 there. So IMHO if you just removed
      bm_flags setting, nothing will really happen."
      So this patch removes the bm_flags setting completely. This makes the "clean"
      argument of the 'affs_commit_super()' function unneeded, so it is also removed.
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  2. 13 Jul, 2012 2 commits
    • Dave Jones's avatar
      Remove easily user-triggerable BUG from generic_setlease · 8d657eb3
      Dave Jones authored
      This can be trivially triggered from userspace by passing in something unexpected.
          kernel BUG at fs/locks.c:1468!
          invalid opcode: 0000 [#1] SMP
          RIP: 0010:generic_setlease+0xc2/0x100
          Call Trace:
      Signed-off-by: default avatarDave Jones <davej@redhat.com>
      Cc: stable@kernel.org # 3.2+
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Jeff Moyer's avatar
      block: fix infinite loop in __getblk_slow · 91f68c89
      Jeff Moyer authored
      Commit 080399aa ("block: don't mark buffers beyond end of disk as
      mapped") exposed a bug in __getblk_slow that causes mount to hang as it
      loops infinitely waiting for a buffer that lies beyond the end of the
      disk to become uptodate.
      The problem was initially reported by Torsten Hilbrich here:
      and also reported independently here:
      and then Richard W.M.  Jones and Marcos Mello noted a few separate
      bugzillas also associated with the same issue.  This patch has been
      confirmed to fix:
      The main problem is here, in __getblk_slow:
              for (;;) {
                      struct buffer_head * bh;
                      int ret;
                      bh = __find_get_block(bdev, block, size);
                      if (bh)
                              return bh;
                      ret = grow_buffers(bdev, block, size);
                      if (ret < 0)
                              return NULL;
                      if (ret == 0)
      __find_get_block does not find the block, since it will not be marked as
      mapped, and so grow_buffers is called to fill in the buffers for the
      associated page.  I believe the for (;;) loop is there primarily to
      retry in the case of memory pressure keeping grow_buffers from
      succeeding.  However, we also continue to loop for other cases, like the
      block lying beond the end of the disk.  So, the fix I came up with is to
      only loop when grow_buffers fails due to memory allocation issues
      (return value of 0).
      The attached patch was tested by myself, Torsten, and Rich, and was
      found to resolve the problem in call cases.
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Reported-and-Tested-by: default avatarTorsten Hilbrich <torsten.hilbrich@secunet.com>
      Tested-by: default avatarRichard W.M. Jones <rjones@redhat.com>
      Reviewed-by: default avatarJosh Boyer <jwboyer@redhat.com>
      Cc: Stable <stable@vger.kernel.org>  # 3.0+
      [ Jens is on vacation, taking this directly  - Linus ]
      Stable Notes: this patch requires backport to 3.0, 3.2 and 3.3.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  3. 11 Jul, 2012 3 commits
  4. 10 Jul, 2012 1 commit
  5. 08 Jul, 2012 1 commit
  6. 07 Jul, 2012 1 commit
  7. 06 Jul, 2012 1 commit
  8. 04 Jul, 2012 5 commits
  9. 03 Jul, 2012 3 commits
    • Tyler Hicks's avatar
      eCryptfs: Fix lockdep warning in miscdev operations · 60d65f1f
      Tyler Hicks authored
      Don't grab the daemon mutex while holding the message context mutex.
      Addresses this lockdep warning:
       ecryptfsd/2141 is trying to acquire lock:
        (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}, at: [<ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]
       but task is already holding lock:
        (&(*daemon)->mux){+.+...}, at: [<ffffffffa029c2ec>] ecryptfs_miscdev_read+0x21c/0x470 [ecryptfs]
       which lock already depends on the new lock.
       the existing dependency chain (in reverse order) is:
       -> #1 (&(*daemon)->mux){+.+...}:
              [<ffffffff810a3b8d>] lock_acquire+0x9d/0x220
              [<ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0
              [<ffffffff8151cc64>] mutex_lock_nested+0x44/0x50
              [<ffffffffa029c5d7>] ecryptfs_send_miscdev+0x97/0x120 [ecryptfs]
              [<ffffffffa029b744>] ecryptfs_send_message+0x134/0x1e0 [ecryptfs]
              [<ffffffffa029a24e>] ecryptfs_generate_key_packet_set+0x2fe/0xa80 [ecryptfs]
              [<ffffffffa02960f8>] ecryptfs_write_metadata+0x108/0x250 [ecryptfs]
              [<ffffffffa0290f80>] ecryptfs_create+0x130/0x250 [ecryptfs]
              [<ffffffff811963a4>] vfs_create+0xb4/0x120
              [<ffffffff81197865>] do_last+0x8c5/0xa10
              [<ffffffff811998f9>] path_openat+0xd9/0x460
              [<ffffffff81199da2>] do_filp_open+0x42/0xa0
              [<ffffffff81187998>] do_sys_open+0xf8/0x1d0
              [<ffffffff81187a91>] sys_open+0x21/0x30
              [<ffffffff81527d69>] system_call_fastpath+0x16/0x1b
       -> #0 (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}:
              [<ffffffff810a3418>] __lock_acquire+0x1bf8/0x1c50
              [<ffffffff810a3b8d>] lock_acquire+0x9d/0x220
              [<ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0
              [<ffffffff8151cc64>] mutex_lock_nested+0x44/0x50
              [<ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]
              [<ffffffff811887d3>] vfs_read+0xb3/0x180
              [<ffffffff811888ed>] sys_read+0x4d/0x90
              [<ffffffff81527d69>] system_call_fastpath+0x16/0x1b
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
    • Tyler Hicks's avatar
      eCryptfs: Properly check for O_RDONLY flag before doing privileged open · 9fe79d76
      Tyler Hicks authored
      If the first attempt at opening the lower file read/write fails,
      eCryptfs will retry using a privileged kthread. However, the privileged
      retry should not happen if the lower file's inode is read-only because a
      read/write open will still be unsuccessful.
      The check for determining if the open should be retried was intended to
      be based on the access mode of the lower file's open flags being
      O_RDONLY, but the check was incorrectly performed. This would cause the
      open to be retried by the privileged kthread, resulting in a second
      failed open of the lower file. This patch corrects the check to
      determine if the open request should be handled by the privileged
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
    • Jeff Layton's avatar
      cifs: when server doesn't set CAP_LARGE_READ_X, cap default rsize at MaxBufferSize · ec01d738
      Jeff Layton authored
      When the server doesn't advertise CAP_LARGE_READ_X, then MS-CIFS states
      that you must cap the size of the read at the client's MaxBufferSize.
      Unfortunately, testing with many older servers shows that they often
      can't service a read larger than their own MaxBufferSize.
      Since we can't assume what the server will do in this situation, we must
      be conservative here for the default. When the server can't do large
      reads, then assume that it can't satisfy any read larger than its
      MaxBufferSize either.
      Luckily almost all modern servers can do large reads, so this won't
      affect them. This is really just for older win9x and OS/2 era servers.
      Also, note that this patch just governs the default rsize. The admin can
      always override this if he so chooses.
      Cc: <stable@vger.kernel.org> # 3.2
      Reported-by: default avatarDavid H. Durgee <dhdurgee@acm.org>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarSteven French <sfrench@w500smf.(none)>
  10. 02 Jul, 2012 9 commits
    • Chris Mason's avatar
      Btrfs: run delayed directory updates during log replay · b6305567
      Chris Mason authored
      While we are resolving directory modifications in the
      tree log, we are triggering delayed metadata updates to
      the filesystem btrees.
      This commit forces the delayed updates to run so the
      replay code can find any modifications done.  It stops
      us from crashing because the directory deleltion replay
      expects items to be removed immediately from the tree.
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      cc: stable@kernel.org
    • Josef Bacik's avatar
      Btrfs: hold a ref on the inode during writepages · 7fd1a3f7
      Josef Bacik authored
      We can race with unlink and not actually be able to do our igrab in
      btrfs_add_ordered_extent.  This will result in all sorts of problems.
      Instead of doing the complicated work to try and handle returning an error
      properly from btrfs_add_ordered_extent, just hold a ref to the inode during
      writepages.  If we cannot grab a ref we know we're freeing this inode anyway
      and can just drop the dirty pages on the floor, because screw them we're
      going to invalidate them anyway.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
    • Josef Bacik's avatar
      Btrfs: fix tree log remove space corner case · bdb7d303
      Josef Bacik authored
      The tree log stuff can have allocated space that we end up having split
      across a bitmap and a real extent.  The free space code does not deal with
      this, it assumes that if it finds an extent or bitmap entry that the entire
      range must fall within the entry it finds.  This isn't necessarily the case,
      so rework the remove function so it can handle this case properly.  This
      fixed two panics the user hit, first in the case where the space was
      initially in a bitmap and then in an extent entry, and then the reverse
      case.  Thanks,
      Reported-and-tested-by: default avatarShaun Reich <sreich@kde.org>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
    • Liu Bo's avatar
      Btrfs: fix wrong check during log recovery · 6bf02314
      Liu Bo authored
      When we're evicting an inode during log recovery, we need to ensure that the inode
      is not in orphan state any more, which means inode's run_time flags has _no_
      BTRFS_INODE_HAS_ORPHAN_ITEM.  Thus, the BUG_ON was triggered because of a wrong
      check for the flags.
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
    • Alexander Block's avatar
      Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGS · d3a94048
      Alexander Block authored
      We used the wrong ioctl macro for the getflags ioctl before.
      As we don't have the set/getflags ioctls in the user space ioctl.h
      at the moment, it's safe to fix it now.
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarAlexander Block <ablock84@googlemail.com>
    • Ilya Dryomov's avatar
      Btrfs: resume balance on rw (re)mounts properly · 2b6ba629
      Ilya Dryomov authored
      This introduces btrfs_resume_balance_async(), which, given that
      restriper state was recovered earlier by btrfs_recover_balance(),
      resumes balance in btrfs-balance kthread.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
    • Ilya Dryomov's avatar
      Btrfs: restore restriper state on all mounts · 68310a5e
      Ilya Dryomov authored
      Fix a bug that triggered asserts in btrfs_balance() in both normal and
      resume modes -- restriper state was not properly restored on read-only
      mounts.  This factors out resuming code from btrfs_restore_balance(),
      which is now also called earlier in the mount sequence to avoid the
      problem of some early writes getting the old profile.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
    • Josef Bacik's avatar
      Btrfs: fix dio write vs buffered read race · c3473e83
      Josef Bacik authored
      Miao pointed out there's a problem with mixing dio writes and buffered
      reads.  If the read happens between us invalidating the page range and
      actually locking the extent we can bring in pages into page cache.  Then
      once the write finishes if somebody tries to read again it will just find
      uptodate pages and we'll read stale data.  So we need to lock the extent and
      check for uptodate bits in the range.  If there are uptodate bits we need to
      unlock and invalidate again.  This will keep this race from happening since
      we will hold the extent locked until we create the ordered extent, and then
      teh read side always waits for ordered extents.  There was also a race in
      how we updated i_size, previously we were relying on the generic DIO stuff
      to adjust the i_size after the DIO had completed, but this happens outside
      of the extent lock which means reads could come in and not see the updated
      i_size.  So instead move this work into where we create the extents, and
      then this way the update ordered i_size stuff works properly in the endio
      handlers.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
    • Stefan Behrens's avatar
      Btrfs: don't count I/O statistic read errors for missing devices · 597a60fa
      Stefan Behrens authored
      It is normal behaviour of the low level btrfs function btrfs_map_bio()
      to complete a bio with -EIO if the device is missing, instead of just
      preventing the bio creation in an earlier step.
      This used to cause I/O statistic read error increments and annoying
      printk_ratelimited messages. This commit fixes the issue.
      Signed-off-by: default avatarStefan Behrens <sbehrens@giantdisaster.de>
      Reported-by: default avatarCarey Underwood <cwillu@cwillu.com>
  11. 28 Jun, 2012 3 commits
  12. 27 Jun, 2012 4 commits
    • Jan Schmidt's avatar
      Btrfs: resolve tree mod log locking issue in btrfs_next_leaf · d42244a0
      Jan Schmidt authored
      With the tree mod log, we may end up with two roots (the current root and a
      rewinded version of it) both pointing to two leaves, l1 and l2, of which l2
      had already been cow-ed in the current transaction. If we don't rewind any
      tree blocks, we cannot have two roots both pointing to an already cowed tree
      Now there is btrfs_next_leaf, which has a leaf locked and wants a lock on
      the next (right) leaf. And there is push_leaf_left, which has a (cowed!)
      leaf locked and wants a lock on the previous (left) leaf.
      In order to solve this dead lock situation, we use try_lock in
      btrfs_next_leaf (only in case it's called with a tree mod log time_seq
      paramter) and if we fail to get a lock on the next leaf, we give up our lock
      on the current leaf and retry from the very beginning.
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
    • Jan Schmidt's avatar
      Btrfs: fix tree mod log rewind of ADD operations · 19956c7e
      Jan Schmidt authored
      When a MOD_LOG_KEY_ADD operation is rewinded, we remove the key from the
      tree block. If its not the last key, removal involves a move operation.
      This move operation was explicitly done before this commit.
      However, at insertion time, there's a move operation before the actual
      addition to make room for the new key, which is recorded in the tree mod
      log as well. This means, we must drop the move operation when rewinding the
      add operation, because the next operation we'll be rewinding will be the
      corresponding MOD_LOG_MOVE_KEYS operation.
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
    • Jan Schmidt's avatar
      Btrfs: leave critical region in btrfs_find_all_roots as soon as possible · 155725c9
      Jan Schmidt authored
      When delayed refs exist, btrfs_find_all_roots used to hold the delayed ref
      mutex way longer than actually required. We ought to drop it immediately
      after we're done collecting all the delayed refs.
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
    • Jan Schmidt's avatar
      Btrfs: always put insert_ptr modifications into the tree mod log · c3e06965
      Jan Schmidt authored
      Several callers of insert_ptr set the tree_mod_log parameter to 0 to avoid
      addition to the tree mod log. In fact, we need all of those operations. This
      commit simply removes the additional parameter and makes addition to the
      tree mod log unconditional.
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>