1. 14 May, 2015 1 commit
  2. 10 May, 2015 3 commits
    • Al Viro's avatar
      don't pass nameidata to ->follow_link() · 6e77137b
      Al Viro authored
      
      
      its only use is getting passed to nd_jump_link(), which can obtain
      it from current->nameidata
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6e77137b
    • Al Viro's avatar
      namei: remove restrictions on nesting depth · 894bc8c4
      Al Viro authored
      
      
      The only restriction is that on the total amount of symlinks
      crossed; how they are nested does not matter
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      894bc8c4
    • Al Viro's avatar
      new ->follow_link() and ->put_link() calling conventions · 680baacb
      Al Viro authored
      
      
      a) instead of storing the symlink body (via nd_set_link()) and returning
      an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
      that opaque pointer (into void * passed by address by caller) and returns
      the symlink body.  Returning ERR_PTR() on error, NULL on jump (procfs magic
      symlinks) and pointer to symlink body for normal symlinks.  Stored pointer
      is ignored in all cases except the last one.
      
      Storing NULL for opaque pointer (or not storing it at all) means no call
      of ->put_link().
      
      b) the body used to be passed to ->put_link() implicitly (via nameidata).
      Now only the opaque pointer is.  In the cases when we used the symlink body
      to free stuff, ->follow_link() now should store it as opaque pointer in addition
      to returning it.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      680baacb
  3. 10 Dec, 2014 1 commit
  4. 08 Nov, 2013 1 commit
    • Al Viro's avatar
      RCU'd vfsmounts · 48a066e7
      Al Viro authored
      
      
      * RCU-delayed freeing of vfsmounts
      * vfsmount_lock replaced with a seqlock (mount_lock)
      * sequence number from mount_lock is stored in nameidata->m_seq and
      used when we exit RCU mode
      * new vfsmount flag - MNT_SYNC_UMOUNT.  Set by umount_tree() when its
      caller knows that vfsmount will have no surviving references.
      * synchronize_rcu() done between unlocking namespace_sem in namespace_unlock()
      and doing pending mntput().
      * new helper: legitimize_mnt(mnt, seq).  Checks the mount_lock sequence
      number against seq, then grabs reference to mnt.  Then it rechecks mount_lock
      again to close the race and either returns success or drops the reference it
      has acquired.  The subtle point is that in case of MNT_SYNC_UMOUNT we can
      simply decrement the refcount and sod off - aforementioned synchronize_rcu()
      makes sure that final mntput() won't come until we leave RCU mode.  We need
      that, since we don't want to end up with some lazy pathwalk racing with
      umount() and stealing the final mntput() from it - caller of umount() may
      expect it to return only once the fs is shut down and we don't want to break
      that.  In other cases (i.e. with MNT_SYNC_UMOUNT absent) we have to do
      full-blown mntput() in case of mount_lock sequence number mismatch happening
      just as we'd grabbed the reference, but in those cases we won't be stealing
      the final mntput() from anything that would care.
      * mntput_no_expire() doesn't lock anything on the fast path now.  Incidentally,
      SMP and UP cases are handled the same way - no ifdefs there.
      * normal pathname resolution does *not* do any writes to mount_lock.  It does,
      of course, bump the refcounts of vfsmount and dentry in the very end, but that's
      it.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      48a066e7
  5. 08 Sep, 2013 2 commits
  6. 03 Sep, 2013 1 commit
    • Jeff Layton's avatar
      vfs: allow umount to handle mountpoints without revalidating them · 8033426e
      Jeff Layton authored
      
      
      Christopher reported a regression where he was unable to unmount a NFS
      filesystem where the root had gone stale. The problem is that
      d_revalidate handles the root of the filesystem differently from other
      dentries, but d_weak_revalidate does not. We could simply fix this by
      making d_weak_revalidate return success on IS_ROOT dentries, but there
      are cases where we do want to revalidate the root of the fs.
      
      A umount is really a special case. We generally aren't interested in
      anything but the dentry and vfsmount that's attached at that point. If
      the inode turns out to be stale we just don't care since the intent is
      to stop using it anyway.
      
      Try to handle this situation better by treating umount as a special
      case in the lookup code. Have it resolve the parent using normal
      means, and then do a lookup of the final dentry without revalidating
      it. In most cases, the final lookup will come out of the dcache, but
      the case where there's a trailing symlink or !LAST_NORM entry on the
      end complicates things a bit.
      
      Cc: Neil Brown <neilb@suse.de>
      Reported-by: default avatarChristopher T Vogan <cvogan@us.ibm.com>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8033426e
  7. 25 Dec, 2012 1 commit
  8. 20 Dec, 2012 2 commits
  9. 29 Jul, 2012 1 commit
  10. 14 Jul, 2012 3 commits
  11. 02 Nov, 2011 1 commit
    • Andy Whitcroft's avatar
      readlinkat: ensure we return ENOENT for the empty pathname for normal lookups · 1fa1e7f6
      Andy Whitcroft authored
      Since the commit below which added O_PATH support to the *at() calls, the
      error return for readlink/readlinkat for the empty pathname has switched
      from ENOENT to EINVAL:
      
        commit 65cfc672
        Author: Al Viro <viro@zeniv.linux.org.uk>
        Date:   Sun Mar 13 15:56:26 2011 -0400
      
          readlinkat(), fchownat() and fstatat() with empty relative pathnames
      
      This is both unexpected for userspace and makes readlink/readlinkat
      inconsistant with all other interfaces; and inconsistant with our stated
      return for these pathnames.
      
      As the readlinkat call does not have a flags parameter we cannot use the
      AT_EMPTY_PATH approach used in the other calls.  Therefore expose whether
      the original path is infact entry via a new user_path_at_empty() path
      lookup function.  Use this to determine whether to default to EINVAL or
      ENOENT for failures.
      
      Addresses http://bugs.launchpad.net/bugs/817187
      
      
      
      [akpm@linux-foundation.org: remove unused getname_flags()]
      Signed-off-by: default avatarAndy Whitcroft <apw@canonical.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      1fa1e7f6
  12. 27 Sep, 2011 1 commit
    • Linus Torvalds's avatar
      vfs: remove LOOKUP_NO_AUTOMOUNT flag · b6c8069d
      Linus Torvalds authored
      
      
      That flag no longer makes sense, since we don't look up automount points
      as eagerly any more.  Additionally, it turns out that the NO_AUTOMOUNT
      handling was buggy to begin with: it would avoid automounting even for
      cases where we really *needed* to do the automount handling, and could
      return ENOENT for autofs entries that hadn't been instantiated yet.
      
      With our new non-eager automount semantics, one discussion has been
      about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the
      newfstatat() and fstatat64() system calls), but it's probably not worth
      it: you can always force at least directory automounting by simply
      adding the final '/' to the filename, which works for *all* of the stat
      family system calls, old and new.
      
      So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a
      result of our bad default behavior.
      Acked-by: default avatarIan Kent <raven@themaw.net>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b6c8069d
  13. 26 Sep, 2011 1 commit
    • Linus Torvalds's avatar
      vfs pathname lookup: Add LOOKUP_AUTOMOUNT flag · d94c177b
      Linus Torvalds authored
      
      
      Since we've now turned around and made LOOKUP_FOLLOW *not* force an
      automount, we want to add the ability to force an automount event on
      lookup even if we don't happen to have one of the other flags that force
      it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..)
      
      Most cases will never want to use this, since you'd normally want to
      delay automounting as long as possible, which usually implies
      LOOKUP_OPEN (when we open a file or directory, we really cannot avoid
      the automount any more).
      
      But Trond argued sufficiently forcefully that at a minimum bind mounting
      a file and quotactl will want to force the automount lookup.  Some other
      cases (like nfs_follow_remote_path()) could use it too, although
      LOOKUP_DIRECTORY would work there as well.
      
      This commit just adds the flag and logic, no users yet, though.  It also
      doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and
      was made irrelevant by the same change that made us not follow on
      LOOKUP_FOLLOW.
      
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Ian Kent <raven@themaw.net>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Greg KH <gregkh@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d94c177b
  14. 19 Jul, 2011 3 commits
  15. 18 Mar, 2011 1 commit
  16. 14 Mar, 2011 5 commits
    • Al Viro's avatar
      New AT_... flag: AT_EMPTY_PATH · f52e0c11
      Al Viro authored
      
      
      For name_to_handle_at(2) we'll want both ...at()-style syscall that
      would be usable for non-directory descriptors (with empty relative
      pathname).  Introduce new flag (AT_EMPTY_PATH) to deal with that and
      corresponding LOOKUP_EMPTY; teach user_path_at() and path_init() to
      deal with the latter.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f52e0c11
    • Al Viro's avatar
      reduce vfs_path_lookup() to do_path_lookup() · 5b6ca027
      Al Viro authored
      
      
      New lookup flag: LOOKUP_ROOT.  nd->root is set (and held) by caller,
      path_init() starts walking from that place and all pathname resolution
      machinery never drops nd->root if that flag is set.  That turns
      vfs_path_lookup() into a special case of do_path_lookup() *and*
      gets us down to 3 callers of link_path_walk(), making it finally
      feasible to rip the handling of trailing symlink out of link_path_walk().
      That will not only simply the living hell out of it, but make life
      much simpler for unionfs merge.  Trailing symlink handling will
      become iterative, which is a good thing for stack footprint in
      a lot of situations as well.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5b6ca027
    • Al Viro's avatar
      get rid of nd->file · 70e9b357
      Al Viro authored
      
      
      Don't stash the struct file * used as starting point of walk in nameidata;
      pass file ** to path_init() instead.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      70e9b357
    • Al Viro's avatar
      untangle the "need_reval_dot" mess · 16c2cd71
      Al Viro authored
      
      
      instead of ad-hackery around need_reval_dot(), do the following:
      set a flag (LOOKUP_JUMPED) in the beginning of path, on absolute
      symlink traversal, on ".." and on procfs-style symlinks.  Clear on
      normal components, leave unchanged on ".".  Non-nested callers of
      link_path_walk() call handle_reval_path(), which checks that flag
      is set and that fs does want the final revalidate thing, then does
      ->d_revalidate().  In link_path_walk() all the return_reval stuff
      is gone.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      16c2cd71
    • Al Viro's avatar
      kill path_lookup() · c9c6cac0
      Al Viro authored
      
      
      all remaining callers pass LOOKUP_PARENT to it, so
      flags argument can die; renamed to kern_path_parent()
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c9c6cac0
  17. 15 Jan, 2011 2 commits
    • David Howells's avatar
      Add an AT_NO_AUTOMOUNT flag to suppress terminal automount · 6f45b656
      David Howells authored
      
      
      Add an AT_NO_AUTOMOUNT flag to suppress terminal automounting of automount
      point directories.  This can be used by fstatat() users to permit the
      gathering of attributes on an automount point and also prevent
      mass-automounting of a directory of automount points by ls.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarIan Kent <raven@themaw.net>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6f45b656
    • David Howells's avatar
      Add a dentry op to allow processes to be held during pathwalk transit · cc53ce53
      David Howells authored
      
      
      Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
      sleep when it tries to transit away from one of that filesystem's directories
      during a pathwalk.  The operation is keyed off a new dentry flag
      (DCACHE_MANAGE_TRANSIT).
      
      The filesystem is allowed to be selective about which processes it holds and
      which it permits to continue on or prohibits from transiting from each flagged
      directory.  This will allow autofs to hold up client processes whilst letting
      its userspace daemon through to maintain the directory or the stuff behind it
      or mounted upon it.
      
      The ->d_manage() dentry operation:
      
      	int (*d_manage)(struct path *path, bool mounting_here);
      
      takes a pointer to the directory about to be transited away from and a flag
      indicating whether the transit is undertaken by do_add_mount() or
      do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.
      
      It should return 0 if successful and to let the process continue on its way;
      -EISDIR to prohibit the caller from skipping to overmounted filesystems or
      automounting, and to use this directory; or some other error code to return to
      the user.
      
      ->d_manage() is called with namespace_sem writelocked if mounting_here is true
      and no other locks held, so it may sleep.  However, if mounting_here is true,
      it may not initiate or wait for a mount or unmount upon the parameter
      directory, even if the act is actually performed by userspace.
      
      Within fs/namei.c, follow_managed() is extended to check with d_manage() first
      on each managed directory, before transiting away from it or attempting to
      automount upon it.
      
      follow_down() is renamed follow_down_one() and should only be used where the
      filesystem deliberately intends to avoid management steps (e.g. autofs).
      
      A new follow_down() is added that incorporates the loop done by all other
      callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
      and CIFS do use it, their use is removed by converting them to use
      d_automount()).  The new follow_down() calls d_manage() as appropriate.  It
      also takes an extra parameter to indicate if it is being called from mount code
      (with namespace_sem writelocked) which it passes to d_manage().  follow_down()
      ignores automount points so that it can be used to mount on them.
      
      __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
      DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
      sleep.  It would be possible to enter d_manage() in rcu-walk mode too, and have
      that determine whether to abort or not itself.  That would allow the autofs
      daemon to continue on in rcu-walk mode.
      
      Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
      required as every tranist from that directory will cause d_manage() to be
      invoked.  It can always be set again when necessary.
      
      ==========================
      WHAT THIS MEANS FOR AUTOFS
      ==========================
      
      Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
      trigger the automounting of indirect mounts, and both of these can be called
      with i_mutex held.
      
      autofs knows that the i_mutex will be held by the caller in lookup(), and so
      can drop it before invoking the daemon - but this isn't so for d_revalidate(),
      since the lock is only held on _some_ of the code paths that call it.  This
      means that autofs can't risk dropping i_mutex from its d_revalidate() function
      before it calls the daemon.
      
      The bug could manifest itself as, for example, a process that's trying to
      validate an automount dentry that gets made to wait because that dentry is
      expired and needs cleaning up:
      
      	mkdir         S ffffffff8014e05a     0 32580  24956
      	Call Trace:
      	 [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
      	 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58
      	 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
      	 [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
      	 [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
      	 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
      	 [<ffffffff80057a2f>] lookup_create+0x46/0x80
      	 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4
      
      versus the automount daemon which wants to remove that dentry, but can't
      because the normal process is holding the i_mutex lock:
      
      	automount     D ffffffff8014e05a     0 32581      1              32561
      	Call Trace:
      	 [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
      	 [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
      	 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
      	 [<ffffffff800e6d55>] do_rmdir+0x77/0xde
      	 [<ffffffff8005d229>] tracesys+0x71/0xe0
      	 [<ffffffff8005d28d>] tracesys+0xd5/0xe0
      
      which means that the system is deadlocked.
      
      This patch allows autofs to hold up normal processes whilst the daemon goes
      ahead and does things to the dentry tree behind the automouter point without
      risking a deadlock as almost no locks are held in d_manage() and none in
      d_automount().
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Was-Acked-by: default avatarIan Kent <raven@themaw.net>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      cc53ce53
  18. 06 Jan, 2011 2 commits
    • Nick Piggin's avatar
      fs: rcu-walk for path lookup · 31e6b01f
      Nick Piggin authored
      
      
      Perform common cases of path lookups without any stores or locking in the
      ancestor dentry elements. This is called rcu-walk, as opposed to the current
      algorithm which is a refcount based walk, or ref-walk.
      
      This results in far fewer atomic operations on every path element,
      significantly improving path lookup performance. It also avoids cacheline
      bouncing on common dentries, significantly improving scalability.
      
      The overall design is like this:
      * LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk.
      * Take the RCU lock for the entire path walk, starting with the acquiring
        of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are
        not required for dentry persistence.
      * synchronize_rcu is called when unregistering a filesystem, so we can
        access d_ops and i_ops during rcu-walk.
      * Similarly take the vfsmount lock for the entire path walk. So now mnt
        refcounts are not required for persistence. Also we are free to perform mount
        lookups, and to assume dentry mount points and mount roots are stable up and
        down the path.
      * Have a per-dentry seqlock to protect the dentry name, parent, and inode,
        so we can load this tuple atomically, and also check whether any of its
        members have changed.
      * Dentry lookups (based on parent, candidate string tuple) recheck the parent
        sequence after the child is found in case anything changed in the parent
        during the path walk.
      * inode is also RCU protected so we can load d_inode and use the inode for
        limited things.
      * i_mode, i_uid, i_gid can be tested for exec permissions during path walk.
      * i_op can be loaded.
      
      When we reach the destination dentry, we lock it, recheck lookup sequence,
      and increment its refcount and mountpoint refcount. RCU and vfsmount locks
      are dropped. This is termed "dropping rcu-walk". If the dentry refcount does
      not match, we can not drop rcu-walk gracefully at the current point in the
      lokup, so instead return -ECHILD (for want of a better errno). This signals the
      path walking code to re-do the entire lookup with a ref-walk.
      
      Aside from the final dentry, there are other situations that may be encounted
      where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take
      a reference on the last good dentry) and continue with a ref-walk. Again, if
      we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup
      using ref-walk. But it is very important that we can continue with ref-walk
      for most cases, particularly to avoid the overhead of double lookups, and to
      gain the scalability advantages on common path elements (like cwd and root).
      
      The cases where rcu-walk cannot continue are:
      * NULL dentry (ie. any uncached path element)
      * parent with d_inode->i_op->permission or ACLs
      * dentries with d_revalidate
      * Following links
      
      In future patches, permission checks and d_revalidate become rcu-walk aware. It
      may be possible eventually to make following links rcu-walk aware.
      
      Uncached path elements will always require dropping to ref-walk mode, at the
      very least because i_mutex needs to be grabbed, and objects allocated.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      31e6b01f
    • Nick Piggin's avatar
      fs: dcache remove dcache_lock · b5c84bf6
      Nick Piggin authored
      
      
      dcache_lock no longer protects anything. remove it.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      b5c84bf6
  19. 22 Dec, 2009 1 commit
  20. 11 Dec, 2009 1 commit
  21. 21 Sep, 2009 1 commit
  22. 11 Jun, 2009 3 commits
  23. 09 May, 2009 1 commit
  24. 31 Dec, 2008 1 commit