1. 27 Mar, 2014 1 commit
  2. 25 Mar, 2014 2 commits
  3. 08 Nov, 2013 2 commits
  4. 16 Jul, 2013 1 commit
  5. 26 Feb, 2013 1 commit
  6. 22 Feb, 2013 1 commit
    • Anatol Pomozov's avatar
      fs: Preserve error code in get_empty_filp(), part 2 · 39b65252
      Anatol Pomozov authored
      Allocating a file structure in function get_empty_filp() might fail because
      of several reasons:
       - not enough memory for file structures
       - operation is not allowed
       - user is over its limit
      Currently the function returns NULL in all cases and we loose the exact
      reason of the error. All callers of get_empty_filp() assume that the function
      can fail with ENFILE only.
      Return error through pointer. Change all callers to preserve this error code.
      [AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit
      (things remaining here deal with alloc_file()), removed pipe(2) behaviour change]
      Signed-off-by: default avatarAnatol Pomozov <anatol.pomozov@gmail.com>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  7. 20 Mar, 2012 1 commit
  8. 26 Jul, 2011 1 commit
  9. 24 Jul, 2011 1 commit
    • Tim Chen's avatar
      VFS : mount lock scalability for internal mounts · 423e0ab0
      Tim Chen authored
      For a number of file systems that don't have a mount point (e.g. sockfs
      and pipefs), they are not marked as long term. Therefore in
      mntput_no_expire, all locks in vfs_mount lock are taken instead of just
      local cpu's lock to aggregate reference counts when we release
      reference to file objects.  In fact, only local lock need to have been
      taken to update ref counts as these file systems are in no danger of
      going away until we are ready to unregister them.
      The attached patch marks file systems using kern_mount without
      mount point as long term.  The contentions of vfs_mount lock
      is now eliminated.  Before un-registering such file system,
      kern_unmount should be called to remove the long term flag and
      make the mount point ready to be freed.
      Signed-off-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  10. 16 Jan, 2011 1 commit
    • Al Viro's avatar
      sanitize vfsmount refcounting changes · f03c6599
      Al Viro authored
      Instead of splitting refcount between (per-cpu) mnt_count
      and (SMP-only) mnt_longrefs, make all references contribute
      to mnt_count again and keep track of how many are longterm
      Accounting rules for longterm count:
      	* 1 for each fs_struct.root.mnt
      	* 1 for each fs_struct.pwd.mnt
      	* 1 for having non-NULL ->mnt_ns
      	* decrement to 0 happens only under vfsmount lock exclusive
      That allows nice common case for mntput() - since we can't drop the
      final reference until after mnt_longterm has reached 0 due to the rules
      above, mntput() can grab vfsmount lock shared and check mnt_longterm.
      If it turns out to be non-zero (which is the common case), we know
      that this is not the final mntput() and can just blindly decrement
      percpu mnt_count.  Otherwise we grab vfsmount lock exclusive and
      do usual decrement-and-check of percpu mnt_count.
      For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm();
      namespace.c uses the latter in places where we don't already hold
      vfsmount lock exclusive and opencodes a few remaining spots where
      we need to manipulate mnt_longterm.
      Note that we mostly revert the code outside of fs/namespace.c back
      to what we used to have; in particular, normal code doesn't need
      to care about two kinds of references, etc.  And we get to keep
      the optimization Nick's variant had bought us...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  11. 12 Jan, 2011 1 commit
  12. 06 Jan, 2011 3 commits
    • Nick Piggin's avatar
      fs: scale mntget/mntput · b3e19d92
      Nick Piggin authored
      The problem that this patch aims to fix is vfsmount refcounting scalability.
      We need to take a reference on the vfsmount for every successful path lookup,
      which often go to the same mount point.
      The fundamental difficulty is that a "simple" reference count can never be made
      scalable, because any time a reference is dropped, we must check whether that
      was the last reference. To do that requires communication with all other CPUs
      that may have taken a reference count.
      We can make refcounts more scalable in a couple of ways, involving keeping
      distributed counters, and checking for the global-zero condition less
      - check the global sum once every interval (this will delay zero detection
        for some interval, so it's probably a showstopper for vfsmounts).
      - keep a local count and only taking the global sum when local reaches 0 (this
        is difficult for vfsmounts, because we can't hold preempt off for the life of
        a reference, so a counter would need to be per-thread or tied strongly to a
        particular CPU which requires more locking).
      - keep a local difference of increments and decrements, which allows us to sum
        the total difference and hence find the refcount when summing all CPUs. Then,
        keep a single integer "long" refcount for slow and long lasting references,
        and only take the global sum of local counters when the long refcount is 0.
      This last scheme is what I implemented here. Attached mounts and process root
      and working directory references are "long" references, and everything else is
      a short reference.
      This allows scalable vfsmount references during path walking over mounted
      subtrees and unattached (lazy umounted) mounts with processes still running
      in them.
      This results in one fewer atomic op in the fastpath: mntget is now just a
      per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
      and non-atomic decrement in the common case. However code is otherwise bigger
      and heavier, so single threaded performance is basically a wash.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
    • Nick Piggin's avatar
      fs: improve scalability of pseudo filesystems · 4b936885
      Nick Piggin authored
      Regardless of how much we possibly try to scale dcache, there is likely
      always going to be some fundamental contention when adding or removing children
      under the same parent. Pseudo filesystems do not seem need to have connected
      dentries because by definition they are disconnected.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
    • Nick Piggin's avatar
      fs: dcache reduce branches in lookup path · fb045adb
      Nick Piggin authored
      Reduce some branches and memory accesses in dcache lookup by adding dentry
      flags to indicate common d_ops are set, rather than having to check them.
      This saves a pointer memory access (dentry->d_op) in common path lookup
      situations, and saves another pointer load and branch in cases where we
      have d_op but not the particular operation.
      Patched with:
      git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
  13. 10 Dec, 2010 1 commit
  14. 29 Oct, 2010 1 commit
  15. 25 Oct, 2010 2 commits
  16. 27 May, 2010 1 commit
  17. 21 May, 2010 1 commit
  18. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      The script does the followings.
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
      The conversion was done in the following steps.
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
      6. percpu.h was updated not to include slab.h.
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
  19. 12 Mar, 2010 1 commit
    • Eric Paris's avatar
      anon_inodes: mark the anon inode private · 3836a03d
      Eric Paris authored
      Inotify was switched to use anon_inode instead of its own private filesystem
      which only had one inode in commit c44dcc56 "switch inotify_user to
      The problem with this is that now the inotify inode is not a distinct inode
      which can be managed by LSMs.  userspace tools which use inotify were allowed
      to use the inotify inode but may not have had permission to do read/write type
      operations on the anon_inode.  After looking at the anon_inode and its users
      it looks like the best solution is to just mark the anon_inode as S_PRIVATE
      so the security system will ignore it.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Acked-by: default avatarJames Morris <jmorris@namei.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  20. 22 Dec, 2009 2 commits
  21. 17 Dec, 2009 2 commits
    • Nick Piggin's avatar
      fs: no games with DCACHE_UNHASHED · a3a065e3
      Nick Piggin authored
      Filesystems outside the regular namespace do not have to clear DCACHE_UNHASHED
      in order to have a working /proc/$pid/fd/XXX. Nothing in proc prevents the
      fd link from being used if its dentry is not in the hash.
      Also, it does not get put into the dcache hash if DCACHE_UNHASHED is clear;
      that depends on the filesystem calling d_add or d_rehash.
      So delete the misleading comments and needless code.
      Acked-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Nick Piggin's avatar
      fs: anon_inodes implement dname · b9aff027
      Nick Piggin authored
      Add a d_dname method for anon_inodes filesystem, the same way pipefs and
      sockfs pseudo filesystems.  This allows us to remove the DCACHE_UNHASHED
      hack from anon_inodes.c (see next patch).
      [AV: inumber is useless here, dropped from anon_inodefs_dname()]
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  22. 16 Dec, 2009 1 commit
  23. 04 Oct, 2009 1 commit
  24. 23 Sep, 2009 1 commit
    • Davide Libenzi's avatar
      anonfd: split interface into file creation and install · 562787a5
      Davide Libenzi authored
      Split the anonfd interface into a bare file pointer creation one, and a
      file pointer creation plus install one.
      There are cases, like the usage of eventfds inside other kernel
      interfaces, where the file pointer created by anonfd needs to be used
      inside the initialization of other structures.
      As it is right now, as soon as anon_inode_getfd() returns, the kenrle can
      race with userspace closing the newly installed file descriptor.
      This patch, while keeping the old anon_inode_getfd(), introduces a new
      anon_inode_getfile() (whose services are reused in anon_inode_getfd())
      that allows to split the file creation phase and the fd install one.
      Once all the kernel structures are initialized, the code can call the
      proper fd_install().
      Gregory manifested the need for something like this inside KVM.
      Signed-off-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: James Morris <jmorris@namei.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Gregory Haskins <ghaskins@novell.com>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Acked-by: default avatarRoland Dreier <rolandd@cisco.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  25. 18 Jun, 2009 1 commit
  26. 27 Mar, 2009 1 commit
  27. 31 Dec, 2008 1 commit
    • Christian Borntraeger's avatar
      anon_inodes: use fops->owner for module refcount · e3a2a0d4
      Christian Borntraeger authored
      There is an imbalance for anonymous inodes. If the fops->owner field is set,
      the module reference count of owner is decreases on release.
      ("filp_close" --> "__fput" ---> "fops_put")
      On the other hand, anon_inode_getfd does not increase the module reference
      count of owner. This causes two problems:
      - if owner is set, the module refcount goes negative
      - if owner is not set, the module can be unloaded while code is running
      This patch changes anon_inode_getfd to be symmetric regarding fops->owner
      I have checked all existing users of anon_inode_getfd. Noone sets fops->owner,
      thats why nobody has seen the module refcount negative. The refcounting was
      tested with a patched and unpatched KVM module.(see patch 2/2) I also did an
      epoll_open/close test.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
  28. 13 Nov, 2008 1 commit
  29. 24 Jul, 2008 2 commits
  30. 01 May, 2008 1 commit
    • Al Viro's avatar
      [PATCH] sanitize anon_inode_getfd() · 2030a42c
      Al Viro authored
      a) none of the callers even looks at inode or file returned by anon_inode_getfd()
      b) any caller that would try to look at those would be racy, since by the time
      it returns we might have raced with close() from another thread and that
      file would be pining for fjords.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  31. 19 Mar, 2008 1 commit
  32. 17 Oct, 2007 1 commit