1. 14 Jan, 2016 1 commit
  2. 14 Dec, 2015 2 commits
  3. 16 Nov, 2015 1 commit
    • Andrew Price's avatar
      GFS2: Use rht_for_each_entry_rcu in glock_hash_walk · 3dd1dd8c
      Andrew Price authored
      This lockdep splat was being triggered on umount:
      
      [55715.973122] ===============================
      [55715.980169] [ INFO: suspicious RCU usage. ]
      [55715.981021] 4.3.0-11553-g8d3de01c-dirty #15 Tainted: G        W
      [55715.982353] -------------------------------
      [55715.983301] fs/gfs2/glock.c:1427 suspicious rcu_dereference_protected() usage!
      
      The code it refers to is the rht_for_each_entry_safe usage in
      glock_hash_walk. The condition that triggers the warning is
      lockdep_rht_bucket_is_held(tbl, hash) which is checked in the
      __rcu_dereference_protected macro.
      
      The rhashtable buckets are not changed in glock_hash_walk so it's safe
      to rely on the rcu protection. Replace the rht_for_each_entry_safe()
      usage with rht_for_each_entry_rcu(), which doesn't care whether the
      bucket lock is held if the rcu read lock is held.
      Signed-off-by: default avatarAndrew Price <anprice@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Acked-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      3dd1dd8c
  4. 29 Oct, 2015 1 commit
  5. 03 Sep, 2015 5 commits
  6. 18 Jun, 2015 1 commit
    • Bob Peterson's avatar
      GFS2: Don't add all glocks to the lru · e7ccaf5f
      Bob Peterson authored
      The glocks used for resource groups often come and go hundreds of
      thousands of times per second. Adding them to the lru list just
      adds unnecessary contention for the lru_lock spin_lock, especially
      considering we're almost certainly going to re-use the glock and
      take it back off the lru microseconds later. We never want the
      glock shrinker to cull them anyway. This patch adds a new bit in
      the glops that determines which glock types get put onto the lru
      list and which ones don't.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Acked-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      e7ccaf5f
  7. 30 Mar, 2015 1 commit
  8. 20 Jan, 2015 1 commit
  9. 09 Jan, 2015 1 commit
  10. 18 Nov, 2014 1 commit
  11. 08 Oct, 2014 1 commit
  12. 18 Jul, 2014 4 commits
    • Bob Peterson's avatar
      GFS2: Allow flocks to use normal glock dq rather than dq_wait · 5bef3e7c
      Bob Peterson authored
      This patch allows flock glocks to use a non-blocking dequeue rather
      than dq_wait. It also reverts the previous patch I had posted regarding
      dq_wait. The reverted patch isn't necessarily a bad idea, but I decided
      this might avoid unforeseen side effects, and was therefore safer.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      5bef3e7c
    • Steven Whitehouse's avatar
      GFS2: Use GFP_NOFS when allocating glocks · fe0bbd29
      Steven Whitehouse authored
      Normally GFP_KERNEL is ok here, but there is now a rarely used code path
      relating to deallocation of unlinked inodes (in certain corner cases)
      which if hit at times of memory shortage can cause recursion while
      trying to free memory.
      
      One solution would be to try and move the gfs2_glock_get() call so
      that it is no longer called while another glock is held, but that
      doesn't look at all easy, so GFP_NOFS is the best solution for the
      time being.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      fe0bbd29
    • Steven Whitehouse's avatar
      GFS2: Fix race in glock lru glock disposal · 94a09a39
      Steven Whitehouse authored
      We must not leave items on the LRU list with GLF_LOCK set, since
      they can be removed if the glock is brought back into use, which
      may then potentially result in a hang, waiting for GLF_LOCK to
      clear.
      
      It doesn't happen very often, since it requires a glock that has
      not been used for a long time to be brought back into use at the
      same moment that the shrinker is part way through disposing of
      glocks.
      
      The fix is to set GLF_LOCK at a later time, when we already know
      that the other locks can be obtained. Also, we now only release
      the lru_lock in case a resched is needed, rather than on every
      iteration.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      94a09a39
    • Bob Peterson's avatar
      GFS2: Only wait for demote when last holder is dequeued · 79272b35
      Bob Peterson authored
      Function gfs2_glock_dq_wait is supposed to dequeue a glock and then
      wait for the lock to be demoted. The problem is, if this is a shared
      lock, its demote will depend on the other holders, which means you
      might end up waiting forever because the other process is blocked.
      This problem is especially apparent when dealing with nested flocks.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      79272b35
  13. 16 Jul, 2014 1 commit
    • NeilBrown's avatar
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown authored
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      74316201
  14. 18 Apr, 2014 1 commit
  15. 12 Mar, 2014 1 commit
  16. 07 Mar, 2014 1 commit
  17. 06 Mar, 2014 1 commit
  18. 16 Jan, 2014 1 commit
    • Steven Whitehouse's avatar
      GFS2: Don't use ENOBUFS when ENOMEM is the correct error code · ac3beb6a
      Steven Whitehouse authored
      Al Viro has tactfully pointed out that we are using the incorrect
      error code in some cases. This patch fixes that, and also removes
      the (unused) return value for glock dumping.
      
      >        * gfs2_iget() - ENOBUFS instead of ENOMEM.  ENOBUFS is
      > "No buffer space available (POSIX.1 (XSI STREAMS option))" and since
      > we don't support STREAMS it's probably fair game, but... what the hell?
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      ac3beb6a
  19. 02 Jan, 2014 1 commit
  20. 21 Nov, 2013 1 commit
  21. 15 Oct, 2013 1 commit
    • Steven Whitehouse's avatar
      GFS2: Use lockref for glocks · e66cf161
      Steven Whitehouse authored
      Currently glocks have an atomic reference count and also a spinlock
      which covers various internal fields, such as the state. This intent of
      this patch is to replace the spinlock and the atomic reference count
      with a lockref structure. This contains a spinlock which we can continue
      to use as before, and a reference counter which is used in conjuction
      with the spinlock to replace the previous atomic counter.
      
      As a result of this there are some new rules for reference counting on
      glocks. We need to distinguish between reference count changes under
      gl_spin (which are now just increment or decrement of the new counter,
      provided the count cannot hit zero) and those which are outside of
      gl_spin, but which now take gl_spin internally.
      
      The conversion is relatively straight forward. There is probably some
      further clean up which can be done, but the priority at this stage is to
      make the change in as simple a manner as possible.
      
      A consequence of this change is that the reference count is being
      decoupled from the lru list processing. This should allow future
      adoption of the lru_list code with glocks in due course.
      
      The reason for using the "dead" state and not just relying on 0 being
      the "invalid state" is so that in due course 0 ref counts can be
      allowable. The intent is to eventually be able to remove the ref count
      changes which are currently hidden away in state_change().
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      e66cf161
  22. 10 Sep, 2013 2 commits
    • Dave Chinner's avatar
      fs: convert fs shrinkers to new scan/count API · 1ab6c499
      Dave Chinner authored
      Convert the filesystem shrinkers to use the new API, and standardise some
      of the behaviours of the shrinkers at the same time.  For example,
      nr_to_scan means the number of objects to scan, not the number of objects
      to free.
      
      I refactored the CIFS idmap shrinker a little - it really needs to be
      broken up into a shrinker per tree and keep an item count with the tree
      root so that we don't need to walk the tree every time the shrinker needs
      to count the number of objects in the tree (i.e.  all the time under
      memory pressure).
      
      [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree]
      [assorted fixes folded in]
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarGlauber Costa <glommer@openvz.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1ab6c499
    • Glauber Costa's avatar
      super: fix calculation of shrinkable objects for small numbers · 55f841ce
      Glauber Costa authored
      The sysctl knob sysctl_vfs_cache_pressure is used to determine which
      percentage of the shrinkable objects in our cache we should actively try
      to shrink.
      
      It works great in situations in which we have many objects (at least more
      than 100), because the aproximation errors will be negligible.  But if
      this is not the case, specially when total_objects < 100, we may end up
      concluding that we have no objects at all (total / 100 = 0, if total <
      100).
      
      This is certainly not the biggest killer in the world, but may matter in
      very low kernel memory situations.
      Signed-off-by: default avatarGlauber Costa <glommer@openvz.org>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      55f841ce
  23. 04 Sep, 2013 1 commit
  24. 20 Aug, 2013 1 commit
  25. 19 Aug, 2013 1 commit
  26. 29 Apr, 2013 1 commit
  27. 26 Apr, 2013 1 commit
    • Bob Peterson's avatar
      GFS2: Flush work queue before clearing glock hash tables · 222cb538
      Bob Peterson authored
      There was a timing window when a GFS2 file system was unmounted
      that caused GFS2 to call BUG() and panic the kernel. The call
      to BUG() is meant to ensure that the glock reference count,
      gl_ref, never gets down to zero and bounce back up again. What was
      happening during umount is that function gfs2_put_super was dequeing
      its glocks for well-known files. In particular, we saw it on the
      journal glock, sd_jinode_gh. The dequeue caused delayed work to be
      queued for the glock state machine, to transition the lock to an
      "unlocked" state. While the work was still queued, gfs2_put_super
      called gfs2_gl_hash_clear to clear out the glock hash tables.
      If the timing was just so, the glock work function would drop the
      reference count at the time when it was being checked for zero,
      and that caused BUG() to be called. This patch calls
      flush_workqueue before clearing the glock hash tables, thereby
      ensuring that the delayed work is executed before the hash tables
      are cleared, and therefore the reference count never goes to zero
      until the glock is cleared.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      222cb538
  28. 10 Apr, 2013 2 commits
    • Steven Whitehouse's avatar
      GFS2: Add origin indicator to glock demote tracing · 7bd8b2eb
      Steven Whitehouse authored
      This adds the origin indicator to the trace point for glock
      demotion, so that it is possible to see where demote requests
      have come from.
      
      Note that requests generated from the demote_rq sysfs interface
      will show as remote, since they are intended to replicate
      exactly the effect of a demote reuqest from a remote node. It
      is still possible to tell these apart by looking at the process
      which initiated the demote request.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      7bd8b2eb
    • Steven Whitehouse's avatar
      GFS2: Add origin indicator to glock callbacks · 81ffbf65
      Steven Whitehouse authored
      This patch adds a bool indicating whether the demote
      request was originated locally or remotely. This is then
      used by the iopen ->go_callback() to make 100% sure that
      it will only respond to remote callbacks.
      
      Since ->evict_inode() uses GL_NOCACHE when it attempts to
      get an exclusive lock on the iopen lock, this may result
      in extra scheduling of the workqueue in case that the
      exclusive promotion request failed. This patch prevents
      that from happening.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      81ffbf65
  29. 08 Apr, 2013 1 commit
    • Steven Whitehouse's avatar
      GFS2: Remove gfs2_refresh_inode from inode creation path · 28fb3027
      Steven Whitehouse authored
      The original method for creating inodes used in GFS2 was to fill
      out a buffer, with all the information, and then to read that
      buffer into the in-core inode, using gfs2_refresh_inode()
      
      The problem with this approach is that all the inode's fields
      need to be calculated ahead of time, and were stored in various
      variables making the code rather complicated.
      
      The new approach is simply to allocate the in-core inode earlier
      and fill in as many fields as possible ahead of time. These can
      then be used to initilise the on disk representation. The
      code has been working towards the point where it is possible
      to remove gfs2_refresh_inode() because all the fields are
      correctly initialised ahead of time. We've now reached that
      milestone, and have reversed the order of setting up the in
      core and on disk inodes.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      28fb3027
  30. 01 Feb, 2013 1 commit
    • Steven Whitehouse's avatar
      GFS2: Split glock lru processing into two parts · 4506a519
      Steven Whitehouse authored
      The intent here is to split the processing of the glock lru
      list into two parts, so that the selection of glocks and the
      disposal are separate functions. The plan is then, that further
      updates can then be made to these functions in the future
      to improve the selection of glocks and also the efficiency of
      glock disposal.
      
      The new feature which this patch brings is sorting the
      glocks to be disposed of into glock number (and thus also
      disk block number) order. Not all glocks will need i/o in
      order to dispose of them, but some will, and at least we'll
      generate mostly disk block order i/o now.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      4506a519