1. 10 Sep, 2013 1 commit
    • Glauber Costa's avatar
      super: fix calculation of shrinkable objects for small numbers · 55f841ce
      Glauber Costa authored
      The sysctl knob sysctl_vfs_cache_pressure is used to determine which
      percentage of the shrinkable objects in our cache we should actively try
      to shrink.
      It works great in situations in which we have many objects (at least more
      than 100), because the aproximation errors will be negligible.  But if
      this is not the case, specially when total_objects < 100, we may end up
      concluding that we have no objects at all (total / 100 = 0, if total <
      This is certainly not the biggest killer in the world, but may matter in
      very low kernel memory situations.
      Signed-off-by: default avatarGlauber Costa <glommer@openvz.org>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  2. 04 Sep, 2013 1 commit
  3. 20 Aug, 2013 1 commit
  4. 19 Aug, 2013 1 commit
  5. 29 Apr, 2013 1 commit
  6. 26 Apr, 2013 1 commit
    • Bob Peterson's avatar
      GFS2: Flush work queue before clearing glock hash tables · 222cb538
      Bob Peterson authored
      There was a timing window when a GFS2 file system was unmounted
      that caused GFS2 to call BUG() and panic the kernel. The call
      to BUG() is meant to ensure that the glock reference count,
      gl_ref, never gets down to zero and bounce back up again. What was
      happening during umount is that function gfs2_put_super was dequeing
      its glocks for well-known files. In particular, we saw it on the
      journal glock, sd_jinode_gh. The dequeue caused delayed work to be
      queued for the glock state machine, to transition the lock to an
      "unlocked" state. While the work was still queued, gfs2_put_super
      called gfs2_gl_hash_clear to clear out the glock hash tables.
      If the timing was just so, the glock work function would drop the
      reference count at the time when it was being checked for zero,
      and that caused BUG() to be called. This patch calls
      flush_workqueue before clearing the glock hash tables, thereby
      ensuring that the delayed work is executed before the hash tables
      are cleared, and therefore the reference count never goes to zero
      until the glock is cleared.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  7. 10 Apr, 2013 2 commits
    • Steven Whitehouse's avatar
      GFS2: Add origin indicator to glock demote tracing · 7bd8b2eb
      Steven Whitehouse authored
      This adds the origin indicator to the trace point for glock
      demotion, so that it is possible to see where demote requests
      have come from.
      Note that requests generated from the demote_rq sysfs interface
      will show as remote, since they are intended to replicate
      exactly the effect of a demote reuqest from a remote node. It
      is still possible to tell these apart by looking at the process
      which initiated the demote request.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Add origin indicator to glock callbacks · 81ffbf65
      Steven Whitehouse authored
      This patch adds a bool indicating whether the demote
      request was originated locally or remotely. This is then
      used by the iopen ->go_callback() to make 100% sure that
      it will only respond to remote callbacks.
      Since ->evict_inode() uses GL_NOCACHE when it attempts to
      get an exclusive lock on the iopen lock, this may result
      in extra scheduling of the workqueue in case that the
      exclusive promotion request failed. This patch prevents
      that from happening.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  8. 08 Apr, 2013 1 commit
    • Steven Whitehouse's avatar
      GFS2: Remove gfs2_refresh_inode from inode creation path · 28fb3027
      Steven Whitehouse authored
      The original method for creating inodes used in GFS2 was to fill
      out a buffer, with all the information, and then to read that
      buffer into the in-core inode, using gfs2_refresh_inode()
      The problem with this approach is that all the inode's fields
      need to be calculated ahead of time, and were stored in various
      variables making the code rather complicated.
      The new approach is simply to allocate the in-core inode earlier
      and fill in as many fields as possible ahead of time. These can
      then be used to initilise the on disk representation. The
      code has been working towards the point where it is possible
      to remove gfs2_refresh_inode() because all the fields are
      correctly initialised ahead of time. We've now reached that
      milestone, and have reversed the order of setting up the in
      core and on disk inodes.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  9. 01 Feb, 2013 1 commit
    • Steven Whitehouse's avatar
      GFS2: Split glock lru processing into two parts · 4506a519
      Steven Whitehouse authored
      The intent here is to split the processing of the glock lru
      list into two parts, so that the selection of glocks and the
      disposal are separate functions. The plan is then, that further
      updates can then be made to these functions in the future
      to improve the selection of glocks and also the efficiency of
      glock disposal.
      The new feature which this patch brings is sorting the
      glocks to be disposed of into glock number (and thus also
      disk block number) order. Not all glocks will need i/o in
      order to dispose of them, but some will, and at least we'll
      generate mostly disk block order i/o now.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  10. 29 Jan, 2013 1 commit
  11. 11 Dec, 2012 1 commit
    • Rafael Aquini's avatar
      mm: redefine address_space.assoc_mapping · 252aa6f5
      Rafael Aquini authored
      Overhaul struct address_space.assoc_mapping renaming it to
      address_space.private_data and its type is redefined to void*.  By this
      approach we consistently name the .private_* elements from struct
      address_space as well as allow extended usage for address_space
      association with other data structures through ->private_data.
      Also, all users of old ->assoc_mapping element are converted to reflect
      its new name and type change (->private_data).
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  12. 15 Nov, 2012 2 commits
  13. 14 Nov, 2012 1 commit
    • David Teigland's avatar
      GFS2: skip dlm_unlock calls in unmount · fb6791d1
      David Teigland authored
      When unmounting, gfs2 does a full dlm_unlock operation on every
      cached lock.  This can create a very large amount of work and can
      take a long time to complete.  However, the vast majority of these
      dlm unlock operations are unnecessary because after all the unlocks
      are done, gfs2 leaves the dlm lockspace, which automatically clears
      the locks of the leaving node, without unlocking each one individually.
      So, gfs2 can skip explicit dlm unlocks, and use dlm_release_lockspace to
      remove the locks implicitly.  The one exception is when the lock's lvb is
      being used.  In this case, dlm_unlock is called because it may update the
      lvb of the resource.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  14. 07 Nov, 2012 2 commits
  15. 24 Sep, 2012 4 commits
  16. 11 Jun, 2012 2 commits
  17. 08 Jun, 2012 2 commits
    • Benjamin Marzinski's avatar
      GFS2: Use lvbs for storing rgrp information with mount option · 90306c41
      Benjamin Marzinski authored
      Instead of reading in the resource groups when gfs2 is checking
      for free space to allocate from, gfs2 can store the necessary infromation
      in the resource group's lvb.  Also, instead of searching for unlinked
      inodes in every resource group that's checked for free space, gfs2 can
      store the number of unlinked but inodes in the lvb, and only check for
      unlinked inodes if it will find some.
      The first time a resource group is locked, the lvb must initialized.
      Since this involves counting the unlinked inodes in the resource group,
      this takes a little extra time.  But after that, if the resource group
      is locked with GL_SKIP, the buffer head won't be read in unless it's
      actually needed.
      Enabling the resource groups lvbs is done via the rgrplvb mount option.  If
      this option isn't set, the lvbs will still be set and updated, but they won't
      be verfied or used by the filesystem.  To safely turn on this option, all of
      the nodes mounting the filesystem must be running code with this patch, and
      the filesystem must have been completely unmounted since they were updated.
      Signed-off-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Cache last hash bucket for glock seq_files · ba1ddcb6
      Steven Whitehouse authored
      For the glocks and glstats seq_files, which are exposed via debugfs
      we should cache the most recent hash bucket, along with the offset
      into that bucket. This allows us to restart from that point, rather
      than having to begin at the beginning each time.
      This is an idea from Eric Dumazet, however I've slightly extended it
      so that if the position from which we are due to start is at any
      point beyond the last cached point, we start from the last cached
      point, plus whatever is the appropriate offset. I don't really expect
      people to be lseeking around these files, but if they did so with only
      positive offsets, then we'd still get some of the benefit of using a
      cached offset.
      With my simple test of around 200k entries in the file, I'm seeing
      an approx 10x speed up.
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  18. 07 Jun, 2012 1 commit
  19. 28 Feb, 2012 2 commits
    • Steven Whitehouse's avatar
      GFS2: glock statistics gathering · a245769f
      Steven Whitehouse authored
      The stats are divided into two sets: those relating to the
      super block and those relating to an individual glock. The
      super block stats are done on a per cpu basis in order to
      try and reduce the overhead of gathering them. They are also
      further divided by glock type.
      In the case of both the super block and glock statistics,
      the same information is gathered in each case. The super
      block statistics are used to provide default values for
      most of the glock statistics, so that newly created glocks
      should have, as far as possible, a sensible starting point.
      The statistics are divided into three pairs of mean and
      variance, plus two counters. The mean/variance pairs are
      smoothed exponential estimates and the algorithm used is
      one which will be very familiar to those used to calculation
      of round trip times in network code.
      The three pairs of mean/variance measure the following
       1. DLM lock time (non-blocking requests)
       2. DLM lock time (blocking requests)
       3. Inter-request time (again to the DLM)
      A non-blocking request is one which will complete right
      away, whatever the state of the DLM lock in question. That
      currently means any requests when (a) the current state of
      the lock is exclusive (b) the requested state is either null
      or unlocked or (c) the "try lock" flag is set. A blocking
      request covers all the other lock requests.
      There are two counters. The first is there primarily to show
      how many lock requests have been made, and thus how much data
      has gone into the mean/variance calculations. The other counter
      is counting queueing of holders at the top layer of the glock
      code. Hopefully that number will be a lot larger than the number
      of dlm lock requests issued.
      So why gather these statistics? There are several reasons
      we'd like to get a better idea of these timings:
      1. To be able to better set the glock "min hold time"
      2. To spot performance issues more easily
      3. To improve the algorithm for selecting resource groups for
      allocation (to base it on lock wait time, rather than blindly
      using a "try lock")
      Due to the smoothing action of the updates, a step change in
      some input quantity being sampled will only fully be taken
      into account after 8 samples (or 4 for the variance) and this
      needs to be carefully considered when interpreting the
      Knowing both the time it takes a lock request to complete and
      the average time between lock requests for a glock means we
      can compute the total percentage of the time for which the
      node is able to use a glock vs. time that the rest of the
      cluster has its share. That will be very useful when setting
      the lock min hold time.
      The other point to remember is that all times are in
      nanoseconds. Great care has been taken to ensure that we
      measure exactly the quantities that we want, as accurately
      as possible. There are always inaccuracies in any
      measuring system, but I hope this is as accurate as we
      can reasonably make it.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Fix race between lru_list and glock ref count · 4043b886
      Steven Whitehouse authored
      This patch fixes a narrow race window between the glock ref count
      hitting zero and glocks being removed from the lru_list.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  20. 11 Jan, 2012 1 commit
  21. 15 Jul, 2011 1 commit
  22. 25 May, 2011 2 commits
    • Ying Han's avatar
      vmscan: change shrinker API by passing shrink_control struct · 1495f230
      Ying Han authored
      Change each shrinker's API by consolidating the existing parameters into
      shrink_control struct.  This will simplify any further features added w/o
      touching each file of shrinker.
      [akpm@linux-foundation.org: fix build]
      [akpm@linux-foundation.org: fix warning]
      [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
      [akpm@linux-foundation.org: fix xfs warning]
      [akpm@linux-foundation.org: update gfs2]
      Signed-off-by: default avatarYing Han <yinghan@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Acked-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Bob Peterson's avatar
      GFS2: Processes waiting on inode glock that no processes are holding · f90e5b5b
      Bob Peterson authored
      This patch fixes a race in the GFS2 glock state machine that may
      result in lockups.  The symptom is that all nodes but one will
      hang, waiting for a particular glock.  All the holder records
      will have the "W" (Waiting) bit set.  The other node will
      typically have the glock stuck in Exclusive mode (EX) with no
      holder records, but the dinode will be cached.  In other words,
      an entry with "I:" will appear in the glock dump for that glock,
      but nothing else.
      The race has to do with the glock "Pending Demote" bit, which
      can be set, then immediately reset, thus losing the fact that
      another node needs the glock.  The sequence of events is:
      1. Something schedules the glock workqueue (e.g. glock request from fs)
      2. The glock workqueue gets to the point between the test of the reply pending
      bit and the spin lock:
              if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) {
                      finish_xmote(gl, gl->gl_reply);
                      drop_ref = 1;
              down_read(&gfs2_umount_flush_sem);         <---- i.e. here
      3. In comes (a) the reply to our EX lock request setting GLF_REPLY_PENDING and
                  (b) the demote request which sets GLF_PENDING_DEMOTE
      4. The following test is executed:
              if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
                  gl->gl_state != LM_ST_UNLOCKED &&
                  gl->gl_demote_state != LM_ST_EXCLUSIVE) {
      This resets the pending demote flag, and gl->gl_demote_state is not equal to
      exclusive, however because the reply from the dlm arrived after we checked for
      the GLF_REPLY_PENDING flag, gl->gl_state is still equal to unlocked, so
      although we reset the GLF_PENDING_DEMOTE flag, we didn't then set the
      GLF_DEMOTE flag or reinstate the GLF_PENDING_DEMOTE_FLAG.
      The patch closes the timing window by only transitioning the
      "Pending demote" bit to the "demote" flag once we know the
      other conditions (not unlocked and not exclusive) are met.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  23. 05 May, 2011 1 commit
  24. 25 Apr, 2011 1 commit
  25. 20 Apr, 2011 4 commits
    • Steven Whitehouse's avatar
      GFS2: Make writeback more responsive to system conditions · 4667a0ec
      Steven Whitehouse authored
      This patch adds writeback_control to writing back the AIL
      list. This means that we can then take advantage of the
      information we get in ->write_inode() in order to set off
      some pre-emptive writeback.
      In addition, the AIL code is cleaned up a bit to make it
      a bit simpler to understand.
      There is still more which can usefully be done in this area,
      but this is a good start at least.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Optimise glock lru and end of life inodes · f42ab085
      Steven Whitehouse authored
      The GLF_LRU flag introduced in the previous patch can be
      used to check if a glock is on the lru list when a new
      holder is queued and if so remove it, without having first
      to get the lru_lock.
      The main purpose of this patch however is to optimise the
      glocks left over when an inode at end of life is being
      evicted. Previously such glocks were left with the GLF_LFLUSH
      flag set, so that when reclaimed, each one required a log flush.
      This patch resets the GLF_LFLUSH flag when there is nothing
      left to flush thus preventing later log flushes as glocks are
      reused or demoted.
      In order to do this, we need to keep track of the number of
      revokes which are outstanding, and also to clear the GLF_LFLUSH
      bit after a log commit when only revokes have been processed.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Improve tracing support (adds two flags) · 627c10b7
      Steven Whitehouse authored
      This adds support for two new flags. One keeps track of whether
      the glock is on the LRU list or not. The other isn't really a
      flag as such, but an indication of whether the glock has an
      attached object or not. This indication is reported without
      any locking, which is ok since we do not dereference the object
      pointer but merely report whether it is NULL or not.
      Also, this fixes one place where a tracepoint was missing, which
      was at the point we remove deallocated blocks from the journal.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Alter point of entry to glock lru list for glocks with an address_space · 29687a2a
      Steven Whitehouse authored
      Rather than allowing the glocks to be scheduled for possible
      reclaim as soon as they have exited the journal, this patch
      delays their entry to the list until the glocks in question
      are no longer in use.
      This means that we will rely on the vm for writeback of all
      dirty data and metadata from now on. When glocks are added
      to the lru list they should be freeable much faster since all
      the I/O required to free them should have already been completed.
      This should lead to much better I/O patterns under low memory
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  26. 31 Mar, 2011 1 commit
  27. 15 Mar, 2011 1 commit