1. 25 Jan, 2008 7 commits
    • Bob Peterson's avatar
      [GFS2] Reorganize function gfs2_glmutex_lock · 398bbe68
      Bob Peterson authored
      This patch optimizes the function gfs2_glmutex_lock.
      The basic theory is: Why bother initializing a holder, setting up
      wait bits and then waiting on them, if you know the glock can be
      yours.  So the holder stuff is placed inside the if checking if the
      glock is locked.  This one needs careful scrutiny because changing
      anything to do with locking should strike terror into one's heart.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Fabio Massimo Di Nitto's avatar
      [GFS2] Fix runtime issue with UP kernels · 1a2781cf
      Fabio Massimo Di Nitto authored
      The issue is indeed UP vs SMP and it is totally random.
      spin_is_locked() is a bad assertion because there is no correct answer on UP.
      on UP spin_is_locked() has to return either one value or another, always.
      This means that in my setup I am lucky enough to trigger the issue and your you
      are lucky enough not to.
      the patch in attachment removes the bogus calls to BUG_ON and according to David
      (in CC and thanks for the long explanation on the problem) we can rely upon
      things like lockdep to find problem that might be trying to catch.
      Signed-off-by: default avatarFabio M. Di Nitto <fabbione@ubuntu.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      [GFS2] Don't add glocks to the journal · 2bcd610d
      Steven Whitehouse authored
      The only reason for adding glocks to the journal was to keep track
      of which locks required a log flush prior to release. We add a
      flag to the glock to allow this check to be made in a simpler way.
      This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64)
      and means that we can avoid extra work during the journal flush.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      [GFS2] Remove flags no longer required · e589665e
      Steven Whitehouse authored
      The HIF_MUTEX and HIF_PROMOTE flags were set on the glock holders
      depending upon which of the two waiters lists they were going to
      be queued upon. They were then tested when the holders were taken
      off the lists to ensure that the right type of holder was being
      Since we are already using separate lists, there doesn't seem a
      lot of point having these flags as well, and since setting them
      and testing them is in the fast path for locking and unlocking
      glock, this patch removes them.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      [GFS2] Reorder writeback for glock sync · 3042a2cc
      Steven Whitehouse authored
      Previously we were doing (write data, wait for data, write metadata, wait
      for metadata). After this patch we so (write metadata, write data, wait for
      data, wait for metadata) which should be more efficient.
      Also I noticed that the drop_bh and xmote_bh functions were almost
      identical. In fact the only difference was a single test, and that
      test is such that in the drop_bh case, it would always evaluate to
      the correct result. As such we can use the xmote_bh functions in
      all the places where we were using the drop_bh function and remove
      the drop_bh functions.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      [GFS2] Remove "reclaim limit" · c2932e03
      Steven Whitehouse authored
      This call to reclaim glocks is not needed, and in particular we don't want it
      in the fast path for locking glocks. The limit was entirely arbitrary anyway
      and we can't expect users to adjust things like this, the remaining code will
      do the right thing on its own.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Wendy Cheng's avatar
      [GFS2] Handle multiple glock demote requests · cc7e79b1
      Wendy Cheng authored
      Fix a race condition where multiple glock demote requests are sent to
      a node back-to-back. This patch does a check inside handle_callback()
      to see whether a demote request is in progress. If true, it sets a flag
      to make sure run_queue() will loop again to handle the new request,
      instead of erronously setting gl_demote_state to a different state.
      Signed-off-by: default avatarS. Wendy Cheng <wcheng@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  2. 10 Oct, 2007 10 commits
  3. 09 Jul, 2007 4 commits
    • Steven Whitehouse's avatar
      [GFS2] Simplify multiple glock aquisition · eaf5bd3c
      Steven Whitehouse authored
      There is a bug in the code which acquires multiple glocks where if the
      initial out-of-order attempt fails part way though we can land up trying
      to acquire the wrong number of glocks. This is part of the fix for red
      hat bz #239737. The other part of the bz doesn't apply to upstream
      kernels since it was fixed by:
      Since the out-of-order code doesn't appear to add anything to the
      performance of GFS2, this patch just removed it rather than trying to
      fix it. It should be much easier to see whats going on here now. In
      addition, we don't allocate any memory unless we are using a lot of
      glocks (which is a relatively uncommon case).
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Abhijith Das's avatar
      [GFS2] Fix deallocation issues · d93cfa98
      Abhijith Das authored
      There were two issues during deallocation of unlinked inodes. The
      first was relating to the use of a "try" lock which in the case of
      the inode lock wasn't trying hard enough to deallocate in all
      circumstances (now changed to a normal glock) and in the case of
      the iopen lock didn't wait for the demotion of the shared lock before
      attempting to get the exclusive lock, and thereby sometimes (timing dependent)
      not completing the deallocation when it should have done.
      The second issue related to the lack of a way to invalidate dcache entries
      on remote nodes (now fixed by this patch) which meant that unlinks were
      taking a long time to return disk space to the fs. By adding some code to
      invalidate the dcache entries across the cluster for unlinked inodes, that
      is now fixed.
      This patch was written jointly by Abhijith Das and Steven Whitehouse.
      Signed-off-by: default avatarAbhijith Das <adas@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      [GFS2] Clean up inode number handling · dbb7cae2
      Steven Whitehouse authored
      This patch cleans up the inode number handling code. The main difference
      is that instead of looking up the inodes using a struct gfs2_inum_host
      we now use just the no_addr member of this structure. The tests relating
      to no_formal_ino can then be done by the calling code. This has
      advantages in that we want to do different things in different code
      paths if the no_formal_ino doesn't match. In the NFS patch we want to
      return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
      no_formal_ino doesn't match and thus we can withdraw in this case.
      In order to later fix bz #201012, we need to be able to look up an inode
      without knowing no_formal_ino, as the only information that is known to
      us is the on-disk location of the inode in question.
      This patch will also help us to fix bz #236099 at a later date by
      cleaning up a lot of the code in that area.
      There are no user visible changes as a result of this patch and there
      are no changes to the on-disk format either.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Robert Peterson's avatar
      [GFS2] Addendum patch 2 for gfs2_grow · cd81a4ba
      Robert Peterson authored
      This addendum patch 2 corrects three things:
      1. It fixes a stupid mistake in the previous addendum that broke gfs2.
         Ref: https://www.redhat.com/archives/cluster-devel/2007-May/msg00162.html
      2. It fixes a problem that Dave Teigland pointed out regarding the
         external declarations in ops_address.h being in the wrong place.
      3. It recasts a couple more %llu printks to (unsigned long long)
         as requested by Steve Whitehouse.
      I would have loved to put this all in one revised patch, but there was
      a rush to get some patches for RHEL5.	Therefore, the previous patches
      were applied to the git tree "as is" and therefore, I'm posting another
      addendum.  Sorry.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  4. 01 May, 2007 8 commits
    • Steven Whitehouse's avatar
      [GFS2] Uncomment sprintf_symbol calling code · 37fde8ca
      Steven Whitehouse authored
      Now that the patch from -mm has gone upstream, we can uncomment the code
      in GFS2 which uses sprintf_symbol.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Cc: Robert Peterson <rpeterso@redhat.com>
    • Robert Peterson's avatar
      [GFS2] lockdump improvements · 5f882096
      Robert Peterson authored
      The patch below consists of the following changes (in code order):
      1. I fixed a minor compiler warning regarding the printing of
         a kernel symbol address.
      2. I implemented a suggestion from Dave Teigland that moves
         the debugfs information for gfs2 into a subdirectory so
         we can easily expand our use of debugfs in the future.
         The current code keeps the glock information in:
         With the patch, the new code keeps the glock information in:
         That will allow us to create more debugfs files in the future.
      3. This fixes a bug whereby a failed mount attempt causes the
         debugfs file to not be deleted.  Failed mount attempts should
         always clean up after themselves, including deleting the
         debugfs file and/or directory.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Robert Peterson's avatar
      [GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump) · 7a0079d9
      Robert Peterson authored
      This is for Bugzilla Bug 236008: Kernel gpf doing cat /debugfs/gfs2/xxx
      (lock dump) seen at the "gfs2 summit".  This also fixes the bug that caused
      garbage to be printed by the "initialized at" field.  I apologize for the
      kludge, but that code will all be ripped out anyway when the official
      sprint_symbol function becomes available in the Linux kernel.  I also
      changed some formatting so that spaces are replaced by proper tabs.
      Signed-off-by: default avatarRobert Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Robert Peterson's avatar
      [GFS2] Red Hat bz 228540: owner references · 04b933f2
      Robert Peterson authored
      In Testing the previously posted and accepted patch for
      I uncovered some gfs2 badness.  It turns out that the current
      gfs2 code saves off a process pointer when glocks is taken
      in both the glock and glock holder structures.  Those
      structures will persist in memory long after the process has
      ended; pointers to poisoned memory.
      This problem isn't caused by the 228540 fix; the new capability
      introduced by the fix just uncovered the problem.
      I wrote this patch that avoids saving process pointers
      and instead saves off the process pid.  Rather than
      referencing the bad pointers, it now does process lookups.
      There is special code that makes the output nicer for
      printing holder information for processes that have ended.
      This patch also adds a stub for the new "sprint_symbol"
      function that exists in Andrew Morton's -mm patch set, but
      won't go into the base kernel until 2.6.22, since it adds
      functionality but doesn't fix a bug.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      [GFS2] Fix a bug on i386 due to evaluation order · 420d2a10
      Steven Whitehouse authored
      Since gcc didn't evaluate the last two terms of the expression in
      glock.c:1881 as a constant expression, it resulted in an error on
      i386 due to the lack of a 64bit divide instruction. This adds some
      brackets to fix the problem.
      This was reported by Andrew Morton.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
    • Steven Whitehouse's avatar
      [GFS2] Fix bz 224480 and cleanup glock demotion code · 3b8249f6
      Steven Whitehouse authored
      This patch prevents the printing of a warning message in cases where
      the fs is functioning normally by handing off responsibility for
      unlinked, but still open inodes, to another node for eventual deallocation.
      Also, there is now an improved system for ensuring that such requests
      to other nodes do not get lost. The callback on the iopen lock is
      only ever called when i_nlink == 0 and when a node is unable to deallocate
      it due to it still being in use on another node. When a node receives
      the callback therefore, it knows that i_nlink must be zero, so we mark
      it as such (in gfs2_drop_inode) in order that it will then attempt
      deallocation of the inode itself.
      As an additional benefit, queuing a demote request no longer requires
      a memory allocation. This simplifies the code for dealing with gfs2_holders
      as it removes one special case.
      There are two new fields in struct gfs2_glock. gl_demote_state is the
      state which the remote node has requested and gl_demote_time is the
      time when the request came in. Both fields are only valid when the
      GLF_DEMOTE flag is set in gl_flags.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Josef Whiter's avatar
      [GFS2] fix bz 231369, gfs2 will oops if you specify an invalid mount option · 5c7342d8
      Josef Whiter authored
      If you specify an invalid mount option when trying to mount a gfs2 filesystem,
      gfs2 will oops.  The attached patch resolves this problem.
      Signed-off-by: default avatarJosef Whiter <jwhiter@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Robert Peterson's avatar
      [GFS2] Add gfs2_tool lockdump support to gfs2 (bz 228540) · 7c52b166
      Robert Peterson authored
      The attached patch resolves bz 228540.  This adds the capability
      for gfs2 to dump gfs2 locks through the debugfs file system.
      This used to exist in gfs1 as "gfs_tool lockdump" but it's missing from
      gfs2 because all the ioctls were stripped out.  Please see the bugzilla
      for more history about the fix.  This patch is also attached to the bugzilla
      The patch is against Steve Whitehouse's latest nmw git tree kernel
      (2.6.21-rc1) and has been tested on system trin-10.
      Signed-off-by: default avatarRobert Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  5. 07 Mar, 2007 2 commits
  6. 05 Feb, 2007 9 commits
    • Steven Whitehouse's avatar
      [GFS2] Put back semaphore to avoid umount problem · 61be084e
      Steven Whitehouse authored
      Dave Teigland fixed this bug a while back, but I managed to mistakenly
      remove the semaphore during later development. It is required to avoid
      the list of inodes changing during an invalidate_inodes call. I have
      made it an rwsem since the read side will be taken frequently during
      normal filesystem operation. The write site will only happen during
      umount of the file system.
      Also the bug only triggers when using the DLM lock manager and only then
      under certain conditions as its timing related.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Cc: David Teigland <teigland@redhat.com>
    • Steven Whitehouse's avatar
      [GFS2] Fix typo in glock.c · d043e190
      Steven Whitehouse authored
      This is a one letter typo fix in glock.c, spotted by Rob Kenna.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      [GFS2] Compile fix for glock.c · 90101c31
      Steven Whitehouse authored
      This one liner got missed from the previous patch.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      [GFS2] Remove queue_empty() function · 12132933
      Steven Whitehouse authored
      This function is not longer required since we do not do recursive
      locking in the glock layer. As a result all its callers can be
      replaceed with list_empty() calls.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      [GFS2] Tidy up glops calls · b5d32bea
      Steven Whitehouse authored
      This patch doesn't make any changes to the ordering of the various
      operations related to glocking, but it does tidy up the calls to the
      glops.c functions to make the structure more obvious.
      The two functions: gfs2_glock_xmote_th() and gfs2_glock_drop_th() can be
      made static within glock.c since they are called by every set of glock
      operations. The xmote_th and drop_th glock operations are then made
      conditional upon those two routines existing and called from the
      previously mentioned functions in glock.c respectively.
      Also it can be seen that the go_sync operation isn't needed since it can
      easily be replaced by calls to xmote_bh and drop_bh respectively. This
      results in no longer (confusingly) calling back into routines in glock.c
      from glops.c and also reducing the glock operations by one member.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      [GFS2] Remove local exclusive glock mode · 1c0f4872
      Steven Whitehouse authored
      Here is a patch for GFS2 to remove the local exclusive flag. In
      the places it was used, mutex's are always held earlier in the
      call path, so it appears redundant in the LM_ST_SHARED case.
      Also, the GFS2 holders were setting local exclusive in any case where
      the requested lock was LM_ST_EXCLUSIVE. So the other places in the glock
      code where the flag was tested have been replaced with tests for the
      lock state being LM_ST_EXCLUSIVE in order to ensure the logic is the
      same as before (i.e. LM_ST_EXCLUSIVE is always locally exclusive as well
      as globally exclusive).
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      [GFS2] Remove unused go_callback operation · 6bd9c8c2
      Steven Whitehouse authored
      This is never used, so we might as well remove it.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      [GFS2] Remove the "greedy" function from glock.[ch] · e5dab552
      Steven Whitehouse authored
      The "greedy" code was an attempt to retain glocks for a minimum length
      of time when they relate to mmap()ed files. The current implementation
      of this feature is not, however, ideal in that it required allocating
      memory in order to do this and its overly complicated.
      It also misses the mark by ignoring the other I/O operations which are
      just as likely to suffer from the same problem. So the plan is to remove
      this now and then add the functionality back as part of the glock state
      machine at a later date (and thus take into account all the possible
      users of this feature)
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      [GFS2] Shrink gfs2_inode memory by half · fee852e3
      Steven Whitehouse authored
      Here is something I spotted (while looking for something entirely
      different) the other day.
      Rather than using a completion in each and every struct gfs2_holder,
      this removes it in favour of hashed wait queues, thus saving a
      considerable amount of memory both on the stack (where a number of
      gfs2_holder structures are allocated) and in particular in the
      gfs2_inode which has 8 gfs2_holder structures embedded within it.
      As a result on x86_64 the gfs2_inode shrinks from 2488 bytes to
      1912 bytes, a saving of 576 bytes per inode (no thats not a typo!).
      In actual practice we get a much better result than that since
      now that a gfs2_inode is under the 2048 byte barrier, we get two
      per 4k slab page effectively halving the amount of memory required
      to store gfs2_inodes.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>