1. 11 Mar, 2010 1 commit
    • Benjamin Marzinski's avatar
      GFS2: Allow the number of committed revokes to temporarily be negative · 2e95e3f6
      Benjamin Marzinski authored
      GFS2 tracks the number of revokes and unrevokes that are part of committed
      transactions via sd_log_commited_revoke. It is possible for one process to add
      revokes during its transaction, while another process unrevokes them during its
      transaction. If the second process finishes its transaction first,
      sd_log_commited_revoke will be decremented by the number of unrevokes that the
      second process did, without first being incremented by the number of revokes
      the first process did. This is fine, since all started transactions must be
      completed before the journal can be flushed.  However, sd_log_commited_revoke
      is an unsigned integer, and log_refund() causes an assertion failure if it
      would go negative at the end of a transaction.  This patch makes
      sd_log_commited_revoke a signed integer and allows it to go negative.
      __gfs2_log_flush() still checks that it mataches the actual number of revokes.
      Signed-off-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  2. 01 Mar, 2010 2 commits
    • Steven Whitehouse's avatar
      GFS2: Remove loopy umount code · c1184f8a
      Steven Whitehouse authored
      As a consequence of the previous patch, we can now remove the
      loop which used to be required due to the circular dependency
      between the inodes and glocks. Instead we can just invalidate
      the inodes, and then clear up any glocks which are left.
      Also we no longer need the rwsem since there is no longer any
      danger of the inode invalidation calling back into the glock
      code (and from there back into the inode code).
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Metadata address space clean up · 009d8518
      Steven Whitehouse authored
      Since the start of GFS2, an "extra" inode has been used to store
      the metadata belonging to each inode. The only reason for using
      this inode was to have an extra address space, the other fields
      were unused. This means that the memory usage was rather inefficient.
      The reason for keeping each inode's metadata in a separate address
      space is that when glocks are requested on remote nodes, we need to
      be able to efficiently locate the data and metadata which relating
      to that glock (inode) in order to sync or sync and invalidate it
      (depending on the remotely requested lock mode).
      This patch adds a new type of glock, which has in addition to
      its normal fields, has an address space. This applies to all
      inode and rgrp glocks (but to no other glock types which remain
      as before). As a result, we no longer need to have the second
      This results in three major improvements:
       1. A saving of approx 25% of memory used in caching inodes
       2. A removal of the circular dependency between inodes and glocks
       3. No confusion between "normal" and "metadata" inodes in super.c
      Although the first of these is the more immediately apparent, the
      second is just as important as it now enables a number of clean
      ups at umount time. Those will be the subject of future patches.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  3. 03 Feb, 2010 1 commit
  4. 03 Dec, 2009 2 commits
  5. 09 Sep, 2009 1 commit
  6. 27 Aug, 2009 1 commit
    • Steven Whitehouse's avatar
      GFS2: Remove no_formal_ino generating code · 8d8291ae
      Steven Whitehouse authored
      The inum structure used throughout GFS2 has two fields. One
      no_addr is the disk block number of the inode in question and
      is used everywhere as the inode number. The other, no_formal_ino,
      is used only as the generation number for NFS.
      Historically the no_formal_ino field was set using a complicated
      system of one global and one per-node file containing inode numbers
      in order to ensure that each no_formal_ino was unique. Also this
      code made no provision for what would happen when eventually the
      (64 bit) numbers ran out. Now I know that is pretty unlikely to
      happen given the large space of numbers, but it is possible
      The only guarantee required for no_formal_ino is that, for any
      single inode, the same number doesn't get reused too quickly.
      We already have a generation number which is kept in the inode
      and initialised from a counter in the resource group (almost
      no overhead, since we have to touch the resource group anyway
      in order to allocate an inode in the first place). Aside from
      ensuring that we never use the value 0 in the no_formal_ino
      field, we can use that counter directly.
      As a result of that change, we lose about 200 lines of code and
      also gain about 10 creates/sec on the postmark benchmark (on
      my test machine).
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  7. 24 Aug, 2009 1 commit
    • Bob Peterson's avatar
      GFS2: Add "-o errors=panic|withdraw" mount options · d34843d0
      Bob Peterson authored
      This patch adds "-o errors=panic" and "-o errors=withdraw" to the
      gfs2 mount options.  The "errors=withdraw" option is today's
      current behaviour, meaning to withdraw from the file system if a
      non-serious gfs2 error occurs.  The new "errors=panic" option
      tells gfs2 to force a kernel panic if a non-serious gfs2 file
      system error occurs.  This may be useful, for example, where
      fabric-level fencing is used that has no way to reboot (such as
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  8. 30 Jul, 2009 1 commit
  9. 21 May, 2009 2 commits
    • Steven Whitehouse's avatar
      GFS2: Be more aggressive in reclaiming unlinked inodes · 1ce97e56
      Steven Whitehouse authored
      This patch increases the frequency with which gfs2 looks
      for unlinked, but still allocated inodes. Its the equivalent
      operation to ext3's orphan list, but done with bitmaps in
      the resource groups.
      This also fixes a bug where a field in the rgrp was too small.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Add a rgrp bitmap full flag · 60a0b8f9
      Steven Whitehouse authored
      During block allocation, it is useful to know if sections of disk
      are full on a finer grained basis than a single resource group.
      This can make a performance difference when resource groups have
      larger numbers of bitmap blocks, since we no longer have to search
      them all block by block in each individual bitmap.
      The full flag is set on a per-bitmap basis when it has been
      searched and found to have no free space. It is then skipped in
      subsequent searches until the flag is reset. The resetting
      occurs if we have to drop the glock on the resource group for any
      reason, or if we deallocate some blocks within that resource
      group and thus free up some space.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  10. 20 May, 2009 1 commit
    • Steven Whitehouse's avatar
      GFS2: Improve resource group error handling · 09010978
      Steven Whitehouse authored
      This patch improves the error handling in the case where we
      discover that the summary information in the resource group
      doesn't match the bitmap information while in the process of
      allocating blocks. Originally this resulted in a kernel bug,
      but this patch changes that so that we return -EIO and print
      some messages explaining what went wrong, and how to fix it.
      We also remember locally not to try and allocate from the
      same rgrp again, so that a subsequent allocation in a
      different rgrp should succeed.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  11. 19 May, 2009 1 commit
    • Steven Whitehouse's avatar
      GFS2: Umount recovery race fix · fe64d517
      Steven Whitehouse authored
      This patch fixes a race condition where we can receive recovery
      requests part way through processing a umount. This was causing
      problems since the recovery thread had already gone away.
      Looking in more detail at the recovery code, it was really trying
      to implement a slight variation on a work queue, and that happens to
      align nicely with the recently introduced slow-work subsystem. As a
      result I've updated the code to use slow-work, rather than its own home
      grown variety of work queue.
      When using the wait_on_bit() function, I noticed that the wait function
      that was supplied as an argument was appearing in the WCHAN field, so
      I've updated the function names in order to produce more meaningful
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  12. 13 May, 2009 1 commit
    • Steven Whitehouse's avatar
      GFS2: Add commit= mount option · 48c2b613
      Steven Whitehouse authored
      It has always been possible to adjust the gfs2 log commit
      interval, but only from the sysfs interface. This adds a
      mount option, commit=<nn>, which will be familar to ext3
      The sysfs interface continues to be available as well, although
      this might be removed in the future.
      Also this patch cleans up some duplicated structures in the GFS2
      sysfs code.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  13. 24 Mar, 2009 7 commits
    • Benjamin Marzinski's avatar
      GFS2: Fix locking bug in failed shared to exclusive conversion · 02ffad08
      Benjamin Marzinski authored
      After calling out to the dlm, GFS2 sets the new state of a glock to
      gl_target in gdlm_ast().  However, gl_target is not always the lock
      state that was requested. If a conversion from shared to exclusive
      fails, finish_xmote() will call do_xmote() with LM_ST_UNLOCKED, instead
      of gl->gl_target, so that it can reacquire the lock in exlusive the next
      time around.  In this case, setting the lock to gl_target in gdlm_ast()
      will make GFS2 think that it has the glock in exclusive mode, when
      really, it doesn't have the glock locked at all.  This patch adds a new
      field to the gfs2_glock structure, gl_req, to track the mode that was
      Signed-off-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Expose UUID via sysfs/uevent · 02e3cc70
      Steven Whitehouse authored
      Since we have a UUID, we ought to expose it to the user via sysfs
      and uevents. We already have the fs name in both of these places
      (a combination of the lock proto and lock table name) so if we add
      the UUID as well, we have a full set.
      For older filesystems (i.e. those created before mkfs.gfs2 was writing
      UUIDs by default) the sysfs file will appear zero length, and no UUID
      env var will be added to the uevents.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Support generation of discard requests · f15ab561
      Steven Whitehouse authored
      This patch allows GFS2 to generate discard requests for blocks which are
      no longer useful to the filesystem (i.e. those which have been freed as
      the result of an unlink operation). The requests are generated at the
      time which those blocks become available for reuse in the filesystem.
      In order to use this new feature, you have to specify the "discard"
      mount option. The code coalesces adjacent blocks into a single extent
      when generating the discard requests, thus generating the minimum
      If an error occurs when the request has been sent to the block device,
      then it will print a message and turn off the requests for that
      filesystem. If the problem is temporary, then you can use remount to
      turn the option back on again. There is also a nodiscard mount option
      so that you can use remount to turn discard requests off, if required.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Remove unused field from glock · ac2425e7
      Steven Whitehouse authored
      The time stamp field is unused in the glock now that we are
      using a shrinker, so that we can remove it and save sizeof(unsigned long)
      bytes in each glock.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Merge lock_dlm module into GFS2 · f057f6cd
      Steven Whitehouse authored
      This is the big patch that I've been working on for some time
      now. There are many reasons for wanting to make this change
      such as:
       o Reducing overhead by eliminating duplicated fields between structures
       o Simplifcation of the code (reduces the code size by a fair bit)
       o The locking interface is now the DLM interface itself as proposed
         some time ago.
       o Fewer lookups of glocks when processing replies from the DLM
       o Fewer memory allocations/deallocations for each glock
       o Scope to do further optimisations in the future (but this patch is
         more than big enough for now!)
      Please note that (a) this patch relates to the lock_dlm module and
      not the DLM itself, that is still a separate module; and (b) that
      we retain the ability to build GFS2 as a standalone single node
      filesystem with out requiring the DLM.
      This patch needs a lot of testing, hence my keeping it I restarted
      my -git tree after the last merge window. That way, this has the maximum
      exposure before its merged. This is (modulo a few minor bug fixes) the
      same patch that I've been posting on and off the the last three months
      and its passed a number of different tests so far.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Remove "double" locking in quota · 22077f57
      Steven Whitehouse authored
      We only really need a single spin lock for the quota data, so
      lets just use the lru lock for now.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Cc: Abhijith Das <adas@redhat.com>
    • Abhijith Das's avatar
      GFS2: change gfs2_quota_scan into a shrinker · 0a7ab79c
      Abhijith Das authored
      Deallocation of gfs2_quota_data objects now happens on-demand through a
      shrinker instead of routinely deallocating through the quotad daemon.
      Signed-off-by: default avatarAbhijith Das <adas@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  14. 05 Jan, 2009 11 commits
    • Steven Whitehouse's avatar
      GFS2: Kill two daemons with one patch · 97cc1025
      Steven Whitehouse authored
      This patch removes the two daemons, gfs2_scand and gfs2_glockd
      and replaces them with a shrinker which is called from the VM.
      The net result is that GFS2 responds better when there is memory
      pressure, since it shrinks the glock cache at the same rate
      as the VFS shrinks the dcache and icache. There are no longer
      any time based criteria for shrinking glocks, they are kept
      until such time as the VM asks for more memory and then we
      demote just as many glocks as required.
      There are potential future changes to this code, including the
      possibility of sorting the glocks which are to be written back
      into inode number order, to get a better I/O ordering. It would
      be very useful to have an elevator based workqueue implementation
      for this, as that would automatically deal with the read I/O cases
      at the same time.
      This patch is my answer to Andrew Morton's remark, made during
      the initial review of GFS2, asking why GFS2 needs so many kernel
      threads, the answer being that it doesn't :-) This patch is a
      net loss of about 200 lines of code.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Fix "truncate in progress" hang · 813e0c46
      Steven Whitehouse authored
      Following on from the recent clean up of gfs2_quotad, this patch moves
      the processing of "truncate in progress" inodes from the glock workqueue
      into gfs2_quotad. This fixes a hang due to the "truncate in progress"
      processing requiring glocks in order to complete.
      It might seem odd to use gfs2_quotad for this particular item, but
      we have to use a pre-existing thread since creating a thread implies
      a GFP_KERNEL memory allocation which is not allowed from the glock
      workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd
      may deadlock if used for this operation. gfs2_scand and gfs2_glockd are
      both scheduled for removal at some (hopefully not too distant) future
      point. That leaves only gfs2_quotad whose workload is generally fairly
      light and is easily adapted for this extra task.
      Also, as a result of this change, it opens the way for a future patch to
      make the reading of the inode's information asynchronous with respect to
      the glock workqueue, which is another improvement that has been on the list
      for some time now.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Clean up & move gfs2_quotad · 37b2c837
      Steven Whitehouse authored
      This patch is a clean up of gfs2_quotad prior to giving it an
      extra job to do in addition to the current portfolio of updating
      the quota and statfs information from time to time.
      As a result it has been moved into quota.c allowing one of the
      functions it calls to be made static. Also the clean up allows
      the two existing functions to have separate timeouts and also
      to coexist with its future role of dealing with the "truncate in
      progress" inode flag.
      The (pointless) setting of gfs2_quotad_secs is removed since we
      arrange to only wake up quotad when one of the two timers expires.
      In addition the struct gfs2_quota_data is moved into a slab cache,
      mainly for easier debugging. It should also be possible to use
      a shrinker in the future, rather than the current scheme of scanning
      the quota data entries from time to time.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Banish struct gfs2_rgrpd_host · 73f74948
      Steven Whitehouse authored
      This patch moves the final field so that we can get rid
      of struct gfs2_rgrpd_host, as promised some time ago. Also
      by rearranging the fields slightly, we are able to reduce
      the size of the gfs2_rgrpd structure at the same time.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Move rg_free from gfs2_rgrpd_host to gfs2_rgrpd · cfc8b549
      Steven Whitehouse authored
      The second of three fields which need to move, in order
      to remove the struct gfs2_rgrpd_host.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Move rg_igeneration into struct gfs2_rgrpd · d8b71f73
      Steven Whitehouse authored
      This moves one of the fields of struct gfs2_rgrpd_host into
      the struct gfs2_rgrpd with the eventual aim of removing
      the struct rgrpd_host completely.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Banish struct gfs2_dinode_host · 383f01fb
      Steven Whitehouse authored
      The final field in gfs2_dinode_host was the i_flags field. Thats
      renamed to i_diskflags in order to avoid confusion with the existing
      inode flags, and moved into the inode proper at a suitable location
      to avoid creating a "hole".
      At that point struct gfs2_dinode_host is no longer needed and as
      promised (quite some time ago!) it can now be removed completely.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize · c9e98886
      Steven Whitehouse authored
      This patch moved the i_size field from the gfs2_dinode_host and
      following the ext3 convention renames it i_disksize.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Move di_eattr into "proper" inode · 3767ac21
      Steven Whitehouse authored
      This moves the di_eattr field out of gfs2_inode_host and
      into the inode proper.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Move "entries" into "proper" inode · ad6203f2
      Steven Whitehouse authored
      This moves the directory entry count into the proper inode.
      Potentially we could get this to share the space used by
      something else in the future, but this is one more step
      on the way to removing the gfs2_dinode_host structure.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: Move generation number into "proper" part of inode · bcf0b5b3
      Steven Whitehouse authored
      This moves the generation number from the gfs2_dinode_host
      into the gfs2_inode structure. Eventually the plan is to get
      rid of the gfs2_dinode_host structure completely.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  15. 26 Sep, 2008 1 commit
    • Steven Whitehouse's avatar
      GFS2: Support for I/O barriers · 254db57f
      Steven Whitehouse authored
      This patch adds barrier support to GFS2. There is not a lot of change
      really... we just add the barrier flag when we write journal header
      blocks. If the underlying device refuses to support them, we fall back
      to the previous way of doing things (wait for the I/O and hope) since
      there is nothing else we can do. There is no user configuration,
      barriers will always be on unless the device refuses to support them.
      This seems a reasonable solution to me since this is a correctness
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
  16. 18 Sep, 2008 2 commits
    • Steven Whitehouse's avatar
      GFS2: high time to take some time over atime · 719ee344
      Steven Whitehouse authored
      Until now, we've used the same scheme as GFS1 for atime. This has failed
      since atime is a per vfsmnt flag, not a per fs flag and as such the
      "noatime" flag was not getting passed down to the filesystems. This
      patch removes all the "special casing" around atime updates and we
      simply use the VFS's atime code.
      The net result is that GFS2 will now support all the same atime related
      mount options of any other filesystem on a per-vfsmnt basis. We do lose
      the "lazy atime" updates, but we gain "relatime". We could add lazy
      atime to the VFS at a later date, if there is a requirement for that
      variant still - I suspect relatime will be enough.
      Also we lose about 100 lines of code after this patch has been applied,
      and I have a suspicion that it will speed things up a bit, even when
      atime is "on". So it seems like a nice clean up as well.
      From a user perspective, everything stays the same except the loss of
      the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
      least, and to be honest I don't think anybody ever used it) and that a
      number of options which were ignored before now work correctly.
      Please let me know if you've got any comments. I'm pushing this out
      early so that you can all see what my plans are.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    • Steven Whitehouse's avatar
      GFS2: The war on bloat · 37ec89e8
      Steven Whitehouse authored
      The following patch shrinks the gfs2_args structure which is embedded in
      every GFS2 superblock. It cuts down the size of the options to a single
      unsigned int (the 13 bits of bitfields will be rounded up to that size
      by the compiler) from the current 11 unsigned ints. So on x86 thats 44
      bytes shrinking to 4 bytes, in each and every GFS2 superblock.
      Signed-off-by: default avatarSteven Whitehouse <swhitho@redhat.com>
  17. 13 Aug, 2008 1 commit
    • Steven Whitehouse's avatar
      GFS2: Fix metafs mounts · 9b8df98f
      Steven Whitehouse authored
      This patch is intended to fix the issues reported in bz #457798. Instead
      of having the metafs as a separate filesystem, it becomes a second root
      of gfs2. As a result it will appear as type gfs2 in /proc/mounts, but it
      is still possible (for backwards compatibility purposes) to mount it as
      type gfs2meta. A new mount flag "meta" is introduced so that its possible
      to tell the two cases apart in /proc/mounts.
      As a result it becomes possible to mount type gfs2 with -o meta and
      get the same result as mounting type gfs2meta. So it is possible to
      mount just the metafs on its own. Currently if you do this, its then
      impossible to mount the "normal" root of the gfs2 filesystem without
      first unmounting the metafs root. I'm not sure if thats a feature or
      a bug :-)
      Either way, this is a great improvement on the previous scheme and I've
      verified that it works ok with bind mounts on both the "normal" root
      and the metafs root in various combinations.
      There were also a bunch of functions in super.c which didn't belong there,
      so this moves them into ops_fstype.c where they can be static. Hopefully
      the mount/umount sequence is now more obvious as a result.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Cc: Alexander Viro <aviro@redhat.com>
  18. 10 Jul, 2008 2 commits
  19. 27 Jun, 2008 1 commit
    • Steven Whitehouse's avatar
      [GFS2] Clean up the glock core · 6802e340
      Steven Whitehouse authored
      This patch implements a number of cleanups to the core of the
      GFS2 glock code. As a result a lot of code is removed. It looks
      like a really big change, but actually a large part of this patch
      is either removing or moving existing code.
      There are some new bits too though, such as the new run_queue()
      function which is considerably streamlined. Highlights of this
      patch include:
       o Fixes a cluster coherency bug during SH -> EX lock conversions
       o Removes the "glmutex" code in favour of a single bit lock
       o Removes the ->go_xmote_bh() for inodes since it was duplicating
       o We now only use the ->lm_lock() function for both locks and
         unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
       o The fast path is considerably shortly, giving performance gains
         especially with lock_nolock
       o The glock_workqueue is now used for all the callbacks from the DLM
         which allows us to simplify the lock_dlm module (see following patch)
       o The way is now open to make further changes such as eliminating the two
         threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
      This patch has undergone extensive testing with various test suites
      so it should be pretty stable by now.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>