1. 01 May, 2007 11 commits
    • Robert Peterson's avatar
      [GFS2] Red Hat bz 228540: owner references · 04b933f2
      Robert Peterson authored
      In Testing the previously posted and accepted patch for
      https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228540
      
      
      I uncovered some gfs2 badness.  It turns out that the current
      gfs2 code saves off a process pointer when glocks is taken
      in both the glock and glock holder structures.  Those
      structures will persist in memory long after the process has
      ended; pointers to poisoned memory.
      
      This problem isn't caused by the 228540 fix; the new capability
      introduced by the fix just uncovered the problem.
      
      I wrote this patch that avoids saving process pointers
      and instead saves off the process pid.  Rather than
      referencing the bad pointers, it now does process lookups.
      There is special code that makes the output nicer for
      printing holder information for processes that have ended.
      
      This patch also adds a stub for the new "sprint_symbol"
      function that exists in Andrew Morton's -mm patch set, but
      won't go into the base kernel until 2.6.22, since it adds
      functionality but doesn't fix a bug.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      04b933f2
    • Benjamin Marzinski's avatar
      [GFS2] flush the log if a transaction can't allocate space · 172e045a
      Benjamin Marzinski authored
      
      
      This is a fix for bz #208514. When GFS2 frees up space, the freed blocks
      aren't available for reuse until the resource group is successfully written
      to the ondisk journal. So in rare cases, GFS2 operations will fail, saying
      that the filesystem is out of space, when in reality, you are just waiting for
      a log flush. For instance, on a 1Gig filesystem, if I continually write 10 Mb
      to a file, and then truncate it, after a hundred interations, the write will
      fail with -ENOSPC, even though the filesystem is just 1% full.
      
      The attached patch calls a log flush in these cases.  I tested this patch
      fairly heavily to check if there were any locking issues that I missed, and
      it seems to work just fine. Also, this patch only does the log flush if
      get_local_rgrp makes a complete loop of resource groups without skipping
      any do to locking issues. The code would be slightly simpler if it just always
      did the log flush after the first failed pass, and you could only ever have
      to go through the loop twice, instead of up to three times. However, I guessed
      that failing to find a rg simply do to locking issues would be common enough
      to skip the log flush in that case, but I'm not certain that this is the right
      way to go. Either way, I don't suppose this code will be hit all that often.
      Signed-off-by: default avatarBenjamin E. Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      172e045a
    • Benjamin Marzinski's avatar
      [GFS2] Fix log entry list corruption · 68835625
      Benjamin Marzinski authored
      
      
      When glock_lo_add and rg_lo_add attempt to add an element to the log, they
      check to see if has already been added before locking the log. If another
      process adds that element to the log in this window between the check and
      locking the log, the element will be added to the list twice. This causes
      the log element list to become corrupted in such a way that the log element
      can never be successfully removed from the list. This patch pulls the
      list_empty() check inside the log lock, to remove this window.
      Signed-off-by: default avatarBenjamin E. Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      68835625
    • Steven Whitehouse's avatar
      [GFS2] Speed up lock_dlm's locking (move sprintf) · f35ac346
      Steven Whitehouse authored
      
      
      The following patch speeds up lock_dlm's locking by moving the sprintf
      out from the lock acquisition path and into the lock creation path. This
      reduces the amount of CPU time used in acquiring locks by a fair amount.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Acked-by: default avatarDavid Teigland <teigland@redhat.com>
      f35ac346
    • Patrick Caulfield's avatar
      [DLM] Don't delete misc device if lockspace removal fails · 254da030
      Patrick Caulfield authored
      
      
      Currently if the lockspace removal fails the misc device associated with a
      lockspace is left deleted. After that there is no way to access the orphaned
      lockspace from userland.
      
      This patch recreates the misc device if th dlm_release_lockspace fails. I
      believe this is better than attempting to remove the lockspace first because
      that leaves an unattached device lying around. The potential gap in which there
      is no access to the lockspace between removing the misc device and recreating it
      is acceptable ... after all the application is trying to remove it, and only new
      users of the lockspace will be affected.
      Signed-Off-By: default avatarPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      254da030
    • Steven Whitehouse's avatar
      [GFS2] Fix a bug on i386 due to evaluation order · 420d2a10
      Steven Whitehouse authored
      
      
      Since gcc didn't evaluate the last two terms of the expression in
      glock.c:1881 as a constant expression, it resulted in an error on
      i386 due to the lack of a 64bit divide instruction. This adds some
      brackets to fix the problem.
      
      This was reported by Andrew Morton.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      420d2a10
    • Steven Whitehouse's avatar
      [GFS2] Fix bz 224480 and cleanup glock demotion code · 3b8249f6
      Steven Whitehouse authored
      
      
      This patch prevents the printing of a warning message in cases where
      the fs is functioning normally by handing off responsibility for
      unlinked, but still open inodes, to another node for eventual deallocation.
      Also, there is now an improved system for ensuring that such requests
      to other nodes do not get lost. The callback on the iopen lock is
      only ever called when i_nlink == 0 and when a node is unable to deallocate
      it due to it still being in use on another node. When a node receives
      the callback therefore, it knows that i_nlink must be zero, so we mark
      it as such (in gfs2_drop_inode) in order that it will then attempt
      deallocation of the inode itself.
      
      As an additional benefit, queuing a demote request no longer requires
      a memory allocation. This simplifies the code for dealing with gfs2_holders
      as it removes one special case.
      
      There are two new fields in struct gfs2_glock. gl_demote_state is the
      state which the remote node has requested and gl_demote_time is the
      time when the request came in. Both fields are only valid when the
      GLF_DEMOTE flag is set in gl_flags.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      3b8249f6
    • Josef Whiter's avatar
      [GFS2] Fix bz 231380, unlock page before dequeing glocks in gfs2_commit_write · 1de91390
      Josef Whiter authored
      
      
      If we are writing a file, and in the middle of writing the file
      another node attempts to get a shared lock on that file (by doing a du for
      example) the process doing the writing will hang waiting on lock_page.  The
      reason for this is because when we have waiters on a exclusive glock, we will go
      through and flush out all dirty pages associated with that inode and release the
      lock.  The problem is that when we flush the dirty pages, we could hit a page
      that we have locked durring the generic_file_buffered_write part of this
      operation.  This patch unlocks the page before we go to dequeue the lock and
      locks it immediatly afterwards, since generic_file_buffered_write needs the page
      locked when the commit_write is completed.  This patch resolves the problem,
      however if somebody sees a better way to do this please don't hesistate to yell.
      Signed-off-by: default avatarJosef Whiter <jwhiter@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      1de91390
    • Patrick Caulfield's avatar
      [DLM] Fix uninitialised variable in receiving · 89adc934
      Patrick Caulfield authored
      
      
      The length of the second element of the kvec array was not initialised before
      being added to the first one. This could cause invalid lengths to be passed to
      kernel_recvmsg
      Signed-Off-By: default avatarPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      89adc934
    • Josef Whiter's avatar
      [GFS2] fix bz 231369, gfs2 will oops if you specify an invalid mount option · 5c7342d8
      Josef Whiter authored
      
      
      If you specify an invalid mount option when trying to mount a gfs2 filesystem,
      gfs2 will oops.  The attached patch resolves this problem.
      Signed-off-by: default avatarJosef Whiter <jwhiter@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      5c7342d8
    • Robert Peterson's avatar
      [GFS2] Add gfs2_tool lockdump support to gfs2 (bz 228540) · 7c52b166
      Robert Peterson authored
      
      
      The attached patch resolves bz 228540.  This adds the capability
      for gfs2 to dump gfs2 locks through the debugfs file system.
      This used to exist in gfs1 as "gfs_tool lockdump" but it's missing from
      gfs2 because all the ioctls were stripped out.  Please see the bugzilla
      for more history about the fix.  This patch is also attached to the bugzilla
      record.
      
      The patch is against Steve Whitehouse's latest nmw git tree kernel
      (2.6.21-rc1) and has been tested on system trin-10.
      Signed-off-by: default avatarRobert Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      7c52b166
  2. 30 Apr, 2007 29 commits