• Benjamin Marzinski's avatar
    [GFS2] delay glock demote for a minimum hold time · c4f68a13
    Benjamin Marzinski authored
    When a lot of IO, with some distributed mmap IO, is run on a GFS2 filesystem in
    a cluster, it will deadlock. The reason is that do_no_page() will repeatedly
    call gfs2_sharewrite_nopage(), because each node keeps giving up the glock
    too early, and is forced to call unmap_mapping_range(). This bumps the
    mapping->truncate_count sequence count, forcing do_no_page() to retry. This
    patch institutes a minimum glock hold time a tenth a second.  This insures
    that even in heavy contention cases, the node has enough time to get some
    useful work done before it gives up the glock.
    A second issue is that when gfs2_glock_dq() is called from within a page fault
    to demote a lock, and the associated page needs to be written out, it will
    try to acqire a lock on it, but it has already been locked at a higher level.
    This patch puts makes gfs2_glock_dq() use the work queue as well, to avoid this
    issue. This is the same patch as Steve Whitehouse originally proposed to fix
    this issue, execpt that gfs2_glock_dq() now grabs a reference to the glock
    before it queues up the work on it.
    Signed-off-by: default avatarBenjamin E. Marzinski <bmarzins@redhat.com>
    Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
incore.h 16.1 KB