Skip to content
  • Guoqing Jiang's avatar
    md-cluster: change resync lock from asynchronous to synchronous · 41a9a0dc
    Guoqing Jiang authored
    
    
    If multiple nodes choose to attempt do resync at the same time
    they need to be serialized so they don't duplicate effort. This
    serialization is done by locking the 'resync' DLM lock.
    
    Currently if a node cannot get the lock immediately it doesn't
    request notification when the lock becomes available (i.e.
    DLM_LKF_NOQUEUE is set), so it may not reliably find out when it
    is safe to try again.
    
    Rather than trying to arrange an async wake-up when the lock
    becomes available, switch to using synchronous locking - this is
    a lot easier to think about.  As it is not permitted to block in
    the 'raid1d' thread, move the locking to the resync thread.  So
    the rsync thread is forked immediately, but it blocks until the
    resync lock is available. Once the lock is locked it checks again
    if any resync action is needed.
    
    A particular symptom of the current problem is that a node can
    get stuck with "resync=pending" indefinitely.
    
    Reviewed-by: default avatarNeilBrown <neilb@suse.com>
    Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
    Signed-off-by: default avatarShaohua Li <shli@fb.com>
    41a9a0dc