Skip to content
  • Jianxin Xiong's avatar
    IB/rdmavt, hfi1: Fix NFSoRDMA failure with FRMR enabled · d9b13c20
    Jianxin Xiong authored
    
    
    Hanging has been observed while writing a file over NFSoRDMA. Dmesg on
    the server contains messages like these:
    
    [  931.992501] svcrdma: Error -22 posting RDMA_READ
    [  952.076879] svcrdma: Error -22 posting RDMA_READ
    [  982.154127] svcrdma: Error -22 posting RDMA_READ
    [ 1012.235884] svcrdma: Error -22 posting RDMA_READ
    [ 1042.319194] svcrdma: Error -22 posting RDMA_READ
    
    Here is why:
    
    With the base memory management extension enabled, FRMR is used instead
    of FMR. The xprtrdma server issues each RDMA read request as the following
    bundle:
    
    (1)IB_WR_REG_MR, signaled;
    (2)IB_WR_RDMA_READ, signaled;
    (3)IB_WR_LOCAL_INV, signaled & fencing.
    
    These requests are signaled. In order to generate completion, the fast
    register work request is processed by the hfi1 send engine after being
    posted to the work queue, and the corresponding lkey is not valid until
    the request is processed. However, the rdmavt driver validates lkey when
    the RDMA read request is posted and thus it fails immediately with error
    -EINVAL (-22).
    
    This patch changes the work flow of local operations (fast register and
    local invalidate) so that fast register work requests are always
    processed immediately to ensure that the corresponding lkey is valid
    when subsequent work requests are posted. Local invalidate requests are
    processed immediately if fencing is not required and no previous local
    invalidate request is pending.
    
    To allow completion generation for signaled local operations that have
    been processed before posting to the work queue, an internal send flag
    RVT_SEND_COMPLETION_ONLY is added. The hfi1 send engine checks this flag
    and only generates completion for such requests.
    
    Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
    Signed-off-by: default avatarJianxin Xiong <jianxin.xiong@intel.com>
    Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
    Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
    d9b13c20