Skip to content
  • Daniel Vetter's avatar
    drm/i915: non-interruptible sleeps can't handle -EAGAIN · d6b2c790
    Daniel Vetter authored
    So don't return -EAGAIN, even in the case of a gpu hang. Remap it to
    -EIO instead. Note that this isn't really an issue with
    interruptability, but more that we have quite a few codepaths (mostly
    around kms stuff) that simply can't handle any errors and hence not
    even -EAGAIN. Instead of adding proper failure paths so that we could
    restart these ioctls we've opted for the cheap way out of sleeping
    non-interruptibly.  Which works everywhere but when the gpu dies,
    which this patch fixes.
    
    So essentially interruptible == false means 'wait for the gpu or die
    trying'.'
    
    This patch is a bit ugly because intel_ring_begin is all non-interruptible
    and hence only returns -EIO. But as the comment in there says,
    auditing all the callsites would be a pain.
    
    To avoid duplicating code, reuse i915_gem_check_wedge in __wait_seqno
    and intel_wait_ring_buffer. Also use the opportunity to clarify the
    different cases in i915_gem_check_wedge a bit with comments.
    
    v2: Don't access dev_priv->mm.interruptible from check_wedge - we
    might not hold dev->struct_mutex, making this racy. Instead pass
    interruptible in as a parameter. I've noticed this because I've hit a
    BUG_ON(!mutex_is_locked) at the top of check_wedge. This has been
    added in
    
    commit b4aca010
    
    
    Author: Ben Widawsky <ben@bwidawsk.net>
    Date:   Wed Apr 25 20:50:12 2012 -0700
    
        drm/i915: extract some common olr+wedge code
    
    although that commit is missing any justification for this. I guess
    it's just copy&paste, because the same commit add the same BUG_ON
    check to check_olr, where it indeed makes sense.
    
    But in check_wedge everything we access is protected by other means,
    so this is superflous. And because it now gets in the way (we add a
    new caller in __wait_seqno, which can be called without
    dev->struct_mutext) let's just remove it.
    
    v3: Group all the i915_gem_check_wedge refactoring into this patch, so
    that this patch here is all about not returning -EAGAIN to callsites
    that can't handle syscall restarting.
    
    v4: Add clarification what interuptible == fales means in our code,
    requested by Ben Widawsky.
    
    v5: Fix EAGAIN mispell noticed by Chris Wilson.
    
    Reviewed-by: default avatarBen Widawsky <ben@bwidawsk.net>
    Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
    Tested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
    Signed-Off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
    d6b2c790