• Daniel Vetter's avatar
    drm/i915: implement inline clflush for pwrite · 58642885
    Daniel Vetter authored
    
    
    In micro-benchmarking of the usual pwrite use-pattern of alternating
    pwrites with gtt domain reads from the gpu, this yields around 30%
    improvement of pwrite throughput across all buffers size. The trick is
    that we can avoid clflush cachelines that we will overwrite completely
    anyway.
    
    Furthermore for partial pwrites it gives a proportional speedup on top
    of the 30% percent because we only clflush back the part of the buffer
    we're actually writing.
    
    v2: Simplify the clflush-before-write logic, as suggested by Chris
    Wilson.
    
    v3: Finishing touches suggested by Chris Wilson:
    - add comment to needs_clflush_before and only set this if the bo is
      uncached.
    - s/needs_clflush/needs_clflush_after/ in the write paths to clearly
      differentiate it from needs_clflush_before.
    
    Tested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
    Signed-Off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
    58642885