1. 04 Oct, 2012 2 commits
  2. 27 Sep, 2012 1 commit
  3. 26 Sep, 2012 1 commit
  4. 20 Sep, 2012 6 commits
  5. 17 Sep, 2012 1 commit
  6. 27 Aug, 2012 1 commit
    • Sedat Dilek's avatar
      drm/i915: Remove __GFP_NO_KSWAPD · d7c3b937
      Sedat Dilek authored
      
      
      When I pulled-in today's drm-intel-next into linux-next (next-20120824)
      I saw this build-breakage:
      
      drivers/gpu/drm/i915/i915_gem.c: In function 'i915_gem_object_get_pages_gtt':
      drivers/gpu/drm/i915/i915_gem.c:1778:40: error: '__GFP_NO_KSWAPD' undeclared (first use in this function)
      drivers/gpu/drm/i915/i915_gem.c:1778:40: note: each undeclared identifier is reported only once for each function it appears in
      
      This is caused by commit ba099ef165f8 ("mm: remove __GFP_NO_KSWAPD")
      and commit b6beae2c2014 ("mm: remove __GFP_NO_KSWAPD fixes") in
      linux-next (next-20120824).
      
      Fix this by removing __GFP_NO_KSWAPD from drm/i915 driver.
      Signed-off-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      d7c3b937
  7. 24 Aug, 2012 2 commits
    • Chris Wilson's avatar
      drm/i915: Use a non-blocking wait for set-to-domain ioctl · 3236f57a
      Chris Wilson authored
      
      
      The principal use for set-to-domain is for userspace to serialise
      operations with a particular buffer, for example to maintain coherency
      with a CPU map or to ratelimit its rendering by waiting on all previous
      operations before continuing. As such we tend to hold the struct_mutex
      for long periods during the synchronisation and so cause contention
      issues with other users of the graphics device, even for independent
      operations as memory management. An example is the contention between
      compiz and X which causes jitter in the display and a drop in peak
      throughput.
      
      The ultimate solution would be a set of fine grained locks and lockless
      operations, but an intermediate step is to first attempt the
      synchronisation for set-to-domain without holding the mutex. This
      introduces a number of race conditions, so we limit it use to the ioctl
      periphery where we have no dependent state and can safely complete with
      a locked synchronisation afterwards.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      3236f57a
    • Chris Wilson's avatar
      drm/i915: Juggle code order to ease flow of the next patch · b361237b
      Chris Wilson authored
      
      
      Move the wait-for-rendering logic around in the file so that we can
      group it together with the subsequent variations. The general goal is to
      have the lower level routines clustered together and then the higher
      level logic building upon those low level routines that came before.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      b361237b
  8. 23 Aug, 2012 3 commits
  9. 21 Aug, 2012 3 commits
    • Chris Wilson's avatar
      drm/i915: Try harder to allocate an mmap_offset · d8cb5086
      Chris Wilson authored
      Given the persistence of an offset for the lifetime of an object, itis
      easy to contemplate how the mmap space becomes badly fragmented to the
      point that further allocations fail with ENOSPC. Our only recourse at
      this point is to try to purge the objects to release some space and
      reattempt the allocation.
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=39552
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      d8cb5086
    • Chris Wilson's avatar
      drm/i915: Add some sanity checks to unbound tracking · c4670ad0
      Chris Wilson authored
      
      
      A pair of universally true checks that just need to be put in the right
      place depending on where in the patch sequence you go. Note that
      i915_gem_object_put_pages_gtt() already gains the
      BUG_ON(obj->gtt_space), but on reflection that needed to migrate to
      put_pages().
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      c4670ad0
    • Chris Wilson's avatar
      drm/i915: Track unbound pages · 6c085a72
      Chris Wilson authored
      
      
      When dealing with a working set larger than the GATT, or even the
      mappable aperture when touching through the GTT, we end up with evicting
      objects only to rebind them at a new offset again later. Moving an
      object into and out of the GTT requires clflushing the pages, thus
      causing a double-clflush penalty for rebinding.
      
      To avoid having to clflush on rebinding, we can track the pages as they
      are evicted from the GTT and only relinquish those pages on memory
      pressure.
      
      As usual, if it were not for the handling of out-of-memory condition and
      having to manually shrink our own bo caches, it would be a net reduction
      of code. Alas.
      
      Note: The patch also contains a few changes to the last-hope
      evict_everything logic in i916_gem_execbuffer.c - we no longer try to
      only evict the purgeable stuff in a first try (since that's superflous
      and only helps in OOM corner-cases, not fragmented-gtt trashing
      situations).
      
      Also, the extraction of the get_pages retry loop from bind_to_gtt (and
      other callsites) to get_pages should imo have been a separate patch.
      
      v2: Ditch the newly added put_pages (for unbound objects only) in
      i915_gem_reset. A quick irc discussion hasn't revealed any important
      reason for this, so if we need this, I'd like to have a git blame'able
      explanation for it.
      
      v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      [danvet: Split out code movements and rant a bit in the commit message
      with a few Notes. Done v2]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      6c085a72
  10. 20 Aug, 2012 1 commit
  11. 17 Aug, 2012 1 commit
  12. 10 Aug, 2012 1 commit
    • Chris Wilson's avatar
      drm/i915: Lazily apply the SNB+ seqno w/a · b2eadbc8
      Chris Wilson authored
      
      
      Avoid the forcewake overhead when simply retiring requests, as often the
      last seen seqno is good enough to satisfy the retirment process and will
      be promptly re-run in any case. Only ensure that we force the coherent
      seqno read when we are explicitly waiting upon a completion event to be
      sure that none go missing, and also for when we are reporting seqno
      values in case of error or debugging.
      
      This greatly reduces the load for userspace using the busy-ioctl to
      track active buffers, for instance halving the CPU used by X in pushing
      the pixels from a software render (flash). The effect will be even more
      magnified with userptr and so providing a zero-copy upload path in that
      instance, or in similar instances where X is simply compositing DRI
      buffers.
      
      v2: Reverse the polarity of the tachyon stream. Daniel suggested that
      'force' was too generic for the parameter name and that 'lazy_coherency'
      better encapsulated the semantics of it being an optimization and its
      purpose. Also notice that gen6_get_seqno() is only used by gen6/7
      chipsets and so the test for IS_GEN6 || IS_GEN7 is redundant in that
      function.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      b2eadbc8
  13. 26 Jul, 2012 2 commits
    • Chris Wilson's avatar
      drm/i915: Export ability of changing cache levels to userspace · e6994aee
      Chris Wilson authored
      
      
      By selecting the cache level (essentially whether or not the CPU snoops
      any updates to the bo, and on more recent machines whether it resides
      inside the CPU's last-level-cache) a userspace driver is able to then
      manage all of its memory within buffer objects, if it so desires. This
      enables the userspace driver to accelerate uploads and more importantly
      downloads from the GPU and to able to mix CPU and GPU rendering/activity
      efficiently.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      [danvet: Added code comment about where we plan to stuff platform
      specific cacheing control bits in the ioctl struct.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      e6994aee
    • Chris Wilson's avatar
      drm/i915: Segregate memory domains in the GTT using coloring · 42d6ab48
      Chris Wilson authored
      
      
      Several functions of the GPU have the restriction that differing memory
      domains cannot be placed next to each other (as the GPU may prefetch
      beyond the end of one domain and hang as it crosses into the other
      domain). We use the facility of the drm_mm to mark ranges with a
      particular color that corresponds to the cache attributes of those pages
      in order to prevent allocating adjacent blocks of differing memory
      types.
      
      v2: Rebase ontop of drm_mm coloring v2.
      v3: Fix rebinding existing gtt_space and add a verification routine.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      42d6ab48
  14. 25 Jul, 2012 11 commits
  15. 20 Jul, 2012 1 commit
  16. 15 Jul, 2012 1 commit
    • Chris Wilson's avatar
      drm: Add colouring to the range allocator · 6b9d89b4
      Chris Wilson authored
      
      
      In order to support snoopable memory on non-LLC architectures (so that
      we can bind vgem objects into the i915 GATT for example), we have to
      avoid the prefetcher on the GPU from crossing memory domains and so
      prevent allocation of a snoopable PTE immediately following an uncached
      PTE. To do that, we need to extend the range allocator with support for
      tracking and segregating different node colours.
      
      This will be used by i915 to segregate memory domains within the GTT.
      
      v2: Now with more drm_mm helpers and less driver interference.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Dave Airlie <airlied@redhat.com
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDave Airlie <airlied@gmail.com>
      6b9d89b4
  17. 05 Jul, 2012 2 commits
    • Daniel Vetter's avatar
      drm/i915: properly SIGBUS on I/O errors · a9340cca
      Daniel Vetter authored
      ... instead of looping endless with no hope of ever serving that
      page-fault. We only need to break out of this loop when the gpu died,
      to run the reset work (and hopefully resurrect it).
      
      To clarify questions Chris raised on irc: This is about handling I/O
      errors not from our own code, but e.g. when the disk died when trying
      to swap in a gem bo. So this patch remidies the issue that the current
      handling only handles gpu-death-induced cases of -EIO. Admittedly,
      dying disks are much rarer than hanging gpus ...To clarify questions
      Chris raised on irc: This is about handling I/O errors not from our
      own code, but e.g. when the disk died when trying to swap in a gem bo.
      So this patch remidies the issue that the current handling only
      handles gpu-death-induced cases of -EIO. Admittedly, dying disks are
      much rarer than hanging gpus ...
      
      This seems to have been lost in:
      
      commit d9bc7e9f
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Mon Feb 7 13:09:31 2011 +0000
      
          drm/i915: Fix infinite loop regression from 21dd3734
      
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Tested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      a9340cca
    • Daniel Vetter's avatar
      drm/i915: don't hang userspace when the gpu reset is stuck · 0a6759c6
      Daniel Vetter authored
      
      
      With the gpu reset no longer using a trylock we've increased the
      chances of userspace getting stuck quite a bit. To make that
      (hopefully) rare case more paletable time out when waiting for the gpu
      reset code to complete and signal this little issue to the caller by
      returning -EIO.
      
      This should help userspace to somewhat gracefully fall back and
      hopefully allow the user to grab some logs and reboot the machine
      (instead of staring at a frozen X screen in agony).
      
      Suggested by Chris Wilson because I've been stubborn about allowing
      the gpu reset code no to fail, ever (by removing the trylock).
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Tested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      0a6759c6