- 03 May, 2012 19 commits
-
-
Daniel Vetter authored
... and shove allow_batchbuffer in there. More dragons will follow suit. There's the curious case that we allow this for KMS ... Acked-by:
Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ben Widawsky authored
It turns out throttle had an almost identical bit of code to do the wait. Now we can call the new helper directly. This is just a bonus, and not needed for the overall series. v2: remove irq_get/put which is now in __wait_seqno (Ben) Signed-off-by:
Ben Widawsky <ben@bwidawsk.net> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ben Widawsky authored
It's about to go away anyway. Just here to help bisection. Signed-off-by:
Ben Widawsky <ben@bwidawsk.net> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ben Widawsky authored
i915_wait_request is actually a fairly large function encapsulating quite a few different operations. Because being able to wait on seqnos in various conditions is useful, extracting that bit of code to a helper function seems useful v2: pull the irq_get/put as well (Ben) Signed-off-by:
Ben Widawsky <ben@bwidawsk.net> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ben Widawsky authored
The only time irq_get should fail is during unload or suspend. Both of these points should try to quiesce the GPU before disabling interrupts and so the atomic polling should never occur. This was recommended by Chris Wilson as a way of reducing added complexity to the polled wait which I introduced in an RFC patch. 09:57 < ickle_> it's only there as a fudge for waiting after irqs after uninstalled during s&r, we aren't actually meant to hit it 09:57 < ickle_> so maybe we should just kill the code there and fix the breakage v2: return -ENODEV instead of -EBUSY when irq_get fails Signed-off-by:
Ben Widawsky <ben@bwidawsk.net> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ben Widawsky authored
The waiting_seqno is not terribly useful, and as such we can remove it so that we'll be able to extract lockless code. v2: Keep the information for error_state (Chris) Check if ring is initialized in hangcheck (Chris) Capture the waiting ring (Chris) Signed-off-by:
Ben Widawsky <ben@bwidawsk.net> [danvet: add some bikeshed to clarify a comment.] Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ben Widawsky authored
This extra bit of interrupt enabling code doesn't belong in the wait seqno function. If anything we should pull it out to a helper so the throttle code can also use it. The history is a bit vague, but I am going to attempt to just dump it, unless someone can argue otherwise. Removing this allows for a shared lock free wait seqno function. To keep tabs on this issue though, the IER value is stored on error capture (recommended by Chris Wilson) v2: fixed typo EIR->IER (Ben) Fix some white space (Ben) Move IER capture to globally instead of per ring (Ben) Signed-off-by:
Ben Widawsky <ben@bwidawsk.net> [danvet: ier is a 16 bit reg on gen2!] Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ben Widawsky authored
This originates from a hack by me to quickly fix a bug in an earlier patch where we needed control over whether or not waiting on a seqno actually did any retire list processing. Since the two operations aren't clearly related, we should pull the parameter out of the wait function, and make the caller responsible for retiring if the action is desired. The only function call site which did not get an explicit retire_request call (on purpose) is i915_gem_inactive_shrink(). That code was already calling retire_request a second time. v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel) Signed-off-by:
Ben Widawsky <ben@bwidawsk.net> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
I've missed this one. v2: Chris Wilson noticed another register. v3: Color choice improvements. Reviewed-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
We always set it so there's no point in checking. We could instead add a bit that tells us whether gem is actually initialized (i.e. either kms or gem_init_ioctl called), but that's imho not worth it. So just rip it out. There's a little change in the wait_ring timeout, but we've never run with anything else than the 60 second timeout, even on dri1 userspace. Reviewed-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
This ioctl used in a kms driver is only useful to create massive havoc. v2: Bail out with -ENODEV as suggested by Chris Wilson. Reviewed-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
The use of the mm_list by deferred-free breaks the following patches to extend the range of objects tracked. We can simplify things if we just make the unbind during free uninterrutible. Note that unbinding should never fail, because we hold an additional reference on every active object. Only the ilk vt-d workaround breaks this, but already takes care of not failing by waiting for the gpu to quiescent non-interruptible. But the existence of the deferred free list casted some doubts on this theory, hence WARN if the unbind fails and only then retry non-interruptible. We can kill this additional code after a release in case the theory is indeed right and no one has hit that WARN. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Simplify object tracking by removing the inactive but pinned list. The only place where this was used is for counting the available memory, which is just as easy performed by checking all objects on the rare occasions it is required (application startup). For ease of debugging, we keep the reporting of pinned objects through the error-state and debugfs. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
This was only used by one external caller who would just be as happy with evict-everything, so perform the replacement and make the function private. In the process we note that unbinding the inactive list should not fail, and make it a warning instead. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Currently, we only bump the inactive LRU of an object when we bind into the GTT for a page-fault. As the object may be used many times before its mapping is zapped, we do not mark it as active as frequently as we should. Userspace should be calling set-to-GTT-domain before each pointer deference (for synchronous access) and so is a good place to mark the buffer as active. Marking the buffer as recently used places it at the end of the inactive eviction queue, though still before anything with outstanding rendering. This reduces the likelihood of evicting a buffer that is going to be used again by the CPU in the near future. This way we can hopefully avoid to kick out upload buffers right before we use them on the gpu. Note that we need to check that the object is not active or pinned, for otherwise we create havoc on the active/pinned lists, which also use obj->mm_list. The active lists are sorted by and evicted in last GPU rendering order, access by the CPU to a still active buffer therefore does not affect its eviction ordering. Pinned objects are currently excluded from eviction, therefore the only list that we need to bump for GTT access by the CPU is the inactive list. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> [danvet: Added further explanations to the commit message as discussed on irc.] Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
... and put them to so good use. Note that there's functional change in vlv clock gating code, we now no longer spuriously read back the current value of the bit. According to Bspec the high bits should always read zero, so ORing this in should have no effect. Reviewed-by:
Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by:
Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by:
Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-Off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Rename obj->tiling_changed to obj->fence_dirty so that it is clear that it flags when the parameters for an active fence (including the no-fence) register are changed. Also, do not set this flag when the object does not have a fence register allocated currently and the gpu does not depend upon the unfence. This case works exactly like when a tiled object lost its fence and hence does not need additional handling for the tiling change in the code. v2: Use fence_dirty to better express what the flag tracks and add a few more details to the comments to serve as a reminder of how the GPU also uses the unfenced register slot. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> [danvet: Add some bikeshed to the commit message about the stricter use of fence_dirty.] Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ben Widawsky authored
As with one of the earlier patches in the series, we're forced to cast for copy_[to|from]_user. Again because of the nature of the GEN x86 exclusivity, this should be safe. Signed-off-by:
Ben Widawsky <benjamin.widawsky@intel.com> [danvet: Added some bikeshed.] Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
- 18 Apr, 2012 10 commits
-
-
Chris Wilson authored
We can also take advantage of the new 'no retire' mode for seqno waiting to avoid having to take a reference on the old fence object whilst flushing an existing fence. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Now that we have a routine that is able to clear the fences as well as setup up the register for a tiled object, remove the surplus routines to clear the fences. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
One clarification that we make is to the existing semantics of obj->tiling_changed to only mean that we need to update an associated fence register (including the NO_FENCE when executing an untiled but fenced GPU command). If we do not have a fence register or pending fenced GPU access for the object (after put_fence() for example), then we can clear the tiling_changed flag as any fence will necessarily be rewritten upon acquisition. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Update the existing architecture specific fence writing routines to either update the fence to point to a tiled object or to clear them in preparation to remove the other fence writing routes. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
As i915_wait_request() will first check for an already passed seqno, doing it also in the caller is a waste of space for a cold path. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
As the fences are stored in LRU order, we can simply reuse the oldest if we do not have an unused register. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
As we now never pipeline a fence update, obj->last_fenced_ring is always the same as the obj->ring whenever obj->last_fenced_seqno is active, so remove it. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
As we now no longer track a pipelined fence change, we never use ring->setup_seqno and can kill it. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Step 2 is then to replace the pipelined parameter with NULL and perform constant folding to remove dead code. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
We never succeeded in getting pipelined fencing to work (unresolved spurious GPU hangs), so begin the process of dismantling and removal the broken code. Step 1 is the removal of the pipeline parameter to get_fence(). Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
- 17 Apr, 2012 2 commits
-
-
Daniel Vetter authored
For some reason snb has 2 fields to set ppgtt cacheability. This one here does not exist on gen7. This might explain why ppgtt wasn't a win on snb like on ivb - not enough pte caching. v2: Fixup rebase fail. Reviewed-by:
Ben Widawsky <ben@bwidawsk.net> Signed-Off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
Bspec says that we need to set this: vol1c.3 "Blitter Command Streamer", Section 1.1.2.1 "GAB_CTL_REG - GAB Unit Control Register". We don't really rely on pagefaults, but who knows what this all affects. Reviewed-by:
Ben Widawsky <ben@bwidawsk.net> Signed-Off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
- 15 Apr, 2012 1 commit
-
-
Daniel Vetter authored
... we will botch up the bit17 swizzling. Furthermore tiled pwrite is a (now) unused slowpath, so no one really cares. This fixes the last swizzling issues I have with i-g-t on my bit17 swizzling i915G. No regression, it's been broken since the dawn of gem, but it's nice for regression tracking when really _all_ i-g-t tests work. Actually this is not true, Chris Wilson noticed while reviewing this patch that the commit commit d9e86c0e Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 10 16:40:20 2010 +0000 drm/i915: Pipelined fencing [infrastructure] contained a functional change that broke things. Reviewed-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
- 12 Apr, 2012 7 commits
-
-
Ben Widawsky authored
Waiting for seqno-1 in our object synchronization code is an implementation detail given how we've decided to do the waits within the rest of our code. Requested-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by:
Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ben Widawsky authored
This fixes a long standing issue where emitting the semaphore updates may have failed, but we've already updated our internal data structure. Reported-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by:
Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ben Widawsky authored
When I extracted the synchronization code for implementing semaphorified pageflips (74f5f6e0), I neglected the non pipelined case which also calls this code. The modesetting code wants to make sure the object has finished rendering to the frame before configuring the scanout (ie. non-pipelined case). As a result of a follow on discussion on IRC, I've decided to add a comment about the function itself which received much inspiration from Chris as well. So really, this patch was ghost-written by Chris :). Reported-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by:
Chris Wilson <chris@chris-wilson.co.uk> Tested-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Similar to allowing a buffer to be simultaneously read by the GPU and through the GTT, we wish to allow readback of the pages through the CPU domain whilst they are also being read by the GPU. Domain coherency is managed by allowing multiple readers, but only a single writer. This is used by mesa for its program cache which it may search for every new program every frame and then renews should it need to add. During renewal, mesa copies the program bo currently executing through a CPU mapping onto the new bo. This patch allows the search and that copy to proceed without causing a stall on the current batch. Testcase: i-g-t/tests/gem_cpu_concurrent_blit Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Signed-Off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ben Widawsky authored
In theory this will have performance and power improvements. Performance because we don't need to stall when the scanout BO is busy, and power because we don't have to stall when the BO is busy (and the ring can even go to sleep if the HW supports it). v2: squash 2 patches into 1 (me) un-inline the enable_semaphores function (Daniel) remove comment about SNB hangs from i915_gem_object_sync (Chris) rename intel_enable_semaphores to i915_semaphore_is_enabled (me) removed page flip comment; "no why" (Chris) To address other comments from Daniel (irc): update the comment to say 'vt-d is crap, don't enable semaphores' - I think you misinterpreted Chris' comment, it already exists. checking out whether we can pageflip on the render ring on ivb (didn't work on early silicon) - We don't want to enable workarounds for early silicon unless we have to. - I can't find any references in the docs about this. optionally use it if the fb is already busy on the render ring - This should be how the code already worked, unless I am misunderstanding your meaning. Signed-off-by:
Ben Widawsky <ben@bwidawsk.net> Reviewed-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
By simplifying the rules to calling get_fence when writing to the through the GTT in a tiled manner, and calling put_fence before writing to the object through the GTT in a linear manner, the code becomes clearer and there is less chance of making a mistake. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by:
Daniel Vetter <daniel.vetter@ffwll.ch> [danvet: fixed up conflict with ppgtt code and spelling in a new comment.] Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
This fixes a resume regression introduced in commit 7dd49065 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Mar 21 10:48:18 2012 +0000 drm/i915: Mark untiled BLT commands as fenced on gen2/3 which fixed fencing tracking for untiled blt commands. A side effect of that patch was that now also untiled objects have a non-zero obj->last_fenced_seqno to track when a fence can be set up after a pipelined tiling change. Unfortunately this was only cleared by the fence setup and teardown code, resulting in tons of untiled but inactive objects with non-zero last_fenced_seqno. Now after resume we completely reset the seqno tracking, both on the driver side (by setting dev_priv->next_seqno = 1) and on the hw side (by allocating a new hws page, which contains the seqnos). Hilarity and indefinite waits ensued from the stale seqnos in obj->last_fenced_seqno from before the suspend. The fix is to properly clear the fencing tracking state like we already do for the normal gpu rendering while moving objects off the active list. Reported-and-tested-by:
"Rafael J. Wysocki" <rjw@sisk.pl> Cc: Jiri Slaby <jslaby@suse.cz> Reviewed-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-
- 09 Apr, 2012 1 commit
-
-
Daniel Vetter authored
Ums is already disabled, but on ilk we can additionally disable gem initialization when using user mode setting. Upstream never support ilk without kernel modesetting and not even the RHEL ilk ums backport needs gem - that driver is based on xf86-video-intel version 2.2, which is pre-gem. Reviewed-by:
Adam Jackson <ajax@redhat.com> Reviewed-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by:
Daniel Vetter <daniel.vetter@ffwll.ch>
-