drm/i915: Signal first fence from irq handler if complete

As execlists and other non-semaphore multi-engine devices coordinate between engines using interrupts, we can shave off a few 10s of microsecond of scheduling latency by doing the fence signaling from the interrupt as opposed to a RT kthread. (Realistically the delay adds about 1% to an individual cross-engine workload.) We only signal the first fence in order to limit the amount of work we move into the interrupt handler. We also have to remember that our breadcrumbs may be unordered with respect to the interrupt and so we still require the waiter process to perform some heavyweight coherency fixups, as well as traversing the tree of waiters. v2: No need for early exit in irq handler - it breaks the flow between patches and prevents the tracepoint v3: Restore rcu hold across irq signaling of request Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170227205850.2828-2-chris@chris-wilson.co.uk
author: Chris Wilson <chris@chris-wilson.co.uk> 2017-02-27 20:58:48 +0000
committer: Chris Wilson <chris@chris-wilson.co.uk> 2017-02-27 21:57:20 +0000
commit: 56299fb7d9047cc1d25362827073b2ac0984ed21 (patch)
tree: 311ae5bcaec77364beb1a0514886e73534a5516d /drivers/gpu/drm/i915/i915_irq.c
parent: 8d769ea7bc16c34c9dc5143be021e943014c4cd1 (diff)
download: op-kernel-dev-56299fb7d9047cc1d25362827073b2ac0984ed21.zip
op-kernel-dev-56299fb7d9047cc1d25362827073b2ac0984ed21.tar.gz
1 files changed, 33 insertions, 3 deletions
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 312d30e..e06e6eb 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1033,12 +1033,42 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
 
 static void notify_ring(struct intel_engine_cs *engine)
 {
-	bool waiters;
+	struct drm_i915_gem_request *rq = NULL;
+	struct intel_wait *wait;
 
 	atomic_inc(&engine->irq_count);
 	set_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted);
-	waiters = intel_engine_wakeup(engine);
-	trace_intel_engine_notify(engine, waiters);
+
+	rcu_read_lock();
+
+	spin_lock(&engine->breadcrumbs.lock);
+	wait = engine->breadcrumbs.first_wait;
+	if (wait) {
+		/* We use a callback from the dma-fence to submit
+		 * requests after waiting on our own requests. To
+		 * ensure minimum delay in queuing the next request to
+		 * hardware, signal the fence now rather than wait for
+		 * the signaler to be woken up. We still wake up the
+		 * waiter in order to handle the irq-seqno coherency
+		 * issues (we may receive the interrupt before the
+		 * seqno is written, see __i915_request_irq_complete())
+		 * and to handle coalescing of multiple seqno updates
+		 * and many waiters.
+		 */
+		if (i915_seqno_passed(intel_engine_get_seqno(engine),
+				      wait->seqno))
+			rq = wait->request;
+
+		wake_up_process(wait->tsk);
+	}
+	spin_unlock(&engine->breadcrumbs.lock);
+
+	if (rq)
+		dma_fence_signal(&rq->fence);
+
+	rcu_read_unlock();
+
+	trace_intel_engine_notify(engine, wait);
 }
 
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
author	Chris Wilson <chris@chris-wilson.co.uk>	2017-02-27 20:58:48 +0000
committer	Chris Wilson <chris@chris-wilson.co.uk>	2017-02-27 21:57:20 +0000
commit	56299fb7d9047cc1d25362827073b2ac0984ed21 (patch)
tree	311ae5bcaec77364beb1a0514886e73534a5516d /drivers/gpu/drm/i915/i915_irq.c
parent	8d769ea7bc16c34c9dc5143be021e943014c4cd1 (diff)
download	op-kernel-dev-56299fb7d9047cc1d25362827073b2ac0984ed21.zip op-kernel-dev-56299fb7d9047cc1d25362827073b2ac0984ed21.tar.gz