drm/i915: Restore context and pd for ringbuffer submission after reset

Following a reset, the context and page directory registers are lost. However, the queue of requests that we resubmit after the reset may depend upon them - the registers are restored from a context image, but that restore may be inhibited and may simply be absent from the request if it was in the middle of a sequence using the same context. If we prime the CCID/PD registers with the first request in the queue (even for the hung request), we prevent invalid memory access for the following requests (and continually hung engines). v2: Magic BIT(8), reserved for future use but still appears unused. v3: Some commentary on handling innocent vs guilty requests v4: Add a wait for PD_BASE fetch. The reload appears to be instant on my Ivybridge, but this bit probably exists for a reason. Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170207152437.4252-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> (cherry picked from commit c0dcb203fb009678e5be9e7782329dcfbbf16439) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
author: Chris Wilson <chris@chris-wilson.co.uk> 2017-02-07 15:24:37 +0000
committer: Jani Nikula <jani.nikula@intel.com> 2017-02-16 11:59:11 +0200
commit: ec62ed3e1d93843b382c222bc0d81546f12c97b8 (patch)
tree: 230d8cf2adb18dc6b66bdb6b7dac788129c8461c /drivers/gpu/drm/i915/intel_lrc.c
parent: 26d12c619476ccbc6725aa4a17dcb1d41d5774e7 (diff)
download: op-kernel-dev-ec62ed3e1d93843b382c222bc0d81546f12c97b8.zip
op-kernel-dev-ec62ed3e1d93843b382c222bc0d81546f12c97b8.tar.gz
1 files changed, 15 insertions, 1 deletions
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2e767eb..ebf8023 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1390,7 +1390,20 @@ static void reset_common_ring(struct intel_engine_cs *engine,
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 	struct execlist_port *port = engine->execlist_port;
-	struct intel_context *ce = &request->ctx->engine[engine->id];
+	struct intel_context *ce;
+
+	/* If the request was innocent, we leave the request in the ELSP
+	 * and will try to replay it on restarting. The context image may
+	 * have been corrupted by the reset, in which case we may have
+	 * to service a new GPU hang, but more likely we can continue on
+	 * without impact.
+	 *
+	 * If the request was guilty, we presume the context is corrupt
+	 * and have to at least restore the RING register in the context
+	 * image back to the expected values to skip over the guilty request.
+	 */
+	if (!request || request->fence.error != -EIO)
+		return;
 
 	/* We want a simple context + ring to execute the breadcrumb update.
 	 * We cannot rely on the context being intact across the GPU hang,
@@ -1399,6 +1412,7 @@ static void reset_common_ring(struct intel_engine_cs *engine,
 	 * future request will be after userspace has had the opportunity
 	 * to recreate its own state.
 	 */
+	ce = &request->ctx->engine[engine->id];
 	execlists_init_reg_state(ce->lrc_reg_state,
 				 request->ctx, engine, ce->ring);
author	Chris Wilson <chris@chris-wilson.co.uk>	2017-02-07 15:24:37 +0000
committer	Jani Nikula <jani.nikula@intel.com>	2017-02-16 11:59:11 +0200
commit	ec62ed3e1d93843b382c222bc0d81546f12c97b8 (patch)
tree	230d8cf2adb18dc6b66bdb6b7dac788129c8461c /drivers/gpu/drm/i915/intel_lrc.c
parent	26d12c619476ccbc6725aa4a17dcb1d41d5774e7 (diff)
download	op-kernel-dev-ec62ed3e1d93843b382c222bc0d81546f12c97b8.zip op-kernel-dev-ec62ed3e1d93843b382c222bc0d81546f12c97b8.tar.gz