USB: EHCI: add a delay when unlinking an active QH

Michael Reutman reports that an AMD/ATI EHCI host controller on one of his computers does not stop transferring data when an active bulk QH is unlinked from the async schedule. Apparently that host controller fails to implement the IAA mechanism correctly when an active QH is unlinked. This leads to data corruption, because the controller continues to update the QH in memory when the driver doesn't expect it. As a result, the next URB submitted for that QH can hang, because the link pointers for the TD queue have been messed up. This misbehavior is observed quite regularly. To be fair, the EHCI spec (section 4.8.2) says that active QHs should not be unlinked. It goes on to recommend a procedure that involves waiting for the QH to go inactive before unlinking it. In the real world this is impractical, not least because the QH may _never_ go inactive. (What were they thinking?) Sometimes we have no choice but to unlink an active QH. In an attempt to avoid the problems that can ensue, this patch changes how the driver decides when the unlink is complete. In addition to waiting through two IAA cycles, in cases where the QH was not known to be inactive beforehand we now wait until a 2-ms period has elapsed with the host controller making no change to the QH data structure (the hw_current and hw_token fields in particular). The intuition here is that after such a long period, the endpoint must be NAKing and hopefully the QH has been dropped from the host controller's internal cache. There's no way to know if this reasoning is really valid -- the spec is no help in this regard -- but at least this approach fixes Michael's problem. The test for whether the QH is already known to be inactive involves the reason for unlinking the QH originally. If it was unlinked because it had halted, or it stopped in response to a short read, or it overlaid a dummy TD (a silicon bug), then it certainly is inactive. If it was unlinked because the TD queue was empty and no TDs have been added to the queue in the meantime, then it must be inactive. Or if the hardware status indicates that the QH is currently halted (even if that wasn't the reason for unlinking it), then it is inactive. Otherwise, if none of those checks apply, we go through the 2-ms delay. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Michael Reutman <mreutman@epiqsolutions.com> Tested-by: Michael Reutman <mreutman@epiqsolutions.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
author: Alan Stern <stern@rowland.harvard.edu> 2016-01-25 15:45:25 -0500
committer: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 2016-02-03 13:14:52 -0800
commit: 87d61912c23a746ee9a8a8d2fe17af217c87f761 (patch)
tree: 3f2239d651d7c2cf44df93095d34ad9cf6acd358 /drivers/usb/host/ehci-q.c
parent: f96fba0dbf8f6b0eaa313b4c230f93c9bb0dd759 (diff)
download: op-kernel-dev-87d61912c23a746ee9a8a8d2fe17af217c87f761.zip
op-kernel-dev-87d61912c23a746ee9a8a8d2fe17af217c87f761.tar.gz
1 files changed, 51 insertions, 5 deletions
diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
index 15dcae8..a24341e 100644
--- a/drivers/usb/host/ehci-q.c
+++ b/drivers/usb/host/ehci-q.c
@@ -1341,14 +1341,60 @@ static void end_unlink_async(struct ehci_hcd *ehci)
 	 * after the IAA interrupt occurs.  In self-defense, always go
 	 * through two IAA cycles for each QH.
 	 */
-	else if (qh->qh_state == QH_STATE_UNLINK_WAIT) {
+	else if (qh->qh_state == QH_STATE_UNLINK) {
+		/*
+		 * Second IAA cycle has finished.  Process only the first
+		 * waiting QH (NVIDIA (?) bug).
+		 */
+		list_move_tail(&qh->unlink_node, &ehci->async_idle);
+	}
+
+	/*
+	 * AMD/ATI (?) bug: The HC can continue to use an active QH long
+	 * after the IAA interrupt occurs.  To prevent problems, QHs that
+	 * may still be active will wait until 2 ms have passed with no
+	 * change to the hw_current and hw_token fields (this delay occurs
+	 * between the two IAA cycles).
+	 *
+	 * The EHCI spec (4.8.2) says that active QHs must not be removed
+	 * from the async schedule and recommends waiting until the QH
+	 * goes inactive.  This is ridiculous because the QH will _never_
+	 * become inactive if the endpoint NAKs indefinitely.
+	 */
+
+	/* Some reasons for unlinking guarantee the QH can't be active */
+	else if (qh->unlink_reason & (QH_UNLINK_HALTED |
+			QH_UNLINK_SHORT_READ | QH_UNLINK_DUMMY_OVERLAY))
+		goto DelayDone;
+
+	/* The QH can't be active if the queue was and still is empty... */
+	else if	((qh->unlink_reason & QH_UNLINK_QUEUE_EMPTY) &&
+			list_empty(&qh->qtd_list))
+		goto DelayDone;
+
+	/* ... or if the QH has halted */
+	else if	(qh->hw->hw_token & cpu_to_hc32(ehci, QTD_STS_HALT))
+		goto DelayDone;
+
+	/* Otherwise we have to wait until the QH stops changing */
+	else {
+		__hc32		qh_current, qh_token;
+
+		qh_current = qh->hw->hw_current;
+		qh_token = qh->hw->hw_token;
+		if (qh_current != ehci->old_current ||
+				qh_token != ehci->old_token) {
+			ehci->old_current = qh_current;
+			ehci->old_token = qh_token;
+			ehci_enable_event(ehci,
+					EHCI_HRTIMER_ACTIVE_UNLINK, true);
+			return;
+		}
+ DelayDone:
 		qh->qh_state = QH_STATE_UNLINK;
 		early_exit = true;
 	}
-
-	/* Otherwise process only the first waiting QH (NVIDIA bug?) */
-	else
-		list_move_tail(&qh->unlink_node, &ehci->async_idle);
+	ehci->old_current = ~0;		/* Prepare for next QH */
 
 	/* Start a new IAA cycle if any QHs are waiting for it */
 	if (!list_empty(&ehci->async_unlink))
author	Alan Stern <stern@rowland.harvard.edu>	2016-01-25 15:45:25 -0500
committer	Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-02-03 13:14:52 -0800
commit	87d61912c23a746ee9a8a8d2fe17af217c87f761 (patch)
tree	3f2239d651d7c2cf44df93095d34ad9cf6acd358 /drivers/usb/host/ehci-q.c
parent	f96fba0dbf8f6b0eaa313b4c230f93c9bb0dd759 (diff)
download	op-kernel-dev-87d61912c23a746ee9a8a8d2fe17af217c87f761.zip op-kernel-dev-87d61912c23a746ee9a8a8d2fe17af217c87f761.tar.gz