NTB: MFV 113bf1c9: BWD Link Recovery

The BWD NTB device will drop the link if an error is encountered on the point-to-point PCI bridge. The link will stay down until all errors are cleared and the link is re-established. On link down, check to see if the error is detected, if so do the necessary housekeeping to try and recover from the error and reestablish the link. There is a potential race between the 2 NTB devices recovering at the same time. If the times are synchronized, the link will not recover and the driver will be stuck in this loop forever. Add a random interval to the recovery time to prevent this race. Authored by: Jon Mason Obtained from: Linux Sponsored by: EMC / Isilon Storage Division
author: cem <cem@FreeBSD.org> 2015-10-13 17:20:47 +0000
committer: cem <cem@FreeBSD.org> 2015-10-13 17:20:47 +0000
commit: eb857d904bea92ef8435e43ca114aa6b24e27f6d (patch)
tree: bfb4c41bc26bdb642173bf76919707775742e4d7
parent: 87513df0117018831a9065eae7f1dc454f7088a8 (diff)
download: FreeBSD-src-eb857d904bea92ef8435e43ca114aa6b24e27f6d.zip
FreeBSD-src-eb857d904bea92ef8435e43ca114aa6b24e27f6d.tar.gz
1 files changed, 15 insertions, 1 deletions
diff --git a/sys/dev/ntb/ntb_hw/ntb_hw.c b/sys/dev/ntb/ntb_hw/ntb_hw.c
index 606c432..d6825bc 100644
--- a/sys/dev/ntb/ntb_hw/ntb_hw.c
+++ b/sys/dev/ntb/ntb_hw/ntb_hw.c
@@ -896,6 +896,7 @@ ntb_handle_heartbeat(void *arg)
 	if (rc != 0)
 		device_printf(ntb->device,
 		    "Error determining link status\n");
+
 	/* Check to see if a link error is the cause of the link down */
 	if (ntb->link_status == NTB_LINK_DOWN) {
 		status32 = ntb_reg_read(4, SOC_LTSSMSTATEJMP_OFFSET);
@@ -995,7 +996,15 @@ recover_soc_link(void *arg)
 	uint16_t status16;
 
 	soc_perform_link_restart(ntb);
-	pause("Link", SOC_LINK_RECOVERY_TIME * hz / 1000);
+
+	/*
+	 * There is a potential race between the 2 NTB devices recovering at
+	 * the same time.  If the times are the same, the link will not recover
+	 * and the driver will be stuck in this loop forever.  Add a random
+	 * interval to the recovery time to prevent this race.
+	 */
+	status32 = arc4random() % SOC_LINK_RECOVERY_TIME;
+	pause("Link", (SOC_LINK_RECOVERY_TIME + status32) * hz / 1000);
 
 	status32 = ntb_reg_read(4, SOC_LTSSMSTATEJMP_OFFSET);
 	if ((status32 & SOC_LTSSMSTATEJMP_FORCEDETECT) != 0)
@@ -1005,12 +1014,17 @@ recover_soc_link(void *arg)
 	if ((status32 & SOC_IBIST_ERR_OFLOW) != 0)
 		goto retry;
 
+	status32 = ntb_reg_read(4, ntb->reg_ofs.lnk_cntl);
+	if ((status32 & SOC_CNTL_LINK_DOWN) != 0)
+		goto out;
+
 	status16 = ntb_reg_read(2, ntb->reg_ofs.lnk_stat);
 	width = (status16 & NTB_LINK_WIDTH_MASK) >> 4;
 	speed = (status16 & NTB_LINK_SPEED_MASK);
 	if (ntb->link_width != width || ntb->link_speed != speed)
 		goto retry;
 
+out:
 	callout_reset(&ntb->heartbeat_timer, NTB_HB_TIMEOUT * hz,
 	    ntb_handle_heartbeat, ntb);
 	return;
author	cem <cem@FreeBSD.org>	2015-10-13 17:20:47 +0000
committer	cem <cem@FreeBSD.org>	2015-10-13 17:20:47 +0000
commit	eb857d904bea92ef8435e43ca114aa6b24e27f6d (patch)
tree	bfb4c41bc26bdb642173bf76919707775742e4d7
parent	87513df0117018831a9065eae7f1dc454f7088a8 (diff)
download	FreeBSD-src-eb857d904bea92ef8435e43ca114aa6b24e27f6d.zip FreeBSD-src-eb857d904bea92ef8435e43ca114aa6b24e27f6d.tar.gz