summaryrefslogtreecommitdiffstats
path: root/sys/dev/isp/isp.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Redo how default Node and Port WWNs are determined (again!). This is somjacob2000-10-121-29/+66
| | | | we don't stomp on the differences between ports for a Qlogic 2202.
* some copyright cleanupsmjacob2000-09-211-6/+3
|
* Inintialize the queue index stuff from what the f/w sends back- justmjacob2000-09-211-3/+4
| | | | | | in case it's insane enough to not do what you tell it to. Print out (LOGINFO level) initiator ID.
* various fixesmjacob2000-08-271-170/+277
|
* Major whacking for core version 2.0. A major motivator for 2.0 and thesemjacob2000-08-011-938/+980
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | changes is that there's now a Solaris port of this driver, so some things in the core version had to change (not much, but some). In order, from the top.....: A lot of error strings are gathered in one place at the head of the file. This caused me to rewrite them to look consistent (with respect to things like 'Port 0x%' and 'Target %d' and 'Loop ID 0x%x'. The major mailbox function, isp_mboxcmd, now takes a third argument, which is a mask that selectively says whether mailbox command failures will be logged. This will substantially reduce a lot of spurious noise from the driver. At the first run through isp_reset we used to try and get the current running firmware's revision by issuing a mailbox command. This would invariably fail on alpha's with anything but a Qlogic 1040 since SRM doesn't *start* the f/w on these cards. Instead, we now see whether we're sitting ROM state before trying to get a running BIOS loaded f/w version. All CFGPRINTF/PRINTF/IDPRINTF macros have been replaced with calls to isp_prt. There are seperate print levels that can be independently set (see ispvar.h), which include debugging, etc. All SYS_DELAY macros are now USEC_DELAY macros. RQUEST_QUEUE_LEN and RESULT_QUEUE_LEN now take ispsoftc as a parameter- the Fibre Channel cards and the Ultra2/Ultra3 cards can have 16 bit request queue entry indices, so we can make a 1024 entry index for them instead of the 256 entries we've had until now. A major change it to fix isp_fclink_test to actually only wait the delay of time specified in the microsecond argument being passed. The problem has always been that a call to isp_mboxcmd to get he current firmware state takes an unknown (sometimes long) amount of time- this is if the firmware is busy doing PLOGIs while we ask it what's up. So, up until now, the usdelay argument has been a joke. The net effect has been that if you boot without being plugged into a good loop or into a switch, you hang. Massively annonying, and hard to fix because the actual time delta was impossible to know from just guessing. Now, using the new GET_NANOTIME macros, a precise and measured amount of USEC_DELAY calls are done so that only the specified usecdelay is allowed to pass. This means that if the initial startup of the firmware if followed by a call from isp_freebsd.c:isp_attach to isp_control(isp, ISP_FCLINK_TEST, &tdelay) where tdelay is 2 * 1000000, no more than two seconds will actually elapse before we leave concluding that the cable is unhooked. Jeez. About time.... Change the ispscsicmd entry point to isp_start, and the XS_CMD_DONE macro to a call to the platform supplied isp_done (sane naming). Limit our size of request queue completions we'll look at at interrupt time. Since we've increased the size of the Request Queue (and the size of the Response Queue proportionally), let's not create an interrupt stack overflow by having to keep a max completion list (forw links are not an option because this is common code with some platforms that don't have link space in their XS_T structures). A limit of 32 is not unreasonable- I doubt there'd be even this many request queue completions at a time- remember, most boards now use fast posting for normal command completion instead of filling out response queue entries. In the isp_mboxcmd cleanup, also create an array of command names so that "ABOUT FIRMWARE" can be printed instead of "CMD #8". Remove the isp_lostcmd function- it's been deprecated for a while. Remove isp_dumpregs- the ISP_DUMPREGS goes to the specific bus register dump fucntion. Various other cleanups.
* Raise debug level for some messages. Fix botched inversionmjacob2000-07-181-11/+9
| | | | about MBOX_COMMAND_ERROR vs. MBOX_COMMAND_PARAM_ERROR.
* Clean up ISPCTL_ABORT_CMD function to not be too chatty if it succeeds,mjacob2000-07-051-7/+14
| | | | | or even if it fails with INVALID_PARM (which just means that the handle doesn't refer to an active commane).
* Change delay loop in new isp_mboxcmd to the use of the new MBOX_WAIT_COMPLETEmjacob2000-07-041-7/+2
| | | | | macro. Change notification of completion of a mailbox command in isp_intr to MBOX_NOTIFY_COMPLETE macro.
* Fix usage of DELAY (SYS_DELAY is the platform independent localmjacob2000-06-271-469/+382
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | define). Fix stupidity wrt checking whether we've gone to LOOP_PDB_RCVD loopstate- it's okay to be greater than this state. D'oh! Protect calls to isp_pdb_sync and isp_fclink_state with IS_FC macros. Completely redo mailbox command routine (in preparation to make this possibly wait rather than poll for completion). Make a major attempt to solve the 'lost interrupt' problem 1. Problem The Qlogic cards would appear to 'lose' interrupts, i.e., a legitimate regular SCSI command placed on the request queue would never complete and the watchdog routine in the driver would eventually wakeup and catch it. This would typically only happen on Alphas, although a couple folks with 700MHz Intel platforms have also seen this. For a long time I thought it was a foulup with f/w negotiations of SYNC and/or WIDE as it always seemed to happen right after the platform it was running on had done a SET TARGET PARAMETERS mailbox command to (re)enable sync && wide (after initially forcing ASYNC/NARROW at startup). However, occasionally, the same thing would also occur for the Fibre Channel cards as well (which, ahem, have no SET TARGET PARAMETERS for transfer mode). After finally putting in a better set of watchdog routines for the platforms for this driver, it seemed to be the case that the command in question (usually a READ CAPACITY) just had up and died- the watchdog routine would catch it after ~10 seconds. For some platforms (NetBSD/OpenBSD)- an ABORT COMMAND mailbox command was sent (which would always fail- indicating that the f/w denied knowledge of this command, i.e., the f/w thought it was a done command). In any case, retrying the command worked. But this whole problem needed to be really fixed. 2. A False Step That Went in The Right Direction The mailbox code was completely rewritten to no longer try and grab the mailbox semaphore register and to try and 'by hand' complete async fast posting completions. It was also rewritten to now have separate in && out bitpatterns for registers to load to start and retrieve to complete. This means that isp_intr now handles mailbox completions. This substantially simplifies the mailbox handling code, and carries things 90% toward getting this to be a non-polled routine for this driver. This did not solve the problem, though. 3. Register Debouncing I saw some comments in some errata sheets and some notes in a Qlogic produced Linux driver (for the Qlogic 2100) that seemed to indicate that debouncing of reads of the mailbox registers might be needed, so I added this. This did not affect the problem. In fact, it made the problem worse for non-2100 cards. 5. Interrupt masking/unmasking The driver *used* to do a substantial amount of masking/unmasking of the interrupt control register. This was done to make sure that the core common code could just assume it would never get pre-empted. This apparently substantially contributed to the lost interrupt problem. The rewrite of the ICR (Interrupt Control Register), which is a separate register from the ISR (Interrupt Status Register) should not have caused any change to interrupt assertions pending. The manual does not state that it will, and the register layout seems to imply that the ICR is just an active route gate. We only enable PCI Interrupts and RISC Interrupts- this should mean that when the f/w asserts a RISC interrupt and (and the ICR allows RISC Interrupts) and we have PCI Interrupts enabled, we should get a PCI interrupt. Apparently this is a latch- not a signal route. Removing this got rid of *most* but not all, lost interrupts. 5. Watchdog Smartening I made sure that the watchdog routine would catch cases where the Qlogic's ISR showed an interrupt assertion. The watchdog routine now calls the interrupt service routine if it sees this. Some additional internal state flags were added so that the watchdog routine could then know whether the command it was in the middle of burying (because we had time it out) was in fact completed by the interrupt service routine. 6. Occasional Constipation Of Commands.. In running some very strenous high IOPs tests (generating about 11000 interrupts/second across one Qlogic 1040, one Qlogic 1080 and one Qlogic 2200 on an Alpha PC164), I found that I would get occasional but regular 'watchdog timeouts' on both the 1080 and the 2100 cards. This is under FreeBSD, and the watchdog timeout routine just marks the command in error and retries it. Invariably, right after this 'watchdog timeout' error, I'd get a command completion for the command that I had thought timed out. That is, I'd get a command completion, but the handle returned by the firmware mapped to no current command. The frequency of this problem is low under such a load- it would usually take an 30 minutes per 'lost' interrupt. I doubled the timeout for commands to see if it just was an edge case of waiting too short a period. This has no effect. I gathered and printed out microtimes for the watchdog completed command and the completion that couldn't find a command- it was always the case that the order of occurrence was "timeout, completion" separated by a time on the order of 100 to 150 ms. This caused me to consider 'firmware constipation' as to be a possible culprit. That is, resubmission of a command to the device that had suffered a watchdog timeout seemed to cause the presumed dead command to show back up. I added code in the watchdog routine that, when first entered for the command, marks the command with a flag, reissues a local timeout call for one second later, but also then issues a MARKER Request Queue entry to the Qlogic f/w. A MARKER entry is used typically after a Bus Reset to cause the f/w to get synchronized with respect to either a Bus, a Nexus or a Target. Since I've added this code, I always now see the occasional watchdog timeout, but the command that was about to be terminated always now seems to be completed after the MARKER entry is issued (and before the timeout extension fires, which would come back and *really* terminate the command).
* Once we have firmware running (if isp_reset) and this is the first timemjacob2000-06-181-34/+74
| | | | | | | | | | | | | | | | | | | | through, establish what our LUN width is. Unfortunately, we can't ask the f/w. If we loaded the f/w, we'll now assume we have expanded LUNs (SCCLUN for fibre channel, just plain 32 LUN for SCSI). If we didn't load firmware, assume 8 LUNs for SCSI and 1 LUN for Fibre Channel. We have to assume only one LUN for Fibre Channel because the LUN setting in Request Queue entries is in different places whether we have SCCLUN firmware or not, so the only LUN guaranteed to work for both is LUN 0. Clean up the rest of isp.c so that ISP2100_SCCLUN defines aren't used- instead use run time determinants based upon isp->isp_maxluns. After starting firmware, delay 500us to give it a chance to get rolling. Fix the interrupt service routine to check for both isr && sema being zero before thinking this was a spurious interrupt. Following the manuals, allow for both Mailbox as well as Queue Reponse type interrupts for regular SCSI.
* Fix some breakage about how we build WWNs. Do some other fabric relatedmjacob2000-05-091-136/+247
| | | | | | | | | | | | changes: consider a new PDB entry different if Class 3 service parameter roles change (!!!). Do some checking as we're getting a port database that traps whether things change while we're doing so. Handle N-port and F-ports correctly. Fix the fabric login loop to retain a login/binding if things haven't changed (I mean, why logout a device only to log it back in). No longer accept, after fabric logins, garbage if we can't get a PDB entry that matches the device we've just logged into- if it doesn't, log it out as it is very unlikely to still be what we thought it was. Get rid of some of the debounce loops because we could get stuck there.
* Pick up topology more sanely at f/w startup. Change the restrictions ofmjacob2000-04-211-22/+34
| | | | | | | where we can have targets (based on topology). Much more importantly, make sure all mods to isp_sendmarker or |= so we don't lose the marking of a bus that needs to have a marker sent for it.
* Slightly cleaner fabric support (whiter whites! redder reds!).. No,mjacob2000-02-291-124/+138
| | | | | | | | | | | seriously- only attempt to logout a previously logged in fabric device. Fix a longstanding bug for aborting overtime commands- handle halves have always been reversed. Clean up some error messages to indicate channel number. Approved:jkh
* Clean out residual bogosity for fast posting stuff- ISP_NO_FASTPOST_SCSImjacob2000-02-151-15/+12
| | | | | | | | | | is gone as a define. We just don't support fast posting for anything less than the 1240/1080/1280/12160 or Fibre Channel cards. Put in support for CDB's larger than 12 bytes for parallel SCSI (up to 44 bytes are allowed). Approved: jkh
* Restructure nvram reading routine to split out to separate functionsmjacob2000-02-111-362/+584
| | | | | | | | | | | | | for 1020/1X80/12160/2X00- for readability. Add in 12160 (Ultra3) support- but not with PPR just yet. Fix and clarify fetching of return parameter for getting firmware rev which for the 2200 contains the connection topology (Private Loop (NL-port), N-port, FL-port, F-port). Synthesize the connection topology for the 2100 which can only be Private Loop or FL-port. Handle a couple of new async mailbox commands which signify connection in Point-to-Point mode (N-port or F-port) or indicate various toe stubbing getting to same. Approved: jkh@freebsd.org
* clean up for SBus Ultra (yes, we do not do that here yet)mjacob2000-01-151-2/+9
|
* change debug printout lefvels for a couple of placesmjacob2000-01-091-4/+4
|
* Make Fibre Channel cards correctly note the presence/absencemjacob2000-01-041-3/+11
| | | | | of ARQ data and punt the dealing with its presence/absence to the platform layers.
* Raise default FCP logintime to 60 seconds. Move the positionmjacob2000-01-031-21/+53
| | | | | | | of where we could have seen the loop up at least once so it makes sense. Change some stuff in ispscsicmd so we don't get stuck there if the loop has never come up yet. Add in some target mode support code.
* Clean up some f/w revision checking wrt enabling fast posting.mjacob1999-12-201-76/+98
| | | | Make sure we set defaults sanely for dual-bus adapters.
* Add Dual LVD bus (1280) supportmjacob1999-12-161-40/+67
|
* turn some messages into CFGPRINT messagesmjacob1999-12-031-5/+5
|
* Clean up stupidity in the isp_handle_other_response function- indexesmjacob1999-11-211-119/+169
| | | | | | | | of queue entries have to be at least 16 bits now! If we're running a 2100 less than rev 5, turn off loop fairness (per Qlogic errata). Fix typo in checking against 2200 F/W revision. Slightly fix/reorder fabric login stuff. Change to usage of isp_getrqentry for code clarity. Add some defensive dual bus assumptions. Various cleanups, etc...
* correct moronic typomjacob1999-11-011-2/+2
|
* Use pointer to f/w in md structure as to whether f/w exists or not.mjacob1999-10-301-14/+7
| | | | | If firmware length isn't specified, extract from the 4th short into the firmware.
* I was misinformed. I cannot get away from specifying tags for FC. Some devicesmjacob1999-10-281-2/+8
| | | | are happy w/o them- some are unhappy (IBM drives).
* nuke a debug printout I thought I had already nukedmjacob1999-10-261-2/+0
|
* remember to initialize mailbox 2 for FC isp bus resetsmjacob1999-10-221-1/+1
|
* Remove some target mode stuff. It will get re-introduced in a differentmjacob1999-10-171-815/+166
| | | | | | | | | | | | | | | | | | file later. Do some pencil-sharpening types of minor changes. Change how active commands are remembered (using new inline functions to get handles, etc..). Now do a GET FIRMWARE STATUS after firing up the f/w as outgoing mailbox 2 will tell you the f/w's notion of the max commands that can be supported. Attempt to retrieve loop topology. Add in the appropriate SWIZZLE/UNSWIZZLE macros calls (this is a no-op on Little Endian machines but is needed for sparc (on other platforms)). Move the temp port database we use to find out where things have moved to after a LIP to the softc and off the kernel stack. Follow Qlogic's hint and don't bother setting a tag for commands that don't have this enabled (presumably the f/w will do it's own selection then). Use an INT_PENDING macro to check for an interrupt. The call to ISP_DMAFREE now just takes the handle- not the 'handle-1' which was a layering violation. Use CFGPRINTF in a couple of places to make things less chatty if not booting verbose, or CAMDEBUG compiles, etc..
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* More code cleanup. Go back to using FULL_LOGIN Fibre Chan if f/w is less thanmjacob1999-08-161-91/+157
| | | | | 1.17.0 level. Change where we do the loop database init. Add in the CMD_RQLATER return. Add some register debounce.
* add 2200 f/w; fix botched definemjacob1999-07-051-2/+2
|
* Roll revision levels. Add support for the Qlogic 2200 (warn aboutmjacob1999-07-021-130/+707
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | not having SCSI_ISP_SCCLUN config defined if we don't have f/w for the 2200- it's resident firmware uses SCCLUN (65535 luns)). Change the way the default LoopID is gathered (it's now a platform specific define so that some attempt at a synthetic WWN can be made in case NVRAM isn't readable). Change initialization of options a bit- don't use ADISC. Set FullDuplex mode if config options tells us to do so. Do not use FULL_LOGIN after LIP- it's the right thing to do but it causes too much loop disruption (Loop Resets). Sanity check some default values. Redo construction of port and node WWNs based upon what we have- if we have 2 in the top nibble, we can have distinct port and node WWNs. Clean up some SCCLUN related code that we obviously had never compiled (:-(). Audit commands coming int ispscsicmd and don't throw commands at Fibre devices that do not have Class 3 service parameters TARGET ROLE defined. Clean up f/w initialization a bit. Add Fabric support (or at least the first blush of it). Whew - way too much to describe here. Basically, after a LIP, hang out until we see a Loop Up or a Port DataBase Change async event, then see if we're on a Fabric (GET_PORT_NAME of FL_PORT_ID). If we are, try and scan the fabric controller for fabric devices using the GetAllNext SNS subcommand. As we find devices, announce them to the outer layer. Try and do some guard code for broken (Brocade) SNS servers (that get stuck in loops- gotta maybe do this a different way using the GP_ID3 cmd instead). Then do a scan of the lower (local loop) ids using a GET_PORT_NAME to see if the f/w has logged into anything at that loop id. If so, then do a GET_PORT_DATABASE command. Do this scan into a local database. At this point we can say the loop is 'Ready'. After this, we merge our local loop port database with our stored port database- in a as yet to be really fully exercised fashion we try and follow the logic of something having moved around. The first time we see something at a Loop ID, we fix it, for the purpose of this system instance, at that Loop ID. If things shift around so it ends up somewhere else, we still keep it at this Loop ID (our 'Target') but use the new (moved) Loop ID when we actually throw commands at it. Check for insane cases of different Loop IDs both claiming to have the same WWN- if that happens, invalidate both. Notify the outer layer of devices that have arrived and devices that have gone away. *Finally*, when this is done, search the softc's database of Fabric devices and perform logout/login actions. The Qlogic f/w maintains logout/login for all local loop devices. We have to maintain logout/login for fabric devices- total PITA. Expect to see this area undergo more change over time.
* be a bit more chatty about some speed negotiationsmjacob1999-05-121-2/+8
|
* Some massive thwunking in initialization to handle dual bus adapters. Moremjacob1999-05-111-312/+601
| | | | | | | massive thwunking to include an XS_CHANNEL value. Some changes of how parameters are reported to outer layers (including bus, e.g.). Yet more stirring around in isp_mboxcmd to try and get it right. Decode of 1080/1240 NVRAM.
* temp fix for internal queue overflow problemmjacob1999-04-141-14/+34
|
* Make firmware revision a triple. Clean up some FC init stuff formjacob1999-04-041-73/+138
| | | | | | | | | | | | | board versions with no BIOS. Separate mailbox interrupts from IOCB interrupts. Read OUTMAILBOX5 while RISC_INT is active- not after you clear it (potential race condition). Clear out older broken BIG_ENDIAN goop. Don't negotiate narrow/async for LVD busses at startup if already in LVD mode. Note usage of presumptive 1040C revision. For all the LIP, PDB Changed, Loop UP/DOWN async events, mark fw state as unknown as well as marking the need to do a getpdb on targets- after a LIP for certain the f/w has to do PRLI/PLOGI for all targets again and marking f/w state as unknown gives us a fighting chance to (start to) hold up for that to complete.
* Annoying little nigglet- apparently *some* Qlogic temporarily ignoremjacob1999-03-261-2/+10
| | | | | | settings you've just sent them and return random values if you follow the set by a get. This causes problems when you latter run a Tag-enabled command when you've command tagged mode off.
* Add in 1080 LVD support and some basis also for the 1240. The port databasemjacob1999-03-251-197/+269
| | | | printout is now enabled.
* A wad of changes- prepping for 1080/1240 support (which caused a massivemjacob1999-03-171-220/+496
| | | | | | thwank in register layout goop). A different mboxcmd approach. Some PDB change infrastructure. Some better management of loopdown/loopup events (keep them distinct from resource starvation for simq freeze/unfreeze actions).
* Roll internal release tag. Print out if we're in a 64 bit PCI slot.mjacob1999-02-091-59/+156
| | | | | | | | | | | | Use fast memory timing NVRAM parameter. Clean up and fix establishment of default target parameters. Don't use NVRAM if are flagged as not to do so (I had a busted NVRAM setup which I couldn't edit that enabled SYNC mode but disabled disconnect/reconnect and wide!!). Fix delays after resets. BUS resets not done in isp_init anymore- relegated to OS specific outer layers. Fix a buglet where you can get in a loop for a NULL xs in the completion list in isp_intr. Add in some defines that can disable fast posting. Add in code for Loop Up/Loop Down events that call into the outer layers as to what to do.
* Implement and use Fast Posting for both parallel && fibre. Redo a bit ofmjacob1999-01-301-184/+256
| | | | | | | the startup code. Implement a call to outer framework function so that asynchronous events can be handled (e.g., speed negotiation, target mode). Roll internal release tags.
* Suggested by bde@freebsd.org- memcpy not necessarily good to use. D'oh- not inmjacob1999-01-101-11/+11
| | | | | the BSD DKI. Stop being lazy and finish the defines so MEMCPY becomes bzero for FreeBSD.
* Add some prototype deadchip detection. Set FIFO bursting (1XX0 only-mjacob1999-01-101-48/+88
| | | | | | | | it's already on for the 2XX0) and detect the broken 1040A FIFO. Change bzero to MEMZERO (portability with **nux). Use memcpy for same reason. Finally detect QUEUE FULL conditions and return this as an error that will get cam_periph_error to do it's 'tagged openings now XXX' dance.
* clarify headers;move uninit to outer layer;remove watchdogmjacob1998-12-281-69/+3
|
* oops on lastmjacob1998-12-051-3/+3
|
* Remove the Target mode functions until they're in better shape. Implement somemjacob1998-12-051-91/+534
| | | | | | | suggested compilation cleanups from Eklund. Wire down a hard loop id if we are not on a platform that has the ability to get to a PCI BIOS (it still will float to the ID it gets after a LIP but at least we can try). Clarify that the expanded lun is based upon SCCLUN defines (in f/w).
* per bde (who is right about this) that an inlined fucntion with constmjacob1998-09-171-2/+24
| | | | | char * strings being returned defined in a header file included several places but only used in one module, is, uh, silly.
* Cleanliness. Don't leave defined a const char array that's only usedmjacob1998-09-171-2/+4
| | | | if target mode is defined (which it isn't, yet).
* ISP_DMASETUP now returns a value to be possibly punted to outer layers.mjacob1998-09-171-13/+25
| | | | | Turn request queue overflow messages into debug messages. Ensure on isp_restarts that we nullify the xflist array.
OpenPOWER on IntegriCloud