diff options
-rw-r--r-- | sys/dev/isp/DriverManual.txt | 634 | ||||
-rw-r--r-- | sys/dev/isp/Hardware.txt | 304 |
2 files changed, 938 insertions, 0 deletions
diff --git a/sys/dev/isp/DriverManual.txt b/sys/dev/isp/DriverManual.txt new file mode 100644 index 0000000..ea1968f --- /dev/null +++ b/sys/dev/isp/DriverManual.txt @@ -0,0 +1,634 @@ +/* $FreeBSD$ */ + + Driver Theory of Operation Manual + +1. Introduction + +This is a short text document that will describe the background, goals +for, and current theory of operation for the joint Fibre Channel/SCSI +HBA driver for QLogic hardware. + +Because this driver is an ongoing project, do not expect this manual +to remain entirely up to date. Like a lot of software engineering, the +ultimate documentation is the driver source. However, this manual should +serve as a solid basis for attempting to understand where the driver +started and what is trying to be accomplished with the current source. + +The reader is expected to understand the basics of SCSI and Fibre Channel +and to be familiar with the range of platforms that Solaris, Linux and +the variant "BSD" Open Source systems are available on. A glossary and +a few references will be placed at the end of the document. + +There will be references to functions and structures within the body of +this document. These can be easily found within the source using editor +tags or grep. There will be few code examples here as the code already +exists where the reader can easily find it. + +2. A Brief History for this Driver + +This driver originally started as part of work funded by NASA Ames +Research Center's Numerical Aerodynamic Simulation center ("NAS" for +short) for the QLogic PCI 1020 and 1040 SCSI Host Adapters as part of my +work at porting the NetBSD Operating System to the Alpha architectures +(specifically the AlphaServer 8200 and 8400 platforms). In short, it +started just as simple single SCSI HBA driver for just the purpose of +running off a SCSI disk. This work took place starting in January, 1997. + +Because the first implementation was for NetBSD, which runs on a very +large number of platforms, and because NetBSD supported both systems with +SBus cards (e.g., Sun SPARC systems) as well as systems with PCI cards, +and because the QLogic SCSI cards came in both SBus and PCI versions, the +initial implementation followed the very thoughtful NetBSD design tenet +of splitting drivers into what are called MI (for Machine Independent) +and MD (Machine Dependent) portions. The original design therefore was +from the premise that the driver would drive both SBus and PCI card +variants. These busses are similar but have quite different constraints, +and while the QLogic SBus and PCI cards are very similar, there are some +significant differences. + +After this initial goal had been met, there began to be some talk about +looking into implementing Fibre Channel mass storage at NAS. At this time +the QLogic 2100 FC/AL HBA was about to become available. After looking at +the way it was designed I concluded that it was so darned close to being +just like the SCSI HBAs that it would be insane to *not* leverage off of +the existing driver. So, we ended up with a driver for NetBSD that drove +PCI and SBus SCSI cards, and now also drove the QLogic 2100 FC-AL HBA. + +After this, ports to non-NetBSD platforms became interesting as well. +This took the driver out of the interest with NAS and into interested +support from a number of other places. Since the original NetBSD +development, the driver has been ported to FreeBSD, OpenBSD, Linux, +Solaris, and two proprietary systems. Following from the original MI/MD +design of NetBSD, a rather successful attempt has been made to keep the +Operating System Platform differences segregated and to a minimum. + +Along the way, support for the 2200 as well as full fabric and target +mode support has been added, and 2300 support as well as an FC-IP stack +are planned. + +3. Driver Design Goals + +The driver has not started out as one normally would do such an effort. +Normally you design via top-down methodologies and set an intial goal +and meet it. This driver has had a design goal that changes from almost +the very first. This has been an extremely peculiar, if not risque, +experience. As a consequence, this section of this document contains +a bit of "reconstruction after the fact" in that the design goals are +as I perceive them to be now- not necessarily what they started as. + +The primary design goal now is to have a driver that can run both the +SCSI and Fibre Channel SCSI prototocols on multiple OS platforms with +as little OS platform support code as possible. + +The intended support targets for SCSI HBAs is to support the single and +dual channel PCI Ultra2 and PCI Ultra3 cards as well as the older PCI +Ultra single channel cards and SBus cards. + +The intended support targets for Fibre Channel HBAs is the 2100, 2200 +and 2300 PCI cards. + +Fibre Channel support should include complete fabric and public loop +as well as private loop and private loop, direct-attach topologies. +FC-IP support is also a goal. + +For both SCSI and Fibre Channel, simultaneous target/initiator mode support +is a goal. + +Pure, raw, performance is not a primary goal of this design. This design, +because it has a tremendous amount of code common across multiple +platforms, will undoubtedly never be able to beat the performance of a +driver that is specifically designed for a single platform and a single +card. However, it is a good strong secondary goal to make the performance +penalties in this design as small as possible. + +Another primary aim, which almost need not be stated, is that the +implementation of platform differences must not clutter up the common +code with platform specific defines. Instead, some reasonable layering +semantics are defined such that platform specifics can be kept in the +platform specific code. + +4. QLogic Hardware Architecture + +In order to make the design of this driver more intelligible, some +description of the Qlogic hardware architecture is in order. This will +not be an exhaustive description of how this card works, but will +note enough of the important features so that the driver design is +hopefully clearer. + +4.1 Basic QLogic hardware + +The QLogic HBA cards all contain a tiny 16-bit RISC-like processor and +varying sizes of SRAM. Each card contains a Bus Interface Unit (BIU) +as appropriate for the host bus (SBus or PCI). The BIUs allow access +to a set of dual-ranked 16 bit incoming and outgoing mailbox registers +as well as access to control registers that control the RISC or access +other portions of the card (e.g., Flash BIOS). The term 'dual-ranked' +means that at the same host visible address if you write a mailbox +register, that is a write to an (incoming, to the HBA) mailbox register, +while a read to the same address reads another (outgoing, to the HBA) +mailbox register with completely different data. Each HBA also then has +core and auxillary logic which either is used to interface to a SCSI bus +(or to external bus drivers that connect to a SCSI bus), or to connect +to a Fibre Channel bus. + +4.2 Basic Control Interface + +There are two principle I/O control mechanisms by which the driver +communicates with and controls the QLogic HBA. The first mechanism is to +use the incoming mailbox registers to interrupt and issue commands to +the RISC processor (with results usually, but not always, ending up in +the ougtoing mailbox registers). The second mechanism is to establish, +via mailbox commands, circular request and response queues in system +memory that are then shared between the QLogic and the driver. The +request queue is used to queue requests (e.g., I/O requests) for the +QLogic HBA's RISC engine to copy into the HBA memory and process. The +result queue is used by the QLogic HBA's RISC engine to place results of +requests read from the request queue, as well as to place notification +of asynchronous events (e.g., incoming commands in target mode). + +To give a bit more precise scale to the preceding description, the QLogic +HBA has 8 dual-ranked 16 bit mailbox registers, mostly for out-of-band +control purposes. The QLogic HBA then utilizes a circular request queue +of 64 byte fixed size Queue Entries to receive normal initiator mode +I/O commands (or continue target mode requests). The request queue may +be up to 256 elements for the QLogic 1020 and 1040 chipsets, but may +be quite larger for the QLogic 12X0/12160 SCSI and QLogic 2X00 Fibre +Channel chipsets. + +In addition to synchronously initiated usage of mailbox commands by +the host system, the QLogic may also deliver asynchronous notifications +solely in outgoing mailbox registers. These asynchronous notifications in +mailboxes may be things like notification of SCSI Bus resets, or that the +Fabric Name server has sent a change notification, or even that a specific +I/O command completed without error (this is called 'Fast Posting' +and saves the QLogic HBA from having to write a response queue entry). + +The QLogic HBA is an interrupting card, and when servicing an interrupt +you really only have to check for either a mailbox interrupt or an +interrupt notification that the the response queue has an entry to +be dequeued. + +4.3 Fibre Channel SCSI out of SCSI + +QLogic took the approach in introducing the 2X00 cards to just treat +FC-AL as a 'fat' SCSI bus (a SCSI bus with more than 15 targets). All +of the things that you really need to do with Fibre Channel with respect +to providing FC-4 services on top of a Class 3 connection are performed +by the RISC engine on the QLogic card itself. This means that from +an HBA driver point of view, very little needs to change that would +distinguish addressing a Fibre Channel disk from addressing a plain +old SCSI disk. + +However, in the details it's not *quite* that simple. For example, in +order to manage Fabric Connections, the HBA driver has to do explicit +binding of entities it's queried from the name server to specific 'target' +ids (targets, in this case, being a virtual entity). + +Still- the HBA firmware does really nearly all of the tedious management +of Fibre Channel login state. The corollary to this sometimes is the +lack of ability to say why a particular login connection to a Fibre +Channel disk is not working well. + +There are clear limits with the QLogic card in managing fabric devices. +The QLogic manages local loop devices (LoopID or Target 0..126) itself, +but for the management of fabric devices, it has an absolute limit of +253 simultaneous connections (256 entries less 3 reserved entries). + +5. Driver Architecture + +5.1 Driver Assumptions + +The first basic assumption for this driver is that the requirements for +a SCSI HBA driver for any system is that of a 2 or 3 layer model where +there are SCSI target device drivers (drivers which drive SCSI disks, +SCSI tapes, and so on), possibly a middle services layer, and a bottom +layer that manages the transport of SCSI CDB's out a SCSI bus (or across +Fibre Channel) to a SCSI device. It's assumed that each SCSI command is +a separate structure (or pointer to a structure) that contains the SCSI +CDB and a place to store SCSI Status and SCSI Sense Data. + +This turns out to be a pretty good assumption. All of the Open Source +systems (*BSD and Linux) and most of the proprietary systems have this +kind of structure. This has been the way to manage SCSI subsystems for +at least ten years. + +There are some additional basic assumptions that this driver makes- primarily +in the arena of basic simple services like memory zeroing, memory copying, +delay, sleep, microtime functions. It doesn't assume much more than this. + +5.2 Overall Driver Architecture + +The driver is split into a core (machine independent) module and platform +and bus specific outer modules (machine dependent). + +The core code (in the files isp.c, isp_inline.h, ispvar.h, ispreg.h and +ispmbox.h) handles: + + + Chipset recognition and reset and firmware download (isp_reset) + + Board Initialization (isp_init) + + First level interrupt handling (response retrieval) (isp_intr) + + A SCSI command queueing entry point (isp_start) + + A set of control services accessed either via local requirements within + the core module or via an externally visible control entry point + (isp_control). + +The platform/bus specific modules (and definitions) depend on each +platform, and they provide both definitions and functions for the core +module's use. Generally a platform module set is split into a bus +dependent module (where configuration is begun from and bus specific +support functions reside) and relatively thin platform specific layer +which serves as the interconnect with the rest of this platform's SCSI +subsystem. + +For ease of bus specific access issues, a centralized soft state +structure is maintained for each HBA instance (struct ispsoftc). This +soft state structure contains a machine/bus dependent vector (mdvec) +for functions that read and write hardware registers, set up DMA for the +request/response queues and fibre channel scratch area, set up and tear +down DMA mappings for a SCSI command, provide a pointer to firmware to +load, and other minor things. + +The machine dependent outer module must provide functional entry points +for the core module: + + + A SCSI command completion handoff point (isp_done) + + An asynchronous event handler (isp_async) + + A logging/printing function (isp_prt) + +The machine dependent outer module code must also provide a set of +abstracting definitions which is what the core module utilizes heavily +to do its job. These are discussed in detail in the comments in the +file ispvar.h, but to give a sense of the range of what is required, +let's illustrate two basic classes of these defines. + +The first class are "structure definition/access" class. An +example of these would be: + + XS_T Platform SCSI transaction type (i.e., command for HBA) + .. + XS_TGT(xs) gets the target from an XS_T + .. + XS_TAG_TYPE(xs) which type of tag to use + .. + +The second class are 'functional' class definitions. Some examples of +this class are: + + MEMZERO(dst, src) platform zeroing function + .. + MBOX_WAIT_COMPLETE(struct ispsoftc *) wait for mailbox cmd to be done + +Note that the former is likely to be simple replacement with bzero or +memset on most systems, while the latter could be quite complex. + +This soft state structure also contains different parameter information +based upon whether this is a SCSI HBA or a Fibre Channel HBA (which is +filled in by the code module). + +In order to clear up what is undoubtedly a seeming confusion of +interconnects, a description of the typical flow of code that performs +boards initialization and command transactions may help. + +5.3 Initialization Code Flow + +Typically a bus specific module for a platform (e.g., one that wants +to configure a PCI card) is entered via that platform's configuration +methods. If this module recognizes a card and can utilize or construct the +space for the HBA instance softc, it does so, and initializes the machine +dependent vector as well as any other platform specific information that +can be hidden in or associated with this structure. + +Configuration at this point usually involves mapping in board registers +and registering an interrupt. It's quite possible that the core module's +isp_intr function is adequate to be the interrupt entry point, but often +it's more useful have a bus specific wrapper module that calls isp_intr. + +After mapping and interrupt registry is done, isp_reset is called. +Part of the isp_reset call may cause callbacks out to the bus dependent +module to perform allocation and/or mapping of Request and Response +queues (as well as a Fibre Channel scratch area if this is a Fibre +Channel HBA). The reason this is considered 'bus dependent' is that +only the bus dependent module may have the information that says how +one could perform I/O mapping and dependent (e.g., on a Solaris system) +on the Request and Reponse queues. Another callback can enable the *use* +of interrupts should this platform be able to finish configuration in +interrupt driven mode. + +If isp_reset is successful at resetting the QLogic chipset and downloading +new firmware (if available) and setting it running, isp_init is called. If +isp_init is successful in doing initial board setups (including reading +NVRAM from the QLogic card), then this bus specicic module will call the +platform dependent module that takes the appropriate steps to 'register' +this HBA with this platform's SCSI subsystem. Examining either the +OpenBSD or the NetBSD isp_pci.c or isp_sbus.c files may assist the reader +here in clarifying some of this. + +5.4 Initiator Mode Command Code Flow + +A succesful execution of isp_init will lead to the driver 'registering' +itself with this platform's SCSI subsystem. One assumed action for this +is the registry of a function the the SCSI subsystem for this platform +will call when it has a SCSI command to run. + +The platform specific module function that receives this will do whatever +it needs to to prepare this command for execution in the core module. This +sounds vague, but it's also very flexible. In principle, this could be +a complete marshalling/demarshalling of this platform's SCSI command +structure (should it be impossible to represent in an XS_T). In addition, +this function can also block commands from running (if, e.g., Fibre +Channel loop state would preclude successful starting of the command). + +When it's ready to do so, the function isp_start is called with this +command. This core module tries to allocate request queue space for +this command. It also calls through the machine dependent vector +function to make sure any DMA mapping for this command is done. + +Now, DMA mapping here is possibly a misnomer, as more than just +DMA mapping can be done in this bus dependent function. This is +also the place where any endian byte-swizzling will be done. At any +rate, this function is called last because the process of establishing +DMA addresses for any command may in fact consume more Request Queue +entries than there are currently available. If the mapping and other +functions are successful, the QLogic mailbox inbox pointer register +is updated to indicate to the QLogic that it has a new request to +read. + +If this function is unsuccessful, policy as to what to do at this point is +left to the machine dependent platform function which called isp_start. In +some platforms, temporary resource shortages can be handled by the main +SCSI subsystem. In other platforms, the machine dependent code has to +handle this. + +In order to keep track of commands that are in progress, the soft state +structure contains an array of 'handles' that are associated with each +active command. When you send a command to the QLogic firmware, a portion +of the Request Queue entry can contain a non-zero handle identifier so +that at a later point in time in reading either a Response Queue entry +or from a Fast Posting mailbox completion interrupt, you can take this +handle to find the command you were waiting on. It should be noted that +this is probably one of the most dangerous areas of this driver. Corrupted +handles will lead to system panics. + +At some later point in time an interrupt will occur. Eventually, +isp_intr will be called. This core module will determine what the cause +of the interrupt is, and if it is for a completing command. That is, +it'll determine the handle and fetch the pointer to the command out of +storage within the soft state structure. Skipping over a lot of details, +the machine dependent code supplied function isp_done is called with the +pointer to the completing command. This would then be the glue layer that +informs the SCSI subsystem for this platform that a command is complete. + +5.5 Asynchronous Events + +Interrupts occur for events other than commands (mailbox or request queue +started commands) completing. These are called Asynchronous Mailbox +interrupts. When some external event causes the SCSI bus to be reset, +or when a Fibre Channel loop changes state (e.g., a LIP is observed), +this generates such an asynchronous event. + +Each platform module has to provide an isp_async entry point that will +handle a set of these. This isp_async entry point also handles things +which aren't properly async events but are simply natural outgrowths +of code flow for another core function (see discussion on fabric device +management below). + +5.6 Target Mode Code Flow + +This section could use a lot of expansion, but this covers the basics. + +The QLogic cards, when operating in target mode, follow a code flow that is +essentially the inverse of that for intiator mode describe above. In this +scenario, an interrupt occurs, and present on the Response Queue is a +queue entry element defining a new command arriving from an initiator. + +This is passed to possibly external target mode handler. This driver +provides some handling for this in a core module, but also leaves +things open enough that a completely different target mode handler +may accept this incoming queue entry. + +The external target mode handler then turns around forms up a response +to this 'response' that just arrived which is then placed on the Request +Queue and handled very much like an initiator mode command (i.e., calling +the bus dependent DMA mapping function). If this entry completes the +command, no more need occur. But often this handles only part of the +requested command, so the QLogic firmware will rewrite the response +to the initial 'response' again onto the Response Queue, whereupon the +target mode handler will respond to that, and so on until the command +is completely handled. + +Because almost no platform provides basic SCSI Subsystem target mode +support, this design has been left extremely open ended, and as such +it's a bit hard to describe in more detail than this. + +5.7 Locking Assumptions + +The observant reader by now is likely to have asked the question, "but what +about locking? Or interrupt masking" by now. + +The basic assumption about this is that the core module does not know +anything directly about locking or interrupt masking. It may assume that +upon entry (e.g., via isp_start, isp_control, isp_intr) that appropriate +locking and interrupt masking has been done. + +The platform dependent code may also therefore assume that if it is +called (e.g., isp_done or isp_async) that any locking or masking that +was in place upon the entry to the core module is still there. It is up +to the platform dependent code to worry about avoiding any lock nesting +issues. As an example of this, the Linux implementation simply queues +up commands completed via the callout to isp_done, which it then pushes +out to the SCSI subsystem after a return from it's calling isp_intr is +executed (and locks dropped appropriately, as well as avoidance of deep +interrupt stacks). + +Recent changes in the design have now eased what had been an original +requirement that the while in the core module no locks or interrupt +masking could be dropped. It's now up to each platform to figure out how +to implement this. This is principally used in the execution of mailbox +commands (which are principally used for Loop and Fabric management via +the isp_control function). + +5.8 SCSI Specifics + +The driver core or platform dependent architecture issues that are specific +to SCSI are few. There is a basic assumption that the QLogic firmware +supported Automatic Request sense will work- there is no particular provision +for disabling it's usage on a per-command basis. + +5.9 Fibre Channel Specifics + +Fibre Channel presents an interesting challenge here. The QLogic firmware +architecture for dealing with Fibre Channel as just a 'fat' SCSI bus +is fine on the face of it, but there are some subtle and not so subtle +problems here. + +5.9.1 Firmware State + +Part of the initialization (isp_init) for Fibre Channel HBAs involves +sending a command (Initialize Control Block) that establishes Node +and Port WWNs as well as topology preferences. After this occurs, +the QLogic firmware tries to traverese through serveral states: + + FW_CONFIG_WAIT + FW_WAIT_AL_PA + FW_WAIT_LOGIN + FW_READY + FW_LOSS_OF_SYNC + FW_ERROR + FW_REINIT + FW_NON_PART + +It starts with FW_CONFIG_WAIT, attempts to get an AL_PA (if on an FC-AL +loop instead of being connected as an N-port), waits to log into all +FC-AL loop entities and then hopefully transitions to FW_READY state. + +Clearly, no command should be attempted prior to FW_READY state is +achieved. The core internal function isp_fclink_test (reachable via +isp_control with the ISPCTL_FCLINK_TEST function code). This function +also determines connection topology (i.e., whether we're attached to a +fabric or not). + +5.9.2. Loop State Transitions- From Nil to Ready + +Once the firmware has transitioned to a ready state, then the state of the +connection to either arbitrated loop or to a fabric has to be ascertained, +and the identity of all loop members (and fabric members validated). + +This can be very complicated, and it isn't made easy in that the QLogic +firmware manages PLOGI and PRLI to devices that are on a local loop, but +it is the driver that must manage PLOGI/PRLI with devices on the fabric. + +In order to manage this state an eight level staging of current "Loop" +(where "Loop" is taken to mean FC-AL or N- or F-port connections) states +in the following ascending order: + + LOOP_NIL + LOOP_LIP_RCVD + LOOP_PDB_RCVD + LOOP_SCANNING_FABRIC + LOOP_FSCAN_DONE + LOOP_SCANNING_LOOP + LOOP_LSCAN_DONE + LOOP_SYNCING_PDB + LOOP_READY + +When the core code initializes the QLogic firmware, it sets the loop +state to LOOP_NIL. The first 'LIP Received' asynchronous event sets state +to LOOP_LIP_RCVD. This should be followed by a "Port Database Changed" +asynchronous event which will set the state to LOOP_PDB_RCVD. Each of +these states, when entered, causes an isp_async event call to the +machine dependent layers with the ISPASYNC_CHANGE_NOTIFY code. + +After the state of LOOP_PDB_RCVD is reached, the internal core function +isp_scan_fabric (reachable via isp_control(..ISPCTL_SCAN_FABRIC)) will, +if the connection is to a fabric, use Simple Name Server mailbox mediated +commands to dump the entire fabric contents. For each new entity, an +isp_async event will be generated that says a Fabric device has arrived +(ISPASYNC_FABRIC_DEV). The function that isp_async must perform in this +step is to insert possibly remove devices that it wants to have the +QLogic firmware log into (at LOOP_SYNCING_PDB state level)). + +After this has occurred, the state LOOP_FSCAN_DONE is set, and then the +internal function isp_scan_loop (isp_control(...ISPCTL_SCAN_LOOP)) can +be called which will then scan for any local (FC-AL) entries by asking +for each possible local loop id the QLogic firmware for a Port Database +entry. It's at this level some entries cached locally are purged +or shifting loopids are managed (see section 5.9.4). + +The final step after this is to call the internal function isp_pdb_sync +(isp_control(..ISPCTL_PDB_SYNC)). The purpose of this function is to +then perform the PLOGI/PRLI functions for fabric devices. The next state +entered after this is LOOP_READY, which means that the driver is ready +to process commands to send to Fibre Channel devices. + +5.9.3 Fibre Channel variants of Initiator Mode Code Flow + +The code flow in isp_start for Fibre Channel devices is the same as it is +for SCSI devices, but with a notable exception. + +Maintained within the fibre channel specific portion of the driver soft +state structure is a distillation of the existing population of both +local loop and fabric devices. Because Loop IDs can shift on a local +loop but we wish to retain a 'constant' Target ID (see 5.9.4), this +is indexed directly via the Target ID for the command (XS_TGT(xs)). + +If there is a valid entry for this Target ID, the command is started +(with the stored 'Loop ID'). If not the command is completed with +the error that is just like a SCSI Selection Timeout error. + +This code is currently somewhat in transition. Some platforms to +do firmware and loop state management (as described above) at this +point. Other platforms manage this from the machine dependent layers. The +important function to watch in this respect is isp_fc_runstate (in +isp_inline.h). + +5.9.4 "Target" in Fibre Channel is a fixed virtual construct + +Very few systems can cope with the notion that "Target" for a disk +device can change while you're using it. But one of the properties of +for arbitrated loop is that the physical bus address for a loop member +(the AL_PA) can change depending on when and how things are inserted in +the loop. + +To illustrate this, let's take an example. Let's say you start with a +loop that has 5 disks in it. At boot time, the system will likely find +them and see them in this order: + +disk# Loop ID Target ID +disk0 0 0 +disk1 1 1 +disk2 2 2 +disk3 3 3 +disk4 4 4 + +The driver uses 'Loop ID' when it forms requests to send a comamnd to +each disk. However, it reports to NetBSD that things exist as 'Target +ID'. As you can see here, there is perfect correspondence between disk, +Loop ID and Target ID. + +Let's say you add a new disk between disk2 and disk3 while things are +running. You don't really often see this, but you *could* see this where +the loop has to renegotiate, and you end up with: + +disk# Loop ID Target ID +disk0 0 0 +disk1 1 1 +disk2 2 2 +diskN 3 ? +disk3 4 ? +disk4 5 ? + +Clearly, you don't want disk3 and disk4's "Target ID" to change while you're +running since currently mounted filesystems will get trashed. + +What the driver is supposed to do (this is the function of isp_scan_loop), +is regenerate things such that the following then occurs: + +disk# Loop ID Target ID +disk0 0 0 +disk1 1 1 +disk2 2 2 +diskN 3 5 +disk3 4 3 +disk4 5 4 + +So, "Target" is a virtual entity that is maintained while you're running. + +6. Glossary + +HBA - Host Bus Adapter + +SCSI - Small Computer + +7. References + +Various URLs of interest: + +http://www.netbsd.org - NetBSD's Web Page +http://www.openbsd.org - OpenBSD's Web Page +http://www.freebsd.org - FreeBSD's Web Page + +http://www.t10.org - ANSI SCSI Commitee's Web Page + (SCSI Specs) +http://www.t11.org - NCITS Device Interface Web Page + (Fibre Channel Specs) + diff --git a/sys/dev/isp/Hardware.txt b/sys/dev/isp/Hardware.txt new file mode 100644 index 0000000..dfdc0e7 --- /dev/null +++ b/sys/dev/isp/Hardware.txt @@ -0,0 +1,304 @@ +/* $FreeBSD$ */ + + Hardware that is Known To or Should Work with This Driver + + +0. Intro + + This is not an endorsement for hardware vendors (there will be + no "where to buy" URLs here with a couple of exception). This + is simply a list of things I know work, or should work, plus + maybe a couple of notes as to what you should do to make it + work. Corrections accepted. Even better would be to send me + hardware to I can test it. + + I'll put a rough range of costs in US$ that I know about. No doubt + it'll differ from your expectations. + +1. HBAs + +Qlogic 2100, 2102 + 2200, 2202, 2204 + + There are various suffices that indicate copper or optical + connectors, or 33 vs. 66MHz PCI bus operation. None of these + have a software impact. + + Approx cost: 1K$ for a 2200 + +Qlogic 2300, 2312 + + These are the new 2-Gigabit cards. Optical only. + + Approx cost: ?????? + + +Antares P-0033, P-0034, P-0036 + + There many other vendors that use the Qlogic 2X00 chipset. Some older + 2100 boards (not on this list) have a bug in the ROM that causes a + failure to download newer firmware that is larger than 0x7fff words. + + Approx cost: 850$ for a P-0036 + + + + In general, the 2200 class chip is to be preferred. + + +2. Hubs + +Vixel 1000 +Vixel 2000 + Of the two, the 1000 (7 ports, vs. 12 ports) has had fewer problems- + it's an old workhorse. + + + Approx cost: 1.5K$ for Vixel 1000, 2.5K$ for 2000 + +Gadzoox Cappellix 3000 + Don't forget to use telnet to configure the Cappellix ports + to the role you're using them for- otherwise things don't + work well at all. + + (cost: I have no idea... certainly less than a switch) + +3. Switches + +Brocade Silkworm II +Brocade 2400 +(other brocades should be fine) + + Especially with revision 2 or higher f/w, this is now best + of breed for fabrics or segmented loop (which Brocade + calls "QuickLoop"). + + For the Silkworm II, set operating mode to "Tachyon" (mode 3). + + The web interace isn't good- but telnet is what I prefer anyhow. + + You can't connect a Silkworm II and the other Brocades together + as E-ports to make a large fabric (at least with the f/w *I* + had for the Silkworm II). + + Approx cost of a Brocade 2400 with no GBICs is about 8K$ when + I recently checked the US Government SEWP price list- no doubt + it'll be a bit more for others. I'd assume around 10K$. + +ANCOR SA-8 + + This also is a fine switch, but you have to use a browser + with working java to manage it- which is a bit of a pain. + This also supports fabric and segmented loop. + + These switches don't form E-ports with each other for a larger + fabric. + + (cost: no idea) + +McData (model unknown) + + I tried one exactly once for 30 minutes. Seemed to work once + I added the "register FC4 types" command to the driver. + + (cost: very very expensive, 40K$ plus) + +4. Cables/GBICs + + Multimode optical is adequate for Fibre Channel- the same cable is + used for Gigabit Ethernet. + + Copper DB-9 and Copper HSS-DC connectors are also fine. Copper && + Optical both are rated to 1.026Gbit- copper is naturally shorter + (the longest I've used is a 15meter cable but it's supposed to go + longer). + + The reason to use copper instead of optical is that if step on one of + the really fat DB-9 cables you can get, it'll survive. Optical usually + dies quickly if you step on it. + + Approx cost: I don't know what optical is- you can expect to pay maybe + a 100$ for a 3m copper cable. + +GBICs- + + I use Finisar copper and IBM Opticals. + + Approx Cost: Copper GBICs are 70$ each. Opticals are twice that or more. + + +Vendor: (this is the one exception I'll make because it turns out to be + an incredible pain to find FC copper cabling and GBICs- the source I + use for GBICs and copper cables is http://www.scsi-cables.com) + + +Other: + There now is apparently a source for little connector boards + to connect to bare drives: http://www.cinonic.com. + + +5. Storage JBODs/RAID + +JMR 4-Bay + + Rinky-tink, but a solid 4 bay loop only entry model. + + I paid 1000$ for mine- overprice, IMO. + +JMR Fortra + + I rather like this box. The blue LEDs are a very nice touch- you + can see them very clearly from 50 feet away. + + I paid 2000$ for one used. + +Sun A5X00 + + Very expensive (in my opinion) but well crafted. Has two SES + instances, so you can use the ses driver (and the example + code in /usr/share/examples) for power/thermal/slot monitoring. + + Approx Cost: The last I saw for a price list item on this was 22K$ + for a unpopulated (no disk drive) A5X00. + + +DataDirect E1000 RAID + + Don't connect both SCSI and FC interfaces at the same time- a SCSI + reset will cause the DataDirect to think you want to use the SCSI + interface and a LIP on the FC interface will cause it to think you + want to use the FC interface. Use only one connector at a time so + both you and the DataDirect are sure about what you want. + + Cost: I have no idea. + +Veritas ServPoint + + This is a software storage virtualization engine that + runs on Sparc/Solaris in target mode for frontend + and with other FC or SCSI as the backend storage. FreeBSD + has been used extensively to test it. + + + Cost: I have no idea. + +6. Disk Drives + + I have used lots of different Seagate and a few IBM drives and + typically have had few problems with them. These are the bare + drives with 40-pin SCA connectors in back. They go into the JBODs + you assemble. + + Seagate does make, but I can no longer find, a little paddleboard + single drive connector that goes from DB-9 FC to the 40-pin SCA + connector- primarily for you to try and evaluate a single FC drive. + + All FC-AL disk drives are dual ported (i.e., have separte 'A' and + 'B' ports- which are completely separate loops). This seems to work + reasonably enough, but I haven't tested it much. It really depends + on the JBOD you put them to carry this dual port to the outside + world. The JMR boxes have it. The Sun A5X00 you have to pay for + an extra IB card to carry it out. + + Approx Cost: You'll find that FC drives are the same cost if not + slightly cheaper than the equivalent Ultra3 SCSI drives. + +7. Recommended Configurations + +These are recommendations that are biased toward the cautious side. They +do not represent formal engineering commitments- just suggestions as to +what I would expect to work. + +A. The simpletst form of a connection topology I can suggest for +a small SAN (i.e., replacement for SCSI JBOD/RAID): + +HOST +2xxx <----------> Single Unit of Storage (JBOD, RAID) + +This is called a PL_DA (Private Loop, Direct Attach) topology. + +B. The next most simple form of a connection topology I can suggest for +a medium local SAN (where you do not plan to do dynamic insertion +and removal of devices while I/Os are active): + +HOST +2xxx <----------> +-------- + | Vixel | + | 1000 | + | +<---> Storage + | | + | +<---> Storage + | | + | +<---> Storage + -------- + +This is a Private Loop topology. Remember that this can get very unstable +if you make it too long. A good practice is to try it in a staged fashion. + +It is possible with some units to "daisy chain", e.g.: + +HOST +2xxx <----------> (JBOD, RAID) <--------> (JBOD, RAID) + +In practice I have had poor results with these configurations. They *should* +work fine, but for both the JMR and the Sun A5X00 I tend to get LIP storms +and so the second unit just isn't seen and the loop isn't stable. + +Now, this could simply be my lack of clean, newer, h/w (or, in general, +a lack of h/w), but I would recommend the use of a hub if you want to +stay with Private Loop and have more than one FC target. + +You should also note this can begin to be the basis for a shared SAN +solution. For example, the above configuration can be extended to be: + +HOST +2xxx <----------> +-------- + | Vixel | + | 1000 | + | +<---> Storage + | | + | +<---> Storage + | | + | +<---> Storage +HOST | | +2xxx <----------> +-------- + +However, note that there is nothing to mediate locking of devices, and +it is also conceivable that the reboot of one host can, by causing +a LIP storm, cause problems with the I/Os from the other host. +(in other words, this topology hasn't really been made safe yet for +this driver). + +D. You can repeat the topology in #B with a switch that is set to be +in segmented loop mode. This avoids LIPs propagating where you don't +want them to- and this makes for a much more reliable, if more expensive, +SAN. + +E. The next level of complexity is a Switched Fabric. The following topology +is good when you start to begin to get to want more performance. Private +and Public Arbitrated Loop, while 100MB/s, is a shared medium. Direct +connections to a switch can run full-duplex at full speed. + +HOST +2xxx <----------> +--------- + | Brocade| + | 2400 | + | +<---> Storage + | | + | +<---> Storage + | | + | +<---> Storage +HOST | | +2xxx <----------> +--------- + + +I would call this the best configuration available now. It can expand +substantially if you cascade switches. + +There is a hard limit of about 253 devices for each Qlogic HBA- and the +fabric login policy is simplistic (log them in as you find them). If +somebody actually runs into a configuration that's larger, let me know +and I'll work on some tools that would allow you some policy choices +as to which would be interesting devices to actually connect to. + + |