diff options
Diffstat (limited to 'Documentation/networking')
102 files changed, 24593 insertions, 0 deletions
diff --git a/Documentation/networking/.gitignore b/Documentation/networking/.gitignore new file mode 100644 index 0000000..286a568 --- /dev/null +++ b/Documentation/networking/.gitignore @@ -0,0 +1 @@ +ifenslave diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX new file mode 100644 index 0000000..1634c6d --- /dev/null +++ b/Documentation/networking/00-INDEX @@ -0,0 +1,110 @@ +00-INDEX + - this file +3c505.txt + - information on the 3Com EtherLink Plus (3c505) driver. +6pack.txt + - info on the 6pack protocol, an alternative to KISS for AX.25 +DLINK.txt + - info on the D-Link DE-600/DE-620 parallel port pocket adapters +PLIP.txt + - PLIP: The Parallel Line Internet Protocol device driver +README.sb1000 + - info on General Instrument/NextLevel SURFboard1000 cable modem. +alias.txt + - info on using alias network devices +arcnet-hardware.txt + - tons of info on ARCnet, hubs, jumper settings for ARCnet cards, etc. +arcnet.txt + - info on the using the ARCnet driver itself. +atm.txt + - info on where to get ATM programs and support for Linux. +ax25.txt + - info on using AX.25 and NET/ROM code for Linux +baycom.txt + - info on the driver for Baycom style amateur radio modems +bridge.txt + - where to get user space programs for ethernet bridging with Linux. +can.txt + - documentation on CAN protocol family. +cops.txt + - info on the COPS LocalTalk Linux driver +cs89x0.txt + - the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver +cxacru.txt + - Conexant AccessRunner USB ADSL Modem +de4x5.txt + - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver +decnet.txt + - info on using the DECnet networking layer in Linux. +depca.txt + - the Digital DEPCA/EtherWORKS DE1?? and DE2?? LANCE Ethernet driver +dgrs.txt + - the Digi International RightSwitch SE-X Ethernet driver +dmfe.txt + - info on the Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver. +e100.txt + - info on Intel's EtherExpress PRO/100 line of 10/100 boards +e1000.txt + - info on Intel's E1000 line of gigabit ethernet boards +eql.txt + - serial IP load balancing +ethertap.txt + - the Ethertap user space packet reception and transmission driver +ewrk3.txt + - the Digital EtherWORKS 3 DE203/4/5 Ethernet driver +filter.txt + - Linux Socket Filtering +fore200e.txt + - FORE Systems PCA-200E/SBA-200E ATM NIC driver info. +framerelay.txt + - info on using Frame Relay/Data Link Connection Identifier (DLCI). +generic_netlink.txt + - info on Generic Netlink +ip-sysctl.txt + - /proc/sys/net/ipv4/* variables +ip_dynaddr.txt + - IP dynamic address hack e.g. for auto-dialup links +ipddp.txt + - AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation +iphase.txt + - Interphase PCI ATM (i)Chip IA Linux driver info. +irda.txt + - where to get IrDA (infrared) utilities and info for Linux. +lapb-module.txt + - programming information of the LAPB module. +ltpc.txt + - the Apple or Farallon LocalTalk PC card driver +multicast.txt + - Behaviour of cards under Multicast +netdevices.txt + - info on network device driver functions exported to the kernel. +olympic.txt + - IBM PCI Pit/Pit-Phy/Olympic Token Ring driver info. +policy-routing.txt + - IP policy-based routing +ray_cs.txt + - Raylink Wireless LAN card driver info. +skfp.txt + - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. +smc9.txt + - the driver for SMC's 9000 series of Ethernet cards +smctr.txt + - SMC TokenCard TokenRing Linux driver info. +tcp.txt + - short blurb on how TCP output takes place. +tlan.txt + - ThunderLAN (Compaq Netelligent 10/100, Olicom OC-2xxx) driver info. +tms380tr.txt + - SysKonnect Token Ring ISA/PCI adapter driver info. +tuntap.txt + - TUN/TAP device driver, allowing user space Rx/Tx of packets. +vortex.txt + - info on using 3Com Vortex (3c590, 3c592, 3c595, 3c597) Ethernet cards. +wavelan.txt + - AT&T GIS (nee NCR) WaveLAN card: An Ethernet-like radio transceiver +x25.txt + - general info on X.25 development. +x25-iface.txt + - description of the X.25 Packet Layer to LAPB device interface. +z8530drv.txt + - info about Linux driver for Z8530 based HDLC cards for AX.25 diff --git a/Documentation/networking/3c359.txt b/Documentation/networking/3c359.txt new file mode 100644 index 0000000..4af8071 --- /dev/null +++ b/Documentation/networking/3c359.txt @@ -0,0 +1,58 @@ + +3COM PCI TOKEN LINK VELOCITY XL TOKEN RING CARDS README + +Release 0.9.0 - Release + Jul 17th 2000 Mike Phillips + + 1.2.0 - Final + Feb 17th 2002 Mike Phillips + Updated for submission to the 2.4.x kernel. + +Thanks: + Terry Murphy from 3Com for tech docs and support, + Adam D. Ligas for testing the driver. + +Note: + This driver will NOT work with the 3C339 Token Ring cards, you need +to use the tms380 driver instead. + +Options: + +The driver accepts three options: ringspeed, pkt_buf_sz and message_level. + +These options can be specified differently for each card found. + +ringspeed: Has one of three settings 0 (default), 4 or 16. 0 will +make the card autosense the ringspeed and join at the appropriate speed, +this will be the default option for most people. 4 or 16 allow you to +explicitly force the card to operate at a certain speed. The card will fail +if you try to insert it at the wrong speed. (Although some hubs will allow +this so be *very* careful). The main purpose for explicitly setting the ring +speed is for when the card is first on the ring. In autosense mode, if the card +cannot detect any active monitors on the ring it will open at the same speed as +its last opening. This can be hazardous if this speed does not match the speed +you want the ring to operate at. + +pkt_buf_sz: This is this initial receive buffer allocation size. This will +default to 4096 if no value is entered. You may increase performance of the +driver by setting this to a value larger than the network packet size, although +the driver now re-sizes buffers based on MTU settings as well. + +message_level: Controls level of messages created by the driver. Defaults to 0: +which only displays start-up and critical messages. Presently any non-zero +value will display all soft messages as well. NB This does not turn +debugging messages on, that must be done by modified the source code. + +Variable MTU size: + +The driver can handle a MTU size upto either 4500 or 18000 depending upon +ring speed. The driver also changes the size of the receive buffers as part +of the mtu re-sizing, so if you set mtu = 18000, you will need to be able +to allocate 16 * (sk_buff with 18000 buffer size) call it 18500 bytes per ring +position = 296,000 bytes of memory space, plus of course anything +necessary for the tx sk_buff's. Remember this is per card, so if you are +building routers, gateway's etc, you could start to use a lot of memory +real fast. + +2/17/02 Mike Phillips + diff --git a/Documentation/networking/3c505.txt b/Documentation/networking/3c505.txt new file mode 100644 index 0000000..72f38b1 --- /dev/null +++ b/Documentation/networking/3c505.txt @@ -0,0 +1,45 @@ +The 3Com Etherlink Plus (3c505) driver. + +This driver now uses DMA. There is currently no support for PIO operation. +The default DMA channel is 6; this is _not_ autoprobed, so you must +make sure you configure it correctly. If loading the driver as a +module, you can do this with "modprobe 3c505 dma=n". If the driver is +linked statically into the kernel, you must either use an "ether=" +statement on the command line, or change the definition of ELP_DMA in 3c505.h. + +The driver will warn you if it has to fall back on the compiled in +default DMA channel. + +If no base address is given at boot time, the driver will autoprobe +ports 0x300, 0x280 and 0x310 (in that order). If no IRQ is given, the driver +will try to probe for it. + +The driver can be used as a loadable module. + +Theoretically, one instance of the driver can now run multiple cards, +in the standard way (when loading a module, say "modprobe 3c505 +io=0x300,0x340 irq=10,11 dma=6,7" or whatever). I have not tested +this, though. + +The driver may now support revision 2 hardware; the dependency on +being able to read the host control register has been removed. This +is also untested, since I don't have a suitable card. + +Known problems: + I still see "DMA upload timed out" messages from time to time. These +seem to be fairly non-fatal though. + The card is old and slow. + +To do: + Improve probe/setup code + Test multicast and promiscuous operation + +Authors: + The driver is mainly written by Craig Southeren, email + <craigs@ineluki.apana.org.au>. + Parts of the driver (adapting the driver to 1.1.4+ kernels, + IRQ/address detection, some changes) and this README by + Juha Laiho <jlaiho@ichaos.nullnet.fi>. + DMA mode, more fixes, etc, by Philip Blundell <pjb27@cam.ac.uk> + Multicard support, Software configurable DMA, etc., by + Christopher Collins <ccollins@pcug.org.au> diff --git a/Documentation/networking/3c509.txt b/Documentation/networking/3c509.txt new file mode 100644 index 0000000..0643e3b --- /dev/null +++ b/Documentation/networking/3c509.txt @@ -0,0 +1,210 @@ +Linux and the 3Com EtherLink III Series Ethercards (driver v1.18c and higher) +---------------------------------------------------------------------------- + +This file contains the instructions and caveats for v1.18c and higher versions +of the 3c509 driver. You should not use the driver without reading this file. + +release 1.0 +28 February 2002 +Current maintainer (corrections to): + David Ruggiero <jdr@farfalle.com> + +---------------------------------------------------------------------------- + +(0) Introduction + +The following are notes and information on using the 3Com EtherLink III series +ethercards in Linux. These cards are commonly known by the most widely-used +card's 3Com model number, 3c509. They are all 10mb/s ISA-bus cards and shouldn't +be (but sometimes are) confused with the similarly-numbered PCI-bus "3c905" +(aka "Vortex" or "Boomerang") series. Kernel support for the 3c509 family is +provided by the module 3c509.c, which has code to support all of the following +models: + + 3c509 (original ISA card) + 3c509B (later revision of the ISA card; supports full-duplex) + 3c589 (PCMCIA) + 3c589B (later revision of the 3c589; supports full-duplex) + 3c529 (MCA) + 3c579 (EISA) + +Large portions of this documentation were heavily borrowed from the guide +written the original author of the 3c509 driver, Donald Becker. The master +copy of that document, which contains notes on older versions of the driver, +currently resides on Scyld web server: http://www.scyld.com/network/3c509.html. + + +(1) Special Driver Features + +Overriding card settings + +The driver allows boot- or load-time overriding of the card's detected IOADDR, +IRQ, and transceiver settings, although this capability shouldn't generally be +needed except to enable full-duplex mode (see below). An example of the syntax +for LILO parameters for doing this: + + ether=10,0x310,3,0x3c509,eth0 + +This configures the first found 3c509 card for IRQ 10, base I/O 0x310, and +transceiver type 3 (10base2). The flag "0x3c509" must be set to avoid conflicts +with other card types when overriding the I/O address. When the driver is +loaded as a module, only the IRQ and transceiver setting may be overridden. +For example, setting two cards to 10base2/IRQ10 and AUI/IRQ11 is done by using +the xcvr and irq module options: + + options 3c509 xcvr=3,1 irq=10,11 + + +(2) Full-duplex mode + +The v1.18c driver added support for the 3c509B's full-duplex capabilities. +In order to enable and successfully use full-duplex mode, three conditions +must be met: + +(a) You must have a Etherlink III card model whose hardware supports full- +duplex operations. Currently, the only members of the 3c509 family that are +positively known to support full-duplex are the 3c509B (ISA bus) and 3c589B +(PCMCIA) cards. Cards without the "B" model designation do *not* support +full-duplex mode; these include the original 3c509 (no "B"), the original +3c589, the 3c529 (MCA bus), and the 3c579 (EISA bus). + +(b) You must be using your card's 10baseT transceiver (i.e., the RJ-45 +connector), not its AUI (thick-net) or 10base2 (thin-net/coax) interfaces. +AUI and 10base2 network cabling is physically incapable of full-duplex +operation. + +(c) Most importantly, your 3c509B must be connected to a link partner that is +itself full-duplex capable. This is almost certainly one of two things: a full- +duplex-capable Ethernet switch (*not* a hub), or a full-duplex-capable NIC on +another system that's connected directly to the 3c509B via a crossover cable. + +/////Extremely important caution concerning full-duplex mode///// +Understand that the 3c509B's hardware's full-duplex support is much more +limited than that provide by more modern network interface cards. Although +at the physical layer of the network it fully supports full-duplex operation, +the card was designed before the current Ethernet auto-negotiation (N-way) +spec was written. This means that the 3c509B family ***cannot and will not +auto-negotiate a full-duplex connection with its link partner under any +circumstances, no matter how it is initialized***. If the full-duplex mode +of the 3c509B is enabled, its link partner will very likely need to be +independently _forced_ into full-duplex mode as well; otherwise various nasty +failures will occur - at the very least, you'll see massive numbers of packet +collisions. This is one of very rare circumstances where disabling auto- +negotiation and forcing the duplex mode of a network interface card or switch +would ever be necessary or desirable. + + +(3) Available Transceiver Types + +For versions of the driver v1.18c and above, the available transceiver types are: + +0 transceiver type from EEPROM config (normally 10baseT); force half-duplex +1 AUI (thick-net / DB15 connector) +2 (undefined) +3 10base2 (thin-net == coax / BNC connector) +4 10baseT (RJ-45 connector); force half-duplex mode +8 transceiver type and duplex mode taken from card's EEPROM config settings +12 10baseT (RJ-45 connector); force full-duplex mode + +Prior to driver version 1.18c, only transceiver codes 0-4 were supported. Note +that the new transceiver codes 8 and 12 are the *only* ones that will enable +full-duplex mode, no matter what the card's detected EEPROM settings might be. +This insured that merely upgrading the driver from an earlier version would +never automatically enable full-duplex mode in an existing installation; +it must always be explicitly enabled via one of these code in order to be +activated. + + +(4a) Interpretation of error messages and common problems + +Error Messages + +eth0: Infinite loop in interrupt, status 2011. +These are "mostly harmless" message indicating that the driver had too much +work during that interrupt cycle. With a status of 0x2011 you are receiving +packets faster than they can be removed from the card. This should be rare +or impossible in normal operation. Possible causes of this error report are: + + - a "green" mode enabled that slows the processor down when there is no + keyboard activity. + + - some other device or device driver hogging the bus or disabling interrupts. + Check /proc/interrupts for excessive interrupt counts. The timer tick + interrupt should always be incrementing faster than the others. + +No received packets +If a 3c509, 3c562 or 3c589 can successfully transmit packets, but never +receives packets (as reported by /proc/net/dev or 'ifconfig') you likely +have an interrupt line problem. Check /proc/interrupts to verify that the +card is actually generating interrupts. If the interrupt count is not +increasing you likely have a physical conflict with two devices trying to +use the same ISA IRQ line. The common conflict is with a sound card on IRQ10 +or IRQ5, and the easiest solution is to move the 3c509 to a different +interrupt line. If the device is receiving packets but 'ping' doesn't work, +you have a routing problem. + +Tx Carrier Errors Reported in /proc/net/dev +If an EtherLink III appears to transmit packets, but the "Tx carrier errors" +field in /proc/net/dev increments as quickly as the Tx packet count, you +likely have an unterminated network or the incorrect media transceiver selected. + +3c509B card is not detected on machines with an ISA PnP BIOS. +While the updated driver works with most PnP BIOS programs, it does not work +with all. This can be fixed by disabling PnP support using the 3Com-supplied +setup program. + +3c509 card is not detected on overclocked machines +Increase the delay time in id_read_eeprom() from the current value, 500, +to an absurdly high value, such as 5000. + + +(4b) Decoding Status and Error Messages + +The bits in the main status register are: + +value description +0x01 Interrupt latch +0x02 Tx overrun, or Rx underrun +0x04 Tx complete +0x08 Tx FIFO room available +0x10 A complete Rx packet has arrived +0x20 A Rx packet has started to arrive +0x40 The driver has requested an interrupt +0x80 Statistics counter nearly full + +The bits in the transmit (Tx) status word are: + +value description +0x02 Out-of-window collision. +0x04 Status stack overflow (normally impossible). +0x08 16 collisions. +0x10 Tx underrun (not enough PCI bus bandwidth). +0x20 Tx jabber. +0x40 Tx interrupt requested. +0x80 Status is valid (this should always be set). + + +When a transmit error occurs the driver produces a status message such as + + eth0: Transmit error, Tx status register 82 + +The two values typically seen here are: + +0x82 +Out of window collision. This typically occurs when some other Ethernet +host is incorrectly set to full duplex on a half duplex network. + +0x88 +16 collisions. This typically occurs when the network is exceptionally busy +or when another host doesn't correctly back off after a collision. If this +error is mixed with 0x82 errors it is the result of a host incorrectly set +to full duplex (see above). + +Both of these errors are the result of network problems that should be +corrected. They do not represent driver malfunction. + + +(5) Revision history (this file) + +28Feb02 v1.0 DR New; major portions based on Becker original 3c509 docs + diff --git a/Documentation/networking/6pack.txt b/Documentation/networking/6pack.txt new file mode 100644 index 0000000..d0777a1 --- /dev/null +++ b/Documentation/networking/6pack.txt @@ -0,0 +1,175 @@ +This is the 6pack-mini-HOWTO, written by + +Andreas Könsgen DG3KQ +Internet: ajk@iehk.rwth-aachen.de +AMPR-net: dg3kq@db0pra.ampr.org +AX.25: dg3kq@db0ach.#nrw.deu.eu + +Last update: April 7, 1998 + +1. What is 6pack, and what are the advantages to KISS? + +6pack is a transmission protocol for data exchange between the PC and +the TNC over a serial line. It can be used as an alternative to KISS. + +6pack has two major advantages: +- The PC is given full control over the radio + channel. Special control data is exchanged between the PC and the TNC so + that the PC knows at any time if the TNC is receiving data, if a TNC + buffer underrun or overrun has occurred, if the PTT is + set and so on. This control data is processed at a higher priority than + normal data, so a data stream can be interrupted at any time to issue an + important event. This helps to improve the channel access and timing + algorithms as everything is computed in the PC. It would even be possible + to experiment with something completely different from the known CSMA and + DAMA channel access methods. + This kind of real-time control is especially important to supply several + TNCs that are connected between each other and the PC by a daisy chain + (however, this feature is not supported yet by the Linux 6pack driver). + +- Each packet transferred over the serial line is supplied with a checksum, + so it is easy to detect errors due to problems on the serial line. + Received packets that are corrupt are not passed on to the AX.25 layer. + Damaged packets that the TNC has received from the PC are not transmitted. + +More details about 6pack are described in the file 6pack.ps that is located +in the doc directory of the AX.25 utilities package. + +2. Who has developed the 6pack protocol? + +The 6pack protocol has been developed by Ekki Plicht DF4OR, Henning Rech +DF9IC and Gunter Jost DK7WJ. A driver for 6pack, written by Gunter Jost and +Matthias Welwarsky DG2FEF, comes along with the PC version of FlexNet. +They have also written a firmware for TNCs to perform the 6pack +protocol (see section 4 below). + +3. Where can I get the latest version of 6pack for LinuX? + +At the moment, the 6pack stuff can obtained via anonymous ftp from +db0bm.automation.fh-aachen.de. In the directory /incoming/dg3kq, +there is a file named 6pack.tgz. + +4. Preparing the TNC for 6pack operation + +To be able to use 6pack, a special firmware for the TNC is needed. The EPROM +of a newly bought TNC does not contain 6pack, so you will have to +program an EPROM yourself. The image file for 6pack EPROMs should be +available on any packet radio box where PC/FlexNet can be found. The name of +the file is 6pack.bin. This file is copyrighted and maintained by the FlexNet +team. It can be used under the terms of the license that comes along +with PC/FlexNet. Please do not ask me about the internals of this file as I +don't know anything about it. I used a textual description of the 6pack +protocol to program the Linux driver. + +TNCs contain a 64kByte EPROM, the lower half of which is used for +the firmware/KISS. The upper half is either empty or is sometimes +programmed with software called TAPR. In the latter case, the TNC +is supplied with a DIP switch so you can easily change between the +two systems. When programming a new EPROM, one of the systems is replaced +by 6pack. It is useful to replace TAPR, as this software is rarely used +nowadays. If your TNC is not equipped with the switch mentioned above, you +can build in one yourself that switches over the highest address pin +of the EPROM between HIGH and LOW level. After having inserted the new EPROM +and switched to 6pack, apply power to the TNC for a first test. The connect +and the status LED are lit for about a second if the firmware initialises +the TNC correctly. + +5. Building and installing the 6pack driver + +The driver has been tested with kernel version 2.1.90. Use with older +kernels may lead to a compilation error because the interface to a kernel +function has been changed in the 2.1.8x kernels. + +How to turn on 6pack support: + +- In the linux kernel configuration program, select the code maturity level + options menu and turn on the prompting for development drivers. + +- Select the amateur radio support menu and turn on the serial port 6pack + driver. + +- Compile and install the kernel and the modules. + +To use the driver, the kissattach program delivered with the AX.25 utilities +has to be modified. + +- Do a cd to the directory that holds the kissattach sources. Edit the + kissattach.c file. At the top, insert the following lines: + + #ifndef N_6PACK + #define N_6PACK (N_AX25+1) + #endif + + Then find the line + + int disc = N_AX25; + + and replace N_AX25 by N_6PACK. + +- Recompile kissattach. Rename it to spattach to avoid confusions. + +Installing the driver: + +- Do an insmod 6pack. Look at your /var/log/messages file to check if the + module has printed its initialization message. + +- Do a spattach as you would launch kissattach when starting a KISS port. + Check if the kernel prints the message '6pack: TNC found'. + +- From here, everything should work as if you were setting up a KISS port. + The only difference is that the network device that represents + the 6pack port is called sp instead of sl or ax. So, sp0 would be the + first 6pack port. + +Although the driver has been tested on various platforms, I still declare it +ALPHA. BE CAREFUL! Sync your disks before insmoding the 6pack module +and spattaching. Watch out if your computer behaves strangely. Read section +6 of this file about known problems. + +Note that the connect and status LEDs of the TNC are controlled in a +different way than they are when the TNC is used with PC/FlexNet. When using +FlexNet, the connect LED is on if there is a connection; the status LED is +on if there is data in the buffer of the PC's AX.25 engine that has to be +transmitted. Under Linux, the 6pack layer is beyond the AX.25 layer, +so the 6pack driver doesn't know anything about connects or data that +has not yet been transmitted. Therefore the LEDs are controlled +as they are in KISS mode: The connect LED is turned on if data is transferred +from the PC to the TNC over the serial line, the status LED if data is +sent to the PC. + +6. Known problems + +When testing the driver with 2.0.3x kernels and +operating with data rates on the radio channel of 9600 Baud or higher, +the driver may, on certain systems, sometimes print the message '6pack: +bad checksum', which is due to data loss if the other station sends two +or more subsequent packets. I have been told that this is due to a problem +with the serial driver of 2.0.3x kernels. I don't know yet if the problem +still exists with 2.1.x kernels, as I have heard that the serial driver +code has been changed with 2.1.x. + +When shutting down the sp interface with ifconfig, the kernel crashes if +there is still an AX.25 connection left over which an IP connection was +running, even if that IP connection is already closed. The problem does not +occur when there is a bare AX.25 connection still running. I don't know if +this is a problem of the 6pack driver or something else in the kernel. + +The driver has been tested as a module, not yet as a kernel-builtin driver. + +The 6pack protocol supports daisy-chaining of TNCs in a token ring, which is +connected to one serial port of the PC. This feature is not implemented +and at least at the moment I won't be able to do it because I do not have +the opportunity to build a TNC daisy-chain and test it. + +Some of the comments in the source code are inaccurate. They are left from +the SLIP/KISS driver, from which the 6pack driver has been derived. +I haven't modified or removed them yet -- sorry! The code itself needs +some cleaning and optimizing. This will be done in a later release. + +If you encounter a bug or if you have a question or suggestion concerning the +driver, feel free to mail me, using the addresses given at the beginning of +this file. + +Have fun! + +Andreas diff --git a/Documentation/networking/DLINK.txt b/Documentation/networking/DLINK.txt new file mode 100644 index 0000000..55d2443 --- /dev/null +++ b/Documentation/networking/DLINK.txt @@ -0,0 +1,203 @@ +Released 1994-06-13 + + + CONTENTS: + + 1. Introduction. + 2. License. + 3. Files in this release. + 4. Installation. + 5. Problems and tuning. + 6. Using the drivers with earlier releases. + 7. Acknowledgments. + + + 1. INTRODUCTION. + + This is a set of Ethernet drivers for the D-Link DE-600/DE-620 + pocket adapters, for the parallel port on a Linux based machine. + Some adapter "clones" will also work. Xircom is _not_ a clone... + These drivers _can_ be used as loadable modules, + and were developed for use on Linux 1.1.13 and above. + For use on Linux 1.0.X, or earlier releases, see below. + + I have used these drivers for NFS, ftp, telnet and X-clients on + remote machines. Transmissions with ftp seems to work as + good as can be expected (i.e. > 80k bytes/sec) from a + parallel port...:-) Receive speeds will be about 60-80% of this. + Depending on your machine, somewhat higher speeds can be achieved. + + All comments/fixes to Bjorn Ekwall (bj0rn@blox.se). + + + 2. LICENSE. + + This program is free software; you can redistribute it + and/or modify it under the terms of the GNU General Public + License as published by the Free Software Foundation; either + version 2, or (at your option) any later version. + + This program is distributed in the hope that it will be + useful, but WITHOUT ANY WARRANTY; without even the implied + warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR + PURPOSE. See the GNU General Public License for more + details. + + You should have received a copy of the GNU General Public + License along with this program; if not, write to the Free + Software Foundation, Inc., 675 Mass Ave, Cambridge, MA + 02139, USA. + + + 3. FILES IN THIS RELEASE. + + README.DLINK This file. + de600.c The Source (may it be with You :-) for the DE-600 + de620.c ditto for the DE-620 + de620.h Macros for de620.c + + If you are upgrading from the d-link tar release, there will + also be a "dlink-patches" file that will patch Linux 1.1.18: + linux/drivers/net/Makefile + linux/drivers/net/CONFIG + linux/drivers/net/MODULES + linux/drivers/net/Space.c + linux/config.in + Apply the patch by: + "cd /usr/src; patch -p0 < linux/drivers/net/dlink-patches" + The old source, "linux/drivers/net/d_link.c", can be removed. + + + 4. INSTALLATION. + + o Get the latest net binaries, according to current net.wisdom. + + o Read the NET-2 and Ethernet HOWTOs and modify your setup. + + o If your parallel port has a strange address or irq, + modify "linux/drivers/net/CONFIG" accordingly, or adjust + the parameters in the "tuning" section in the sources. + + If you are going to use the drivers as loadable modules, do _not_ + enable them while doing "make config", but instead make sure that + the drivers are included in "linux/drivers/net/MODULES". + + If you are _not_ going to use the driver(s) as loadable modules, + but instead have them included in the kernel, remember to enable + the drivers while doing "make config". + + o To include networking and DE600/DE620 support in your kernel: + # cd /linux + (as modules:) + # make config (answer yes on CONFIG_NET and CONFIG_INET) + (else included in the kernel:) + # make config (answer yes on CONFIG _NET, _INET and _DE600 or _DE620) + # make clean + # make zImage (or whatever magic you usually do) + + o I use lilo to boot multiple kernels, so that I at least + can have one working kernel :-). If you do too, append + these lines to /etc/lilo/config: + + image = /linux/zImage + label = newlinux + root = /dev/hda2 (or whatever YOU have...) + + # /etc/lilo/install + + o Do "sync" and reboot the new kernel with a D-Link + DE-600/DE-620 pocket adapter connected. + + o The adapter can be configured with ifconfig eth? + where the actual number is decided by the kernel + when the drivers are initialized. + + + 5. "PROBLEMS" AND TUNING, + + o If you see error messages from the driver, and if the traffic + stops on the adapter, try to do "ifconfig" and "route" once + more, just as in "rc.inet1". This should take care of most + problems, including effects from power loss, or adapters that + aren't connected to the printer port in some way or another. + You can somewhat change the behaviour by enabling/disabling + the macro SHUTDOWN_WHEN_LOST in the "tuning" section. + For the DE-600 there is another macro, CHECK_LOST_DE600, + that you might want to read about in the "tuning" section. + + o Some machines have trouble handling the parallel port and + the adapter at high speed. If you experience problems: + + DE-600: + - The adapter is not recognized at boot, i.e. an Ethernet + address of 00:80:c8:... is not shown, try to add another + "; SLOW_DOWN_IO" + at DE600_SLOW_DOWN in the "tuning" section. As a last resort, + uncomment: "#define REALLY_SLOW_IO" (see <asm/io.h> for hints). + + - You experience "timeout" messages: first try to add another + "; SLOW_DOWN_IO" + at DE600_SLOW_DOWN in the "tuning" section, _then_ try to + increase the value (original value: 5) at + "if (tickssofar < 5)" near line 422. + + DE-620: + - Your parallel port might be "sluggish". To cater for + this, there are the macros LOWSPEED and READ_DELAY/WRITE_DELAY + in the "tuning" section. Your first step should be to enable + LOWSPEED, and after that you can "tune" the XXX_DELAY values. + + o If the adapter _is_ recognized at boot but you get messages + about "Network Unreachable", then the problem is probably + _not_ with the driver. Check your net configuration instead + (ifconfig and route) in "rc.inet1". + + o There is some rudimentary support for debugging, look at + the source. Use "-DDE600_DEBUG=3" or "-DDE620_DEBUG=3" + when compiling, or include it in "linux/drivers/net/CONFIG". + IF YOU HAVE PROBLEMS YOU CAN'T SOLVE: PLEASE COMPILE THE DRIVER + WITH DEBUGGING ENABLED, AND SEND ME THE RESULTING OUTPUT! + + + 6. USING THE DRIVERS WITH EARLIER RELEASES. + + The later 1.1.X releases of the Linux kernel include some + changes in the networking layer (a.k.a. NET3). This affects + these drivers in a few places. The hints that follow are + _not_ tested by me, since I don't have the disk space to keep + all releases on-line. + Known needed changes to date: + - release patchfile: some patches will fail, but they should + be easy to apply "by hand", since they are trivial. + (Space.c: d_link_init() is now called de600_probe()) + - de600.c: change "mark_bh(NET_BH)" to "mark_bh(INET_BH)". + - de620.c: (maybe) change the code around "netif_rx(skb);" to be + similar to the code around "dev_rint(...)" in de600.c + + + 7. ACKNOWLEDGMENTS. + + These drivers wouldn't have been done without the base + (and support) from Ross Biro, and D-Link Systems Inc. + The driver relies upon GPL-ed source from D-Link Systems Inc. + and from Russel Nelson at Crynwr Software <nelson@crynwr.com>. + + Additional input also from: + Donald Becker <becker@super.org>, Alan Cox <A.Cox@swansea.ac.uk> + and Fred N. van Kempen <waltje@uWalt.NL.Mugnet.ORG> + + DE-600 alpha release primary victim^H^H^H^H^H^Htester: + - Erik Proper <erikp@cs.kun.nl>. + Good input also from several users, most notably + - Mark Burton <markb@ordern.demon.co.uk>. + + DE-620 alpha release victims^H^H^H^H^H^H^Htesters: + - J. Joshua Kopper <kopper@rtsg.mot.com> + - Olav Kvittem <Olav.Kvittem@uninett.no> + - Germano Caronni <caronni@nessie.cs.id.ethz.ch> + - Jeremy Fitzhardinge <jeremy@suite.sw.oz.au> + + + Happy hacking! + + Bjorn Ekwall == bj0rn@blox.se diff --git a/Documentation/networking/LICENSE.qla3xxx b/Documentation/networking/LICENSE.qla3xxx new file mode 100644 index 0000000..2f2077e --- /dev/null +++ b/Documentation/networking/LICENSE.qla3xxx @@ -0,0 +1,46 @@ +Copyright (c) 2003-2006 QLogic Corporation +QLogic Linux Networking HBA Driver + +This program includes a device driver for Linux 2.6 that may be +distributed with QLogic hardware specific firmware binary file. +You may modify and redistribute the device driver code under the +GNU General Public License as published by the Free Software +Foundation (version 2 or a later version). + +You may redistribute the hardware specific firmware binary file +under the following terms: + + 1. Redistribution of source code (only if applicable), + must retain the above copyright notice, this list of + conditions and the following disclaimer. + + 2. Redistribution in binary form must reproduce the above + copyright notice, this list of conditions and the + following disclaimer in the documentation and/or other + materials provided with the distribution. + + 3. The name of QLogic Corporation may not be used to + endorse or promote products derived from this software + without specific prior written permission + +REGARDLESS OF WHAT LICENSING MECHANISM IS USED OR APPLICABLE, +THIS PROGRAM IS PROVIDED BY QLOGIC CORPORATION "AS IS'' AND ANY +EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR +BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, +EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED +TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON +ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. + +USER ACKNOWLEDGES AND AGREES THAT USE OF THIS PROGRAM WILL NOT +CREATE OR GIVE GROUNDS FOR A LICENSE BY IMPLICATION, ESTOPPEL, OR +OTHERWISE IN ANY INTELLECTUAL PROPERTY RIGHTS (PATENT, COPYRIGHT, +TRADE SECRET, MASK WORK, OR OTHER PROPRIETARY RIGHT) EMBODIED IN +ANY OTHER QLOGIC HARDWARE OR SOFTWARE EITHER SOLELY OR IN +COMBINATION WITH THIS PROGRAM. + diff --git a/Documentation/networking/LICENSE.qlge b/Documentation/networking/LICENSE.qlge new file mode 100644 index 0000000..123b6ed --- /dev/null +++ b/Documentation/networking/LICENSE.qlge @@ -0,0 +1,46 @@ +Copyright (c) 2003-2008 QLogic Corporation +QLogic Linux Networking HBA Driver + +This program includes a device driver for Linux 2.6 that may be +distributed with QLogic hardware specific firmware binary file. +You may modify and redistribute the device driver code under the +GNU General Public License as published by the Free Software +Foundation (version 2 or a later version). + +You may redistribute the hardware specific firmware binary file +under the following terms: + + 1. Redistribution of source code (only if applicable), + must retain the above copyright notice, this list of + conditions and the following disclaimer. + + 2. Redistribution in binary form must reproduce the above + copyright notice, this list of conditions and the + following disclaimer in the documentation and/or other + materials provided with the distribution. + + 3. The name of QLogic Corporation may not be used to + endorse or promote products derived from this software + without specific prior written permission + +REGARDLESS OF WHAT LICENSING MECHANISM IS USED OR APPLICABLE, +THIS PROGRAM IS PROVIDED BY QLOGIC CORPORATION "AS IS'' AND ANY +EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR +BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, +EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED +TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON +ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. + +USER ACKNOWLEDGES AND AGREES THAT USE OF THIS PROGRAM WILL NOT +CREATE OR GIVE GROUNDS FOR A LICENSE BY IMPLICATION, ESTOPPEL, OR +OTHERWISE IN ANY INTELLECTUAL PROPERTY RIGHTS (PATENT, COPYRIGHT, +TRADE SECRET, MASK WORK, OR OTHER PROPRIETARY RIGHT) EMBODIED IN +ANY OTHER QLOGIC HARDWARE OR SOFTWARE EITHER SOLELY OR IN +COMBINATION WITH THIS PROGRAM. + diff --git a/Documentation/networking/Makefile b/Documentation/networking/Makefile new file mode 100644 index 0000000..6d8af1a --- /dev/null +++ b/Documentation/networking/Makefile @@ -0,0 +1,8 @@ +# kbuild trick to avoid linker error. Can be omitted if a module is built. +obj- := dummy.o + +# List of programs to build +hostprogs-y := ifenslave + +# Tell kbuild to always build the programs +always := $(hostprogs-y) diff --git a/Documentation/networking/PLIP.txt b/Documentation/networking/PLIP.txt new file mode 100644 index 0000000..ad7e3f7 --- /dev/null +++ b/Documentation/networking/PLIP.txt @@ -0,0 +1,215 @@ +PLIP: The Parallel Line Internet Protocol Device + +Donald Becker (becker@super.org) +I.D.A. Supercomputing Research Center, Bowie MD 20715 + +At some point T. Thorn will probably contribute text, +Tommy Thorn (tthorn@daimi.aau.dk) + +PLIP Introduction +----------------- + +This document describes the parallel port packet pusher for Net/LGX. +This device interface allows a point-to-point connection between two +parallel ports to appear as a IP network interface. + +What is PLIP? +============= + +PLIP is Parallel Line IP, that is, the transportation of IP packages +over a parallel port. In the case of a PC, the obvious choice is the +printer port. PLIP is a non-standard, but [can use] uses the standard +LapLink null-printer cable [can also work in turbo mode, with a PLIP +cable]. [The protocol used to pack IP packages, is a simple one +initiated by Crynwr.] + +Advantages of PLIP +================== + +It's cheap, it's available everywhere, and it's easy. + +The PLIP cable is all that's needed to connect two Linux boxes, and it +can be built for very few bucks. + +Connecting two Linux boxes takes only a second's decision and a few +minutes' work, no need to search for a [supported] netcard. This might +even be especially important in the case of notebooks, where netcards +are not easily available. + +Not requiring a netcard also means that apart from connecting the +cables, everything else is software configuration [which in principle +could be made very easy.] + +Disadvantages of PLIP +===================== + +Doesn't work over a modem, like SLIP and PPP. Limited range, 15 m. +Can only be used to connect three (?) Linux boxes. Doesn't connect to +an existing Ethernet. Isn't standard (not even de facto standard, like +SLIP). + +Performance +=========== + +PLIP easily outperforms Ethernet cards....(ups, I was dreaming, but +it *is* getting late. EOB) + +PLIP driver details +------------------- + +The Linux PLIP driver is an implementation of the original Crynwr protocol, +that uses the parallel port subsystem of the kernel in order to properly +share parallel ports between PLIP and other services. + +IRQs and trigger timeouts +========================= + +When a parallel port used for a PLIP driver has an IRQ configured to it, the +PLIP driver is signaled whenever data is sent to it via the cable, such that +when no data is available, the driver isn't being used. + +However, on some machines it is hard, if not impossible, to configure an IRQ +to a certain parallel port, mainly because it is used by some other device. +On these machines, the PLIP driver can be used in IRQ-less mode, where +the PLIP driver would constantly poll the parallel port for data waiting, +and if such data is available, process it. This mode is less efficient than +the IRQ mode, because the driver has to check the parallel port many times +per second, even when no data at all is sent. Some rough measurements +indicate that there isn't a noticeable performance drop when using IRQ-less +mode as compared to IRQ mode as far as the data transfer speed is involved. +There is a performance drop on the machine hosting the driver. + +When the PLIP driver is used in IRQ mode, the timeout used for triggering a +data transfer (the maximal time the PLIP driver would allow the other side +before announcing a timeout, when trying to handshake a transfer of some +data) is, by default, 500usec. As IRQ delivery is more or less immediate, +this timeout is quite sufficient. + +When in IRQ-less mode, the PLIP driver polls the parallel port HZ times +per second (where HZ is typically 100 on most platforms, and 1024 on an +Alpha, as of this writing). Between two such polls, there are 10^6/HZ usecs. +On an i386, for example, 10^6/100 = 10000usec. It is easy to see that it is +quite possible for the trigger timeout to expire between two such polls, as +the timeout is only 500usec long. As a result, it is required to change the +trigger timeout on the *other* side of a PLIP connection, to about +10^6/HZ usecs. If both sides of a PLIP connection are used in IRQ-less mode, +this timeout is required on both sides. + +It appears that in practice, the trigger timeout can be shorter than in the +above calculation. It isn't an important issue, unless the wire is faulty, +in which case a long timeout would stall the machine when, for whatever +reason, bits are dropped. + +A utility that can perform this change in Linux is plipconfig, which is part +of the net-tools package (its location can be found in the +Documentation/Changes file). An example command would be +'plipconfig plipX trigger 10000', where plipX is the appropriate +PLIP device. + +PLIP hardware interconnection +----------------------------- + +PLIP uses several different data transfer methods. The first (and the +only one implemented in the early version of the code) uses a standard +printer "null" cable to transfer data four bits at a time using +data bit outputs connected to status bit inputs. + +The second data transfer method relies on both machines having +bi-directional parallel ports, rather than output-only ``printer'' +ports. This allows byte-wide transfers and avoids reconstructing +nibbles into bytes, leading to much faster transfers. + +Parallel Transfer Mode 0 Cable +============================== + +The cable for the first transfer mode is a standard +printer "null" cable which transfers data four bits at a time using +data bit outputs of the first port (machine T) connected to the +status bit inputs of the second port (machine R). There are five +status inputs, and they are used as four data inputs and a clock (data +strobe) input, arranged so that the data input bits appear as contiguous +bits with standard status register implementation. + +A cable that implements this protocol is available commercially as a +"Null Printer" or "Turbo Laplink" cable. It can be constructed with +two DB-25 male connectors symmetrically connected as follows: + + STROBE output 1* + D0->ERROR 2 - 15 15 - 2 + D1->SLCT 3 - 13 13 - 3 + D2->PAPOUT 4 - 12 12 - 4 + D3->ACK 5 - 10 10 - 5 + D4->BUSY 6 - 11 11 - 6 + D5,D6,D7 are 7*, 8*, 9* + AUTOFD output 14* + INIT output 16* + SLCTIN 17 - 17 + extra grounds are 18*,19*,20*,21*,22*,23*,24* + GROUND 25 - 25 +* Do not connect these pins on either end + +If the cable you are using has a metallic shield it should be +connected to the metallic DB-25 shell at one end only. + +Parallel Transfer Mode 1 +======================== + +The second data transfer method relies on both machines having +bi-directional parallel ports, rather than output-only ``printer'' +ports. This allows byte-wide transfers, and avoids reconstructing +nibbles into bytes. This cable should not be used on unidirectional +``printer'' (as opposed to ``parallel'') ports or when the machine +isn't configured for PLIP, as it will result in output driver +conflicts and the (unlikely) possibility of damage. + +The cable for this transfer mode should be constructed as follows: + + STROBE->BUSY 1 - 11 + D0->D0 2 - 2 + D1->D1 3 - 3 + D2->D2 4 - 4 + D3->D3 5 - 5 + D4->D4 6 - 6 + D5->D5 7 - 7 + D6->D6 8 - 8 + D7->D7 9 - 9 + INIT -> ACK 16 - 10 + AUTOFD->PAPOUT 14 - 12 + SLCT->SLCTIN 13 - 17 + GND->ERROR 18 - 15 + extra grounds are 19*,20*,21*,22*,23*,24* + GROUND 25 - 25 +* Do not connect these pins on either end + +Once again, if the cable you are using has a metallic shield it should +be connected to the metallic DB-25 shell at one end only. + +PLIP Mode 0 transfer protocol +============================= + +The PLIP driver is compatible with the "Crynwr" parallel port transfer +standard in Mode 0. That standard specifies the following protocol: + + send header nibble '0x8' + count-low octet + count-high octet + ... data octets + checksum octet + +Each octet is sent as + <wait for rx. '0x1?'> <send 0x10+(octet&0x0F)> + <wait for rx. '0x0?'> <send 0x00+((octet>>4)&0x0F)> + +To start a transfer the transmitting machine outputs a nibble 0x08. +That raises the ACK line, triggering an interrupt in the receiving +machine. The receiving machine disables interrupts and raises its own ACK +line. + +Restated: + +(OUT is bit 0-4, OUT.j is bit j from OUT. IN likewise) +Send_Byte: + OUT := low nibble, OUT.4 := 1 + WAIT FOR IN.4 = 1 + OUT := high nibble, OUT.4 := 0 + WAIT FOR IN.4 = 0 diff --git a/Documentation/networking/README.ipw2100 b/Documentation/networking/README.ipw2100 new file mode 100644 index 0000000..f3fcaa4 --- /dev/null +++ b/Documentation/networking/README.ipw2100 @@ -0,0 +1,294 @@ + +Intel(R) PRO/Wireless 2100 Driver for Linux in support of: + +Intel(R) PRO/Wireless 2100 Network Connection + +Copyright (C) 2003-2006, Intel Corporation + +README.ipw2100 + +Version: git-1.1.5 +Date : January 25, 2006 + +Index +----------------------------------------------- +0. IMPORTANT INFORMATION BEFORE USING THIS DRIVER +1. Introduction +2. Release git-1.1.5 Current Features +3. Command Line Parameters +4. Sysfs Helper Files +5. Radio Kill Switch +6. Dynamic Firmware +7. Power Management +8. Support +9. License + + +0. IMPORTANT INFORMATION BEFORE USING THIS DRIVER +----------------------------------------------- + +Important Notice FOR ALL USERS OR DISTRIBUTORS!!!! + +Intel wireless LAN adapters are engineered, manufactured, tested, and +quality checked to ensure that they meet all necessary local and +governmental regulatory agency requirements for the regions that they +are designated and/or marked to ship into. Since wireless LANs are +generally unlicensed devices that share spectrum with radars, +satellites, and other licensed and unlicensed devices, it is sometimes +necessary to dynamically detect, avoid, and limit usage to avoid +interference with these devices. In many instances Intel is required to +provide test data to prove regional and local compliance to regional and +governmental regulations before certification or approval to use the +product is granted. Intel's wireless LAN's EEPROM, firmware, and +software driver are designed to carefully control parameters that affect +radio operation and to ensure electromagnetic compliance (EMC). These +parameters include, without limitation, RF power, spectrum usage, +channel scanning, and human exposure. + +For these reasons Intel cannot permit any manipulation by third parties +of the software provided in binary format with the wireless WLAN +adapters (e.g., the EEPROM and firmware). Furthermore, if you use any +patches, utilities, or code with the Intel wireless LAN adapters that +have been manipulated by an unauthorized party (i.e., patches, +utilities, or code (including open source code modifications) which have +not been validated by Intel), (i) you will be solely responsible for +ensuring the regulatory compliance of the products, (ii) Intel will bear +no liability, under any theory of liability for any issues associated +with the modified products, including without limitation, claims under +the warranty and/or issues arising from regulatory non-compliance, and +(iii) Intel will not provide or be required to assist in providing +support to any third parties for such modified products. + +Note: Many regulatory agencies consider Wireless LAN adapters to be +modules, and accordingly, condition system-level regulatory approval +upon receipt and review of test data documenting that the antennas and +system configuration do not cause the EMC and radio operation to be +non-compliant. + +The drivers available for download from SourceForge are provided as a +part of a development project. Conformance to local regulatory +requirements is the responsibility of the individual developer. As +such, if you are interested in deploying or shipping a driver as part of +solution intended to be used for purposes other than development, please +obtain a tested driver from Intel Customer Support at: + +http://support.intel.com/support/notebook/sb/CS-006408.htm + + +1. Introduction +----------------------------------------------- + +This document provides a brief overview of the features supported by the +IPW2100 driver project. The main project website, where the latest +development version of the driver can be found, is: + + http://ipw2100.sourceforge.net + +There you can find the not only the latest releases, but also information about +potential fixes and patches, as well as links to the development mailing list +for the driver project. + + +2. Release git-1.1.5 Current Supported Features +----------------------------------------------- +- Managed (BSS) and Ad-Hoc (IBSS) +- WEP (shared key and open) +- Wireless Tools support +- 802.1x (tested with XSupplicant 1.0.1) + +Enabled (but not supported) features: +- Monitor/RFMon mode +- WPA/WPA2 + +The distinction between officially supported and enabled is a reflection +on the amount of validation and interoperability testing that has been +performed on a given feature. + + +3. Command Line Parameters +----------------------------------------------- + +If the driver is built as a module, the following optional parameters are used +by entering them on the command line with the modprobe command using this +syntax: + + modprobe ipw2100 [<option>=<VAL1><,VAL2>...] + +For example, to disable the radio on driver loading, enter: + + modprobe ipw2100 disable=1 + +The ipw2100 driver supports the following module parameters: + +Name Value Example: +debug 0x0-0xffffffff debug=1024 +mode 0,1,2 mode=1 /* AdHoc */ +channel int channel=3 /* Only valid in AdHoc or Monitor */ +associate boolean associate=0 /* Do NOT auto associate */ +disable boolean disable=1 /* Do not power the HW */ + + +4. Sysfs Helper Files +--------------------------- +----------------------------------------------- + +There are several ways to control the behavior of the driver. Many of the +general capabilities are exposed through the Wireless Tools (iwconfig). There +are a few capabilities that are exposed through entries in the Linux Sysfs. + + +----- Driver Level ------ +For the driver level files, look in /sys/bus/pci/drivers/ipw2100/ + + debug_level + + This controls the same global as the 'debug' module parameter. For + information on the various debugging levels available, run the 'dvals' + script found in the driver source directory. + + NOTE: 'debug_level' is only enabled if CONFIG_IPW2100_DEBUG is turn + on. + +----- Device Level ------ +For the device level files look in + + /sys/bus/pci/drivers/ipw2100/{PCI-ID}/ + +For example: + /sys/bus/pci/drivers/ipw2100/0000:02:01.0 + +For the device level files, see /sys/bus/pci/drivers/ipw2100: + + rf_kill + read - + 0 = RF kill not enabled (radio on) + 1 = SW based RF kill active (radio off) + 2 = HW based RF kill active (radio off) + 3 = Both HW and SW RF kill active (radio off) + write - + 0 = If SW based RF kill active, turn the radio back on + 1 = If radio is on, activate SW based RF kill + + NOTE: If you enable the SW based RF kill and then toggle the HW + based RF kill from ON -> OFF -> ON, the radio will NOT come back on + + +5. Radio Kill Switch +----------------------------------------------- +Most laptops provide the ability for the user to physically disable the radio. +Some vendors have implemented this as a physical switch that requires no +software to turn the radio off and on. On other laptops, however, the switch +is controlled through a button being pressed and a software driver then making +calls to turn the radio off and on. This is referred to as a "software based +RF kill switch" + +See the Sysfs helper file 'rf_kill' for determining the state of the RF switch +on your system. + + +6. Dynamic Firmware +----------------------------------------------- +As the firmware is licensed under a restricted use license, it can not be +included within the kernel sources. To enable the IPW2100 you will need a +firmware image to load into the wireless NIC's processors. + +You can obtain these images from <http://ipw2100.sf.net/firmware.php>. + +See INSTALL for instructions on installing the firmware. + + +7. Power Management +----------------------------------------------- +The IPW2100 supports the configuration of the Power Save Protocol +through a private wireless extension interface. The IPW2100 supports +the following different modes: + + off No power management. Radio is always on. + on Automatic power management + 1-5 Different levels of power management. The higher the + number the greater the power savings, but with an impact to + packet latencies. + +Power management works by powering down the radio after a certain +interval of time has passed where no packets are passed through the +radio. Once powered down, the radio remains in that state for a given +period of time. For higher power savings, the interval between last +packet processed to sleep is shorter and the sleep period is longer. + +When the radio is asleep, the access point sending data to the station +must buffer packets at the AP until the station wakes up and requests +any buffered packets. If you have an AP that does not correctly support +the PSP protocol you may experience packet loss or very poor performance +while power management is enabled. If this is the case, you will need +to try and find a firmware update for your AP, or disable power +management (via `iwconfig eth1 power off`) + +To configure the power level on the IPW2100 you use a combination of +iwconfig and iwpriv. iwconfig is used to turn power management on, off, +and set it to auto. + + iwconfig eth1 power off Disables radio power down + iwconfig eth1 power on Enables radio power management to + last set level (defaults to AUTO) + iwpriv eth1 set_power 0 Sets power level to AUTO and enables + power management if not previously + enabled. + iwpriv eth1 set_power 1-5 Set the power level as specified, + enabling power management if not + previously enabled. + +You can view the current power level setting via: + + iwpriv eth1 get_power + +It will return the current period or timeout that is configured as a string +in the form of xxxx/yyyy (z) where xxxx is the timeout interval (amount of +time after packet processing), yyyy is the period to sleep (amount of time to +wait before powering the radio and querying the access point for buffered +packets), and z is the 'power level'. If power management is turned off the +xxxx/yyyy will be replaced with 'off' -- the level reported will be the active +level if `iwconfig eth1 power on` is invoked. + + +8. Support +----------------------------------------------- + +For general development information and support, +go to: + + http://ipw2100.sf.net/ + +The ipw2100 1.1.0 driver and firmware can be downloaded from: + + http://support.intel.com + +For installation support on the ipw2100 1.1.0 driver on Linux kernels +2.6.8 or greater, email support is available from: + + http://supportmail.intel.com + +9. License +----------------------------------------------- + + Copyright(c) 2003 - 2006 Intel Corporation. All rights reserved. + + This program is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License (version 2) as + published by the Free Software Foundation. + + This program is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + You should have received a copy of the GNU General Public License along with + this program; if not, write to the Free Software Foundation, Inc., 59 + Temple Place - Suite 330, Boston, MA 02111-1307, USA. + + The full GNU General Public License is included in this distribution in the + file called LICENSE. + + License Contact Information: + James P. Ketrenos <ipw2100-admin@linux.intel.com> + Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497 + diff --git a/Documentation/networking/README.ipw2200 b/Documentation/networking/README.ipw2200 new file mode 100644 index 0000000..4f2a40f --- /dev/null +++ b/Documentation/networking/README.ipw2200 @@ -0,0 +1,472 @@ + +Intel(R) PRO/Wireless 2915ABG Driver for Linux in support of: + +Intel(R) PRO/Wireless 2200BG Network Connection +Intel(R) PRO/Wireless 2915ABG Network Connection + +Note: The Intel(R) PRO/Wireless 2915ABG Driver for Linux and Intel(R) +PRO/Wireless 2200BG Driver for Linux is a unified driver that works on +both hardware adapters listed above. In this document the Intel(R) +PRO/Wireless 2915ABG Driver for Linux will be used to reference the +unified driver. + +Copyright (C) 2004-2006, Intel Corporation + +README.ipw2200 + +Version: 1.1.2 +Date : March 30, 2006 + + +Index +----------------------------------------------- +0. IMPORTANT INFORMATION BEFORE USING THIS DRIVER +1. Introduction +1.1. Overview of features +1.2. Module parameters +1.3. Wireless Extension Private Methods +1.4. Sysfs Helper Files +1.5. Supported channels +2. Ad-Hoc Networking +3. Interacting with Wireless Tools +3.1. iwconfig mode +3.2. iwconfig sens +4. About the Version Numbers +5. Firmware installation +6. Support +7. License + + +0. IMPORTANT INFORMATION BEFORE USING THIS DRIVER +----------------------------------------------- + +Important Notice FOR ALL USERS OR DISTRIBUTORS!!!! + +Intel wireless LAN adapters are engineered, manufactured, tested, and +quality checked to ensure that they meet all necessary local and +governmental regulatory agency requirements for the regions that they +are designated and/or marked to ship into. Since wireless LANs are +generally unlicensed devices that share spectrum with radars, +satellites, and other licensed and unlicensed devices, it is sometimes +necessary to dynamically detect, avoid, and limit usage to avoid +interference with these devices. In many instances Intel is required to +provide test data to prove regional and local compliance to regional and +governmental regulations before certification or approval to use the +product is granted. Intel's wireless LAN's EEPROM, firmware, and +software driver are designed to carefully control parameters that affect +radio operation and to ensure electromagnetic compliance (EMC). These +parameters include, without limitation, RF power, spectrum usage, +channel scanning, and human exposure. + +For these reasons Intel cannot permit any manipulation by third parties +of the software provided in binary format with the wireless WLAN +adapters (e.g., the EEPROM and firmware). Furthermore, if you use any +patches, utilities, or code with the Intel wireless LAN adapters that +have been manipulated by an unauthorized party (i.e., patches, +utilities, or code (including open source code modifications) which have +not been validated by Intel), (i) you will be solely responsible for +ensuring the regulatory compliance of the products, (ii) Intel will bear +no liability, under any theory of liability for any issues associated +with the modified products, including without limitation, claims under +the warranty and/or issues arising from regulatory non-compliance, and +(iii) Intel will not provide or be required to assist in providing +support to any third parties for such modified products. + +Note: Many regulatory agencies consider Wireless LAN adapters to be +modules, and accordingly, condition system-level regulatory approval +upon receipt and review of test data documenting that the antennas and +system configuration do not cause the EMC and radio operation to be +non-compliant. + +The drivers available for download from SourceForge are provided as a +part of a development project. Conformance to local regulatory +requirements is the responsibility of the individual developer. As +such, if you are interested in deploying or shipping a driver as part of +solution intended to be used for purposes other than development, please +obtain a tested driver from Intel Customer Support at: + +http://support.intel.com/support/notebook/sb/CS-006408.htm + + +1. Introduction +----------------------------------------------- +The following sections attempt to provide a brief introduction to using +the Intel(R) PRO/Wireless 2915ABG Driver for Linux. + +This document is not meant to be a comprehensive manual on +understanding or using wireless technologies, but should be sufficient +to get you moving without wires on Linux. + +For information on building and installing the driver, see the INSTALL +file. + + +1.1. Overview of Features +----------------------------------------------- +The current release (1.1.2) supports the following features: + ++ BSS mode (Infrastructure, Managed) ++ IBSS mode (Ad-Hoc) ++ WEP (OPEN and SHARED KEY mode) ++ 802.1x EAP via wpa_supplicant and xsupplicant ++ Wireless Extension support ++ Full B and G rate support (2200 and 2915) ++ Full A rate support (2915 only) ++ Transmit power control ++ S state support (ACPI suspend/resume) + +The following features are currently enabled, but not officially +supported: + ++ WPA ++ long/short preamble support ++ Monitor mode (aka RFMon) + +The distinction between officially supported and enabled is a reflection +on the amount of validation and interoperability testing that has been +performed on a given feature. + + + +1.2. Command Line Parameters +----------------------------------------------- + +Like many modules used in the Linux kernel, the Intel(R) PRO/Wireless +2915ABG Driver for Linux allows configuration options to be provided +as module parameters. The most common way to specify a module parameter +is via the command line. + +The general form is: + +% modprobe ipw2200 parameter=value + +Where the supported parameter are: + + associate + Set to 0 to disable the auto scan-and-associate functionality of the + driver. If disabled, the driver will not attempt to scan + for and associate to a network until it has been configured with + one or more properties for the target network, for example configuring + the network SSID. Default is 1 (auto-associate) + + Example: % modprobe ipw2200 associate=0 + + auto_create + Set to 0 to disable the auto creation of an Ad-Hoc network + matching the channel and network name parameters provided. + Default is 1. + + channel + channel number for association. The normal method for setting + the channel would be to use the standard wireless tools + (i.e. `iwconfig eth1 channel 10`), but it is useful sometimes + to set this while debugging. Channel 0 means 'ANY' + + debug + If using a debug build, this is used to control the amount of debug + info is logged. See the 'dvals' and 'load' script for more info on + how to use this (the dvals and load scripts are provided as part + of the ipw2200 development snapshot releases available from the + SourceForge project at http://ipw2200.sf.net) + + led + Can be used to turn on experimental LED code. + 0 = Off, 1 = On. Default is 0. + + mode + Can be used to set the default mode of the adapter. + 0 = Managed, 1 = Ad-Hoc, 2 = Monitor + + +1.3. Wireless Extension Private Methods +----------------------------------------------- + +As an interface designed to handle generic hardware, there are certain +capabilities not exposed through the normal Wireless Tool interface. As +such, a provision is provided for a driver to declare custom, or +private, methods. The Intel(R) PRO/Wireless 2915ABG Driver for Linux +defines several of these to configure various settings. + +The general form of using the private wireless methods is: + + % iwpriv $IFNAME method parameters + +Where $IFNAME is the interface name the device is registered with +(typically eth1, customized via one of the various network interface +name managers, such as ifrename) + +The supported private methods are: + + get_mode + Can be used to report out which IEEE mode the driver is + configured to support. Example: + + % iwpriv eth1 get_mode + eth1 get_mode:802.11bg (6) + + set_mode + Can be used to configure which IEEE mode the driver will + support. + + Usage: + % iwpriv eth1 set_mode {mode} + Where {mode} is a number in the range 1-7: + 1 802.11a (2915 only) + 2 802.11b + 3 802.11ab (2915 only) + 4 802.11g + 5 802.11ag (2915 only) + 6 802.11bg + 7 802.11abg (2915 only) + + get_preamble + Can be used to report configuration of preamble length. + + set_preamble + Can be used to set the configuration of preamble length: + + Usage: + % iwpriv eth1 set_preamble {mode} + Where {mode} is one of: + 1 Long preamble only + 0 Auto (long or short based on connection) + + +1.4. Sysfs Helper Files: +----------------------------------------------- + +The Linux kernel provides a pseudo file system that can be used to +access various components of the operating system. The Intel(R) +PRO/Wireless 2915ABG Driver for Linux exposes several configuration +parameters through this mechanism. + +An entry in the sysfs can support reading and/or writing. You can +typically query the contents of a sysfs entry through the use of cat, +and can set the contents via echo. For example: + +% cat /sys/bus/pci/drivers/ipw2200/debug_level + +Will report the current debug level of the driver's logging subsystem +(only available if CONFIG_IPW2200_DEBUG was configured when the driver +was built). + +You can set the debug level via: + +% echo $VALUE > /sys/bus/pci/drivers/ipw2200/debug_level + +Where $VALUE would be a number in the case of this sysfs entry. The +input to sysfs files does not have to be a number. For example, the +firmware loader used by hotplug utilizes sysfs entries for transfering +the firmware image from user space into the driver. + +The Intel(R) PRO/Wireless 2915ABG Driver for Linux exposes sysfs entries +at two levels -- driver level, which apply to all instances of the driver +(in the event that there are more than one device installed) and device +level, which applies only to the single specific instance. + + +1.4.1 Driver Level Sysfs Helper Files +----------------------------------------------- + +For the driver level files, look in /sys/bus/pci/drivers/ipw2200/ + + debug_level + + This controls the same global as the 'debug' module parameter + + + +1.4.2 Device Level Sysfs Helper Files +----------------------------------------------- + +For the device level files, look in + + /sys/bus/pci/drivers/ipw2200/{PCI-ID}/ + +For example: + /sys/bus/pci/drivers/ipw2200/0000:02:01.0 + +For the device level files, see /sys/bus/pci/drivers/ipw2200: + + rf_kill + read - + 0 = RF kill not enabled (radio on) + 1 = SW based RF kill active (radio off) + 2 = HW based RF kill active (radio off) + 3 = Both HW and SW RF kill active (radio off) + write - + 0 = If SW based RF kill active, turn the radio back on + 1 = If radio is on, activate SW based RF kill + + NOTE: If you enable the SW based RF kill and then toggle the HW + based RF kill from ON -> OFF -> ON, the radio will NOT come back on + + ucode + read-only access to the ucode version number + + led + read - + 0 = LED code disabled + 1 = LED code enabled + write - + 0 = Disable LED code + 1 = Enable LED code + + NOTE: The LED code has been reported to hang some systems when + running ifconfig and is therefore disabled by default. + + +1.5. Supported channels +----------------------------------------------- + +Upon loading the Intel(R) PRO/Wireless 2915ABG Driver for Linux, a +message stating the detected geography code and the number of 802.11 +channels supported by the card will be displayed in the log. + +The geography code corresponds to a regulatory domain as shown in the +table below. + + Supported channels +Code Geography 802.11bg 802.11a + +--- Restricted 11 0 +ZZF Custom US/Canada 11 8 +ZZD Rest of World 13 0 +ZZA Custom USA & Europe & High 11 13 +ZZB Custom NA & Europe 11 13 +ZZC Custom Japan 11 4 +ZZM Custom 11 0 +ZZE Europe 13 19 +ZZJ Custom Japan 14 4 +ZZR Rest of World 14 0 +ZZH High Band 13 4 +ZZG Custom Europe 13 4 +ZZK Europe 13 24 +ZZL Europe 11 13 + + +2. Ad-Hoc Networking +----------------------------------------------- + +When using a device in an Ad-Hoc network, it is useful to understand the +sequence and requirements for the driver to be able to create, join, or +merge networks. + +The following attempts to provide enough information so that you can +have a consistent experience while using the driver as a member of an +Ad-Hoc network. + +2.1. Joining an Ad-Hoc Network +----------------------------------------------- + +The easiest way to get onto an Ad-Hoc network is to join one that +already exists. + +2.2. Creating an Ad-Hoc Network +----------------------------------------------- + +An Ad-Hoc networks is created using the syntax of the Wireless tool. + +For Example: +iwconfig eth1 mode ad-hoc essid testing channel 2 + +2.3. Merging Ad-Hoc Networks +----------------------------------------------- + + +3. Interaction with Wireless Tools +----------------------------------------------- + +3.1 iwconfig mode +----------------------------------------------- + +When configuring the mode of the adapter, all run-time configured parameters +are reset to the value used when the module was loaded. This includes +channels, rates, ESSID, etc. + +3.2 iwconfig sens +----------------------------------------------- + +The 'iwconfig ethX sens XX' command will not set the signal sensitivity +threshold, as described in iwconfig documentation, but rather the number +of consecutive missed beacons that will trigger handover, i.e. roaming +to another access point. At the same time, it will set the disassociation +threshold to 3 times the given value. + + +4. About the Version Numbers +----------------------------------------------- + +Due to the nature of open source development projects, there are +frequently changes being incorporated that have not gone through +a complete validation process. These changes are incorporated into +development snapshot releases. + +Releases are numbered with a three level scheme: + + major.minor.development + +Any version where the 'development' portion is 0 (for example +1.0.0, 1.1.0, etc.) indicates a stable version that will be made +available for kernel inclusion. + +Any version where the 'development' portion is not a 0 (for +example 1.0.1, 1.1.5, etc.) indicates a development version that is +being made available for testing and cutting edge users. The stability +and functionality of the development releases are not know. We make +efforts to try and keep all snapshots reasonably stable, but due to the +frequency of their release, and the desire to get those releases +available as quickly as possible, unknown anomalies should be expected. + +The major version number will be incremented when significant changes +are made to the driver. Currently, there are no major changes planned. + +5. Firmware installation +---------------------------------------------- + +The driver requires a firmware image, download it and extract the +files under /lib/firmware (or wherever your hotplug's firmware.agent +will look for firmware files) + +The firmware can be downloaded from the following URL: + + http://ipw2200.sf.net/ + + +6. Support +----------------------------------------------- + +For direct support of the 1.0.0 version, you can contact +http://supportmail.intel.com, or you can use the open source project +support. + +For general information and support, go to: + + http://ipw2200.sf.net/ + + +7. License +----------------------------------------------- + + Copyright(c) 2003 - 2006 Intel Corporation. All rights reserved. + + This program is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License version 2 as + published by the Free Software Foundation. + + This program is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + You should have received a copy of the GNU General Public License along with + this program; if not, write to the Free Software Foundation, Inc., 59 + Temple Place - Suite 330, Boston, MA 02111-1307, USA. + + The full GNU General Public License is included in this distribution in the + file called LICENSE. + + Contact Information: + James P. Ketrenos <ipw2100-admin@linux.intel.com> + Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497 + diff --git a/Documentation/networking/README.sb1000 b/Documentation/networking/README.sb1000 new file mode 100644 index 0000000..f82d425 --- /dev/null +++ b/Documentation/networking/README.sb1000 @@ -0,0 +1,207 @@ +sb1000 is a module network device driver for the General Instrument (also known +as NextLevel) SURFboard1000 internal cable modem board. This is an ISA card +which is used by a number of cable TV companies to provide cable modem access. +It's a one-way downstream-only cable modem, meaning that your upstream net link +is provided by your regular phone modem. + +This driver was written by Franco Venturi <fventuri@mediaone.net>. He deserves +a great deal of thanks for this wonderful piece of code! + +----------------------------------------------------------------------------- + +Support for this device is now a part of the standard Linux kernel. The +driver source code file is drivers/net/sb1000.c. In addition to this +you will need: + +1.) The "cmconfig" program. This is a utility which supplements "ifconfig" +to configure the cable modem and network interface (usually called "cm0"); +and + +2.) Several PPP scripts which live in /etc/ppp to make connecting via your +cable modem easy. + + These utilities can be obtained from: + + http://www.jacksonville.net/~fventuri/ + + in Franco's original source code distribution .tar.gz file. Support for + the sb1000 driver can be found at: + + http://home.adelphia.net/~siglercm/sb1000.html + http://linuxpower.cx/~cable/ + + along with these utilities. + +3.) The standard isapnp tools. These are necessary to configure your SB1000 +card at boot time (or afterwards by hand) since it's a PnP card. + + If you don't have these installed as a standard part of your Linux + distribution, you can find them at: + + http://www.roestock.demon.co.uk/isapnptools/ + + or check your Linux distribution binary CD or their web site. For help with + isapnp, pnpdump, or /etc/isapnp.conf, go to: + + http://www.roestock.demon.co.uk/isapnptools/isapnpfaq.html + +----------------------------------------------------------------------------- + +To make the SB1000 card work, follow these steps: + +1.) Run `make config', or `make menuconfig', or `make xconfig', whichever +you prefer, in the top kernel tree directory to set up your kernel +configuration. Make sure to say "Y" to "Prompt for development drivers" +and to say "M" to the sb1000 driver. Also say "Y" or "M" to all the standard +networking questions to get TCP/IP and PPP networking support. + +2.) *BEFORE* you build the kernel, edit drivers/net/sb1000.c. Make sure +to redefine the value of READ_DATA_PORT to match the I/O address used +by isapnp to access your PnP cards. This is the value of READPORT in +/etc/isapnp.conf or given by the output of pnpdump. + +3.) Build and install the kernel and modules as usual. + +4.) Boot your new kernel following the usual procedures. + +5.) Set up to configure the new SB1000 PnP card by capturing the output +of "pnpdump" to a file and editing this file to set the correct I/O ports, +IRQ, and DMA settings for all your PnP cards. Make sure none of the settings +conflict with one another. Then test this configuration by running the +"isapnp" command with your new config file as the input. Check for +errors and fix as necessary. (As an aside, I use I/O ports 0x110 and +0x310 and IRQ 11 for my SB1000 card and these work well for me. YMMV.) +Then save the finished config file as /etc/isapnp.conf for proper configuration +on subsequent reboots. + +6.) Download the original file sb1000-1.1.2.tar.gz from Franco's site or one of +the others referenced above. As root, unpack it into a temporary directory and +do a `make cmconfig' and then `install -c cmconfig /usr/local/sbin'. Don't do +`make install' because it expects to find all the utilities built and ready for +installation, not just cmconfig. + +7.) As root, copy all the files under the ppp/ subdirectory in Franco's +tar file into /etc/ppp, being careful not to overwrite any files that are +already in there. Then modify ppp@gi-on to set the correct login name, +phone number, and frequency for the cable modem. Also edit pap-secrets +to specify your login name and password and any site-specific information +you need. + +8.) Be sure to modify /etc/ppp/firewall to use ipchains instead of +the older ipfwadm commands from the 2.0.x kernels. There's a neat utility to +convert ipfwadm commands to ipchains commands: + + http://users.dhp.com/~whisper/ipfwadm2ipchains/ + +You may also wish to modify the firewall script to implement a different +firewalling scheme. + +9.) Start the PPP connection via the script /etc/ppp/ppp@gi-on. You must be +root to do this. It's better to use a utility like sudo to execute +frequently used commands like this with root permissions if possible. If you +connect successfully the cable modem interface will come up and you'll see a +driver message like this at the console: + + cm0: sb1000 at (0x110,0x310), csn 1, S/N 0x2a0d16d8, IRQ 11. + sb1000.c:v1.1.2 6/01/98 (fventuri@mediaone.net) + +The "ifconfig" command should show two new interfaces, ppp0 and cm0. +The command "cmconfig cm0" will give you information about the cable modem +interface. + +10.) Try pinging a site via `ping -c 5 www.yahoo.com', for example. You should +see packets received. + +11.) If you can't get site names (like www.yahoo.com) to resolve into +IP addresses (like 204.71.200.67), be sure your /etc/resolv.conf file +has no syntax errors and has the right nameserver IP addresses in it. +If this doesn't help, try something like `ping -c 5 204.71.200.67' to +see if the networking is running but the DNS resolution is where the +problem lies. + +12.) If you still have problems, go to the support web sites mentioned above +and read the information and documentation there. + +----------------------------------------------------------------------------- + +Common problems: + +1.) Packets go out on the ppp0 interface but don't come back on the cm0 +interface. It looks like I'm connected but I can't even ping any +numerical IP addresses. (This happens predominantly on Debian systems due +to a default boot-time configuration script.) + +Solution -- As root `echo 0 > /proc/sys/net/ipv4/conf/cm0/rp_filter' so it +can share the same IP address as the ppp0 interface. Note that this +command should probably be added to the /etc/ppp/cablemodem script +*right*between* the "/sbin/ifconfig" and "/sbin/cmconfig" commands. +You may need to do this to /proc/sys/net/ipv4/conf/ppp0/rp_filter as well. +If you do this to /proc/sys/net/ipv4/conf/default/rp_filter on each reboot +(in rc.local or some such) then any interfaces can share the same IP +addresses. + +2.) I get "unresolved symbol" error messages on executing `insmod sb1000.o'. + +Solution -- You probably have a non-matching kernel source tree and +/usr/include/linux and /usr/include/asm header files. Make sure you +install the correct versions of the header files in these two directories. +Then rebuild and reinstall the kernel. + +3.) When isapnp runs it reports an error, and my SB1000 card isn't working. + +Solution -- There's a problem with later versions of isapnp using the "(CHECK)" +option in the lines that allocate the two I/O addresses for the SB1000 card. +This first popped up on RH 6.0. Delete "(CHECK)" for the SB1000 I/O addresses. +Make sure they don't conflict with any other pieces of hardware first! Then +rerun isapnp and go from there. + +4.) I can't execute the /etc/ppp/ppp@gi-on file. + +Solution -- As root do `chmod ug+x /etc/ppp/ppp@gi-on'. + +5.) The firewall script isn't working (with 2.2.x and higher kernels). + +Solution -- Use the ipfwadm2ipchains script referenced above to convert the +/etc/ppp/firewall script from the deprecated ipfwadm commands to ipchains. + +6.) I'm getting *tons* of firewall deny messages in the /var/kern.log, +/var/messages, and/or /var/syslog files, and they're filling up my /var +partition!!! + +Solution -- First, tell your ISP that you're receiving DoS (Denial of Service) +and/or portscanning (UDP connection attempts) attacks! Look over the deny +messages to figure out what the attack is and where it's coming from. Next, +edit /etc/ppp/cablemodem and make sure the ",nobroadcast" option is turned on +to the "cmconfig" command (uncomment that line). If you're not receiving these +denied packets on your broadcast interface (IP address xxx.yyy.zzz.255 +typically), then someone is attacking your machine in particular. Be careful +out there.... + +7.) Everything seems to work fine but my computer locks up after a while +(and typically during a lengthy download through the cable modem)! + +Solution -- You may need to add a short delay in the driver to 'slow down' the +SURFboard because your PC might not be able to keep up with the transfer rate +of the SB1000. To do this, it's probably best to download Franco's +sb1000-1.1.2.tar.gz archive and build and install sb1000.o manually. You'll +want to edit the 'Makefile' and look for the 'SB1000_DELAY' +define. Uncomment those 'CFLAGS' lines (and comment out the default ones) +and try setting the delay to something like 60 microseconds with: +'-DSB1000_DELAY=60'. Then do `make' and as root `make install' and try +it out. If it still doesn't work or you like playing with the driver, you may +try other numbers. Remember though that the higher the delay, the slower the +driver (which slows down the rest of the PC too when it is actively +used). Thanks to Ed Daiga for this tip! + +----------------------------------------------------------------------------- + +Credits: This README came from Franco Venturi's original README file which is +still supplied with his driver .tar.gz archive. I and all other sb1000 users +owe Franco a tremendous "Thank you!" Additional thanks goes to Carl Patten +and Ralph Bonnell who are now managing the Linux SB1000 web site, and to +the SB1000 users who reported and helped debug the common problems listed +above. + + + Clemmitt Sigler + csigler@vt.edu diff --git a/Documentation/networking/alias.txt b/Documentation/networking/alias.txt new file mode 100644 index 0000000..cd12c2f --- /dev/null +++ b/Documentation/networking/alias.txt @@ -0,0 +1,53 @@ + +IP-Aliasing: +============ + +IP-aliases are additional IP-addresses/masks hooked up to a base +interface by adding a colon and a string when running ifconfig. +This string is usually numeric, but this is not a must. + +IP-Aliases are avail if CONFIG_INET (`standard' IPv4 networking) +is configured in the kernel. + + +o Alias creation. + Alias creation is done by 'magic' interface naming: eg. to create a + 200.1.1.1 alias for eth0 ... + + # ifconfig eth0:0 200.1.1.1 etc,etc.... + ~~ -> request alias #0 creation (if not yet exists) for eth0 + + The corresponding route is also set up by this command. + Please note: The route always points to the base interface. + + +o Alias deletion. + The alias is removed by shutting the alias down: + + # ifconfig eth0:0 down + ~~~~~~~~~~ -> will delete alias + + +o Alias (re-)configuring + + Aliases are not real devices, but programs should be able to configure and + refer to them as usual (ifconfig, route, etc). + + +o Relationship with main device + + If the base device is shut down the added aliases will be deleted + too. + + +Contact +------- +Please finger or e-mail me: + Juan Jose Ciarlante <jjciarla@raiz.uncu.edu.ar> + +Updated by Erik Schoenfelder <schoenfr@gaertner.DE> + +; local variables: +; mode: indented-text +; mode: auto-fill +; end: diff --git a/Documentation/networking/arcnet-hardware.txt b/Documentation/networking/arcnet-hardware.txt new file mode 100644 index 0000000..731de41 --- /dev/null +++ b/Documentation/networking/arcnet-hardware.txt @@ -0,0 +1,3133 @@ + +----------------------------------------------------------------------------- +1) This file is a supplement to arcnet.txt. Please read that for general + driver configuration help. +----------------------------------------------------------------------------- +2) This file is no longer Linux-specific. It should probably be moved out of + the kernel sources. Ideas? +----------------------------------------------------------------------------- + +Because so many people (myself included) seem to have obtained ARCnet cards +without manuals, this file contains a quick introduction to ARCnet hardware, +some cabling tips, and a listing of all jumper settings I can find. Please +e-mail apenwarr@worldvisions.ca with any settings for your particular card, +or any other information you have! + + +INTRODUCTION TO ARCNET +---------------------- + +ARCnet is a network type which works in a way similar to popular Ethernet +networks but which is also different in some very important ways. + +First of all, you can get ARCnet cards in at least two speeds: 2.5 Mbps +(slower than Ethernet) and 100 Mbps (faster than normal Ethernet). In fact, +there are others as well, but these are less common. The different hardware +types, as far as I'm aware, are not compatible and so you cannot wire a +100 Mbps card to a 2.5 Mbps card, and so on. From what I hear, my driver does +work with 100 Mbps cards, but I haven't been able to verify this myself, +since I only have the 2.5 Mbps variety. It is probably not going to saturate +your 100 Mbps card. Stop complaining. :) + +You also cannot connect an ARCnet card to any kind of Ethernet card and +expect it to work. + +There are two "types" of ARCnet - STAR topology and BUS topology. This +refers to how the cards are meant to be wired together. According to most +available documentation, you can only connect STAR cards to STAR cards and +BUS cards to BUS cards. That makes sense, right? Well, it's not quite +true; see below under "Cabling." + +Once you get past these little stumbling blocks, ARCnet is actually quite a +well-designed standard. It uses something called "modified token passing" +which makes it completely incompatible with so-called "Token Ring" cards, +but which makes transfers much more reliable than Ethernet does. In fact, +ARCnet will guarantee that a packet arrives safely at the destination, and +even if it can't possibly be delivered properly (ie. because of a cable +break, or because the destination computer does not exist) it will at least +tell the sender about it. + +Because of the carefully defined action of the "token", it will always make +a pass around the "ring" within a maximum length of time. This makes it +useful for realtime networks. + +In addition, all known ARCnet cards have an (almost) identical programming +interface. This means that with one ARCnet driver you can support any +card, whereas with Ethernet each manufacturer uses what is sometimes a +completely different programming interface, leading to a lot of different, +sometimes very similar, Ethernet drivers. Of course, always using the same +programming interface also means that when high-performance hardware +facilities like PCI bus mastering DMA appear, it's hard to take advantage of +them. Let's not go into that. + +One thing that makes ARCnet cards difficult to program for, however, is the +limit on their packet sizes; standard ARCnet can only send packets that are +up to 508 bytes in length. This is smaller than the Internet "bare minimum" +of 576 bytes, let alone the Ethernet MTU of 1500. To compensate, an extra +level of encapsulation is defined by RFC1201, which I call "packet +splitting," that allows "virtual packets" to grow as large as 64K each, +although they are generally kept down to the Ethernet-style 1500 bytes. + +For more information on the advantages and disadvantages (mostly the +advantages) of ARCnet networks, you might try the "ARCnet Trade Association" +WWW page: + http://www.arcnet.com + + +CABLING ARCNET NETWORKS +----------------------- + +This section was rewritten by + Vojtech Pavlik <vojtech@suse.cz> +using information from several people, including: + Avery Pennraun <apenwarr@worldvisions.ca> + Stephen A. Wood <saw@hallc1.cebaf.gov> + John Paul Morrison <jmorriso@bogomips.ee.ubc.ca> + Joachim Koenig <jojo@repas.de> +and Avery touched it up a bit, at Vojtech's request. + +ARCnet (the classic 2.5 Mbps version) can be connected by two different +types of cabling: coax and twisted pair. The other ARCnet-type networks +(100 Mbps TCNS and 320 kbps - 32 Mbps ARCnet Plus) use different types of +cabling (Type1, Fiber, C1, C4, C5). + +For a coax network, you "should" use 93 Ohm RG-62 cable. But other cables +also work fine, because ARCnet is a very stable network. I personally use 75 +Ohm TV antenna cable. + +Cards for coax cabling are shipped in two different variants: for BUS and +STAR network topologies. They are mostly the same. The only difference +lies in the hybrid chip installed. BUS cards use high impedance output, +while STAR use low impedance. Low impedance card (STAR) is electrically +equal to a high impedance one with a terminator installed. + +Usually, the ARCnet networks are built up from STAR cards and hubs. There +are two types of hubs - active and passive. Passive hubs are small boxes +with four BNC connectors containing four 47 Ohm resistors: + + | | wires + R + junction +-R-+-R- R 47 Ohm resistors + R + | + +The shielding is connected together. Active hubs are much more complicated; +they are powered and contain electronics to amplify the signal and send it +to other segments of the net. They usually have eight connectors. Active +hubs come in two variants - dumb and smart. The dumb variant just +amplifies, but the smart one decodes to digital and encodes back all packets +coming through. This is much better if you have several hubs in the net, +since many dumb active hubs may worsen the signal quality. + +And now to the cabling. What you can connect together: + +1. A card to a card. This is the simplest way of creating a 2-computer + network. + +2. A card to a passive hub. Remember that all unused connectors on the hub + must be properly terminated with 93 Ohm (or something else if you don't + have the right ones) terminators. + (Avery's note: oops, I didn't know that. Mine (TV cable) works + anyway, though.) + +3. A card to an active hub. Here is no need to terminate the unused + connectors except some kind of aesthetic feeling. But, there may not be + more than eleven active hubs between any two computers. That of course + doesn't limit the number of active hubs on the network. + +4. An active hub to another. + +5. An active hub to passive hub. + +Remember that you cannot connect two passive hubs together. The power loss +implied by such a connection is too high for the net to operate reliably. + +An example of a typical ARCnet network: + + R S - STAR type card + S------H--------A-------S R - Terminator + | | H - Hub + | | A - Active hub + | S----H----S + S | + | + S + +The BUS topology is very similar to the one used by Ethernet. The only +difference is in cable and terminators: they should be 93 Ohm. Ethernet +uses 50 Ohm impedance. You use T connectors to put the computers on a single +line of cable, the bus. You have to put terminators at both ends of the +cable. A typical BUS ARCnet network looks like: + + RT----T------T------T------T------TR + B B B B B B + + B - BUS type card + R - Terminator + T - T connector + +But that is not all! The two types can be connected together. According to +the official documentation the only way of connecting them is using an active +hub: + + A------T------T------TR + | B B B + S---H---S + | + S + +The official docs also state that you can use STAR cards at the ends of +BUS network in place of a BUS card and a terminator: + + S------T------T------S + B B + +But, according to my own experiments, you can simply hang a BUS type card +anywhere in middle of a cable in a STAR topology network. And more - you +can use the bus card in place of any star card if you use a terminator. Then +you can build very complicated networks fulfilling all your needs! An +example: + + S + | + RT------T-------T------H------S + B B B | + | R + S------A------T-------T-------A-------H------TR + | B B | | B + | S BT | + | | | S----A-----S + S------H---A----S | | + | | S------T----H---S | + S S B R S + +A basically different cabling scheme is used with Twisted Pair cabling. Each +of the TP cards has two RJ (phone-cord style) connectors. The cards are +then daisy-chained together using a cable connecting every two neighboring +cards. The ends are terminated with RJ 93 Ohm terminators which plug into +the empty connectors of cards on the ends of the chain. An example: + + ___________ ___________ + _R_|_ _|_|_ _|_R_ + | | | | | | + |Card | |Card | |Card | + |_____| |_____| |_____| + + +There are also hubs for the TP topology. There is nothing difficult +involved in using them; you just connect a TP chain to a hub on any end or +even at both. This way you can create almost any network configuration. +The maximum of 11 hubs between any two computers on the net applies here as +well. An example: + + RP-------P--------P--------H-----P------P-----PR + | + RP-----H--------P--------H-----P------PR + | | + PR PR + + R - RJ Terminator + P - TP Card + H - TP Hub + +Like any network, ARCnet has a limited cable length. These are the maximum +cable lengths between two active ends (an active end being an active hub or +a STAR card). + + RG-62 93 Ohm up to 650 m + RG-59/U 75 Ohm up to 457 m + RG-11/U 75 Ohm up to 533 m + IBM Type 1 150 Ohm up to 200 m + IBM Type 3 100 Ohm up to 100 m + +The maximum length of all cables connected to a passive hub is limited to 65 +meters for RG-62 cabling; less for others. You can see that using passive +hubs in a large network is a bad idea. The maximum length of a single "BUS +Trunk" is about 300 meters for RG-62. The maximum distance between the two +most distant points of the net is limited to 3000 meters. The maximum length +of a TP cable between two cards/hubs is 650 meters. + + +SETTING THE JUMPERS +------------------- + +All ARCnet cards should have a total of four or five different settings: + + - the I/O address: this is the "port" your ARCnet card is on. Probed + values in the Linux ARCnet driver are only from 0x200 through 0x3F0. (If + your card has additional ones, which is possible, please tell me.) This + should not be the same as any other device on your system. According to + a doc I got from Novell, MS Windows prefers values of 0x300 or more, + eating net connections on my system (at least) otherwise. My guess is + this may be because, if your card is at 0x2E0, probing for a serial port + at 0x2E8 will reset the card and probably mess things up royally. + - Avery's favourite: 0x300. + + - the IRQ: on 8-bit cards, it might be 2 (9), 3, 4, 5, or 7. + on 16-bit cards, it might be 2 (9), 3, 4, 5, 7, or 10-15. + + Make sure this is different from any other card on your system. Note + that IRQ2 is the same as IRQ9, as far as Linux is concerned. You can + "cat /proc/interrupts" for a somewhat complete list of which ones are in + use at any given time. Here is a list of common usages from Vojtech + Pavlik <vojtech@suse.cz>: + ("Not on bus" means there is no way for a card to generate this + interrupt) + IRQ 0 - Timer 0 (Not on bus) + IRQ 1 - Keyboard (Not on bus) + IRQ 2 - IRQ Controller 2 (Not on bus, nor does interrupt the CPU) + IRQ 3 - COM2 + IRQ 4 - COM1 + IRQ 5 - FREE (LPT2 if you have it; sometimes COM3; maybe PLIP) + IRQ 6 - Floppy disk controller + IRQ 7 - FREE (LPT1 if you don't use the polling driver; PLIP) + IRQ 8 - Realtime Clock Interrupt (Not on bus) + IRQ 9 - FREE (VGA vertical sync interrupt if enabled) + IRQ 10 - FREE + IRQ 11 - FREE + IRQ 12 - FREE + IRQ 13 - Numeric Coprocessor (Not on bus) + IRQ 14 - Fixed Disk Controller + IRQ 15 - FREE (Fixed Disk Controller 2 if you have it) + + Note: IRQ 9 is used on some video cards for the "vertical retrace" + interrupt. This interrupt would have been handy for things like + video games, as it occurs exactly once per screen refresh, but + unfortunately IBM cancelled this feature starting with the original + VGA and thus many VGA/SVGA cards do not support it. For this + reason, no modern software uses this interrupt and it can almost + always be safely disabled, if your video card supports it at all. + + If your card for some reason CANNOT disable this IRQ (usually there + is a jumper), one solution would be to clip the printed circuit + contact on the board: it's the fourth contact from the left on the + back side. I take no responsibility if you try this. + + - Avery's favourite: IRQ2 (actually IRQ9). Watch that VGA, though. + + - the memory address: Unlike most cards, ARCnets use "shared memory" for + copying buffers around. Make SURE it doesn't conflict with any other + used memory in your system! + A0000 - VGA graphics memory (ok if you don't have VGA) + B0000 - Monochrome text mode + C0000 \ One of these is your VGA BIOS - usually C0000. + E0000 / + F0000 - System BIOS + + Anything less than 0xA0000 is, well, a BAD idea since it isn't above + 640k. + - Avery's favourite: 0xD0000 + + - the station address: Every ARCnet card has its own "unique" network + address from 0 to 255. Unlike Ethernet, you can set this address + yourself with a jumper or switch (or on some cards, with special + software). Since it's only 8 bits, you can only have 254 ARCnet cards + on a network. DON'T use 0 or 255, since these are reserved (although + neat stuff will probably happen if you DO use them). By the way, if you + haven't already guessed, don't set this the same as any other ARCnet on + your network! + - Avery's favourite: 3 and 4. Not that it matters. + + - There may be ETS1 and ETS2 settings. These may or may not make a + difference on your card (many manuals call them "reserved"), but are + used to change the delays used when powering up a computer on the + network. This is only necessary when wiring VERY long range ARCnet + networks, on the order of 4km or so; in any case, the only real + requirement here is that all cards on the network with ETS1 and ETS2 + jumpers have them in the same position. Chris Hindy <chrish@io.org> + sent in a chart with actual values for this: + ET1 ET2 Response Time Reconfiguration Time + --- --- ------------- -------------------- + open open 74.7us 840us + open closed 283.4us 1680us + closed open 561.8us 1680us + closed closed 1118.6us 1680us + + Make sure you set ETS1 and ETS2 to the SAME VALUE for all cards on your + network. + +Also, on many cards (not mine, though) there are red and green LED's. +Vojtech Pavlik <vojtech@suse.cz> tells me this is what they mean: + GREEN RED Status + ----- --- ------ + OFF OFF Power off + OFF Short flashes Cabling problems (broken cable or not + terminated) + OFF (short) ON Card init + ON ON Normal state - everything OK, nothing + happens + ON Long flashes Data transfer + ON OFF Never happens (maybe when wrong ID) + + +The following is all the specific information people have sent me about +their own particular ARCnet cards. It is officially a mess, and contains +huge amounts of duplicated information. I have no time to fix it. If you +want to, PLEASE DO! Just send me a 'diff -u' of all your changes. + +The model # is listed right above specifics for that card, so you should be +able to use your text viewer's "search" function to find the entry you want. +If you don't KNOW what kind of card you have, try looking through the +various diagrams to see if you can tell. + +If your model isn't listed and/or has different settings, PLEASE PLEASE +tell me. I had to figure mine out without the manual, and it WASN'T FUN! + +Even if your ARCnet model isn't listed, but has the same jumpers as another +model that is, please e-mail me to say so. + +Cards Listed in this file (in this order, mostly): + + Manufacturer Model # Bits + ------------ ------- ---- + SMC PC100 8 + SMC PC110 8 + SMC PC120 8 + SMC PC130 8 + SMC PC270E 8 + SMC PC500 16 + SMC PC500Longboard 16 + SMC PC550Longboard 16 + SMC PC600 16 + SMC PC710 8 + SMC? LCS-8830(-T) 8/16 + Puredata PDI507 8 + CNet Tech CN120-Series 8 + CNet Tech CN160-Series 16 + Lantech? UM9065L chipset 8 + Acer 5210-003 8 + Datapoint? LAN-ARC-8 8 + Topware TA-ARC/10 8 + Thomas-Conrad 500-6242-0097 REV A 8 + Waterloo? (C)1985 Waterloo Micro. 8 + No Name -- 8/16 + No Name Taiwan R.O.C? 8 + No Name Model 9058 8 + Tiara Tiara Lancard? 8 + + +** SMC = Standard Microsystems Corp. +** CNet Tech = CNet Technology, Inc. + + +Unclassified Stuff +------------------ + - Please send any other information you can find. + + - And some other stuff (more info is welcome!): + From: root@ultraworld.xs4all.nl (Timo Hilbrink) + To: apenwarr@foxnet.net (Avery Pennarun) + Date: Wed, 26 Oct 1994 02:10:32 +0000 (GMT) + Reply-To: timoh@xs4all.nl + + [...parts deleted...] + + About the jumpers: On my PC130 there is one more jumper, located near the + cable-connector and it's for changing to star or bus topology; + closed: star - open: bus + On the PC500 are some more jumper-pins, one block labeled with RX,PDN,TXI + and another with ALE,LA17,LA18,LA19 these are undocumented.. + + [...more parts deleted...] + + --- CUT --- + + +** Standard Microsystems Corp (SMC) ** +PC100, PC110, PC120, PC130 (8-bit cards) +PC500, PC600 (16-bit cards) +--------------------------------- + - mainly from Avery Pennarun <apenwarr@worldvisions.ca>. Values depicted + are from Avery's setup. + - special thanks to Timo Hilbrink <timoh@xs4all.nl> for noting that PC120, + 130, 500, and 600 all have the same switches as Avery's PC100. + PC500/600 have several extra, undocumented pins though. (?) + - PC110 settings were verified by Stephen A. Wood <saw@cebaf.gov> + - Also, the JP- and S-numbers probably don't match your card exactly. Try + to find jumpers/switches with the same number of settings - it's + probably more reliable. + + + JP5 [|] : : : : +(IRQ Setting) IRQ2 IRQ3 IRQ4 IRQ5 IRQ7 + Put exactly one jumper on exactly one set of pins. + + + 1 2 3 4 5 6 7 8 9 10 + S1 /----------------------------------\ +(I/O and Memory | 1 1 * 0 0 0 0 * 1 1 0 1 | + addresses) \----------------------------------/ + |--| |--------| |--------| + (a) (b) (m) + + WARNING. It's very important when setting these which way + you're holding the card, and which way you think is '1'! + + If you suspect that your settings are not being made + correctly, try reversing the direction or inverting the + switch positions. + + a: The first digit of the I/O address. + Setting Value + ------- ----- + 00 0 + 01 1 + 10 2 + 11 3 + + b: The second digit of the I/O address. + Setting Value + ------- ----- + 0000 0 + 0001 1 + 0010 2 + ... ... + 1110 E + 1111 F + + The I/O address is in the form ab0. For example, if + a is 0x2 and b is 0xE, the address will be 0x2E0. + + DO NOT SET THIS LESS THAN 0x200!!!!! + + + m: The first digit of the memory address. + Setting Value + ------- ----- + 0000 0 + 0001 1 + 0010 2 + ... ... + 1110 E + 1111 F + + The memory address is in the form m0000. For example, if + m is D, the address will be 0xD0000. + + DO NOT SET THIS TO C0000, F0000, OR LESS THAN A0000! + + 1 2 3 4 5 6 7 8 + S2 /--------------------------\ +(Station Address) | 1 1 0 0 0 0 0 0 | + \--------------------------/ + + Setting Value + ------- ----- + 00000000 00 + 10000000 01 + 01000000 02 + ... + 01111111 FE + 11111111 FF + + Note that this is binary with the digits reversed! + + DO NOT SET THIS TO 0 OR 255 (0xFF)! + + +***************************************************************************** + +** Standard Microsystems Corp (SMC) ** +PC130E/PC270E (8-bit cards) +--------------------------- + - from Juergen Seifert <seifert@htwm.de> + + +STANDARD MICROSYSTEMS CORPORATION (SMC) ARCNET(R)-PC130E/PC270E +=============================================================== + +This description has been written by Juergen Seifert <seifert@htwm.de> +using information from the following Original SMC Manual + + "Configuration Guide for + ARCNET(R)-PC130E/PC270 + Network Controller Boards + Pub. # 900.044A + June, 1989" + +ARCNET is a registered trademark of the Datapoint Corporation +SMC is a registered trademark of the Standard Microsystems Corporation + +The PC130E is an enhanced version of the PC130 board, is equipped with a +standard BNC female connector for connection to RG-62/U coax cable. +Since this board is designed both for point-to-point connection in star +networks and for connection to bus networks, it is downwardly compatible +with all the other standard boards designed for coax networks (that is, +the PC120, PC110 and PC100 star topology boards and the PC220, PC210 and +PC200 bus topology boards). + +The PC270E is an enhanced version of the PC260 board, is equipped with two +modular RJ11-type jacks for connection to twisted pair wiring. +It can be used in a star or a daisy-chained network. + + + 8 7 6 5 4 3 2 1 + ________________________________________________________________ + | | S1 | | + | |_________________| | + | Offs|Base |I/O Addr | + | RAM Addr | ___| + | ___ ___ CR3 |___| + | | \/ | CR4 |___| + | | PROM | ___| + | | | N | | 8 + | | SOCKET | o | | 7 + | |________| d | | 6 + | ___________________ e | | 5 + | | | A | S | 4 + | |oo| EXT2 | | d | 2 | 3 + | |oo| EXT1 | SMC | d | | 2 + | |oo| ROM | 90C63 | r |___| 1 + | |oo| IRQ7 | | |o| _____| + | |oo| IRQ5 | | |o| | J1 | + | |oo| IRQ4 | | STAR |_____| + | |oo| IRQ3 | | | J2 | + | |oo| IRQ2 |___________________| |_____| + |___ ______________| + | | + |_____________________________________________| + +Legend: + +SMC 90C63 ARCNET Controller / Transceiver /Logic +S1 1-3: I/O Base Address Select + 4-6: Memory Base Address Select + 7-8: RAM Offset Select +S2 1-8: Node ID Select +EXT Extended Timeout Select +ROM ROM Enable Select +STAR Selected - Star Topology (PC130E only) + Deselected - Bus Topology (PC130E only) +CR3/CR4 Diagnostic LEDs +J1 BNC RG62/U Connector (PC130E only) +J1 6-position Telephone Jack (PC270E only) +J2 6-position Telephone Jack (PC270E only) + +Setting one of the switches to Off/Open means "1", On/Closed means "0". + + +Setting the Node ID +------------------- + +The eight switches in group S2 are used to set the node ID. +These switches work in a way similar to the PC100-series cards; see that +entry for more information. + + +Setting the I/O Base Address +---------------------------- + +The first three switches in switch group S1 are used to select one +of eight possible I/O Base addresses using the following table + + + Switch | Hex I/O + 1 2 3 | Address + -------|-------- + 0 0 0 | 260 + 0 0 1 | 290 + 0 1 0 | 2E0 (Manufacturer's default) + 0 1 1 | 2F0 + 1 0 0 | 300 + 1 0 1 | 350 + 1 1 0 | 380 + 1 1 1 | 3E0 + + +Setting the Base Memory (RAM) buffer Address +-------------------------------------------- + +The memory buffer requires 2K of a 16K block of RAM. The base of this +16K block can be located in any of eight positions. +Switches 4-6 of switch group S1 select the Base of the 16K block. +Within that 16K address space, the buffer may be assigned any one of four +positions, determined by the offset, switches 7 and 8 of group S1. + + Switch | Hex RAM | Hex ROM + 4 5 6 7 8 | Address | Address *) + -----------|---------|----------- + 0 0 0 0 0 | C0000 | C2000 + 0 0 0 0 1 | C0800 | C2000 + 0 0 0 1 0 | C1000 | C2000 + 0 0 0 1 1 | C1800 | C2000 + | | + 0 0 1 0 0 | C4000 | C6000 + 0 0 1 0 1 | C4800 | C6000 + 0 0 1 1 0 | C5000 | C6000 + 0 0 1 1 1 | C5800 | C6000 + | | + 0 1 0 0 0 | CC000 | CE000 + 0 1 0 0 1 | CC800 | CE000 + 0 1 0 1 0 | CD000 | CE000 + 0 1 0 1 1 | CD800 | CE000 + | | + 0 1 1 0 0 | D0000 | D2000 (Manufacturer's default) + 0 1 1 0 1 | D0800 | D2000 + 0 1 1 1 0 | D1000 | D2000 + 0 1 1 1 1 | D1800 | D2000 + | | + 1 0 0 0 0 | D4000 | D6000 + 1 0 0 0 1 | D4800 | D6000 + 1 0 0 1 0 | D5000 | D6000 + 1 0 0 1 1 | D5800 | D6000 + | | + 1 0 1 0 0 | D8000 | DA000 + 1 0 1 0 1 | D8800 | DA000 + 1 0 1 1 0 | D9000 | DA000 + 1 0 1 1 1 | D9800 | DA000 + | | + 1 1 0 0 0 | DC000 | DE000 + 1 1 0 0 1 | DC800 | DE000 + 1 1 0 1 0 | DD000 | DE000 + 1 1 0 1 1 | DD800 | DE000 + | | + 1 1 1 0 0 | E0000 | E2000 + 1 1 1 0 1 | E0800 | E2000 + 1 1 1 1 0 | E1000 | E2000 + 1 1 1 1 1 | E1800 | E2000 + +*) To enable the 8K Boot PROM install the jumper ROM. + The default is jumper ROM not installed. + + +Setting the Timeouts and Interrupt +---------------------------------- + +The jumpers labeled EXT1 and EXT2 are used to determine the timeout +parameters. These two jumpers are normally left open. + +To select a hardware interrupt level set one (only one!) of the jumpers +IRQ2, IRQ3, IRQ4, IRQ5, IRQ7. The Manufacturer's default is IRQ2. + + +Configuring the PC130E for Star or Bus Topology +----------------------------------------------- + +The single jumper labeled STAR is used to configure the PC130E board for +star or bus topology. +When the jumper is installed, the board may be used in a star network, when +it is removed, the board can be used in a bus topology. + + +Diagnostic LEDs +--------------- + +Two diagnostic LEDs are visible on the rear bracket of the board. +The green LED monitors the network activity: the red one shows the +board activity: + + Green | Status Red | Status + -------|------------------- ---------|------------------- + on | normal activity flash/on | data transfer + blink | reconfiguration off | no data transfer; + off | defective board or | incorrect memory or + | node ID is zero | I/O address + + +***************************************************************************** + +** Standard Microsystems Corp (SMC) ** +PC500/PC550 Longboard (16-bit cards) +------------------------------------- + - from Juergen Seifert <seifert@htwm.de> + + +STANDARD MICROSYSTEMS CORPORATION (SMC) ARCNET-PC500/PC550 Long Board +===================================================================== + +Note: There is another Version of the PC500 called Short Version, which + is different in hard- and software! The most important differences + are: + - The long board has no Shared memory. + - On the long board the selection of the interrupt is done by binary + coded switch, on the short board directly by jumper. + +[Avery's note: pay special attention to that: the long board HAS NO SHARED +MEMORY. This means the current Linux-ARCnet driver can't use these cards. +I have obtained a PC500Longboard and will be doing some experiments on it in +the future, but don't hold your breath. Thanks again to Juergen Seifert for +his advice about this!] + +This description has been written by Juergen Seifert <seifert@htwm.de> +using information from the following Original SMC Manual + + "Configuration Guide for + SMC ARCNET-PC500/PC550 + Series Network Controller Boards + Pub. # 900.033 Rev. A + November, 1989" + +ARCNET is a registered trademark of the Datapoint Corporation +SMC is a registered trademark of the Standard Microsystems Corporation + +The PC500 is equipped with a standard BNC female connector for connection +to RG-62/U coax cable. +The board is designed both for point-to-point connection in star networks +and for connection to bus networks. + +The PC550 is equipped with two modular RJ11-type jacks for connection +to twisted pair wiring. +It can be used in a star or a daisy-chained (BUS) network. + + 1 + 0 9 8 7 6 5 4 3 2 1 6 5 4 3 2 1 + ____________________________________________________________________ + < | SW1 | | SW2 | | + > |_____________________| |_____________| | + < IRQ |I/O Addr | + > ___| + < CR4 |___| + > CR3 |___| + < ___| + > N | | 8 + < o | | 7 + > d | S | 6 + < e | W | 5 + > A | 3 | 4 + < d | | 3 + > d | | 2 + < r |___| 1 + > |o| _____| + < |o| | J1 | + > 3 1 JP6 |_____| + < |o|o| JP2 | J2 | + > |o|o| |_____| + < 4 2__ ______________| + > | | | + <____| |_____________________________________________| + +Legend: + +SW1 1-6: I/O Base Address Select + 7-10: Interrupt Select +SW2 1-6: Reserved for Future Use +SW3 1-8: Node ID Select +JP2 1-4: Extended Timeout Select +JP6 Selected - Star Topology (PC500 only) + Deselected - Bus Topology (PC500 only) +CR3 Green Monitors Network Activity +CR4 Red Monitors Board Activity +J1 BNC RG62/U Connector (PC500 only) +J1 6-position Telephone Jack (PC550 only) +J2 6-position Telephone Jack (PC550 only) + +Setting one of the switches to Off/Open means "1", On/Closed means "0". + + +Setting the Node ID +------------------- + +The eight switches in group SW3 are used to set the node ID. Each node +attached to the network must have an unique node ID which must be +different from 0. +Switch 1 serves as the least significant bit (LSB). + +The node ID is the sum of the values of all switches set to "1" +These values are: + + Switch | Value + -------|------- + 1 | 1 + 2 | 2 + 3 | 4 + 4 | 8 + 5 | 16 + 6 | 32 + 7 | 64 + 8 | 128 + +Some Examples: + + Switch | Hex | Decimal + 8 7 6 5 4 3 2 1 | Node ID | Node ID + ----------------|---------|--------- + 0 0 0 0 0 0 0 0 | not allowed + 0 0 0 0 0 0 0 1 | 1 | 1 + 0 0 0 0 0 0 1 0 | 2 | 2 + 0 0 0 0 0 0 1 1 | 3 | 3 + . . . | | + 0 1 0 1 0 1 0 1 | 55 | 85 + . . . | | + 1 0 1 0 1 0 1 0 | AA | 170 + . . . | | + 1 1 1 1 1 1 0 1 | FD | 253 + 1 1 1 1 1 1 1 0 | FE | 254 + 1 1 1 1 1 1 1 1 | FF | 255 + + +Setting the I/O Base Address +---------------------------- + +The first six switches in switch group SW1 are used to select one +of 32 possible I/O Base addresses using the following table + + Switch | Hex I/O + 6 5 4 3 2 1 | Address + -------------|-------- + 0 1 0 0 0 0 | 200 + 0 1 0 0 0 1 | 210 + 0 1 0 0 1 0 | 220 + 0 1 0 0 1 1 | 230 + 0 1 0 1 0 0 | 240 + 0 1 0 1 0 1 | 250 + 0 1 0 1 1 0 | 260 + 0 1 0 1 1 1 | 270 + 0 1 1 0 0 0 | 280 + 0 1 1 0 0 1 | 290 + 0 1 1 0 1 0 | 2A0 + 0 1 1 0 1 1 | 2B0 + 0 1 1 1 0 0 | 2C0 + 0 1 1 1 0 1 | 2D0 + 0 1 1 1 1 0 | 2E0 (Manufacturer's default) + 0 1 1 1 1 1 | 2F0 + 1 1 0 0 0 0 | 300 + 1 1 0 0 0 1 | 310 + 1 1 0 0 1 0 | 320 + 1 1 0 0 1 1 | 330 + 1 1 0 1 0 0 | 340 + 1 1 0 1 0 1 | 350 + 1 1 0 1 1 0 | 360 + 1 1 0 1 1 1 | 370 + 1 1 1 0 0 0 | 380 + 1 1 1 0 0 1 | 390 + 1 1 1 0 1 0 | 3A0 + 1 1 1 0 1 1 | 3B0 + 1 1 1 1 0 0 | 3C0 + 1 1 1 1 0 1 | 3D0 + 1 1 1 1 1 0 | 3E0 + 1 1 1 1 1 1 | 3F0 + + +Setting the Interrupt +--------------------- + +Switches seven through ten of switch group SW1 are used to select the +interrupt level. The interrupt level is binary coded, so selections +from 0 to 15 would be possible, but only the following eight values will +be supported: 3, 4, 5, 7, 9, 10, 11, 12. + + Switch | IRQ + 10 9 8 7 | + ---------|-------- + 0 0 1 1 | 3 + 0 1 0 0 | 4 + 0 1 0 1 | 5 + 0 1 1 1 | 7 + 1 0 0 1 | 9 (=2) (default) + 1 0 1 0 | 10 + 1 0 1 1 | 11 + 1 1 0 0 | 12 + + +Setting the Timeouts +-------------------- + +The two jumpers JP2 (1-4) are used to determine the timeout parameters. +These two jumpers are normally left open. +Refer to the COM9026 Data Sheet for alternate configurations. + + +Configuring the PC500 for Star or Bus Topology +---------------------------------------------- + +The single jumper labeled JP6 is used to configure the PC500 board for +star or bus topology. +When the jumper is installed, the board may be used in a star network, when +it is removed, the board can be used in a bus topology. + + +Diagnostic LEDs +--------------- + +Two diagnostic LEDs are visible on the rear bracket of the board. +The green LED monitors the network activity: the red one shows the +board activity: + + Green | Status Red | Status + -------|------------------- ---------|------------------- + on | normal activity flash/on | data transfer + blink | reconfiguration off | no data transfer; + off | defective board or | incorrect memory or + | node ID is zero | I/O address + + +***************************************************************************** + +** SMC ** +PC710 (8-bit card) +------------------ + - from J.S. van Oosten <jvoosten@compiler.tdcnet.nl> + +Note: this data is gathered by experimenting and looking at info of other +cards. However, I'm sure I got 99% of the settings right. + +The SMC710 card resembles the PC270 card, but is much more basic (i.e. no +LEDs, RJ11 jacks, etc.) and 8 bit. Here's a little drawing: + + _______________________________________ + | +---------+ +---------+ |____ + | | S2 | | S1 | | + | +---------+ +---------+ | + | | + | +===+ __ | + | | R | | | X-tal ###___ + | | O | |__| ####__'| + | | M | || ### + | +===+ | + | | + | .. JP1 +----------+ | + | .. | big chip | | + | .. | 90C63 | | + | .. | | | + | .. +----------+ | + ------- ----------- + ||||||||||||||||||||| + +The row of jumpers at JP1 actually consists of 8 jumpers, (sometimes +labelled) the same as on the PC270, from top to bottom: EXT2, EXT1, ROM, +IRQ7, IRQ5, IRQ4, IRQ3, IRQ2 (gee, wonder what they would do? :-) ) + +S1 and S2 perform the same function as on the PC270, only their numbers +are swapped (S1 is the nodeaddress, S2 sets IO- and RAM-address). + +I know it works when connected to a PC110 type ARCnet board. + + +***************************************************************************** + +** Possibly SMC ** +LCS-8830(-T) (8 and 16-bit cards) +--------------------------------- + - from Mathias Katzer <mkatzer@HRZ.Uni-Bielefeld.DE> + - Marek Michalkiewicz <marekm@i17linuxb.ists.pwr.wroc.pl> says the + LCS-8830 is slightly different from LCS-8830-T. These are 8 bit, BUS + only (the JP0 jumper is hardwired), and BNC only. + +This is a LCS-8830-T made by SMC, I think ('SMC' only appears on one PLCC, +nowhere else, not even on the few Xeroxed sheets from the manual). + +SMC ARCnet Board Type LCS-8830-T + + ------------------------------------ + | | + | JP3 88 8 JP2 | + | ##### | \ | + | ##### ET1 ET2 ###| + | 8 ###| + | U3 SW 1 JP0 ###| Phone Jacks + | -- ###| + | | | | + | | | SW2 | + | | | | + | | | ##### | + | -- ##### #### BNC Connector + | #### + | 888888 JP1 | + | 234567 | + -- ------- + ||||||||||||||||||||||||||| + -------------------------- + + +SW1: DIP-Switches for Station Address +SW2: DIP-Switches for Memory Base and I/O Base addresses + +JP0: If closed, internal termination on (default open) +JP1: IRQ Jumpers +JP2: Boot-ROM enabled if closed +JP3: Jumpers for response timeout + +U3: Boot-ROM Socket + + +ET1 ET2 Response Time Idle Time Reconfiguration Time + + 78 86 840 + X 285 316 1680 + X 563 624 1680 + X X 1130 1237 1680 + +(X means closed jumper) + +(DIP-Switch downwards means "0") + +The station address is binary-coded with SW1. + +The I/O base address is coded with DIP-Switches 6,7 and 8 of SW2: + +Switches Base +678 Address +000 260-26f +100 290-29f +010 2e0-2ef +110 2f0-2ff +001 300-30f +101 350-35f +011 380-38f +111 3e0-3ef + + +DIP Switches 1-5 of SW2 encode the RAM and ROM Address Range: + +Switches RAM ROM +12345 Address Range Address Range +00000 C:0000-C:07ff C:2000-C:3fff +10000 C:0800-C:0fff +01000 C:1000-C:17ff +11000 C:1800-C:1fff +00100 C:4000-C:47ff C:6000-C:7fff +10100 C:4800-C:4fff +01100 C:5000-C:57ff +11100 C:5800-C:5fff +00010 C:C000-C:C7ff C:E000-C:ffff +10010 C:C800-C:Cfff +01010 C:D000-C:D7ff +11010 C:D800-C:Dfff +00110 D:0000-D:07ff D:2000-D:3fff +10110 D:0800-D:0fff +01110 D:1000-D:17ff +11110 D:1800-D:1fff +00001 D:4000-D:47ff D:6000-D:7fff +10001 D:4800-D:4fff +01001 D:5000-D:57ff +11001 D:5800-D:5fff +00101 D:8000-D:87ff D:A000-D:bfff +10101 D:8800-D:8fff +01101 D:9000-D:97ff +11101 D:9800-D:9fff +00011 D:C000-D:c7ff D:E000-D:ffff +10011 D:C800-D:cfff +01011 D:D000-D:d7ff +11011 D:D800-D:dfff +00111 E:0000-E:07ff E:2000-E:3fff +10111 E:0800-E:0fff +01111 E:1000-E:17ff +11111 E:1800-E:1fff + + +***************************************************************************** + +** PureData Corp ** +PDI507 (8-bit card) +-------------------- + - from Mark Rejhon <mdrejhon@magi.com> (slight modifications by Avery) + - Avery's note: I think PDI508 cards (but definitely NOT PDI508Plus cards) + are mostly the same as this. PDI508Plus cards appear to be mainly + software-configured. + +Jumpers: + There is a jumper array at the bottom of the card, near the edge + connector. This array is labelled J1. They control the IRQs and + something else. Put only one jumper on the IRQ pins. + + ETS1, ETS2 are for timing on very long distance networks. See the + more general information near the top of this file. + + There is a J2 jumper on two pins. A jumper should be put on them, + since it was already there when I got the card. I don't know what + this jumper is for though. + + There is a two-jumper array for J3. I don't know what it is for, + but there were already two jumpers on it when I got the card. It's + a six pin grid in a two-by-three fashion. The jumpers were + configured as follows: + + .-------. + o | o o | + :-------: ------> Accessible end of card with connectors + o | o o | in this direction -------> + `-------' + +Carl de Billy <CARL@carainfo.com> explains J3 and J4: + + J3 Diagram: + + .-------. + o | o o | + :-------: TWIST Technology + o | o o | + `-------' + .-------. + | o o | o + :-------: COAX Technology + | o o | o + `-------' + + - If using coax cable in a bus topology the J4 jumper must be removed; + place it on one pin. + + - If using bus topology with twisted pair wiring move the J3 + jumpers so they connect the middle pin and the pins closest to the RJ11 + Connectors. Also the J4 jumper must be removed; place it on one pin of + J4 jumper for storage. + + - If using star topology with twisted pair wiring move the J3 + jumpers so they connect the middle pin and the pins closest to the RJ11 + connectors. + + +DIP Switches: + + The DIP switches accessible on the accessible end of the card while + it is installed, is used to set the ARCnet address. There are 8 + switches. Use an address from 1 to 254. + + Switch No. + 12345678 ARCnet address + ----------------------------------------- + 00000000 FF (Don't use this!) + 00000001 FE + 00000010 FD + .... + 11111101 2 + 11111110 1 + 11111111 0 (Don't use this!) + + There is another array of eight DIP switches at the top of the + card. There are five labelled MS0-MS4 which seem to control the + memory address, and another three labelled IO0-IO2 which seem to + control the base I/O address of the card. + + This was difficult to test by trial and error, and the I/O addresses + are in a weird order. This was tested by setting the DIP switches, + rebooting the computer, and attempting to load ARCETHER at various + addresses (mostly between 0x200 and 0x400). The address that caused + the red transmit LED to blink, is the one that I thought works. + + Also, the address 0x3D0 seem to have a special meaning, since the + ARCETHER packet driver loaded fine, but without the red LED + blinking. I don't know what 0x3D0 is for though. I recommend using + an address of 0x300 since Windows may not like addresses below + 0x300. + + IO Switch No. + 210 I/O address + ------------------------------- + 111 0x260 + 110 0x290 + 101 0x2E0 + 100 0x2F0 + 011 0x300 + 010 0x350 + 001 0x380 + 000 0x3E0 + + The memory switches set a reserved address space of 0x1000 bytes + (0x100 segment units, or 4k). For example if I set an address of + 0xD000, it will use up addresses 0xD000 to 0xD100. + + The memory switches were tested by booting using QEMM386 stealth, + and using LOADHI to see what address automatically became excluded + from the upper memory regions, and then attempting to load ARCETHER + using these addresses. + + I recommend using an ARCnet memory address of 0xD000, and putting + the EMS page frame at 0xC000 while using QEMM stealth mode. That + way, you get contiguous high memory from 0xD100 almost all the way + the end of the megabyte. + + Memory Switch 0 (MS0) didn't seem to work properly when set to OFF + on my card. It could be malfunctioning on my card. Experiment with + it ON first, and if it doesn't work, set it to OFF. (It may be a + modifier for the 0x200 bit?) + + MS Switch No. + 43210 Memory address + -------------------------------- + 00001 0xE100 (guessed - was not detected by QEMM) + 00011 0xE000 (guessed - was not detected by QEMM) + 00101 0xDD00 + 00111 0xDC00 + 01001 0xD900 + 01011 0xD800 + 01101 0xD500 + 01111 0xD400 + 10001 0xD100 + 10011 0xD000 + 10101 0xCD00 + 10111 0xCC00 + 11001 0xC900 (guessed - crashes tested system) + 11011 0xC800 (guessed - crashes tested system) + 11101 0xC500 (guessed - crashes tested system) + 11111 0xC400 (guessed - crashes tested system) + + +***************************************************************************** + +** CNet Technology Inc. ** +120 Series (8-bit cards) +------------------------ + - from Juergen Seifert <seifert@htwm.de> + + +CNET TECHNOLOGY INC. (CNet) ARCNET 120A SERIES +============================================== + +This description has been written by Juergen Seifert <seifert@htwm.de> +using information from the following Original CNet Manual + + "ARCNET + USER'S MANUAL + for + CN120A + CN120AB + CN120TP + CN120ST + CN120SBT + P/N:12-01-0007 + Revision 3.00" + +ARCNET is a registered trademark of the Datapoint Corporation + +P/N 120A ARCNET 8 bit XT/AT Star +P/N 120AB ARCNET 8 bit XT/AT Bus +P/N 120TP ARCNET 8 bit XT/AT Twisted Pair +P/N 120ST ARCNET 8 bit XT/AT Star, Twisted Pair +P/N 120SBT ARCNET 8 bit XT/AT Star, Bus, Twisted Pair + + __________________________________________________________________ + | | + | ___| + | LED |___| + | ___| + | N | | ID7 + | o | | ID6 + | d | S | ID5 + | e | W | ID4 + | ___________________ A | 2 | ID3 + | | | d | | ID2 + | | | 1 2 3 4 5 6 7 8 d | | ID1 + | | | _________________ r |___| ID0 + | | 90C65 || SW1 | ____| + | JP 8 7 | ||_________________| | | + | |o|o| JP1 | | | J2 | + | |o|o| |oo| | | JP 1 1 1 | | + | ______________ | | 0 1 2 |____| + | | PROM | |___________________| |o|o|o| _____| + | > SOCKET | JP 6 5 4 3 2 |o|o|o| | J1 | + | |______________| |o|o|o|o|o| |o|o|o| |_____| + |_____ |o|o|o|o|o| ______________| + | | + |_____________________________________________| + +Legend: + +90C65 ARCNET Probe +S1 1-5: Base Memory Address Select + 6-8: Base I/O Address Select +S2 1-8: Node ID Select (ID0-ID7) +JP1 ROM Enable Select +JP2 IRQ2 +JP3 IRQ3 +JP4 IRQ4 +JP5 IRQ5 +JP6 IRQ7 +JP7/JP8 ET1, ET2 Timeout Parameters +JP10/JP11 Coax / Twisted Pair Select (CN120ST/SBT only) +JP12 Terminator Select (CN120AB/ST/SBT only) +J1 BNC RG62/U Connector (all except CN120TP) +J2 Two 6-position Telephone Jack (CN120TP/ST/SBT only) + +Setting one of the switches to Off means "1", On means "0". + + +Setting the Node ID +------------------- + +The eight switches in SW2 are used to set the node ID. Each node attached +to the network must have an unique node ID which must be different from 0. +Switch 1 (ID0) serves as the least significant bit (LSB). + +The node ID is the sum of the values of all switches set to "1" +These values are: + + Switch | Label | Value + -------|-------|------- + 1 | ID0 | 1 + 2 | ID1 | 2 + 3 | ID2 | 4 + 4 | ID3 | 8 + 5 | ID4 | 16 + 6 | ID5 | 32 + 7 | ID6 | 64 + 8 | ID7 | 128 + +Some Examples: + + Switch | Hex | Decimal + 8 7 6 5 4 3 2 1 | Node ID | Node ID + ----------------|---------|--------- + 0 0 0 0 0 0 0 0 | not allowed + 0 0 0 0 0 0 0 1 | 1 | 1 + 0 0 0 0 0 0 1 0 | 2 | 2 + 0 0 0 0 0 0 1 1 | 3 | 3 + . . . | | + 0 1 0 1 0 1 0 1 | 55 | 85 + . . . | | + 1 0 1 0 1 0 1 0 | AA | 170 + . . . | | + 1 1 1 1 1 1 0 1 | FD | 253 + 1 1 1 1 1 1 1 0 | FE | 254 + 1 1 1 1 1 1 1 1 | FF | 255 + + +Setting the I/O Base Address +---------------------------- + +The last three switches in switch block SW1 are used to select one +of eight possible I/O Base addresses using the following table + + + Switch | Hex I/O + 6 7 8 | Address + ------------|-------- + ON ON ON | 260 + OFF ON ON | 290 + ON OFF ON | 2E0 (Manufacturer's default) + OFF OFF ON | 2F0 + ON ON OFF | 300 + OFF ON OFF | 350 + ON OFF OFF | 380 + OFF OFF OFF | 3E0 + + +Setting the Base Memory (RAM) buffer Address +-------------------------------------------- + +The memory buffer (RAM) requires 2K. The base of this buffer can be +located in any of eight positions. The address of the Boot Prom is +memory base + 8K or memory base + 0x2000. +Switches 1-5 of switch block SW1 select the Memory Base address. + + Switch | Hex RAM | Hex ROM + 1 2 3 4 5 | Address | Address *) + --------------------|---------|----------- + ON ON ON ON ON | C0000 | C2000 + ON ON OFF ON ON | C4000 | C6000 + ON ON ON OFF ON | CC000 | CE000 + ON ON OFF OFF ON | D0000 | D2000 (Manufacturer's default) + ON ON ON ON OFF | D4000 | D6000 + ON ON OFF ON OFF | D8000 | DA000 + ON ON ON OFF OFF | DC000 | DE000 + ON ON OFF OFF OFF | E0000 | E2000 + +*) To enable the Boot ROM install the jumper JP1 + +Note: Since the switches 1 and 2 are always set to ON it may be possible + that they can be used to add an offset of 2K, 4K or 6K to the base + address, but this feature is not documented in the manual and I + haven't tested it yet. + + +Setting the Interrupt Line +-------------------------- + +To select a hardware interrupt level install one (only one!) of the jumpers +JP2, JP3, JP4, JP5, JP6. JP2 is the default. + + Jumper | IRQ + -------|----- + 2 | 2 + 3 | 3 + 4 | 4 + 5 | 5 + 6 | 7 + + +Setting the Internal Terminator on CN120AB/TP/SBT +-------------------------------------------------- + +The jumper JP12 is used to enable the internal terminator. + + ----- + 0 | 0 | + ----- ON | | ON + | 0 | | 0 | + | | OFF ----- OFF + | 0 | 0 + ----- + Terminator Terminator + disabled enabled + + +Selecting the Connector Type on CN120ST/SBT +------------------------------------------- + + JP10 JP11 JP10 JP11 + ----- ----- + 0 0 | 0 | | 0 | + ----- ----- | | | | + | 0 | | 0 | | 0 | | 0 | + | | | | ----- ----- + | 0 | | 0 | 0 0 + ----- ----- + Coaxial Cable Twisted Pair Cable + (Default) + + +Setting the Timeout Parameters +------------------------------ + +The jumpers labeled EXT1 and EXT2 are used to determine the timeout +parameters. These two jumpers are normally left open. + + + +***************************************************************************** + +** CNet Technology Inc. ** +160 Series (16-bit cards) +------------------------- + - from Juergen Seifert <seifert@htwm.de> + +CNET TECHNOLOGY INC. (CNet) ARCNET 160A SERIES +============================================== + +This description has been written by Juergen Seifert <seifert@htwm.de> +using information from the following Original CNet Manual + + "ARCNET + USER'S MANUAL + for + CN160A + CN160AB + CN160TP + P/N:12-01-0006 + Revision 3.00" + +ARCNET is a registered trademark of the Datapoint Corporation + +P/N 160A ARCNET 16 bit XT/AT Star +P/N 160AB ARCNET 16 bit XT/AT Bus +P/N 160TP ARCNET 16 bit XT/AT Twisted Pair + + ___________________________________________________________________ + < _________________________ ___| + > |oo| JP2 | | LED |___| + < |oo| JP1 | 9026 | LED |___| + > |_________________________| ___| + < N | | ID7 + > 1 o | | ID6 + < 1 2 3 4 5 6 7 8 9 0 d | S | ID5 + > _______________ _____________________ e | W | ID4 + < | PROM | | SW1 | A | 2 | ID3 + > > SOCKET | |_____________________| d | | ID2 + < |_______________| | IO-Base | MEM | d | | ID1 + > r |___| ID0 + < ____| + > | | + < | J1 | + > | | + < |____| + > 1 1 1 1 | + < 3 4 5 6 7 JP 8 9 0 1 2 3 | + > |o|o|o|o|o| |o|o|o|o|o|o| | + < |o|o|o|o|o| __ |o|o|o|o|o|o| ___________| + > | | | + <____________| |_______________________________________| + +Legend: + +9026 ARCNET Probe +SW1 1-6: Base I/O Address Select + 7-10: Base Memory Address Select +SW2 1-8: Node ID Select (ID0-ID7) +JP1/JP2 ET1, ET2 Timeout Parameters +JP3-JP13 Interrupt Select +J1 BNC RG62/U Connector (CN160A/AB only) +J1 Two 6-position Telephone Jack (CN160TP only) +LED + +Setting one of the switches to Off means "1", On means "0". + + +Setting the Node ID +------------------- + +The eight switches in SW2 are used to set the node ID. Each node attached +to the network must have an unique node ID which must be different from 0. +Switch 1 (ID0) serves as the least significant bit (LSB). + +The node ID is the sum of the values of all switches set to "1" +These values are: + + Switch | Label | Value + -------|-------|------- + 1 | ID0 | 1 + 2 | ID1 | 2 + 3 | ID2 | 4 + 4 | ID3 | 8 + 5 | ID4 | 16 + 6 | ID5 | 32 + 7 | ID6 | 64 + 8 | ID7 | 128 + +Some Examples: + + Switch | Hex | Decimal + 8 7 6 5 4 3 2 1 | Node ID | Node ID + ----------------|---------|--------- + 0 0 0 0 0 0 0 0 | not allowed + 0 0 0 0 0 0 0 1 | 1 | 1 + 0 0 0 0 0 0 1 0 | 2 | 2 + 0 0 0 0 0 0 1 1 | 3 | 3 + . . . | | + 0 1 0 1 0 1 0 1 | 55 | 85 + . . . | | + 1 0 1 0 1 0 1 0 | AA | 170 + . . . | | + 1 1 1 1 1 1 0 1 | FD | 253 + 1 1 1 1 1 1 1 0 | FE | 254 + 1 1 1 1 1 1 1 1 | FF | 255 + + +Setting the I/O Base Address +---------------------------- + +The first six switches in switch block SW1 are used to select the I/O Base +address using the following table: + + Switch | Hex I/O + 1 2 3 4 5 6 | Address + ------------------------|-------- + OFF ON ON OFF OFF ON | 260 + OFF ON OFF ON ON OFF | 290 + OFF ON OFF OFF OFF ON | 2E0 (Manufacturer's default) + OFF ON OFF OFF OFF OFF | 2F0 + OFF OFF ON ON ON ON | 300 + OFF OFF ON OFF ON OFF | 350 + OFF OFF OFF ON ON ON | 380 + OFF OFF OFF OFF OFF ON | 3E0 + +Note: Other IO-Base addresses seem to be selectable, but only the above + combinations are documented. + + +Setting the Base Memory (RAM) buffer Address +-------------------------------------------- + +The switches 7-10 of switch block SW1 are used to select the Memory +Base address of the RAM (2K) and the PROM. + + Switch | Hex RAM | Hex ROM + 7 8 9 10 | Address | Address + ----------------|---------|----------- + OFF OFF ON ON | C0000 | C8000 + OFF OFF ON OFF | D0000 | D8000 (Default) + OFF OFF OFF ON | E0000 | E8000 + +Note: Other MEM-Base addresses seem to be selectable, but only the above + combinations are documented. + + +Setting the Interrupt Line +-------------------------- + +To select a hardware interrupt level install one (only one!) of the jumpers +JP3 through JP13 using the following table: + + Jumper | IRQ + -------|----------------- + 3 | 14 + 4 | 15 + 5 | 12 + 6 | 11 + 7 | 10 + 8 | 3 + 9 | 4 + 10 | 5 + 11 | 6 + 12 | 7 + 13 | 2 (=9) Default! + +Note: - Do not use JP11=IRQ6, it may conflict with your Floppy Disk + Controller + - Use JP3=IRQ14 only, if you don't have an IDE-, MFM-, or RLL- + Hard Disk, it may conflict with their controllers + + +Setting the Timeout Parameters +------------------------------ + +The jumpers labeled JP1 and JP2 are used to determine the timeout +parameters. These two jumpers are normally left open. + + +***************************************************************************** + +** Lantech ** +8-bit card, unknown model +------------------------- + - from Vlad Lungu <vlungu@ugal.ro> - his e-mail address seemed broken at + the time I tried to reach him. Sorry Vlad, if you didn't get my reply. + + ________________________________________________________________ + | 1 8 | + | ___________ __| + | | SW1 | LED |__| + | |__________| | + | ___| + | _____________________ |S | 8 + | | | |W | + | | | |2 | + | | | |__| 1 + | | UM9065L | |o| JP4 ____|____ + | | | |o| | CN | + | | | |________| + | | | | + | |___________________| | + | | + | | + | _____________ | + | | | | + | | PROM | |ooooo| JP6 | + | |____________| |ooooo| | + |_____________ _ _| + |____________________________________________| |__| + + +UM9065L : ARCnet Controller + +SW 1 : Shared Memory Address and I/O Base + + ON=0 + + 12345|Memory Address + -----|-------------- + 00001| D4000 + 00010| CC000 + 00110| D0000 + 01110| D1000 + 01101| D9000 + 10010| CC800 + 10011| DC800 + 11110| D1800 + +It seems that the bits are considered in reverse order. Also, you must +observe that some of those addresses are unusual and I didn't probe them; I +used a memory dump in DOS to identify them. For the 00000 configuration and +some others that I didn't write here the card seems to conflict with the +video card (an S3 GENDAC). I leave the full decoding of those addresses to +you. + + 678| I/O Address + ---|------------ + 000| 260 + 001| failed probe + 010| 2E0 + 011| 380 + 100| 290 + 101| 350 + 110| failed probe + 111| 3E0 + +SW 2 : Node ID (binary coded) + +JP 4 : Boot PROM enable CLOSE - enabled + OPEN - disabled + +JP 6 : IRQ set (ONLY ONE jumper on 1-5 for IRQ 2-6) + + +***************************************************************************** + +** Acer ** +8-bit card, Model 5210-003 +-------------------------- + - from Vojtech Pavlik <vojtech@suse.cz> using portions of the existing + arcnet-hardware file. + +This is a 90C26 based card. Its configuration seems similar to the SMC +PC100, but has some additional jumpers I don't know the meaning of. + + __ + | | + ___________|__|_________________________ + | | | | + | | BNC | | + | |______| ___| + | _____________________ |___ + | | | | + | | Hybrid IC | | + | | | o|o J1 | + | |_____________________| 8|8 | + | 8|8 J5 | + | o|o | + | 8|8 | + |__ 8|8 | + (|__| LED o|o | + | 8|8 | + | 8|8 J15 | + | | + | _____ | + | | | _____ | + | | | | | ___| + | | | | | | + | _____ | ROM | | UFS | | + | | | | | | | | + | | | ___ | | | | | + | | | | | |__.__| |__.__| | + | | NCR | |XTL| _____ _____ | + | | | |___| | | | | | + | |90C26| | | | | | + | | | | RAM | | UFS | | + | | | J17 o|o | | | | | + | | | J16 o|o | | | | | + | |__.__| |__.__| |__.__| | + | ___ | + | | |8 | + | |SW2| | + | | | | + | |___|1 | + | ___ | + | | |10 J18 o|o | + | | | o|o | + | |SW1| o|o | + | | | J21 o|o | + | |___|1 | + | | + |____________________________________| + + +Legend: + +90C26 ARCNET Chip +XTL 20 MHz Crystal +SW1 1-6 Base I/O Address Select + 7-10 Memory Address Select +SW2 1-8 Node ID Select (ID0-ID7) +J1-J5 IRQ Select +J6-J21 Unknown (Probably extra timeouts & ROM enable ...) +LED1 Activity LED +BNC Coax connector (STAR ARCnet) +RAM 2k of SRAM +ROM Boot ROM socket +UFS Unidentified Flying Sockets + + +Setting the Node ID +------------------- + +The eight switches in SW2 are used to set the node ID. Each node attached +to the network must have an unique node ID which must not be 0. +Switch 1 (ID0) serves as the least significant bit (LSB). + +Setting one of the switches to OFF means "1", ON means "0". + +The node ID is the sum of the values of all switches set to "1" +These values are: + + Switch | Value + -------|------- + 1 | 1 + 2 | 2 + 3 | 4 + 4 | 8 + 5 | 16 + 6 | 32 + 7 | 64 + 8 | 128 + +Don't set this to 0 or 255; these values are reserved. + + +Setting the I/O Base Address +---------------------------- + +The switches 1 to 6 of switch block SW1 are used to select one +of 32 possible I/O Base addresses using the following tables + + | Hex + Switch | Value + -------|------- + 1 | 200 + 2 | 100 + 3 | 80 + 4 | 40 + 5 | 20 + 6 | 10 + +The I/O address is sum of all switches set to "1". Remember that +the I/O address space bellow 0x200 is RESERVED for mainboard, so +switch 1 should be ALWAYS SET TO OFF. + + +Setting the Base Memory (RAM) buffer Address +-------------------------------------------- + +The memory buffer (RAM) requires 2K. The base of this buffer can be +located in any of sixteen positions. However, the addresses below +A0000 are likely to cause system hang because there's main RAM. + +Jumpers 7-10 of switch block SW1 select the Memory Base address. + + Switch | Hex RAM + 7 8 9 10 | Address + ----------------|--------- + OFF OFF OFF OFF | F0000 (conflicts with main BIOS) + OFF OFF OFF ON | E0000 + OFF OFF ON OFF | D0000 + OFF OFF ON ON | C0000 (conflicts with video BIOS) + OFF ON OFF OFF | B0000 (conflicts with mono video) + OFF ON OFF ON | A0000 (conflicts with graphics) + + +Setting the Interrupt Line +-------------------------- + +Jumpers 1-5 of the jumper block J1 control the IRQ level. ON means +shorted, OFF means open. + + Jumper | IRQ + 1 2 3 4 5 | + ---------------------------- + ON OFF OFF OFF OFF | 7 + OFF ON OFF OFF OFF | 5 + OFF OFF ON OFF OFF | 4 + OFF OFF OFF ON OFF | 3 + OFF OFF OFF OFF ON | 2 + + +Unknown jumpers & sockets +------------------------- + +I know nothing about these. I just guess that J16&J17 are timeout +jumpers and maybe one of J18-J21 selects ROM. Also J6-J10 and +J11-J15 are connecting IRQ2-7 to some pins on the UFSs. I can't +guess the purpose. + + +***************************************************************************** + +** Datapoint? ** +LAN-ARC-8, an 8-bit card +------------------------ + - from Vojtech Pavlik <vojtech@suse.cz> + +This is another SMC 90C65-based ARCnet card. I couldn't identify the +manufacturer, but it might be DataPoint, because the card has the +original arcNet logo in its upper right corner. + + _______________________________________________________ + | _________ | + | | SW2 | ON arcNet | + | |_________| OFF ___| + | _____________ 1 ______ 8 | | 8 + | | | SW1 | XTAL | ____________ | S | + | > RAM (2k) | |______|| | | W | + | |_____________| | H | | 3 | + | _________|_____ y | |___| 1 + | _________ | | |b | | + | |_________| | | |r | | + | | SMC | |i | | + | | 90C65| |d | | + | _________ | | | | | + | | SW1 | ON | | |I | | + | |_________| OFF |_________|_____/C | _____| + | 1 8 | | | |___ + | ______________ | | | BNC |___| + | | | |____________| |_____| + | > EPROM SOCKET | _____________ | + | |______________| |_____________| | + | ______________| + | | + |________________________________________| + +Legend: + +90C65 ARCNET Chip +SW1 1-5: Base Memory Address Select + 6-8: Base I/O Address Select +SW2 1-8: Node ID Select +SW3 1-5: IRQ Select + 6-7: Extra Timeout + 8 : ROM Enable +BNC Coax connector +XTAL 20 MHz Crystal + + +Setting the Node ID +------------------- + +The eight switches in SW3 are used to set the node ID. Each node attached +to the network must have an unique node ID which must not be 0. +Switch 1 serves as the least significant bit (LSB). + +Setting one of the switches to Off means "1", On means "0". + +The node ID is the sum of the values of all switches set to "1" +These values are: + + Switch | Value + -------|------- + 1 | 1 + 2 | 2 + 3 | 4 + 4 | 8 + 5 | 16 + 6 | 32 + 7 | 64 + 8 | 128 + + +Setting the I/O Base Address +---------------------------- + +The last three switches in switch block SW1 are used to select one +of eight possible I/O Base addresses using the following table + + + Switch | Hex I/O + 6 7 8 | Address + ------------|-------- + ON ON ON | 260 + OFF ON ON | 290 + ON OFF ON | 2E0 (Manufacturer's default) + OFF OFF ON | 2F0 + ON ON OFF | 300 + OFF ON OFF | 350 + ON OFF OFF | 380 + OFF OFF OFF | 3E0 + + +Setting the Base Memory (RAM) buffer Address +-------------------------------------------- + +The memory buffer (RAM) requires 2K. The base of this buffer can be +located in any of eight positions. The address of the Boot Prom is +memory base + 0x2000. +Jumpers 3-5 of switch block SW1 select the Memory Base address. + + Switch | Hex RAM | Hex ROM + 1 2 3 4 5 | Address | Address *) + --------------------|---------|----------- + ON ON ON ON ON | C0000 | C2000 + ON ON OFF ON ON | C4000 | C6000 + ON ON ON OFF ON | CC000 | CE000 + ON ON OFF OFF ON | D0000 | D2000 (Manufacturer's default) + ON ON ON ON OFF | D4000 | D6000 + ON ON OFF ON OFF | D8000 | DA000 + ON ON ON OFF OFF | DC000 | DE000 + ON ON OFF OFF OFF | E0000 | E2000 + +*) To enable the Boot ROM set the switch 8 of switch block SW3 to position ON. + +The switches 1 and 2 probably add 0x0800 and 0x1000 to RAM base address. + + +Setting the Interrupt Line +-------------------------- + +Switches 1-5 of the switch block SW3 control the IRQ level. + + Jumper | IRQ + 1 2 3 4 5 | + ---------------------------- + ON OFF OFF OFF OFF | 3 + OFF ON OFF OFF OFF | 4 + OFF OFF ON OFF OFF | 5 + OFF OFF OFF ON OFF | 7 + OFF OFF OFF OFF ON | 2 + + +Setting the Timeout Parameters +------------------------------ + +The switches 6-7 of the switch block SW3 are used to determine the timeout +parameters. These two switches are normally left in the OFF position. + + +***************************************************************************** + +** Topware ** +8-bit card, TA-ARC/10 +------------------------- + - from Vojtech Pavlik <vojtech@suse.cz> + +This is another very similar 90C65 card. Most of the switches and jumpers +are the same as on other clones. + + _____________________________________________________________________ +| ___________ | | ______ | +| |SW2 NODE ID| | | | XTAL | | +| |___________| | Hybrid IC | |______| | +| ___________ | | __| +| |SW1 MEM+I/O| |_________________________| LED1|__|) +| |___________| 1 2 | +| J3 |o|o| TIMEOUT ______| +| ______________ |o|o| | | +| | | ___________________ | RJ | +| > EPROM SOCKET | | \ |------| +|J2 |______________| | | | | +||o| | | |______| +||o| ROM ENABLE | SMC | _________ | +| _____________ | 90C65 | |_________| _____| +| | | | | | |___ +| > RAM (2k) | | | | BNC |___| +| |_____________| | | |_____| +| |____________________| | +| ________ IRQ 2 3 4 5 7 ___________ | +||________| |o|o|o|o|o| |___________| | +|________ J1|o|o|o|o|o| ______________| + | | + |_____________________________________________| + +Legend: + +90C65 ARCNET Chip +XTAL 20 MHz Crystal +SW1 1-5 Base Memory Address Select + 6-8 Base I/O Address Select +SW2 1-8 Node ID Select (ID0-ID7) +J1 IRQ Select +J2 ROM Enable +J3 Extra Timeout +LED1 Activity LED +BNC Coax connector (BUS ARCnet) +RJ Twisted Pair Connector (daisy chain) + + +Setting the Node ID +------------------- + +The eight switches in SW2 are used to set the node ID. Each node attached to +the network must have an unique node ID which must not be 0. Switch 1 (ID0) +serves as the least significant bit (LSB). + +Setting one of the switches to Off means "1", On means "0". + +The node ID is the sum of the values of all switches set to "1" +These values are: + + Switch | Label | Value + -------|-------|------- + 1 | ID0 | 1 + 2 | ID1 | 2 + 3 | ID2 | 4 + 4 | ID3 | 8 + 5 | ID4 | 16 + 6 | ID5 | 32 + 7 | ID6 | 64 + 8 | ID7 | 128 + +Setting the I/O Base Address +---------------------------- + +The last three switches in switch block SW1 are used to select one +of eight possible I/O Base addresses using the following table: + + + Switch | Hex I/O + 6 7 8 | Address + ------------|-------- + ON ON ON | 260 (Manufacturer's default) + OFF ON ON | 290 + ON OFF ON | 2E0 + OFF OFF ON | 2F0 + ON ON OFF | 300 + OFF ON OFF | 350 + ON OFF OFF | 380 + OFF OFF OFF | 3E0 + + +Setting the Base Memory (RAM) buffer Address +-------------------------------------------- + +The memory buffer (RAM) requires 2K. The base of this buffer can be +located in any of eight positions. The address of the Boot Prom is +memory base + 0x2000. +Jumpers 3-5 of switch block SW1 select the Memory Base address. + + Switch | Hex RAM | Hex ROM + 1 2 3 4 5 | Address | Address *) + --------------------|---------|----------- + ON ON ON ON ON | C0000 | C2000 + ON ON OFF ON ON | C4000 | C6000 (Manufacturer's default) + ON ON ON OFF ON | CC000 | CE000 + ON ON OFF OFF ON | D0000 | D2000 + ON ON ON ON OFF | D4000 | D6000 + ON ON OFF ON OFF | D8000 | DA000 + ON ON ON OFF OFF | DC000 | DE000 + ON ON OFF OFF OFF | E0000 | E2000 + +*) To enable the Boot ROM short the jumper J2. + +The jumpers 1 and 2 probably add 0x0800 and 0x1000 to RAM address. + + +Setting the Interrupt Line +-------------------------- + +Jumpers 1-5 of the jumper block J1 control the IRQ level. ON means +shorted, OFF means open. + + Jumper | IRQ + 1 2 3 4 5 | + ---------------------------- + ON OFF OFF OFF OFF | 2 + OFF ON OFF OFF OFF | 3 + OFF OFF ON OFF OFF | 4 + OFF OFF OFF ON OFF | 5 + OFF OFF OFF OFF ON | 7 + + +Setting the Timeout Parameters +------------------------------ + +The jumpers J3 are used to set the timeout parameters. These two +jumpers are normally left open. + + +***************************************************************************** + +** Thomas-Conrad ** +Model #500-6242-0097 REV A (8-bit card) +--------------------------------------- + - from Lars Karlsson <100617.3473@compuserve.com> + + ________________________________________________________ + | ________ ________ |_____ + | |........| |........| | + | |________| |________| ___| + | SW 3 SW 1 | | + | Base I/O Base Addr. Station | | + | address | | + | ______ switch | | + | | | | | + | | | |___| + | | | ______ |___._ + | |______| |______| ____| BNC + | Jumper- _____| Connector + | Main chip block _ __| ' + | | | | RJ Connector + | |_| | with 110 Ohm + | |__ Terminator + | ___________ __| + | |...........| | RJ-jack + | |...........| _____ | (unused) + | |___________| |_____| |__ + | Boot PROM socket IRQ-jumpers |_ Diagnostic + |________ __ _| LED (red) + | | | | | | | | | | | | | | | | | | | | | | + | | | | | | | | | | | | | | | | | | | | |________| + | + | + +And here are the settings for some of the switches and jumpers on the cards. + + + I/O + + 1 2 3 4 5 6 7 8 + +2E0----- 0 0 0 1 0 0 0 1 +2F0----- 0 0 0 1 0 0 0 0 +300----- 0 0 0 0 1 1 1 1 +350----- 0 0 0 0 1 1 1 0 + +"0" in the above example means switch is off "1" means that it is on. + + + ShMem address. + + 1 2 3 4 5 6 7 8 + +CX00--0 0 1 1 | | | +DX00--0 0 1 0 | +X000--------- 1 1 | +X400--------- 1 0 | +X800--------- 0 1 | +XC00--------- 0 0 +ENHANCED----------- 1 +COMPATIBLE--------- 0 + + + IRQ + + + 3 4 5 7 2 + . . . . . + . . . . . + + +There is a DIP-switch with 8 switches, used to set the shared memory address +to be used. The first 6 switches set the address, the 7th doesn't have any +function, and the 8th switch is used to select "compatible" or "enhanced". +When I got my two cards, one of them had this switch set to "enhanced". That +card didn't work at all, it wasn't even recognized by the driver. The other +card had this switch set to "compatible" and it behaved absolutely normally. I +guess that the switch on one of the cards, must have been changed accidentally +when the card was taken out of its former host. The question remains +unanswered, what is the purpose of the "enhanced" position? + +[Avery's note: "enhanced" probably either disables shared memory (use IO +ports instead) or disables IO ports (use memory addresses instead). This +varies by the type of card involved. I fail to see how either of these +enhance anything. Send me more detailed information about this mode, or +just use "compatible" mode instead.] + + +***************************************************************************** + +** Waterloo Microsystems Inc. ?? ** +8-bit card (C) 1985 +------------------- + - from Robert Michael Best <rmb117@cs.usask.ca> + +[Avery's note: these don't work with my driver for some reason. These cards +SEEM to have settings similar to the PDI508Plus, which is +software-configured and doesn't work with my driver either. The "Waterloo +chip" is a boot PROM, probably designed specifically for the University of +Waterloo. If you have any further information about this card, please +e-mail me.] + +The probe has not been able to detect the card on any of the J2 settings, +and I tried them again with the "Waterloo" chip removed. + + _____________________________________________________________________ +| \/ \/ ___ __ __ | +| C4 C4 |^| | M || ^ ||^| | +| -- -- |_| | 5 || || | C3 | +| \/ \/ C10 |___|| ||_| | +| C4 C4 _ _ | | ?? | +| -- -- | \/ || | | +| | || | | +| | || C1 | | +| | || | \/ _____| +| | C6 || | C9 | |___ +| | || | -- | BNC |___| +| | || | >C7| |_____| +| | || | | +| __ __ |____||_____| 1 2 3 6 | +|| ^ | >C4| |o|o|o|o|o|o| J2 >C4| | +|| | |o|o|o|o|o|o| | +|| C2 | >C4| >C4| | +|| | >C8| | +|| | 2 3 4 5 6 7 IRQ >C4| | +||_____| |o|o|o|o|o|o| J3 | +|_______ |o|o|o|o|o|o| _______________| + | | + |_____________________________________________| + +C1 -- "COM9026 + SMC 8638" + In a chip socket. + +C2 -- "@Copyright + Waterloo Microsystems Inc. + 1985" + In a chip Socket with info printed on a label covering a round window + showing the circuit inside. (The window indicates it is an EPROM chip.) + +C3 -- "COM9032 + SMC 8643" + In a chip socket. + +C4 -- "74LS" + 9 total no sockets. + +M5 -- "50006-136 + 20.000000 MHZ + MTQ-T1-S3 + 0 M-TRON 86-40" + Metallic case with 4 pins, no socket. + +C6 -- "MOSTEK@TC8643 + MK6116N-20 + MALAYSIA" + No socket. + +C7 -- No stamp or label but in a 20 pin chip socket. + +C8 -- "PAL10L8CN + 8623" + In a 20 pin socket. + +C9 -- "PAl16R4A-2CN + 8641" + In a 20 pin socket. + +C10 -- "M8640 + NMC + 9306N" + In an 8 pin socket. + +?? -- Some components on a smaller board and attached with 20 pins all + along the side closest to the BNC connector. The are coated in a dark + resin. + +On the board there are two jumper banks labeled J2 and J3. The +manufacturer didn't put a J1 on the board. The two boards I have both +came with a jumper box for each bank. + +J2 -- Numbered 1 2 3 4 5 6. + 4 and 5 are not stamped due to solder points. + +J3 -- IRQ 2 3 4 5 6 7 + +The board itself has a maple leaf stamped just above the irq jumpers +and "-2 46-86" beside C2. Between C1 and C6 "ASS 'Y 300163" and "@1986 +CORMAN CUSTOM ELECTRONICS CORP." stamped just below the BNC connector. +Below that "MADE IN CANADA" + + +***************************************************************************** + +** No Name ** +8-bit cards, 16-bit cards +------------------------- + - from Juergen Seifert <seifert@htwm.de> + +NONAME 8-BIT ARCNET +=================== + +I have named this ARCnet card "NONAME", since there is no name of any +manufacturer on the Installation manual nor on the shipping box. The only +hint to the existence of a manufacturer at all is written in copper, +it is "Made in Taiwan" + +This description has been written by Juergen Seifert <seifert@htwm.de> +using information from the Original + "ARCnet Installation Manual" + + + ________________________________________________________________ + | |STAR| BUS| T/P| | + | |____|____|____| | + | _____________________ | + | | | | + | | | | + | | | | + | | SMC | | + | | | | + | | COM90C65 | | + | | | | + | | | | + | |__________-__________| | + | _____| + | _______________ | CN | + | | PROM | |_____| + | > SOCKET | | + | |_______________| 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 | + | _______________ _______________ | + | |o|o|o|o|o|o|o|o| | SW1 || SW2 || + | |o|o|o|o|o|o|o|o| |_______________||_______________|| + |___ 2 3 4 5 7 E E R Node ID IOB__|__MEM____| + | \ IRQ / T T O | + |__________________1_2_M______________________| + +Legend: + +COM90C65: ARCnet Probe +S1 1-8: Node ID Select +S2 1-3: I/O Base Address Select + 4-6: Memory Base Address Select + 7-8: RAM Offset Select +ET1, ET2 Extended Timeout Select +ROM ROM Enable Select +CN RG62 Coax Connector +STAR| BUS | T/P Three fields for placing a sign (colored circle) + indicating the topology of the card + +Setting one of the switches to Off means "1", On means "0". + + +Setting the Node ID +------------------- + +The eight switches in group SW1 are used to set the node ID. +Each node attached to the network must have an unique node ID which +must be different from 0. +Switch 8 serves as the least significant bit (LSB). + +The node ID is the sum of the values of all switches set to "1" +These values are: + + Switch | Value + -------|------- + 8 | 1 + 7 | 2 + 6 | 4 + 5 | 8 + 4 | 16 + 3 | 32 + 2 | 64 + 1 | 128 + +Some Examples: + + Switch | Hex | Decimal + 1 2 3 4 5 6 7 8 | Node ID | Node ID + ----------------|---------|--------- + 0 0 0 0 0 0 0 0 | not allowed + 0 0 0 0 0 0 0 1 | 1 | 1 + 0 0 0 0 0 0 1 0 | 2 | 2 + 0 0 0 0 0 0 1 1 | 3 | 3 + . . . | | + 0 1 0 1 0 1 0 1 | 55 | 85 + . . . | | + 1 0 1 0 1 0 1 0 | AA | 170 + . . . | | + 1 1 1 1 1 1 0 1 | FD | 253 + 1 1 1 1 1 1 1 0 | FE | 254 + 1 1 1 1 1 1 1 1 | FF | 255 + + +Setting the I/O Base Address +---------------------------- + +The first three switches in switch group SW2 are used to select one +of eight possible I/O Base addresses using the following table + + Switch | Hex I/O + 1 2 3 | Address + ------------|-------- + ON ON ON | 260 + ON ON OFF | 290 + ON OFF ON | 2E0 (Manufacturer's default) + ON OFF OFF | 2F0 + OFF ON ON | 300 + OFF ON OFF | 350 + OFF OFF ON | 380 + OFF OFF OFF | 3E0 + + +Setting the Base Memory (RAM) buffer Address +-------------------------------------------- + +The memory buffer requires 2K of a 16K block of RAM. The base of this +16K block can be located in any of eight positions. +Switches 4-6 of switch group SW2 select the Base of the 16K block. +Within that 16K address space, the buffer may be assigned any one of four +positions, determined by the offset, switches 7 and 8 of group SW2. + + Switch | Hex RAM | Hex ROM + 4 5 6 7 8 | Address | Address *) + -----------|---------|----------- + 0 0 0 0 0 | C0000 | C2000 + 0 0 0 0 1 | C0800 | C2000 + 0 0 0 1 0 | C1000 | C2000 + 0 0 0 1 1 | C1800 | C2000 + | | + 0 0 1 0 0 | C4000 | C6000 + 0 0 1 0 1 | C4800 | C6000 + 0 0 1 1 0 | C5000 | C6000 + 0 0 1 1 1 | C5800 | C6000 + | | + 0 1 0 0 0 | CC000 | CE000 + 0 1 0 0 1 | CC800 | CE000 + 0 1 0 1 0 | CD000 | CE000 + 0 1 0 1 1 | CD800 | CE000 + | | + 0 1 1 0 0 | D0000 | D2000 (Manufacturer's default) + 0 1 1 0 1 | D0800 | D2000 + 0 1 1 1 0 | D1000 | D2000 + 0 1 1 1 1 | D1800 | D2000 + | | + 1 0 0 0 0 | D4000 | D6000 + 1 0 0 0 1 | D4800 | D6000 + 1 0 0 1 0 | D5000 | D6000 + 1 0 0 1 1 | D5800 | D6000 + | | + 1 0 1 0 0 | D8000 | DA000 + 1 0 1 0 1 | D8800 | DA000 + 1 0 1 1 0 | D9000 | DA000 + 1 0 1 1 1 | D9800 | DA000 + | | + 1 1 0 0 0 | DC000 | DE000 + 1 1 0 0 1 | DC800 | DE000 + 1 1 0 1 0 | DD000 | DE000 + 1 1 0 1 1 | DD800 | DE000 + | | + 1 1 1 0 0 | E0000 | E2000 + 1 1 1 0 1 | E0800 | E2000 + 1 1 1 1 0 | E1000 | E2000 + 1 1 1 1 1 | E1800 | E2000 + +*) To enable the 8K Boot PROM install the jumper ROM. + The default is jumper ROM not installed. + + +Setting Interrupt Request Lines (IRQ) +------------------------------------- + +To select a hardware interrupt level set one (only one!) of the jumpers +IRQ2, IRQ3, IRQ4, IRQ5 or IRQ7. The manufacturer's default is IRQ2. + + +Setting the Timeouts +-------------------- + +The two jumpers labeled ET1 and ET2 are used to determine the timeout +parameters (response and reconfiguration time). Every node in a network +must be set to the same timeout values. + + ET1 ET2 | Response Time (us) | Reconfiguration Time (ms) + --------|--------------------|-------------------------- + Off Off | 78 | 840 (Default) + Off On | 285 | 1680 + On Off | 563 | 1680 + On On | 1130 | 1680 + +On means jumper installed, Off means jumper not installed + + +NONAME 16-BIT ARCNET +==================== + +The manual of my 8-Bit NONAME ARCnet Card contains another description +of a 16-Bit Coax / Twisted Pair Card. This description is incomplete, +because there are missing two pages in the manual booklet. (The table +of contents reports pages ... 2-9, 2-11, 2-12, 3-1, ... but inside +the booklet there is a different way of counting ... 2-9, 2-10, A-1, +(empty page), 3-1, ..., 3-18, A-1 (again), A-2) +Also the picture of the board layout is not as good as the picture of +8-Bit card, because there isn't any letter like "SW1" written to the +picture. +Should somebody have such a board, please feel free to complete this +description or to send a mail to me! + +This description has been written by Juergen Seifert <seifert@htwm.de> +using information from the Original + "ARCnet Installation Manual" + + + ___________________________________________________________________ + < _________________ _________________ | + > | SW? || SW? | | + < |_________________||_________________| | + > ____________________ | + < | | | + > | | | + < | | | + > | | | + < | | | + > | | | + < | | | + > |____________________| | + < ____| + > ____________________ | | + < | | | J1 | + > | < | | + < |____________________| ? ? ? ? ? ? |____| + > |o|o|o|o|o|o| | + < |o|o|o|o|o|o| | + > | + < __ ___________| + > | | | + <____________| |_______________________________________| + + +Setting one of the switches to Off means "1", On means "0". + + +Setting the Node ID +------------------- + +The eight switches in group SW2 are used to set the node ID. +Each node attached to the network must have an unique node ID which +must be different from 0. +Switch 8 serves as the least significant bit (LSB). + +The node ID is the sum of the values of all switches set to "1" +These values are: + + Switch | Value + -------|------- + 8 | 1 + 7 | 2 + 6 | 4 + 5 | 8 + 4 | 16 + 3 | 32 + 2 | 64 + 1 | 128 + +Some Examples: + + Switch | Hex | Decimal + 1 2 3 4 5 6 7 8 | Node ID | Node ID + ----------------|---------|--------- + 0 0 0 0 0 0 0 0 | not allowed + 0 0 0 0 0 0 0 1 | 1 | 1 + 0 0 0 0 0 0 1 0 | 2 | 2 + 0 0 0 0 0 0 1 1 | 3 | 3 + . . . | | + 0 1 0 1 0 1 0 1 | 55 | 85 + . . . | | + 1 0 1 0 1 0 1 0 | AA | 170 + . . . | | + 1 1 1 1 1 1 0 1 | FD | 253 + 1 1 1 1 1 1 1 0 | FE | 254 + 1 1 1 1 1 1 1 1 | FF | 255 + + +Setting the I/O Base Address +---------------------------- + +The first three switches in switch group SW1 are used to select one +of eight possible I/O Base addresses using the following table + + Switch | Hex I/O + 3 2 1 | Address + ------------|-------- + ON ON ON | 260 + ON ON OFF | 290 + ON OFF ON | 2E0 (Manufacturer's default) + ON OFF OFF | 2F0 + OFF ON ON | 300 + OFF ON OFF | 350 + OFF OFF ON | 380 + OFF OFF OFF | 3E0 + + +Setting the Base Memory (RAM) buffer Address +-------------------------------------------- + +The memory buffer requires 2K of a 16K block of RAM. The base of this +16K block can be located in any of eight positions. +Switches 6-8 of switch group SW1 select the Base of the 16K block. +Within that 16K address space, the buffer may be assigned any one of four +positions, determined by the offset, switches 4 and 5 of group SW1. + + Switch | Hex RAM | Hex ROM + 8 7 6 5 4 | Address | Address + -----------|---------|----------- + 0 0 0 0 0 | C0000 | C2000 + 0 0 0 0 1 | C0800 | C2000 + 0 0 0 1 0 | C1000 | C2000 + 0 0 0 1 1 | C1800 | C2000 + | | + 0 0 1 0 0 | C4000 | C6000 + 0 0 1 0 1 | C4800 | C6000 + 0 0 1 1 0 | C5000 | C6000 + 0 0 1 1 1 | C5800 | C6000 + | | + 0 1 0 0 0 | CC000 | CE000 + 0 1 0 0 1 | CC800 | CE000 + 0 1 0 1 0 | CD000 | CE000 + 0 1 0 1 1 | CD800 | CE000 + | | + 0 1 1 0 0 | D0000 | D2000 (Manufacturer's default) + 0 1 1 0 1 | D0800 | D2000 + 0 1 1 1 0 | D1000 | D2000 + 0 1 1 1 1 | D1800 | D2000 + | | + 1 0 0 0 0 | D4000 | D6000 + 1 0 0 0 1 | D4800 | D6000 + 1 0 0 1 0 | D5000 | D6000 + 1 0 0 1 1 | D5800 | D6000 + | | + 1 0 1 0 0 | D8000 | DA000 + 1 0 1 0 1 | D8800 | DA000 + 1 0 1 1 0 | D9000 | DA000 + 1 0 1 1 1 | D9800 | DA000 + | | + 1 1 0 0 0 | DC000 | DE000 + 1 1 0 0 1 | DC800 | DE000 + 1 1 0 1 0 | DD000 | DE000 + 1 1 0 1 1 | DD800 | DE000 + | | + 1 1 1 0 0 | E0000 | E2000 + 1 1 1 0 1 | E0800 | E2000 + 1 1 1 1 0 | E1000 | E2000 + 1 1 1 1 1 | E1800 | E2000 + + +Setting Interrupt Request Lines (IRQ) +------------------------------------- + +?????????????????????????????????????? + + +Setting the Timeouts +-------------------- + +?????????????????????????????????????? + + +***************************************************************************** + +** No Name ** +8-bit cards ("Made in Taiwan R.O.C.") +----------- + - from Vojtech Pavlik <vojtech@suse.cz> + +I have named this ARCnet card "NONAME", since I got only the card with +no manual at all and the only text identifying the manufacturer is +"MADE IN TAIWAN R.O.C" printed on the card. + + ____________________________________________________________ + | 1 2 3 4 5 6 7 8 | + | |o|o| JP1 o|o|o|o|o|o|o|o| ON | + | + o|o|o|o|o|o|o|o| ___| + | _____________ o|o|o|o|o|o|o|o| OFF _____ | | ID7 + | | | SW1 | | | | ID6 + | > RAM (2k) | ____________________ | H | | S | ID5 + | |_____________| | || y | | W | ID4 + | | || b | | 2 | ID3 + | | || r | | | ID2 + | | || i | | | ID1 + | | 90C65 || d | |___| ID0 + | SW3 | || | | + | |o|o|o|o|o|o|o|o| ON | || I | | + | |o|o|o|o|o|o|o|o| | || C | | + | |o|o|o|o|o|o|o|o| OFF |____________________|| | _____| + | 1 2 3 4 5 6 7 8 | | | |___ + | ______________ | | | BNC |___| + | | | |_____| |_____| + | > EPROM SOCKET | | + | |______________| | + | ______________| + | | + |_____________________________________________| + +Legend: + +90C65 ARCNET Chip +SW1 1-5: Base Memory Address Select + 6-8: Base I/O Address Select +SW2 1-8: Node ID Select (ID0-ID7) +SW3 1-5: IRQ Select + 6-7: Extra Timeout + 8 : ROM Enable +JP1 Led connector +BNC Coax connector + +Although the jumpers SW1 and SW3 are marked SW, not JP, they are jumpers, not +switches. + +Setting the jumpers to ON means connecting the upper two pins, off the bottom +two - or - in case of IRQ setting, connecting none of them at all. + +Setting the Node ID +------------------- + +The eight switches in SW2 are used to set the node ID. Each node attached +to the network must have an unique node ID which must not be 0. +Switch 1 (ID0) serves as the least significant bit (LSB). + +Setting one of the switches to Off means "1", On means "0". + +The node ID is the sum of the values of all switches set to "1" +These values are: + + Switch | Label | Value + -------|-------|------- + 1 | ID0 | 1 + 2 | ID1 | 2 + 3 | ID2 | 4 + 4 | ID3 | 8 + 5 | ID4 | 16 + 6 | ID5 | 32 + 7 | ID6 | 64 + 8 | ID7 | 128 + +Some Examples: + + Switch | Hex | Decimal + 8 7 6 5 4 3 2 1 | Node ID | Node ID + ----------------|---------|--------- + 0 0 0 0 0 0 0 0 | not allowed + 0 0 0 0 0 0 0 1 | 1 | 1 + 0 0 0 0 0 0 1 0 | 2 | 2 + 0 0 0 0 0 0 1 1 | 3 | 3 + . . . | | + 0 1 0 1 0 1 0 1 | 55 | 85 + . . . | | + 1 0 1 0 1 0 1 0 | AA | 170 + . . . | | + 1 1 1 1 1 1 0 1 | FD | 253 + 1 1 1 1 1 1 1 0 | FE | 254 + 1 1 1 1 1 1 1 1 | FF | 255 + + +Setting the I/O Base Address +---------------------------- + +The last three switches in switch block SW1 are used to select one +of eight possible I/O Base addresses using the following table + + + Switch | Hex I/O + 6 7 8 | Address + ------------|-------- + ON ON ON | 260 + OFF ON ON | 290 + ON OFF ON | 2E0 (Manufacturer's default) + OFF OFF ON | 2F0 + ON ON OFF | 300 + OFF ON OFF | 350 + ON OFF OFF | 380 + OFF OFF OFF | 3E0 + + +Setting the Base Memory (RAM) buffer Address +-------------------------------------------- + +The memory buffer (RAM) requires 2K. The base of this buffer can be +located in any of eight positions. The address of the Boot Prom is +memory base + 0x2000. +Jumpers 3-5 of jumper block SW1 select the Memory Base address. + + Switch | Hex RAM | Hex ROM + 1 2 3 4 5 | Address | Address *) + --------------------|---------|----------- + ON ON ON ON ON | C0000 | C2000 + ON ON OFF ON ON | C4000 | C6000 + ON ON ON OFF ON | CC000 | CE000 + ON ON OFF OFF ON | D0000 | D2000 (Manufacturer's default) + ON ON ON ON OFF | D4000 | D6000 + ON ON OFF ON OFF | D8000 | DA000 + ON ON ON OFF OFF | DC000 | DE000 + ON ON OFF OFF OFF | E0000 | E2000 + +*) To enable the Boot ROM set the jumper 8 of jumper block SW3 to position ON. + +The jumpers 1 and 2 probably add 0x0800, 0x1000 and 0x1800 to RAM adders. + +Setting the Interrupt Line +-------------------------- + +Jumpers 1-5 of the jumper block SW3 control the IRQ level. + + Jumper | IRQ + 1 2 3 4 5 | + ---------------------------- + ON OFF OFF OFF OFF | 2 + OFF ON OFF OFF OFF | 3 + OFF OFF ON OFF OFF | 4 + OFF OFF OFF ON OFF | 5 + OFF OFF OFF OFF ON | 7 + + +Setting the Timeout Parameters +------------------------------ + +The jumpers 6-7 of the jumper block SW3 are used to determine the timeout +parameters. These two jumpers are normally left in the OFF position. + + +***************************************************************************** + +** No Name ** +(Generic Model 9058) +-------------------- + - from Andrew J. Kroll <ag784@freenet.buffalo.edu> + - Sorry this sat in my to-do box for so long, Andrew! (yikes - over a + year!) + _____ + | < + | .---' + ________________________________________________________________ | | + | | SW2 | | | + | ___________ |_____________| | | + | | | 1 2 3 4 5 6 ___| | + | > 6116 RAM | _________ 8 | | | + | |___________| |20MHzXtal| 7 | | | + | |_________| __________ 6 | S | | + | 74LS373 | |- 5 | W | | + | _________ | E |- 4 | | | + | >_______| ______________|..... P |- 3 | 3 | | + | | | : O |- 2 | | | + | | | : X |- 1 |___| | + | ________________ | | : Y |- | | + | | SW1 | | SL90C65 | : |- | | + | |________________| | | : B |- | | + | 1 2 3 4 5 6 7 8 | | : O |- | | + | |_________o____|..../ A |- _______| | + | ____________________ | R |- | |------, + | | | | D |- | BNC | # | + | > 2764 PROM SOCKET | |__________|- |_______|------' + | |____________________| _________ | | + | >________| <- 74LS245 | | + | | | + |___ ______________| | + |H H H H H H H H H H H H H H H H H H H H H H H| | | + |U_U_U_U_U_U_U_U_U_U_U_U_U_U_U_U_U_U_U_U_U_U_U| | | + \| +Legend: + +SL90C65 ARCNET Controller / Transceiver /Logic +SW1 1-5: IRQ Select + 6: ET1 + 7: ET2 + 8: ROM ENABLE +SW2 1-3: Memory Buffer/PROM Address + 3-6: I/O Address Map +SW3 1-8: Node ID Select +BNC BNC RG62/U Connection + *I* have had success using RG59B/U with *NO* terminators! + What gives?! + +SW1: Timeouts, Interrupt and ROM +--------------------------------- + +To select a hardware interrupt level set one (only one!) of the dip switches +up (on) SW1...(switches 1-5) +IRQ3, IRQ4, IRQ5, IRQ7, IRQ2. The Manufacturer's default is IRQ2. + +The switches on SW1 labeled EXT1 (switch 6) and EXT2 (switch 7) +are used to determine the timeout parameters. These two dip switches +are normally left off (down). + + To enable the 8K Boot PROM position SW1 switch 8 on (UP) labeled ROM. + The default is jumper ROM not installed. + + +Setting the I/O Base Address +---------------------------- + +The last three switches in switch group SW2 are used to select one +of eight possible I/O Base addresses using the following table + + + Switch | Hex I/O + 4 5 6 | Address + -------|-------- + 0 0 0 | 260 + 0 0 1 | 290 + 0 1 0 | 2E0 (Manufacturer's default) + 0 1 1 | 2F0 + 1 0 0 | 300 + 1 0 1 | 350 + 1 1 0 | 380 + 1 1 1 | 3E0 + + +Setting the Base Memory Address (RAM & ROM) +------------------------------------------- + +The memory buffer requires 2K of a 16K block of RAM. The base of this +16K block can be located in any of eight positions. +Switches 1-3 of switch group SW2 select the Base of the 16K block. +(0 = DOWN, 1 = UP) +I could, however, only verify two settings... + + Switch| Hex RAM | Hex ROM + 1 2 3 | Address | Address + ------|---------|----------- + 0 0 0 | E0000 | E2000 + 0 0 1 | D0000 | D2000 (Manufacturer's default) + 0 1 0 | ????? | ????? + 0 1 1 | ????? | ????? + 1 0 0 | ????? | ????? + 1 0 1 | ????? | ????? + 1 1 0 | ????? | ????? + 1 1 1 | ????? | ????? + + +Setting the Node ID +------------------- + +The eight switches in group SW3 are used to set the node ID. +Each node attached to the network must have an unique node ID which +must be different from 0. +Switch 1 serves as the least significant bit (LSB). +switches in the DOWN position are OFF (0) and in the UP position are ON (1) + +The node ID is the sum of the values of all switches set to "1" +These values are: + Switch | Value + -------|------- + 1 | 1 + 2 | 2 + 3 | 4 + 4 | 8 + 5 | 16 + 6 | 32 + 7 | 64 + 8 | 128 + +Some Examples: + + Switch# | Hex | Decimal +8 7 6 5 4 3 2 1 | Node ID | Node ID +----------------|---------|--------- +0 0 0 0 0 0 0 0 | not allowed <-. +0 0 0 0 0 0 0 1 | 1 | 1 | +0 0 0 0 0 0 1 0 | 2 | 2 | +0 0 0 0 0 0 1 1 | 3 | 3 | + . . . | | | +0 1 0 1 0 1 0 1 | 55 | 85 | + . . . | | + Don't use 0 or 255! +1 0 1 0 1 0 1 0 | AA | 170 | + . . . | | | +1 1 1 1 1 1 0 1 | FD | 253 | +1 1 1 1 1 1 1 0 | FE | 254 | +1 1 1 1 1 1 1 1 | FF | 255 <-' + + +***************************************************************************** + +** Tiara ** +(model unknown) +------------------------- + - from Christoph Lameter <christoph@lameter.com> + + +Here is information about my card as far as I could figure it out: +----------------------------------------------- tiara +Tiara LanCard of Tiara Computer Systems. + ++----------------------------------------------+ +! ! Transmitter Unit ! ! +! +------------------+ ------- +! MEM Coax Connector +! ROM 7654321 <- I/O ------- +! : : +--------+ ! +! : : ! 90C66LJ! +++ +! : : ! ! !D Switch to set +! : : ! ! !I the Nodenumber +! : : +--------+ !P +! !++ +! 234567 <- IRQ ! ++------------!!!!!!!!!!!!!!!!!!!!!!!!--------+ + !!!!!!!!!!!!!!!!!!!!!!!! + +0 = Jumper Installed +1 = Open + +Top Jumper line Bit 7 = ROM Enable 654=Memory location 321=I/O + +Settings for Memory Location (Top Jumper Line) +456 Address selected +000 C0000 +001 C4000 +010 CC000 +011 D0000 +100 D4000 +101 D8000 +110 DC000 +111 E0000 + +Settings for I/O Address (Top Jumper Line) +123 Port +000 260 +001 290 +010 2E0 +011 2F0 +100 300 +101 350 +110 380 +111 3E0 + +Settings for IRQ Selection (Lower Jumper Line) +234567 +011111 IRQ 2 +101111 IRQ 3 +110111 IRQ 4 +111011 IRQ 5 +111110 IRQ 7 + +***************************************************************************** + + +Other Cards +----------- + +I have no information on other models of ARCnet cards at the moment. Please +send any and all info to: + apenwarr@worldvisions.ca + +Thanks. diff --git a/Documentation/networking/arcnet.txt b/Documentation/networking/arcnet.txt new file mode 100644 index 0000000..7960125 --- /dev/null +++ b/Documentation/networking/arcnet.txt @@ -0,0 +1,555 @@ +---------------------------------------------------------------------------- +NOTE: See also arcnet-hardware.txt in this directory for jumper-setting +and cabling information if you're like many of us and didn't happen to get a +manual with your ARCnet card. +---------------------------------------------------------------------------- + +Since no one seems to listen to me otherwise, perhaps a poem will get your +attention: + This driver's getting fat and beefy, + But my cat is still named Fifi. + +Hmm, I think I'm allowed to call that a poem, even though it's only two +lines. Hey, I'm in Computer Science, not English. Give me a break. + +The point is: I REALLY REALLY REALLY REALLY REALLY want to hear from you if +you test this and get it working. Or if you don't. Or anything. + +ARCnet 0.32 ALPHA first made it into the Linux kernel 1.1.80 - this was +nice, but after that even FEWER people started writing to me because they +didn't even have to install the patch. <sigh> + +Come on, be a sport! Send me a success report! + +(hey, that was even better than my original poem... this is getting bad!) + + +-------- +WARNING: +-------- + +If you don't e-mail me about your success/failure soon, I may be forced to +start SINGING. And we don't want that, do we? + +(You know, it might be argued that I'm pushing this point a little too much. +If you think so, why not flame me in a quick little e-mail? Please also +include the type of card(s) you're using, software, size of network, and +whether it's working or not.) + +My e-mail address is: apenwarr@worldvisions.ca + + +--------------------------------------------------------------------------- + + +These are the ARCnet drivers for Linux. + + +This new release (2.91) has been put together by David Woodhouse +<dwmw2@infradead.org>, in an attempt to tidy up the driver after adding support +for yet another chipset. Now the generic support has been separated from the +individual chipset drivers, and the source files aren't quite so packed with +#ifdefs! I've changed this file a bit, but kept it in the first person from +Avery, because I didn't want to completely rewrite it. + +The previous release resulted from many months of on-and-off effort from me +(Avery Pennarun), many bug reports/fixes and suggestions from others, and in +particular a lot of input and coding from Tomasz Motylewski. Starting with +ARCnet 2.10 ALPHA, Tomasz's all-new-and-improved RFC1051 support has been +included and seems to be working fine! + + +Where do I discuss these drivers? +--------------------------------- + +Tomasz has been so kind as to set up a new and improved mailing list. +Subscribe by sending a message with the BODY "subscribe linux-arcnet YOUR +REAL NAME" to listserv@tichy.ch.uj.edu.pl. Then, to submit messages to the +list, mail to linux-arcnet@tichy.ch.uj.edu.pl. + +There are archives of the mailing list at: + http://tichy.ch.uj.edu.pl/lists/linux-arcnet + +The people on linux-net@vger.kernel.org have also been known to be very +helpful, especially when we're talking about ALPHA Linux kernels that may or +may not work right in the first place. + + +Other Drivers and Info +---------------------- + +You can try my ARCNET page on the World Wide Web at: + http://www.worldvisions.ca/~apenwarr/arcnet/ + +Also, SMC (one of the companies that makes ARCnet cards) has a WWW site you +might be interested in, which includes several drivers for various cards +including ARCnet. Try: + http://www.smc.com/ + +Performance Technologies makes various network software that supports +ARCnet: + http://www.perftech.com/ or ftp to ftp.perftech.com. + +Novell makes a networking stack for DOS which includes ARCnet drivers. Try +FTPing to ftp.novell.com. + +You can get the Crynwr packet driver collection (including arcether.com, the +one you'll want to use with ARCnet cards) from +oak.oakland.edu:/simtel/msdos/pktdrvr. It won't work perfectly on a 386+ +without patches, though, and also doesn't like several cards. Fixed +versions are available on my WWW page, or via e-mail if you don't have WWW +access. + + +Installing the Driver +--------------------- + +All you will need to do in order to install the driver is: + make config + (be sure to choose ARCnet in the network devices + and at least one chipset driver.) + make clean + make zImage + +If you obtained this ARCnet package as an upgrade to the ARCnet driver in +your current kernel, you will need to first copy arcnet.c over the one in +the linux/drivers/net directory. + +You will know the driver is installed properly if you get some ARCnet +messages when you reboot into the new Linux kernel. + +There are four chipset options: + + 1. Standard ARCnet COM90xx chipset. + +This is the normal ARCnet card, which you've probably got. This is the only +chipset driver which will autoprobe if not told where the card is. +It following options on the command line: + com90xx=[<io>[,<irq>[,<shmem>]]][,<name>] | <name> + +If you load the chipset support as a module, the options are: + io=<io> irq=<irq> shmem=<shmem> device=<name> + +To disable the autoprobe, just specify "com90xx=" on the kernel command line. +To specify the name alone, but allow autoprobe, just put "com90xx=<name>" + + 2. ARCnet COM20020 chipset. + +This is the new chipset from SMC with support for promiscuous mode (packet +sniffing), extra diagnostic information, etc. Unfortunately, there is no +sensible method of autoprobing for these cards. You must specify the I/O +address on the kernel command line. +The command line options are: + com20020=<io>[,<irq>[,<node_ID>[,backplane[,CKP[,timeout]]]]][,name] + +If you load the chipset support as a module, the options are: + io=<io> irq=<irq> node=<node_ID> backplane=<backplane> clock=<CKP> + timeout=<timeout> device=<name> + +The COM20020 chipset allows you to set the node ID in software, overriding the +default which is still set in DIP switches on the card. If you don't have the +COM20020 data sheets, and you don't know what the other three options refer +to, then they won't interest you - forget them. + + 3. ARCnet COM90xx chipset in IO-mapped mode. + +This will also work with the normal ARCnet cards, but doesn't use the shared +memory. It performs less well than the above driver, but is provided in case +you have a card which doesn't support shared memory, or (strangely) in case +you have so many ARCnet cards in your machine that you run out of shmem slots. +If you don't give the IO address on the kernel command line, then the driver +will not find the card. +The command line options are: + com90io=<io>[,<irq>][,<name>] + +If you load the chipset support as a module, the options are: + io=<io> irq=<irq> device=<name> + + 4. ARCnet RIM I cards. + +These are COM90xx chips which are _completely_ memory mapped. The support for +these is not tested. If you have one, please mail the author with a success +report. All options must be specified, except the device name. +Command line options: + arcrimi=<shmem>,<irq>,<node_ID>[,<name>] + +If you load the chipset support as a module, the options are: + shmem=<shmem> irq=<irq> node=<node_ID> device=<name> + + +Loadable Module Support +----------------------- + +Configure and rebuild Linux. When asked, answer 'm' to "Generic ARCnet +support" and to support for your ARCnet chipset if you want to use the +loadable module. You can also say 'y' to "Generic ARCnet support" and 'm' +to the chipset support if you wish. + + make config + make clean + make zImage + make modules + +If you're using a loadable module, you need to use insmod to load it, and +you can specify various characteristics of your card on the command +line. (In recent versions of the driver, autoprobing is much more reliable +and works as a module, so most of this is now unnecessary.) + +For example: + cd /usr/src/linux/modules + insmod arcnet.o + insmod com90xx.o + insmod com20020.o io=0x2e0 device=eth1 + + +Using the Driver +---------------- + +If you build your kernel with ARCnet COM90xx support included, it should +probe for your card automatically when you boot. If you use a different +chipset driver complied into the kernel, you must give the necessary options +on the kernel command line, as detailed above. + +Go read the NET-2-HOWTO and ETHERNET-HOWTO for Linux; they should be +available where you picked up this driver. Think of your ARCnet as a +souped-up (or down, as the case may be) Ethernet card. + +By the way, be sure to change all references from "eth0" to "arc0" in the +HOWTOs. Remember that ARCnet isn't a "true" Ethernet, and the device name +is DIFFERENT. + + +Multiple Cards in One Computer +------------------------------ + +Linux has pretty good support for this now, but since I've been busy, the +ARCnet driver has somewhat suffered in this respect. COM90xx support, if +compiled into the kernel, will (try to) autodetect all the installed cards. + +If you have other cards, with support compiled into the kernel, then you can +just repeat the options on the kernel command line, e.g.: +LILO: linux com20020=0x2e0 com20020=0x380 com90io=0x260 + +If you have the chipset support built as a loadable module, then you need to +do something like this: + insmod -o arc0 com90xx + insmod -o arc1 com20020 io=0x2e0 + insmod -o arc2 com90xx +The ARCnet drivers will now sort out their names automatically. + + +How do I get it to work with...? +-------------------------------- + +NFS: Should be fine linux->linux, just pretend you're using Ethernet cards. + oak.oakland.edu:/simtel/msdos/nfs has some nice DOS clients. There + is also a DOS-based NFS server called SOSS. It doesn't multitask + quite the way Linux does (actually, it doesn't multitask AT ALL) but + you never know what you might need. + + With AmiTCP (and possibly others), you may need to set the following + options in your Amiga nfstab: MD 1024 MR 1024 MW 1024 + (Thanks to Christian Gottschling <ferksy@indigo.tng.oche.de> + for this.) + + Probably these refer to maximum NFS data/read/write block sizes. I + don't know why the defaults on the Amiga didn't work; write to me if + you know more. + +DOS: If you're using the freeware arcether.com, you might want to install + the driver patch from my web page. It helps with PC/TCP, and also + can get arcether to load if it timed out too quickly during + initialization. In fact, if you use it on a 386+ you REALLY need + the patch, really. + +Windows: See DOS :) Trumpet Winsock works fine with either the Novell or + Arcether client, assuming you remember to load winpkt of course. + +LAN Manager and Windows for Workgroups: These programs use protocols that + are incompatible with the Internet standard. They try to pretend + the cards are Ethernet, and confuse everyone else on the network. + + However, v2.00 and higher of the Linux ARCnet driver supports this + protocol via the 'arc0e' device. See the section on "Multiprotocol + Support" for more information. + + Using the freeware Samba server and clients for Linux, you can now + interface quite nicely with TCP/IP-based WfWg or Lan Manager + networks. + +Windows 95: Tools are included with Win95 that let you use either the LANMAN + style network drivers (NDIS) or Novell drivers (ODI) to handle your + ARCnet packets. If you use ODI, you'll need to use the 'arc0' + device with Linux. If you use NDIS, then try the 'arc0e' device. + See the "Multiprotocol Support" section below if you need arc0e, + you're completely insane, and/or you need to build some kind of + hybrid network that uses both encapsulation types. + +OS/2: I've been told it works under Warp Connect with an ARCnet driver from + SMC. You need to use the 'arc0e' interface for this. If you get + the SMC driver to work with the TCP/IP stuff included in the + "normal" Warp Bonus Pack, let me know. + + ftp.microsoft.com also has a freeware "Lan Manager for OS/2" client + which should use the same protocol as WfWg does. I had no luck + installing it under Warp, however. Please mail me with any results. + +NetBSD/AmiTCP: These use an old version of the Internet standard ARCnet + protocol (RFC1051) which is compatible with the Linux driver v2.10 + ALPHA and above using the arc0s device. (See "Multiprotocol ARCnet" + below.) ** Newer versions of NetBSD apparently support RFC1201. + + +Using Multiprotocol ARCnet +-------------------------- + +The ARCnet driver v2.10 ALPHA supports three protocols, each on its own +"virtual network device": + + arc0 - RFC1201 protocol, the official Internet standard which just + happens to be 100% compatible with Novell's TRXNET driver. + Version 1.00 of the ARCnet driver supported _only_ this + protocol. arc0 is the fastest of the three protocols (for + whatever reason), and allows larger packets to be used + because it supports RFC1201 "packet splitting" operations. + Unless you have a specific need to use a different protocol, + I strongly suggest that you stick with this one. + + arc0e - "Ethernet-Encapsulation" which sends packets over ARCnet + that are actually a lot like Ethernet packets, including the + 6-byte hardware addresses. This protocol is compatible with + Microsoft's NDIS ARCnet driver, like the one in WfWg and + LANMAN. Because the MTU of 493 is actually smaller than the + one "required" by TCP/IP (576), there is a chance that some + network operations will not function properly. The Linux + TCP/IP layer can compensate in most cases, however, by + automatically fragmenting the TCP/IP packets to make them + fit. arc0e also works slightly more slowly than arc0, for + reasons yet to be determined. (Probably it's the smaller + MTU that does it.) + + arc0s - The "[s]imple" RFC1051 protocol is the "previous" Internet + standard that is completely incompatible with the new + standard. Some software today, however, continues to + support the old standard (and only the old standard) + including NetBSD and AmiTCP. RFC1051 also does not support + RFC1201's packet splitting, and the MTU of 507 is still + smaller than the Internet "requirement," so it's quite + possible that you may run into problems. It's also slower + than RFC1201 by about 25%, for the same reason as arc0e. + + The arc0s support was contributed by Tomasz Motylewski + and modified somewhat by me. Bugs are probably my fault. + +You can choose not to compile arc0e and arc0s into the driver if you want - +this will save you a bit of memory and avoid confusion when eg. trying to +use the "NFS-root" stuff in recent Linux kernels. + +The arc0e and arc0s devices are created automatically when you first +ifconfig the arc0 device. To actually use them, though, you need to also +ifconfig the other virtual devices you need. There are a number of ways you +can set up your network then: + + +1. Single Protocol. + + This is the simplest way to configure your network: use just one of the + two available protocols. As mentioned above, it's a good idea to use + only arc0 unless you have a good reason (like some other software, ie. + WfWg, that only works with arc0e). + + If you need only arc0, then the following commands should get you going: + ifconfig arc0 MY.IP.ADD.RESS + route add MY.IP.ADD.RESS arc0 + route add -net SUB.NET.ADD.RESS arc0 + [add other local routes here] + + If you need arc0e (and only arc0e), it's a little different: + ifconfig arc0 MY.IP.ADD.RESS + ifconfig arc0e MY.IP.ADD.RESS + route add MY.IP.ADD.RESS arc0e + route add -net SUB.NET.ADD.RESS arc0e + + arc0s works much the same way as arc0e. + + +2. More than one protocol on the same wire. + + Now things start getting confusing. To even try it, you may need to be + partly crazy. Here's what *I* did. :) Note that I don't include arc0s in + my home network; I don't have any NetBSD or AmiTCP computers, so I only + use arc0s during limited testing. + + I have three computers on my home network; two Linux boxes (which prefer + RFC1201 protocol, for reasons listed above), and one XT that can't run + Linux but runs the free Microsoft LANMAN Client instead. + + Worse, one of the Linux computers (freedom) also has a modem and acts as + a router to my Internet provider. The other Linux box (insight) also has + its own IP address and needs to use freedom as its default gateway. The + XT (patience), however, does not have its own Internet IP address and so + I assigned it one on a "private subnet" (as defined by RFC1597). + + To start with, take a simple network with just insight and freedom. + Insight needs to: + - talk to freedom via RFC1201 (arc0) protocol, because I like it + more and it's faster. + - use freedom as its Internet gateway. + + That's pretty easy to do. Set up insight like this: + ifconfig arc0 insight + route add insight arc0 + route add freedom arc0 /* I would use the subnet here (like I said + to to in "single protocol" above), + but the rest of the subnet + unfortunately lies across the PPP + link on freedom, which confuses + things. */ + route add default gw freedom + + And freedom gets configured like so: + ifconfig arc0 freedom + route add freedom arc0 + route add insight arc0 + /* and default gateway is configured by pppd */ + + Great, now insight talks to freedom directly on arc0, and sends packets + to the Internet through freedom. If you didn't know how to do the above, + you should probably stop reading this section now because it only gets + worse. + + Now, how do I add patience into the network? It will be using LANMAN + Client, which means I need the arc0e device. It needs to be able to talk + to both insight and freedom, and also use freedom as a gateway to the + Internet. (Recall that patience has a "private IP address" which won't + work on the Internet; that's okay, I configured Linux IP masquerading on + freedom for this subnet). + + So patience (necessarily; I don't have another IP number from my + provider) has an IP address on a different subnet than freedom and + insight, but needs to use freedom as an Internet gateway. Worse, most + DOS networking programs, including LANMAN, have braindead networking + schemes that rely completely on the netmask and a 'default gateway' to + determine how to route packets. This means that to get to freedom or + insight, patience WILL send through its default gateway, regardless of + the fact that both freedom and insight (courtesy of the arc0e device) + could understand a direct transmission. + + I compensate by giving freedom an extra IP address - aliased 'gatekeeper' + - that is on my private subnet, the same subnet that patience is on. I + then define gatekeeper to be the default gateway for patience. + + To configure freedom (in addition to the commands above): + ifconfig arc0e gatekeeper + route add gatekeeper arc0e + route add patience arc0e + + This way, freedom will send all packets for patience through arc0e, + giving its IP address as gatekeeper (on the private subnet). When it + talks to insight or the Internet, it will use its "freedom" Internet IP + address. + + You will notice that we haven't configured the arc0e device on insight. + This would work, but is not really necessary, and would require me to + assign insight another special IP number from my private subnet. Since + both insight and patience are using freedom as their default gateway, the + two can already talk to each other. + + It's quite fortunate that I set things up like this the first time (cough + cough) because it's really handy when I boot insight into DOS. There, it + runs the Novell ODI protocol stack, which only works with RFC1201 ARCnet. + In this mode it would be impossible for insight to communicate directly + with patience, since the Novell stack is incompatible with Microsoft's + Ethernet-Encap. Without changing any settings on freedom or patience, I + simply set freedom as the default gateway for insight (now in DOS, + remember) and all the forwarding happens "automagically" between the two + hosts that would normally not be able to communicate at all. + + For those who like diagrams, I have created two "virtual subnets" on the + same physical ARCnet wire. You can picture it like this: + + + [RFC1201 NETWORK] [ETHER-ENCAP NETWORK] + (registered Internet subnet) (RFC1597 private subnet) + + (IP Masquerade) + /---------------\ * /---------------\ + | | * | | + | +-Freedom-*-Gatekeeper-+ | + | | | * | | + \-------+-------/ | * \-------+-------/ + | | | + Insight | Patience + (Internet) + + + +It works: what now? +------------------- + +Send mail describing your setup, preferably including driver version, kernel +version, ARCnet card model, CPU type, number of systems on your network, and +list of software in use to me at the following address: + apenwarr@worldvisions.ca + +I do send (sometimes automated) replies to all messages I receive. My email +can be weird (and also usually gets forwarded all over the place along the +way to me), so if you don't get a reply within a reasonable time, please +resend. + + +It doesn't work: what now? +-------------------------- + +Do the same as above, but also include the output of the ifconfig and route +commands, as well as any pertinent log entries (ie. anything that starts +with "arcnet:" and has shown up since the last reboot) in your mail. + +If you want to try fixing it yourself (I strongly recommend that you mail me +about the problem first, since it might already have been solved) you may +want to try some of the debug levels available. For heavy testing on +D_DURING or more, it would be a REALLY good idea to kill your klogd daemon +first! D_DURING displays 4-5 lines for each packet sent or received. D_TX, +D_RX, and D_SKB actually DISPLAY each packet as it is sent or received, +which is obviously quite big. + +Starting with v2.40 ALPHA, the autoprobe routines have changed +significantly. In particular, they won't tell you why the card was not +found unless you turn on the D_INIT_REASONS debugging flag. + +Once the driver is running, you can run the arcdump shell script (available +from me or in the full ARCnet package, if you have it) as root to list the +contents of the arcnet buffers at any time. To make any sense at all out of +this, you should grab the pertinent RFCs. (some are listed near the top of +arcnet.c). arcdump assumes your card is at 0xD0000. If it isn't, edit the +script. + +Buffers 0 and 1 are used for receiving, and Buffers 2 and 3 are for sending. +Ping-pong buffers are implemented both ways. + +If your debug level includes D_DURING and you did NOT define SLOW_XMIT_COPY, +the buffers are cleared to a constant value of 0x42 every time the card is +reset (which should only happen when you do an ifconfig up, or when Linux +decides that the driver is broken). During a transmit, unused parts of the +buffer will be cleared to 0x42 as well. This is to make it easier to figure +out which bytes are being used by a packet. + +You can change the debug level without recompiling the kernel by typing: + ifconfig arc0 down metric 1xxx + /etc/rc.d/rc.inet1 +where "xxx" is the debug level you want. For example, "metric 1015" would put +you at debug level 15. Debug level 7 is currently the default. + +Note that the debug level is (starting with v1.90 ALPHA) a binary +combination of different debug flags; so debug level 7 is really 1+2+4 or +D_NORMAL+D_EXTRA+D_INIT. To include D_DURING, you would add 16 to this, +resulting in debug level 23. + +If you don't understand that, you probably don't want to know anyway. +E-mail me about your problem. + + +I want to send money: what now? +------------------------------- + +Go take a nap or something. You'll feel better in the morning. diff --git a/Documentation/networking/atm.txt b/Documentation/networking/atm.txt new file mode 100644 index 0000000..82921ce --- /dev/null +++ b/Documentation/networking/atm.txt @@ -0,0 +1,8 @@ +In order to use anything but the most primitive functions of ATM, +several user-mode programs are required to assist the kernel. These +programs and related material can be found via the ATM on Linux Web +page at http://linux-atm.sourceforge.net/ + +If you encounter problems with ATM, please report them on the ATM +on Linux mailing list. Subscription information, archives, etc., +can be found on http://linux-atm.sourceforge.net/ diff --git a/Documentation/networking/ax25.txt b/Documentation/networking/ax25.txt new file mode 100644 index 0000000..8257dbf --- /dev/null +++ b/Documentation/networking/ax25.txt @@ -0,0 +1,10 @@ +To use the amateur radio protocols within Linux you will need to get a +suitable copy of the AX.25 Utilities. More detailed information about +AX.25, NET/ROM and ROSE, associated programs and and utilities can be +found on http://www.linux-ax25.org. + +There is an active mailing list for discussing Linux amateur radio matters +called linux-hams@vger.kernel.org. To subscribe to it, send a message to +majordomo@vger.kernel.org with the words "subscribe linux-hams" in the body +of the message, the subject field is ignored. You don't need to be +subscribed to post but of course that means you might miss an answer. diff --git a/Documentation/networking/baycom.txt b/Documentation/networking/baycom.txt new file mode 100644 index 0000000..4e68849 --- /dev/null +++ b/Documentation/networking/baycom.txt @@ -0,0 +1,158 @@ + LINUX DRIVERS FOR BAYCOM MODEMS + + Thomas M. Sailer, HB9JNX/AE4WA, <sailer@ife.ee.ethz.ch> + +!!NEW!! (04/98) The drivers for the baycom modems have been split into +separate drivers as they did not share any code, and the driver +and device names have changed. + +This document describes the Linux Kernel Drivers for simple Baycom style +amateur radio modems. + +The following drivers are available: + +baycom_ser_fdx: + This driver supports the SER12 modems either full or half duplex. + Its baud rate may be changed via the `baud' module parameter, + therefore it supports just about every bit bang modem on a + serial port. Its devices are called bcsf0 through bcsf3. + This is the recommended driver for SER12 type modems, + however if you have a broken UART clone that does not have working + delta status bits, you may try baycom_ser_hdx. + +baycom_ser_hdx: + This is an alternative driver for SER12 type modems. + It only supports half duplex, and only 1200 baud. Its devices + are called bcsh0 through bcsh3. Use this driver only if baycom_ser_fdx + does not work with your UART. + +baycom_par: + This driver supports the par96 and picpar modems. + Its devices are called bcp0 through bcp3. + +baycom_epp: + This driver supports the EPP modem. + Its devices are called bce0 through bce3. + This driver is work-in-progress. + +The following modems are supported: + +ser12: This is a very simple 1200 baud AFSK modem. The modem consists only + of a modulator/demodulator chip, usually a TI TCM3105. The computer + is responsible for regenerating the receiver bit clock, as well as + for handling the HDLC protocol. The modem connects to a serial port, + hence the name. Since the serial port is not used as an async serial + port, the kernel driver for serial ports cannot be used, and this + driver only supports standard serial hardware (8250, 16450, 16550) + +par96: This is a modem for 9600 baud FSK compatible to the G3RUH standard. + The modem does all the filtering and regenerates the receiver clock. + Data is transferred from and to the PC via a shift register. + The shift register is filled with 16 bits and an interrupt is signalled. + The PC then empties the shift register in a burst. This modem connects + to the parallel port, hence the name. The modem leaves the + implementation of the HDLC protocol and the scrambler polynomial to + the PC. + +picpar: This is a redesign of the par96 modem by Henning Rech, DF9IC. The modem + is protocol compatible to par96, but uses only three low power ICs + and can therefore be fed from the parallel port and does not require + an additional power supply. Furthermore, it incorporates a carrier + detect circuitry. + +EPP: This is a high-speed modem adaptor that connects to an enhanced parallel port. + Its target audience is users working over a high speed hub (76.8kbit/s). + +eppfpga: This is a redesign of the EPP adaptor. + + + +All of the above modems only support half duplex communications. However, +the driver supports the KISS (see below) fullduplex command. It then simply +starts to send as soon as there's a packet to transmit and does not care +about DCD, i.e. it starts to send even if there's someone else on the channel. +This command is required by some implementations of the DAMA channel +access protocol. + + +The Interface of the drivers + +Unlike previous drivers, these drivers are no longer character devices, +but they are now true kernel network interfaces. Installation is therefore +simple. Once installed, four interfaces named bc{sf,sh,p,e}[0-3] are available. +sethdlc from the ax25 utilities may be used to set driver states etc. +Users of userland AX.25 stacks may use the net2kiss utility (also available +in the ax25 utilities package) to convert packets of a network interface +to a KISS stream on a pseudo tty. There's also a patch available from +me for WAMPES which allows attaching a kernel network interface directly. + + +Configuring the driver + +Every time a driver is inserted into the kernel, it has to know which +modems it should access at which ports. This can be done with the setbaycom +utility. If you are only using one modem, you can also configure the +driver from the insmod command line (or by means of an option line in +/etc/modprobe.conf). + +Examples: + modprobe baycom_ser_fdx mode="ser12*" iobase=0x3f8 irq=4 + sethdlc -i bcsf0 -p mode "ser12*" io 0x3f8 irq 4 + +Both lines configure the first port to drive a ser12 modem at the first +serial port (COM1 under DOS). The * in the mode parameter instructs the driver to use +the software DCD algorithm (see below). + + insmod baycom_par mode="picpar" iobase=0x378 + sethdlc -i bcp0 -p mode "picpar" io 0x378 + +Both lines configure the first port to drive a picpar modem at the +first parallel port (LPT1 under DOS). (Note: picpar implies +hardware DCD, par96 implies software DCD). + +The channel access parameters can be set with sethdlc -a or kissparms. +Note that both utilities interpret the values slightly differently. + + +Hardware DCD versus Software DCD + +To avoid collisions on the air, the driver must know when the channel is +busy. This is the task of the DCD circuitry/software. The driver may either +utilise a software DCD algorithm (options=1) or use a DCD signal from +the hardware (options=0). + +ser12: if software DCD is utilised, the radio's squelch should always be + open. It is highly recommended to use the software DCD algorithm, + as it is much faster than most hardware squelch circuitry. The + disadvantage is a slightly higher load on the system. + +par96: the software DCD algorithm for this type of modem is rather poor. + The modem simply does not provide enough information to implement + a reasonable DCD algorithm in software. Therefore, if your radio + feeds the DCD input of the PAR96 modem, the use of the hardware + DCD circuitry is recommended. + +picpar: the picpar modem features a builtin DCD hardware, which is highly + recommended. + + + +Compatibility with the rest of the Linux kernel + +The serial driver and the baycom serial drivers compete +for the same hardware resources. Of course only one driver can access a given +interface at a time. The serial driver grabs all interfaces it can find at +startup time. Therefore the baycom drivers subsequently won't be able to +access a serial port. You might therefore find it necessary to release +a port owned by the serial driver with 'setserial /dev/ttyS# uart none', where +# is the number of the interface. The baycom drivers do not reserve any +ports at startup, unless one is specified on the 'insmod' command line. Another +method to solve the problem is to compile all drivers as modules and +leave it to kmod to load the correct driver depending on the application. + +The parallel port drivers (baycom_par, baycom_epp) now use the parport subsystem +to arbitrate the ports between different client drivers. + +vy 73s de +Tom Sailer, sailer@ife.ee.ethz.ch +hb9jnx @ hb9w.ampr.org diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt new file mode 100644 index 0000000..688dfe1 --- /dev/null +++ b/Documentation/networking/bonding.txt @@ -0,0 +1,2346 @@ + + Linux Ethernet Bonding Driver HOWTO + + Latest update: 12 November 2007 + +Initial release : Thomas Davis <tadavis at lbl.gov> +Corrections, HA extensions : 2000/10/03-15 : + - Willy Tarreau <willy at meta-x.org> + - Constantine Gavrilov <const-g at xpert.com> + - Chad N. Tindel <ctindel at ieee dot org> + - Janice Girouard <girouard at us dot ibm dot com> + - Jay Vosburgh <fubar at us dot ibm dot com> + +Reorganized and updated Feb 2005 by Jay Vosburgh +Added Sysfs information: 2006/04/24 + - Mitch Williams <mitch.a.williams at intel.com> + +Introduction +============ + + The Linux bonding driver provides a method for aggregating +multiple network interfaces into a single logical "bonded" interface. +The behavior of the bonded interfaces depends upon the mode; generally +speaking, modes provide either hot standby or load balancing services. +Additionally, link integrity monitoring may be performed. + + The bonding driver originally came from Donald Becker's +beowulf patches for kernel 2.0. It has changed quite a bit since, and +the original tools from extreme-linux and beowulf sites will not work +with this version of the driver. + + For new versions of the driver, updated userspace tools, and +who to ask for help, please follow the links at the end of this file. + +Table of Contents +================= + +1. Bonding Driver Installation + +2. Bonding Driver Options + +3. Configuring Bonding Devices +3.1 Configuration with Sysconfig Support +3.1.1 Using DHCP with Sysconfig +3.1.2 Configuring Multiple Bonds with Sysconfig +3.2 Configuration with Initscripts Support +3.2.1 Using DHCP with Initscripts +3.2.2 Configuring Multiple Bonds with Initscripts +3.3 Configuring Bonding Manually with Ifenslave +3.3.1 Configuring Multiple Bonds Manually +3.4 Configuring Bonding Manually via Sysfs + +4. Querying Bonding Configuration +4.1 Bonding Configuration +4.2 Network Configuration + +5. Switch Configuration + +6. 802.1q VLAN Support + +7. Link Monitoring +7.1 ARP Monitor Operation +7.2 Configuring Multiple ARP Targets +7.3 MII Monitor Operation + +8. Potential Trouble Sources +8.1 Adventures in Routing +8.2 Ethernet Device Renaming +8.3 Painfully Slow Or No Failed Link Detection By Miimon + +9. SNMP agents + +10. Promiscuous mode + +11. Configuring Bonding for High Availability +11.1 High Availability in a Single Switch Topology +11.2 High Availability in a Multiple Switch Topology +11.2.1 HA Bonding Mode Selection for Multiple Switch Topology +11.2.2 HA Link Monitoring for Multiple Switch Topology + +12. Configuring Bonding for Maximum Throughput +12.1 Maximum Throughput in a Single Switch Topology +12.1.1 MT Bonding Mode Selection for Single Switch Topology +12.1.2 MT Link Monitoring for Single Switch Topology +12.2 Maximum Throughput in a Multiple Switch Topology +12.2.1 MT Bonding Mode Selection for Multiple Switch Topology +12.2.2 MT Link Monitoring for Multiple Switch Topology + +13. Switch Behavior Issues +13.1 Link Establishment and Failover Delays +13.2 Duplicated Incoming Packets + +14. Hardware Specific Considerations +14.1 IBM BladeCenter + +15. Frequently Asked Questions + +16. Resources and Links + + +1. Bonding Driver Installation +============================== + + Most popular distro kernels ship with the bonding driver +already available as a module and the ifenslave user level control +program installed and ready for use. If your distro does not, or you +have need to compile bonding from source (e.g., configuring and +installing a mainline kernel from kernel.org), you'll need to perform +the following steps: + +1.1 Configure and build the kernel with bonding +----------------------------------------------- + + The current version of the bonding driver is available in the +drivers/net/bonding subdirectory of the most recent kernel source +(which is available on http://kernel.org). Most users "rolling their +own" will want to use the most recent kernel from kernel.org. + + Configure kernel with "make menuconfig" (or "make xconfig" or +"make config"), then select "Bonding driver support" in the "Network +device support" section. It is recommended that you configure the +driver as module since it is currently the only way to pass parameters +to the driver or configure more than one bonding device. + + Build and install the new kernel and modules, then continue +below to install ifenslave. + +1.2 Install ifenslave Control Utility +------------------------------------- + + The ifenslave user level control program is included in the +kernel source tree, in the file Documentation/networking/ifenslave.c. +It is generally recommended that you use the ifenslave that +corresponds to the kernel that you are using (either from the same +source tree or supplied with the distro), however, ifenslave +executables from older kernels should function (but features newer +than the ifenslave release are not supported). Running an ifenslave +that is newer than the kernel is not supported, and may or may not +work. + + To install ifenslave, do the following: + +# gcc -Wall -O -I/usr/src/linux/include ifenslave.c -o ifenslave +# cp ifenslave /sbin/ifenslave + + If your kernel source is not in "/usr/src/linux," then replace +"/usr/src/linux/include" in the above with the location of your kernel +source include directory. + + You may wish to back up any existing /sbin/ifenslave, or, for +testing or informal use, tag the ifenslave to the kernel version +(e.g., name the ifenslave executable /sbin/ifenslave-2.6.10). + +IMPORTANT NOTE: + + If you omit the "-I" or specify an incorrect directory, you +may end up with an ifenslave that is incompatible with the kernel +you're trying to build it for. Some distros (e.g., Red Hat from 7.1 +onwards) do not have /usr/include/linux symbolically linked to the +default kernel source include directory. + +SECOND IMPORTANT NOTE: + If you plan to configure bonding using sysfs, you do not need +to use ifenslave. + +2. Bonding Driver Options +========================= + + Options for the bonding driver are supplied as parameters to the +bonding module at load time, or are specified via sysfs. + + Module options may be given as command line arguments to the +insmod or modprobe command, but are usually specified in either the +/etc/modules.conf or /etc/modprobe.conf configuration file, or in a +distro-specific configuration file (some of which are detailed in the next +section). + + Details on bonding support for sysfs is provided in the +"Configuring Bonding Manually via Sysfs" section, below. + + The available bonding driver parameters are listed below. If a +parameter is not specified the default value is used. When initially +configuring a bond, it is recommended "tail -f /var/log/messages" be +run in a separate window to watch for bonding driver error messages. + + It is critical that either the miimon or arp_interval and +arp_ip_target parameters be specified, otherwise serious network +degradation will occur during link failures. Very few devices do not +support at least miimon, so there is really no reason not to use it. + + Options with textual values will accept either the text name +or, for backwards compatibility, the option value. E.g., +"mode=802.3ad" and "mode=4" set the same mode. + + The parameters are as follows: + +arp_interval + + Specifies the ARP link monitoring frequency in milliseconds. + + The ARP monitor works by periodically checking the slave + devices to determine whether they have sent or received + traffic recently (the precise criteria depends upon the + bonding mode, and the state of the slave). Regular traffic is + generated via ARP probes issued for the addresses specified by + the arp_ip_target option. + + This behavior can be modified by the arp_validate option, + below. + + If ARP monitoring is used in an etherchannel compatible mode + (modes 0 and 2), the switch should be configured in a mode + that evenly distributes packets across all links. If the + switch is configured to distribute the packets in an XOR + fashion, all replies from the ARP targets will be received on + the same link which could cause the other team members to + fail. ARP monitoring should not be used in conjunction with + miimon. A value of 0 disables ARP monitoring. The default + value is 0. + +arp_ip_target + + Specifies the IP addresses to use as ARP monitoring peers when + arp_interval is > 0. These are the targets of the ARP request + sent to determine the health of the link to the targets. + Specify these values in ddd.ddd.ddd.ddd format. Multiple IP + addresses must be separated by a comma. At least one IP + address must be given for ARP monitoring to function. The + maximum number of targets that can be specified is 16. The + default value is no IP addresses. + +arp_validate + + Specifies whether or not ARP probes and replies should be + validated in the active-backup mode. This causes the ARP + monitor to examine the incoming ARP requests and replies, and + only consider a slave to be up if it is receiving the + appropriate ARP traffic. + + Possible values are: + + none or 0 + + No validation is performed. This is the default. + + active or 1 + + Validation is performed only for the active slave. + + backup or 2 + + Validation is performed only for backup slaves. + + all or 3 + + Validation is performed for all slaves. + + For the active slave, the validation checks ARP replies to + confirm that they were generated by an arp_ip_target. Since + backup slaves do not typically receive these replies, the + validation performed for backup slaves is on the ARP request + sent out via the active slave. It is possible that some + switch or network configurations may result in situations + wherein the backup slaves do not receive the ARP requests; in + such a situation, validation of backup slaves must be + disabled. + + This option is useful in network configurations in which + multiple bonding hosts are concurrently issuing ARPs to one or + more targets beyond a common switch. Should the link between + the switch and target fail (but not the switch itself), the + probe traffic generated by the multiple bonding instances will + fool the standard ARP monitor into considering the links as + still up. Use of the arp_validate option can resolve this, as + the ARP monitor will only consider ARP requests and replies + associated with its own instance of bonding. + + This option was added in bonding version 3.1.0. + +downdelay + + Specifies the time, in milliseconds, to wait before disabling + a slave after a link failure has been detected. This option + is only valid for the miimon link monitor. The downdelay + value should be a multiple of the miimon value; if not, it + will be rounded down to the nearest multiple. The default + value is 0. + +fail_over_mac + + Specifies whether active-backup mode should set all slaves to + the same MAC address at enslavement (the traditional + behavior), or, when enabled, perform special handling of the + bond's MAC address in accordance with the selected policy. + + Possible values are: + + none or 0 + + This setting disables fail_over_mac, and causes + bonding to set all slaves of an active-backup bond to + the same MAC address at enslavement time. This is the + default. + + active or 1 + + The "active" fail_over_mac policy indicates that the + MAC address of the bond should always be the MAC + address of the currently active slave. The MAC + address of the slaves is not changed; instead, the MAC + address of the bond changes during a failover. + + This policy is useful for devices that cannot ever + alter their MAC address, or for devices that refuse + incoming broadcasts with their own source MAC (which + interferes with the ARP monitor). + + The down side of this policy is that every device on + the network must be updated via gratuitous ARP, + vs. just updating a switch or set of switches (which + often takes place for any traffic, not just ARP + traffic, if the switch snoops incoming traffic to + update its tables) for the traditional method. If the + gratuitous ARP is lost, communication may be + disrupted. + + When this policy is used in conjuction with the mii + monitor, devices which assert link up prior to being + able to actually transmit and receive are particularly + susecptible to loss of the gratuitous ARP, and an + appropriate updelay setting may be required. + + follow or 2 + + The "follow" fail_over_mac policy causes the MAC + address of the bond to be selected normally (normally + the MAC address of the first slave added to the bond). + However, the second and subsequent slaves are not set + to this MAC address while they are in a backup role; a + slave is programmed with the bond's MAC address at + failover time (and the formerly active slave receives + the newly active slave's MAC address). + + This policy is useful for multiport devices that + either become confused or incur a performance penalty + when multiple ports are programmed with the same MAC + address. + + + The default policy is none, unless the first slave cannot + change its MAC address, in which case the active policy is + selected by default. + + This option may be modified via sysfs only when no slaves are + present in the bond. + + This option was added in bonding version 3.2.0. The "follow" + policy was added in bonding version 3.3.0. + +lacp_rate + + Option specifying the rate in which we'll ask our link partner + to transmit LACPDU packets in 802.3ad mode. Possible values + are: + + slow or 0 + Request partner to transmit LACPDUs every 30 seconds + + fast or 1 + Request partner to transmit LACPDUs every 1 second + + The default is slow. + +max_bonds + + Specifies the number of bonding devices to create for this + instance of the bonding driver. E.g., if max_bonds is 3, and + the bonding driver is not already loaded, then bond0, bond1 + and bond2 will be created. The default value is 1. Specifying + a value of 0 will load bonding, but will not create any devices. + +miimon + + Specifies the MII link monitoring frequency in milliseconds. + This determines how often the link state of each slave is + inspected for link failures. A value of zero disables MII + link monitoring. A value of 100 is a good starting point. + The use_carrier option, below, affects how the link state is + determined. See the High Availability section for additional + information. The default value is 0. + +mode + + Specifies one of the bonding policies. The default is + balance-rr (round robin). Possible values are: + + balance-rr or 0 + + Round-robin policy: Transmit packets in sequential + order from the first available slave through the + last. This mode provides load balancing and fault + tolerance. + + active-backup or 1 + + Active-backup policy: Only one slave in the bond is + active. A different slave becomes active if, and only + if, the active slave fails. The bond's MAC address is + externally visible on only one port (network adapter) + to avoid confusing the switch. + + In bonding version 2.6.2 or later, when a failover + occurs in active-backup mode, bonding will issue one + or more gratuitous ARPs on the newly active slave. + One gratuitous ARP is issued for the bonding master + interface and each VLAN interfaces configured above + it, provided that the interface has at least one IP + address configured. Gratuitous ARPs issued for VLAN + interfaces are tagged with the appropriate VLAN id. + + This mode provides fault tolerance. The primary + option, documented below, affects the behavior of this + mode. + + balance-xor or 2 + + XOR policy: Transmit based on the selected transmit + hash policy. The default policy is a simple [(source + MAC address XOR'd with destination MAC address) modulo + slave count]. Alternate transmit policies may be + selected via the xmit_hash_policy option, described + below. + + This mode provides load balancing and fault tolerance. + + broadcast or 3 + + Broadcast policy: transmits everything on all slave + interfaces. This mode provides fault tolerance. + + 802.3ad or 4 + + IEEE 802.3ad Dynamic link aggregation. Creates + aggregation groups that share the same speed and + duplex settings. Utilizes all slaves in the active + aggregator according to the 802.3ad specification. + + Slave selection for outgoing traffic is done according + to the transmit hash policy, which may be changed from + the default simple XOR policy via the xmit_hash_policy + option, documented below. Note that not all transmit + policies may be 802.3ad compliant, particularly in + regards to the packet mis-ordering requirements of + section 43.2.4 of the 802.3ad standard. Differing + peer implementations will have varying tolerances for + noncompliance. + + Prerequisites: + + 1. Ethtool support in the base drivers for retrieving + the speed and duplex of each slave. + + 2. A switch that supports IEEE 802.3ad Dynamic link + aggregation. + + Most switches will require some type of configuration + to enable 802.3ad mode. + + balance-tlb or 5 + + Adaptive transmit load balancing: channel bonding that + does not require any special switch support. The + outgoing traffic is distributed according to the + current load (computed relative to the speed) on each + slave. Incoming traffic is received by the current + slave. If the receiving slave fails, another slave + takes over the MAC address of the failed receiving + slave. + + Prerequisite: + + Ethtool support in the base drivers for retrieving the + speed of each slave. + + balance-alb or 6 + + Adaptive load balancing: includes balance-tlb plus + receive load balancing (rlb) for IPV4 traffic, and + does not require any special switch support. The + receive load balancing is achieved by ARP negotiation. + The bonding driver intercepts the ARP Replies sent by + the local system on their way out and overwrites the + source hardware address with the unique hardware + address of one of the slaves in the bond such that + different peers use different hardware addresses for + the server. + + Receive traffic from connections created by the server + is also balanced. When the local system sends an ARP + Request the bonding driver copies and saves the peer's + IP information from the ARP packet. When the ARP + Reply arrives from the peer, its hardware address is + retrieved and the bonding driver initiates an ARP + reply to this peer assigning it to one of the slaves + in the bond. A problematic outcome of using ARP + negotiation for balancing is that each time that an + ARP request is broadcast it uses the hardware address + of the bond. Hence, peers learn the hardware address + of the bond and the balancing of receive traffic + collapses to the current slave. This is handled by + sending updates (ARP Replies) to all the peers with + their individually assigned hardware address such that + the traffic is redistributed. Receive traffic is also + redistributed when a new slave is added to the bond + and when an inactive slave is re-activated. The + receive load is distributed sequentially (round robin) + among the group of highest speed slaves in the bond. + + When a link is reconnected or a new slave joins the + bond the receive traffic is redistributed among all + active slaves in the bond by initiating ARP Replies + with the selected MAC address to each of the + clients. The updelay parameter (detailed below) must + be set to a value equal or greater than the switch's + forwarding delay so that the ARP Replies sent to the + peers will not be blocked by the switch. + + Prerequisites: + + 1. Ethtool support in the base drivers for retrieving + the speed of each slave. + + 2. Base driver support for setting the hardware + address of a device while it is open. This is + required so that there will always be one slave in the + team using the bond hardware address (the + curr_active_slave) while having a unique hardware + address for each slave in the bond. If the + curr_active_slave fails its hardware address is + swapped with the new curr_active_slave that was + chosen. + +num_grat_arp + + Specifies the number of gratuitous ARPs to be issued after a + failover event. One gratuitous ARP is issued immediately after + the failover, subsequent ARPs are sent at a rate of one per link + monitor interval (arp_interval or miimon, whichever is active). + + The valid range is 0 - 255; the default value is 1. This option + affects only the active-backup mode. This option was added for + bonding version 3.3.0. + +primary + + A string (eth0, eth2, etc) specifying which slave is the + primary device. The specified device will always be the + active slave while it is available. Only when the primary is + off-line will alternate devices be used. This is useful when + one slave is preferred over another, e.g., when one slave has + higher throughput than another. + + The primary option is only valid for active-backup mode. + +updelay + + Specifies the time, in milliseconds, to wait before enabling a + slave after a link recovery has been detected. This option is + only valid for the miimon link monitor. The updelay value + should be a multiple of the miimon value; if not, it will be + rounded down to the nearest multiple. The default value is 0. + +use_carrier + + Specifies whether or not miimon should use MII or ETHTOOL + ioctls vs. netif_carrier_ok() to determine the link + status. The MII or ETHTOOL ioctls are less efficient and + utilize a deprecated calling sequence within the kernel. The + netif_carrier_ok() relies on the device driver to maintain its + state with netif_carrier_on/off; at this writing, most, but + not all, device drivers support this facility. + + If bonding insists that the link is up when it should not be, + it may be that your network device driver does not support + netif_carrier_on/off. The default state for netif_carrier is + "carrier on," so if a driver does not support netif_carrier, + it will appear as if the link is always up. In this case, + setting use_carrier to 0 will cause bonding to revert to the + MII / ETHTOOL ioctl method to determine the link state. + + A value of 1 enables the use of netif_carrier_ok(), a value of + 0 will use the deprecated MII / ETHTOOL ioctls. The default + value is 1. + +xmit_hash_policy + + Selects the transmit hash policy to use for slave selection in + balance-xor and 802.3ad modes. Possible values are: + + layer2 + + Uses XOR of hardware MAC addresses to generate the + hash. The formula is + + (source MAC XOR destination MAC) modulo slave count + + This algorithm will place all traffic to a particular + network peer on the same slave. + + This algorithm is 802.3ad compliant. + + layer2+3 + + This policy uses a combination of layer2 and layer3 + protocol information to generate the hash. + + Uses XOR of hardware MAC addresses and IP addresses to + generate the hash. The formula is + + (((source IP XOR dest IP) AND 0xffff) XOR + ( source MAC XOR destination MAC )) + modulo slave count + + This algorithm will place all traffic to a particular + network peer on the same slave. For non-IP traffic, + the formula is the same as for the layer2 transmit + hash policy. + + This policy is intended to provide a more balanced + distribution of traffic than layer2 alone, especially + in environments where a layer3 gateway device is + required to reach most destinations. + + This algorithm is 802.3ad compliant. + + layer3+4 + + This policy uses upper layer protocol information, + when available, to generate the hash. This allows for + traffic to a particular network peer to span multiple + slaves, although a single connection will not span + multiple slaves. + + The formula for unfragmented TCP and UDP packets is + + ((source port XOR dest port) XOR + ((source IP XOR dest IP) AND 0xffff) + modulo slave count + + For fragmented TCP or UDP packets and all other IP + protocol traffic, the source and destination port + information is omitted. For non-IP traffic, the + formula is the same as for the layer2 transmit hash + policy. + + This policy is intended to mimic the behavior of + certain switches, notably Cisco switches with PFC2 as + well as some Foundry and IBM products. + + This algorithm is not fully 802.3ad compliant. A + single TCP or UDP conversation containing both + fragmented and unfragmented packets will see packets + striped across two interfaces. This may result in out + of order delivery. Most traffic types will not meet + this criteria, as TCP rarely fragments traffic, and + most UDP traffic is not involved in extended + conversations. Other implementations of 802.3ad may + or may not tolerate this noncompliance. + + The default value is layer2. This option was added in bonding + version 2.6.3. In earlier versions of bonding, this parameter + does not exist, and the layer2 policy is the only policy. The + layer2+3 value was added for bonding version 3.2.2. + + +3. Configuring Bonding Devices +============================== + + You can configure bonding using either your distro's network +initialization scripts, or manually using either ifenslave or the +sysfs interface. Distros generally use one of two packages for the +network initialization scripts: initscripts or sysconfig. Recent +versions of these packages have support for bonding, while older +versions do not. + + We will first describe the options for configuring bonding for +distros using versions of initscripts and sysconfig with full or +partial support for bonding, then provide information on enabling +bonding without support from the network initialization scripts (i.e., +older versions of initscripts or sysconfig). + + If you're unsure whether your distro uses sysconfig or +initscripts, or don't know if it's new enough, have no fear. +Determining this is fairly straightforward. + + First, issue the command: + +$ rpm -qf /sbin/ifup + + It will respond with a line of text starting with either +"initscripts" or "sysconfig," followed by some numbers. This is the +package that provides your network initialization scripts. + + Next, to determine if your installation supports bonding, +issue the command: + +$ grep ifenslave /sbin/ifup + + If this returns any matches, then your initscripts or +sysconfig has support for bonding. + +3.1 Configuration with Sysconfig Support +---------------------------------------- + + This section applies to distros using a version of sysconfig +with bonding support, for example, SuSE Linux Enterprise Server 9. + + SuSE SLES 9's networking configuration system does support +bonding, however, at this writing, the YaST system configuration +front end does not provide any means to work with bonding devices. +Bonding devices can be managed by hand, however, as follows. + + First, if they have not already been configured, configure the +slave devices. On SLES 9, this is most easily done by running the +yast2 sysconfig configuration utility. The goal is for to create an +ifcfg-id file for each slave device. The simplest way to accomplish +this is to configure the devices for DHCP (this is only to get the +file ifcfg-id file created; see below for some issues with DHCP). The +name of the configuration file for each device will be of the form: + +ifcfg-id-xx:xx:xx:xx:xx:xx + + Where the "xx" portion will be replaced with the digits from +the device's permanent MAC address. + + Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been +created, it is necessary to edit the configuration files for the slave +devices (the MAC addresses correspond to those of the slave devices). +Before editing, the file will contain multiple lines, and will look +something like this: + +BOOTPROTO='dhcp' +STARTMODE='on' +USERCTL='no' +UNIQUE='XNzu.WeZGOGF+4wE' +_nm_name='bus-pci-0001:61:01.0' + + Change the BOOTPROTO and STARTMODE lines to the following: + +BOOTPROTO='none' +STARTMODE='off' + + Do not alter the UNIQUE or _nm_name lines. Remove any other +lines (USERCTL, etc). + + Once the ifcfg-id-xx:xx:xx:xx:xx:xx files have been modified, +it's time to create the configuration file for the bonding device +itself. This file is named ifcfg-bondX, where X is the number of the +bonding device to create, starting at 0. The first such file is +ifcfg-bond0, the second is ifcfg-bond1, and so on. The sysconfig +network configuration system will correctly start multiple instances +of bonding. + + The contents of the ifcfg-bondX file is as follows: + +BOOTPROTO="static" +BROADCAST="10.0.2.255" +IPADDR="10.0.2.10" +NETMASK="255.255.0.0" +NETWORK="10.0.2.0" +REMOTE_IPADDR="" +STARTMODE="onboot" +BONDING_MASTER="yes" +BONDING_MODULE_OPTS="mode=active-backup miimon=100" +BONDING_SLAVE0="eth0" +BONDING_SLAVE1="bus-pci-0000:06:08.1" + + Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK +values with the appropriate values for your network. + + The STARTMODE specifies when the device is brought online. +The possible values are: + + onboot: The device is started at boot time. If you're not + sure, this is probably what you want. + + manual: The device is started only when ifup is called + manually. Bonding devices may be configured this + way if you do not wish them to start automatically + at boot for some reason. + + hotplug: The device is started by a hotplug event. This is not + a valid choice for a bonding device. + + off or ignore: The device configuration is ignored. + + The line BONDING_MASTER='yes' indicates that the device is a +bonding master device. The only useful value is "yes." + + The contents of BONDING_MODULE_OPTS are supplied to the +instance of the bonding module for this device. Specify the options +for the bonding mode, link monitoring, and so on here. Do not include +the max_bonds bonding parameter; this will confuse the configuration +system if you have multiple bonding devices. + + Finally, supply one BONDING_SLAVEn="slave device" for each +slave. where "n" is an increasing value, one for each slave. The +"slave device" is either an interface name, e.g., "eth0", or a device +specifier for the network device. The interface name is easier to +find, but the ethN names are subject to change at boot time if, e.g., +a device early in the sequence has failed. The device specifiers +(bus-pci-0000:06:08.1 in the example above) specify the physical +network device, and will not change unless the device's bus location +changes (for example, it is moved from one PCI slot to another). The +example above uses one of each type for demonstration purposes; most +configurations will choose one or the other for all slave devices. + + When all configuration files have been modified or created, +networking must be restarted for the configuration changes to take +effect. This can be accomplished via the following: + +# /etc/init.d/network restart + + Note that the network control script (/sbin/ifdown) will +remove the bonding module as part of the network shutdown processing, +so it is not necessary to remove the module by hand if, e.g., the +module parameters have changed. + + Also, at this writing, YaST/YaST2 will not manage bonding +devices (they do not show bonding interfaces on its list of network +devices). It is necessary to edit the configuration file by hand to +change the bonding configuration. + + Additional general options and details of the ifcfg file +format can be found in an example ifcfg template file: + +/etc/sysconfig/network/ifcfg.template + + Note that the template does not document the various BONDING_ +settings described above, but does describe many of the other options. + +3.1.1 Using DHCP with Sysconfig +------------------------------- + + Under sysconfig, configuring a device with BOOTPROTO='dhcp' +will cause it to query DHCP for its IP address information. At this +writing, this does not function for bonding devices; the scripts +attempt to obtain the device address from DHCP prior to adding any of +the slave devices. Without active slaves, the DHCP requests are not +sent to the network. + +3.1.2 Configuring Multiple Bonds with Sysconfig +----------------------------------------------- + + The sysconfig network initialization system is capable of +handling multiple bonding devices. All that is necessary is for each +bonding instance to have an appropriately configured ifcfg-bondX file +(as described above). Do not specify the "max_bonds" parameter to any +instance of bonding, as this will confuse sysconfig. If you require +multiple bonding devices with identical parameters, create multiple +ifcfg-bondX files. + + Because the sysconfig scripts supply the bonding module +options in the ifcfg-bondX file, it is not necessary to add them to +the system /etc/modules.conf or /etc/modprobe.conf configuration file. + +3.2 Configuration with Initscripts Support +------------------------------------------ + + This section applies to distros using a recent version of +initscripts with bonding support, for example, Red Hat Enterprise Linux +version 3 or later, Fedora, etc. On these systems, the network +initialization scripts have knowledge of bonding, and can be configured to +control bonding devices. Note that older versions of the initscripts +package have lower levels of support for bonding; this will be noted where +applicable. + + These distros will not automatically load the network adapter +driver unless the ethX device is configured with an IP address. +Because of this constraint, users must manually configure a +network-script file for all physical adapters that will be members of +a bondX link. Network script files are located in the directory: + +/etc/sysconfig/network-scripts + + The file name must be prefixed with "ifcfg-eth" and suffixed +with the adapter's physical adapter number. For example, the script +for eth0 would be named /etc/sysconfig/network-scripts/ifcfg-eth0. +Place the following text in the file: + +DEVICE=eth0 +USERCTL=no +ONBOOT=yes +MASTER=bond0 +SLAVE=yes +BOOTPROTO=none + + The DEVICE= line will be different for every ethX device and +must correspond with the name of the file, i.e., ifcfg-eth1 must have +a device line of DEVICE=eth1. The setting of the MASTER= line will +also depend on the final bonding interface name chosen for your bond. +As with other network devices, these typically start at 0, and go up +one for each device, i.e., the first bonding instance is bond0, the +second is bond1, and so on. + + Next, create a bond network script. The file name for this +script will be /etc/sysconfig/network-scripts/ifcfg-bondX where X is +the number of the bond. For bond0 the file is named "ifcfg-bond0", +for bond1 it is named "ifcfg-bond1", and so on. Within that file, +place the following text: + +DEVICE=bond0 +IPADDR=192.168.1.1 +NETMASK=255.255.255.0 +NETWORK=192.168.1.0 +BROADCAST=192.168.1.255 +ONBOOT=yes +BOOTPROTO=none +USERCTL=no + + Be sure to change the networking specific lines (IPADDR, +NETMASK, NETWORK and BROADCAST) to match your network configuration. + + For later versions of initscripts, such as that found with Fedora +7 and Red Hat Enterprise Linux version 5 (or later), it is possible, and, +indeed, preferable, to specify the bonding options in the ifcfg-bond0 +file, e.g. a line of the format: + +BONDING_OPTS="mode=active-backup arp_interval=60 arp_ip_target=+192.168.1.254" + + will configure the bond with the specified options. The options +specified in BONDING_OPTS are identical to the bonding module parameters +except for the arp_ip_target field. Each target should be included as a +separate option and should be preceded by a '+' to indicate it should be +added to the list of queried targets, e.g., + + arp_ip_target=+192.168.1.1 arp_ip_target=+192.168.1.2 + + is the proper syntax to specify multiple targets. When specifying +options via BONDING_OPTS, it is not necessary to edit /etc/modules.conf or +/etc/modprobe.conf. + + For older versions of initscripts that do not support +BONDING_OPTS, it is necessary to edit /etc/modules.conf (or +/etc/modprobe.conf, depending upon your distro) to load the bonding module +with your desired options when the bond0 interface is brought up. The +following lines in /etc/modules.conf (or modprobe.conf) will load the +bonding module, and select its options: + +alias bond0 bonding +options bond0 mode=balance-alb miimon=100 + + Replace the sample parameters with the appropriate set of +options for your configuration. + + Finally run "/etc/rc.d/init.d/network restart" as root. This +will restart the networking subsystem and your bond link should be now +up and running. + +3.2.1 Using DHCP with Initscripts +--------------------------------- + + Recent versions of initscripts (the versions supplied with Fedora +Core 3 and Red Hat Enterprise Linux 4, or later versions, are reported to +work) have support for assigning IP information to bonding devices via +DHCP. + + To configure bonding for DHCP, configure it as described +above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp" +and add a line consisting of "TYPE=Bonding". Note that the TYPE value +is case sensitive. + +3.2.2 Configuring Multiple Bonds with Initscripts +------------------------------------------------- + + Initscripts packages that are included with Fedora 7 and Red Hat +Enterprise Linux 5 support multiple bonding interfaces by simply +specifying the appropriate BONDING_OPTS= in ifcfg-bondX where X is the +number of the bond. This support requires sysfs support in the kernel, +and a bonding driver of version 3.0.0 or later. Other configurations may +not support this method for specifying multiple bonding interfaces; for +those instances, see the "Configuring Multiple Bonds Manually" section, +below. + +3.3 Configuring Bonding Manually with Ifenslave +----------------------------------------------- + + This section applies to distros whose network initialization +scripts (the sysconfig or initscripts package) do not have specific +knowledge of bonding. One such distro is SuSE Linux Enterprise Server +version 8. + + The general method for these systems is to place the bonding +module parameters into /etc/modules.conf or /etc/modprobe.conf (as +appropriate for the installed distro), then add modprobe and/or +ifenslave commands to the system's global init script. The name of +the global init script differs; for sysconfig, it is +/etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local. + + For example, if you wanted to make a simple bond of two e100 +devices (presumed to be eth0 and eth1), and have it persist across +reboots, edit the appropriate file (/etc/init.d/boot.local or +/etc/rc.d/rc.local), and add the following: + +modprobe bonding mode=balance-alb miimon=100 +modprobe e100 +ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up +ifenslave bond0 eth0 +ifenslave bond0 eth1 + + Replace the example bonding module parameters and bond0 +network configuration (IP address, netmask, etc) with the appropriate +values for your configuration. + + Unfortunately, this method will not provide support for the +ifup and ifdown scripts on the bond devices. To reload the bonding +configuration, it is necessary to run the initialization script, e.g., + +# /etc/init.d/boot.local + + or + +# /etc/rc.d/rc.local + + It may be desirable in such a case to create a separate script +which only initializes the bonding configuration, then call that +separate script from within boot.local. This allows for bonding to be +enabled without re-running the entire global init script. + + To shut down the bonding devices, it is necessary to first +mark the bonding device itself as being down, then remove the +appropriate device driver modules. For our example above, you can do +the following: + +# ifconfig bond0 down +# rmmod bonding +# rmmod e100 + + Again, for convenience, it may be desirable to create a script +with these commands. + + +3.3.1 Configuring Multiple Bonds Manually +----------------------------------------- + + This section contains information on configuring multiple +bonding devices with differing options for those systems whose network +initialization scripts lack support for configuring multiple bonds. + + If you require multiple bonding devices, but all with the same +options, you may wish to use the "max_bonds" module parameter, +documented above. + + To create multiple bonding devices with differing options, it is +preferrable to use bonding parameters exported by sysfs, documented in the +section below. + + For versions of bonding without sysfs support, the only means to +provide multiple instances of bonding with differing options is to load +the bonding driver multiple times. Note that current versions of the +sysconfig network initialization scripts handle this automatically; if +your distro uses these scripts, no special action is needed. See the +section Configuring Bonding Devices, above, if you're not sure about your +network initialization scripts. + + To load multiple instances of the module, it is necessary to +specify a different name for each instance (the module loading system +requires that every loaded module, even multiple instances of the same +module, have a unique name). This is accomplished by supplying multiple +sets of bonding options in /etc/modprobe.conf, for example: + +alias bond0 bonding +options bond0 -o bond0 mode=balance-rr miimon=100 + +alias bond1 bonding +options bond1 -o bond1 mode=balance-alb miimon=50 + + will load the bonding module two times. The first instance is +named "bond0" and creates the bond0 device in balance-rr mode with an +miimon of 100. The second instance is named "bond1" and creates the +bond1 device in balance-alb mode with an miimon of 50. + + In some circumstances (typically with older distributions), +the above does not work, and the second bonding instance never sees +its options. In that case, the second options line can be substituted +as follows: + +install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \ + mode=balance-alb miimon=50 + + This may be repeated any number of times, specifying a new and +unique name in place of bond1 for each subsequent instance. + + It has been observed that some Red Hat supplied kernels are unable +to rename modules at load time (the "-o bond1" part). Attempts to pass +that option to modprobe will produce an "Operation not permitted" error. +This has been reported on some Fedora Core kernels, and has been seen on +RHEL 4 as well. On kernels exhibiting this problem, it will be impossible +to configure multiple bonds with differing parameters (as they are older +kernels, and also lack sysfs support). + +3.4 Configuring Bonding Manually via Sysfs +------------------------------------------ + + Starting with version 3.0.0, Channel Bonding may be configured +via the sysfs interface. This interface allows dynamic configuration +of all bonds in the system without unloading the module. It also +allows for adding and removing bonds at runtime. Ifenslave is no +longer required, though it is still supported. + + Use of the sysfs interface allows you to use multiple bonds +with different configurations without having to reload the module. +It also allows you to use multiple, differently configured bonds when +bonding is compiled into the kernel. + + You must have the sysfs filesystem mounted to configure +bonding this way. The examples in this document assume that you +are using the standard mount point for sysfs, e.g. /sys. If your +sysfs filesystem is mounted elsewhere, you will need to adjust the +example paths accordingly. + +Creating and Destroying Bonds +----------------------------- +To add a new bond foo: +# echo +foo > /sys/class/net/bonding_masters + +To remove an existing bond bar: +# echo -bar > /sys/class/net/bonding_masters + +To show all existing bonds: +# cat /sys/class/net/bonding_masters + +NOTE: due to 4K size limitation of sysfs files, this list may be +truncated if you have more than a few hundred bonds. This is unlikely +to occur under normal operating conditions. + +Adding and Removing Slaves +-------------------------- + Interfaces may be enslaved to a bond using the file +/sys/class/net/<bond>/bonding/slaves. The semantics for this file +are the same as for the bonding_masters file. + +To enslave interface eth0 to bond bond0: +# ifconfig bond0 up +# echo +eth0 > /sys/class/net/bond0/bonding/slaves + +To free slave eth0 from bond bond0: +# echo -eth0 > /sys/class/net/bond0/bonding/slaves + + When an interface is enslaved to a bond, symlinks between the +two are created in the sysfs filesystem. In this case, you would get +/sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and +/sys/class/net/eth0/master pointing to /sys/class/net/bond0. + + This means that you can tell quickly whether or not an +interface is enslaved by looking for the master symlink. Thus: +# echo -eth0 > /sys/class/net/eth0/master/bonding/slaves +will free eth0 from whatever bond it is enslaved to, regardless of +the name of the bond interface. + +Changing a Bond's Configuration +------------------------------- + Each bond may be configured individually by manipulating the +files located in /sys/class/net/<bond name>/bonding + + The names of these files correspond directly with the command- +line parameters described elsewhere in this file, and, with the +exception of arp_ip_target, they accept the same values. To see the +current setting, simply cat the appropriate file. + + A few examples will be given here; for specific usage +guidelines for each parameter, see the appropriate section in this +document. + +To configure bond0 for balance-alb mode: +# ifconfig bond0 down +# echo 6 > /sys/class/net/bond0/bonding/mode + - or - +# echo balance-alb > /sys/class/net/bond0/bonding/mode + NOTE: The bond interface must be down before the mode can be +changed. + +To enable MII monitoring on bond0 with a 1 second interval: +# echo 1000 > /sys/class/net/bond0/bonding/miimon + NOTE: If ARP monitoring is enabled, it will disabled when MII +monitoring is enabled, and vice-versa. + +To add ARP targets: +# echo +192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target +# echo +192.168.0.101 > /sys/class/net/bond0/bonding/arp_ip_target + NOTE: up to 10 target addresses may be specified. + +To remove an ARP target: +# echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target + +Example Configuration +--------------------- + We begin with the same example that is shown in section 3.3, +executed with sysfs, and without using ifenslave. + + To make a simple bond of two e100 devices (presumed to be eth0 +and eth1), and have it persist across reboots, edit the appropriate +file (/etc/init.d/boot.local or /etc/rc.d/rc.local), and add the +following: + +modprobe bonding +modprobe e100 +echo balance-alb > /sys/class/net/bond0/bonding/mode +ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up +echo 100 > /sys/class/net/bond0/bonding/miimon +echo +eth0 > /sys/class/net/bond0/bonding/slaves +echo +eth1 > /sys/class/net/bond0/bonding/slaves + + To add a second bond, with two e1000 interfaces in +active-backup mode, using ARP monitoring, add the following lines to +your init script: + +modprobe e1000 +echo +bond1 > /sys/class/net/bonding_masters +echo active-backup > /sys/class/net/bond1/bonding/mode +ifconfig bond1 192.168.2.1 netmask 255.255.255.0 up +echo +192.168.2.100 /sys/class/net/bond1/bonding/arp_ip_target +echo 2000 > /sys/class/net/bond1/bonding/arp_interval +echo +eth2 > /sys/class/net/bond1/bonding/slaves +echo +eth3 > /sys/class/net/bond1/bonding/slaves + + +4. Querying Bonding Configuration +================================= + +4.1 Bonding Configuration +------------------------- + + Each bonding device has a read-only file residing in the +/proc/net/bonding directory. The file contents include information +about the bonding configuration, options and state of each slave. + + For example, the contents of /proc/net/bonding/bond0 after the +driver is loaded with parameters of mode=0 and miimon=1000 is +generally as follows: + + Ethernet Channel Bonding Driver: 2.6.1 (October 29, 2004) + Bonding Mode: load balancing (round-robin) + Currently Active Slave: eth0 + MII Status: up + MII Polling Interval (ms): 1000 + Up Delay (ms): 0 + Down Delay (ms): 0 + + Slave Interface: eth1 + MII Status: up + Link Failure Count: 1 + + Slave Interface: eth0 + MII Status: up + Link Failure Count: 1 + + The precise format and contents will change depending upon the +bonding configuration, state, and version of the bonding driver. + +4.2 Network configuration +------------------------- + + The network configuration can be inspected using the ifconfig +command. Bonding devices will have the MASTER flag set; Bonding slave +devices will have the SLAVE flag set. The ifconfig output does not +contain information on which slaves are associated with which masters. + + In the example below, the bond0 interface is the master +(MASTER) while eth0 and eth1 are slaves (SLAVE). Notice all slaves of +bond0 have the same MAC address (HWaddr) as bond0 for all modes except +TLB and ALB that require a unique MAC address for each slave. + +# /sbin/ifconfig +bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 + inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0 + UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 + RX packets:7224794 errors:0 dropped:0 overruns:0 frame:0 + TX packets:3286647 errors:1 dropped:0 overruns:1 carrier:0 + collisions:0 txqueuelen:0 + +eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 + UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 + RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0 + TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0 + collisions:0 txqueuelen:100 + Interrupt:10 Base address:0x1080 + +eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 + UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 + RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0 + TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0 + collisions:0 txqueuelen:100 + Interrupt:9 Base address:0x1400 + +5. Switch Configuration +======================= + + For this section, "switch" refers to whatever system the +bonded devices are directly connected to (i.e., where the other end of +the cable plugs into). This may be an actual dedicated switch device, +or it may be another regular system (e.g., another computer running +Linux), + + The active-backup, balance-tlb and balance-alb modes do not +require any specific configuration of the switch. + + The 802.3ad mode requires that the switch have the appropriate +ports configured as an 802.3ad aggregation. The precise method used +to configure this varies from switch to switch, but, for example, a +Cisco 3550 series switch requires that the appropriate ports first be +grouped together in a single etherchannel instance, then that +etherchannel is set to mode "lacp" to enable 802.3ad (instead of +standard EtherChannel). + + The balance-rr, balance-xor and broadcast modes generally +require that the switch have the appropriate ports grouped together. +The nomenclature for such a group differs between switches, it may be +called an "etherchannel" (as in the Cisco example, above), a "trunk +group" or some other similar variation. For these modes, each switch +will also have its own configuration options for the switch's transmit +policy to the bond. Typical choices include XOR of either the MAC or +IP addresses. The transmit policy of the two peers does not need to +match. For these three modes, the bonding mode really selects a +transmit policy for an EtherChannel group; all three will interoperate +with another EtherChannel group. + + +6. 802.1q VLAN Support +====================== + + It is possible to configure VLAN devices over a bond interface +using the 8021q driver. However, only packets coming from the 8021q +driver and passing through bonding will be tagged by default. Self +generated packets, for example, bonding's learning packets or ARP +packets generated by either ALB mode or the ARP monitor mechanism, are +tagged internally by bonding itself. As a result, bonding must +"learn" the VLAN IDs configured above it, and use those IDs to tag +self generated packets. + + For reasons of simplicity, and to support the use of adapters +that can do VLAN hardware acceleration offloading, the bonding +interface declares itself as fully hardware offloading capable, it gets +the add_vid/kill_vid notifications to gather the necessary +information, and it propagates those actions to the slaves. In case +of mixed adapter types, hardware accelerated tagged packets that +should go through an adapter that is not offloading capable are +"un-accelerated" by the bonding driver so the VLAN tag sits in the +regular location. + + VLAN interfaces *must* be added on top of a bonding interface +only after enslaving at least one slave. The bonding interface has a +hardware address of 00:00:00:00:00:00 until the first slave is added. +If the VLAN interface is created prior to the first enslavement, it +would pick up the all-zeroes hardware address. Once the first slave +is attached to the bond, the bond device itself will pick up the +slave's hardware address, which is then available for the VLAN device. + + Also, be aware that a similar problem can occur if all slaves +are released from a bond that still has one or more VLAN interfaces on +top of it. When a new slave is added, the bonding interface will +obtain its hardware address from the first slave, which might not +match the hardware address of the VLAN interfaces (which was +ultimately copied from an earlier slave). + + There are two methods to insure that the VLAN device operates +with the correct hardware address if all slaves are removed from a +bond interface: + + 1. Remove all VLAN interfaces then recreate them + + 2. Set the bonding interface's hardware address so that it +matches the hardware address of the VLAN interfaces. + + Note that changing a VLAN interface's HW address would set the +underlying device -- i.e. the bonding interface -- to promiscuous +mode, which might not be what you want. + + +7. Link Monitoring +================== + + The bonding driver at present supports two schemes for +monitoring a slave device's link state: the ARP monitor and the MII +monitor. + + At the present time, due to implementation restrictions in the +bonding driver itself, it is not possible to enable both ARP and MII +monitoring simultaneously. + +7.1 ARP Monitor Operation +------------------------- + + The ARP monitor operates as its name suggests: it sends ARP +queries to one or more designated peer systems on the network, and +uses the response as an indication that the link is operating. This +gives some assurance that traffic is actually flowing to and from one +or more peers on the local network. + + The ARP monitor relies on the device driver itself to verify +that traffic is flowing. In particular, the driver must keep up to +date the last receive time, dev->last_rx, and transmit start time, +dev->trans_start. If these are not updated by the driver, then the +ARP monitor will immediately fail any slaves using that driver, and +those slaves will stay down. If networking monitoring (tcpdump, etc) +shows the ARP requests and replies on the network, then it may be that +your device driver is not updating last_rx and trans_start. + +7.2 Configuring Multiple ARP Targets +------------------------------------ + + While ARP monitoring can be done with just one target, it can +be useful in a High Availability setup to have several targets to +monitor. In the case of just one target, the target itself may go +down or have a problem making it unresponsive to ARP requests. Having +an additional target (or several) increases the reliability of the ARP +monitoring. + + Multiple ARP targets must be separated by commas as follows: + +# example options for ARP monitoring with three targets +alias bond0 bonding +options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9 + + For just a single target the options would resemble: + +# example options for ARP monitoring with one target +alias bond0 bonding +options bond0 arp_interval=60 arp_ip_target=192.168.0.100 + + +7.3 MII Monitor Operation +------------------------- + + The MII monitor monitors only the carrier state of the local +network interface. It accomplishes this in one of three ways: by +depending upon the device driver to maintain its carrier state, by +querying the device's MII registers, or by making an ethtool query to +the device. + + If the use_carrier module parameter is 1 (the default value), +then the MII monitor will rely on the driver for carrier state +information (via the netif_carrier subsystem). As explained in the +use_carrier parameter information, above, if the MII monitor fails to +detect carrier loss on the device (e.g., when the cable is physically +disconnected), it may be that the driver does not support +netif_carrier. + + If use_carrier is 0, then the MII monitor will first query the +device's (via ioctl) MII registers and check the link state. If that +request fails (not just that it returns carrier down), then the MII +monitor will make an ethtool ETHOOL_GLINK request to attempt to obtain +the same information. If both methods fail (i.e., the driver either +does not support or had some error in processing both the MII register +and ethtool requests), then the MII monitor will assume the link is +up. + +8. Potential Sources of Trouble +=============================== + +8.1 Adventures in Routing +------------------------- + + When bonding is configured, it is important that the slave +devices not have routes that supersede routes of the master (or, +generally, not have routes at all). For example, suppose the bonding +device bond0 has two slaves, eth0 and eth1, and the routing table is +as follows: + +Kernel IP routing table +Destination Gateway Genmask Flags MSS Window irtt Iface +10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth0 +10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth1 +10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 bond0 +127.0.0.0 0.0.0.0 255.0.0.0 U 40 0 0 lo + + This routing configuration will likely still update the +receive/transmit times in the driver (needed by the ARP monitor), but +may bypass the bonding driver (because outgoing traffic to, in this +case, another host on network 10 would use eth0 or eth1 before bond0). + + The ARP monitor (and ARP itself) may become confused by this +configuration, because ARP requests (generated by the ARP monitor) +will be sent on one interface (bond0), but the corresponding reply +will arrive on a different interface (eth0). This reply looks to ARP +as an unsolicited ARP reply (because ARP matches replies on an +interface basis), and is discarded. The MII monitor is not affected +by the state of the routing table. + + The solution here is simply to insure that slaves do not have +routes of their own, and if for some reason they must, those routes do +not supersede routes of their master. This should generally be the +case, but unusual configurations or errant manual or automatic static +route additions may cause trouble. + +8.2 Ethernet Device Renaming +---------------------------- + + On systems with network configuration scripts that do not +associate physical devices directly with network interface names (so +that the same physical device always has the same "ethX" name), it may +be necessary to add some special logic to either /etc/modules.conf or +/etc/modprobe.conf (depending upon which is installed on the system). + + For example, given a modules.conf containing the following: + +alias bond0 bonding +options bond0 mode=some-mode miimon=50 +alias eth0 tg3 +alias eth1 tg3 +alias eth2 e1000 +alias eth3 e1000 + + If neither eth0 and eth1 are slaves to bond0, then when the +bond0 interface comes up, the devices may end up reordered. This +happens because bonding is loaded first, then its slave device's +drivers are loaded next. Since no other drivers have been loaded, +when the e1000 driver loads, it will receive eth0 and eth1 for its +devices, but the bonding configuration tries to enslave eth2 and eth3 +(which may later be assigned to the tg3 devices). + + Adding the following: + +add above bonding e1000 tg3 + + causes modprobe to load e1000 then tg3, in that order, when +bonding is loaded. This command is fully documented in the +modules.conf manual page. + + On systems utilizing modprobe.conf (or modprobe.conf.local), +an equivalent problem can occur. In this case, the following can be +added to modprobe.conf (or modprobe.conf.local, as appropriate), as +follows (all on one line; it has been split here for clarity): + +install bonding /sbin/modprobe tg3; /sbin/modprobe e1000; + /sbin/modprobe --ignore-install bonding + + This will, when loading the bonding module, rather than +performing the normal action, instead execute the provided command. +This command loads the device drivers in the order needed, then calls +modprobe with --ignore-install to cause the normal action to then take +place. Full documentation on this can be found in the modprobe.conf +and modprobe manual pages. + +8.3. Painfully Slow Or No Failed Link Detection By Miimon +--------------------------------------------------------- + + By default, bonding enables the use_carrier option, which +instructs bonding to trust the driver to maintain carrier state. + + As discussed in the options section, above, some drivers do +not support the netif_carrier_on/_off link state tracking system. +With use_carrier enabled, bonding will always see these links as up, +regardless of their actual state. + + Additionally, other drivers do support netif_carrier, but do +not maintain it in real time, e.g., only polling the link state at +some fixed interval. In this case, miimon will detect failures, but +only after some long period of time has expired. If it appears that +miimon is very slow in detecting link failures, try specifying +use_carrier=0 to see if that improves the failure detection time. If +it does, then it may be that the driver checks the carrier state at a +fixed interval, but does not cache the MII register values (so the +use_carrier=0 method of querying the registers directly works). If +use_carrier=0 does not improve the failover, then the driver may cache +the registers, or the problem may be elsewhere. + + Also, remember that miimon only checks for the device's +carrier state. It has no way to determine the state of devices on or +beyond other ports of a switch, or if a switch is refusing to pass +traffic while still maintaining carrier on. + +9. SNMP agents +=============== + + If running SNMP agents, the bonding driver should be loaded +before any network drivers participating in a bond. This requirement +is due to the interface index (ipAdEntIfIndex) being associated to +the first interface found with a given IP address. That is, there is +only one ipAdEntIfIndex for each IP address. For example, if eth0 and +eth1 are slaves of bond0 and the driver for eth0 is loaded before the +bonding driver, the interface for the IP address will be associated +with the eth0 interface. This configuration is shown below, the IP +address 192.168.1.1 has an interface index of 2 which indexes to eth0 +in the ifDescr table (ifDescr.2). + + interfaces.ifTable.ifEntry.ifDescr.1 = lo + interfaces.ifTable.ifEntry.ifDescr.2 = eth0 + interfaces.ifTable.ifEntry.ifDescr.3 = eth1 + interfaces.ifTable.ifEntry.ifDescr.4 = eth2 + interfaces.ifTable.ifEntry.ifDescr.5 = eth3 + interfaces.ifTable.ifEntry.ifDescr.6 = bond0 + ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 5 + ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2 + ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 4 + ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1 + + This problem is avoided by loading the bonding driver before +any network drivers participating in a bond. Below is an example of +loading the bonding driver first, the IP address 192.168.1.1 is +correctly associated with ifDescr.2. + + interfaces.ifTable.ifEntry.ifDescr.1 = lo + interfaces.ifTable.ifEntry.ifDescr.2 = bond0 + interfaces.ifTable.ifEntry.ifDescr.3 = eth0 + interfaces.ifTable.ifEntry.ifDescr.4 = eth1 + interfaces.ifTable.ifEntry.ifDescr.5 = eth2 + interfaces.ifTable.ifEntry.ifDescr.6 = eth3 + ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 6 + ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2 + ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 5 + ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1 + + While some distributions may not report the interface name in +ifDescr, the association between the IP address and IfIndex remains +and SNMP functions such as Interface_Scan_Next will report that +association. + +10. Promiscuous mode +==================== + + When running network monitoring tools, e.g., tcpdump, it is +common to enable promiscuous mode on the device, so that all traffic +is seen (instead of seeing only traffic destined for the local host). +The bonding driver handles promiscuous mode changes to the bonding +master device (e.g., bond0), and propagates the setting to the slave +devices. + + For the balance-rr, balance-xor, broadcast, and 802.3ad modes, +the promiscuous mode setting is propagated to all slaves. + + For the active-backup, balance-tlb and balance-alb modes, the +promiscuous mode setting is propagated only to the active slave. + + For balance-tlb mode, the active slave is the slave currently +receiving inbound traffic. + + For balance-alb mode, the active slave is the slave used as a +"primary." This slave is used for mode-specific control traffic, for +sending to peers that are unassigned or if the load is unbalanced. + + For the active-backup, balance-tlb and balance-alb modes, when +the active slave changes (e.g., due to a link failure), the +promiscuous setting will be propagated to the new active slave. + +11. Configuring Bonding for High Availability +============================================= + + High Availability refers to configurations that provide +maximum network availability by having redundant or backup devices, +links or switches between the host and the rest of the world. The +goal is to provide the maximum availability of network connectivity +(i.e., the network always works), even though other configurations +could provide higher throughput. + +11.1 High Availability in a Single Switch Topology +-------------------------------------------------- + + If two hosts (or a host and a single switch) are directly +connected via multiple physical links, then there is no availability +penalty to optimizing for maximum bandwidth. In this case, there is +only one switch (or peer), so if it fails, there is no alternative +access to fail over to. Additionally, the bonding load balance modes +support link monitoring of their members, so if individual links fail, +the load will be rebalanced across the remaining devices. + + See Section 13, "Configuring Bonding for Maximum Throughput" +for information on configuring bonding with one peer device. + +11.2 High Availability in a Multiple Switch Topology +---------------------------------------------------- + + With multiple switches, the configuration of bonding and the +network changes dramatically. In multiple switch topologies, there is +a trade off between network availability and usable bandwidth. + + Below is a sample network, configured to maximize the +availability of the network: + + | | + |port3 port3| + +-----+----+ +-----+----+ + | |port2 ISL port2| | + | switch A +--------------------------+ switch B | + | | | | + +-----+----+ +-----++---+ + |port1 port1| + | +-------+ | + +-------------+ host1 +---------------+ + eth0 +-------+ eth1 + + In this configuration, there is a link between the two +switches (ISL, or inter switch link), and multiple ports connecting to +the outside world ("port3" on each switch). There is no technical +reason that this could not be extended to a third switch. + +11.2.1 HA Bonding Mode Selection for Multiple Switch Topology +------------------------------------------------------------- + + In a topology such as the example above, the active-backup and +broadcast modes are the only useful bonding modes when optimizing for +availability; the other modes require all links to terminate on the +same peer for them to behave rationally. + +active-backup: This is generally the preferred mode, particularly if + the switches have an ISL and play together well. If the + network configuration is such that one switch is specifically + a backup switch (e.g., has lower capacity, higher cost, etc), + then the primary option can be used to insure that the + preferred link is always used when it is available. + +broadcast: This mode is really a special purpose mode, and is suitable + only for very specific needs. For example, if the two + switches are not connected (no ISL), and the networks beyond + them are totally independent. In this case, if it is + necessary for some specific one-way traffic to reach both + independent networks, then the broadcast mode may be suitable. + +11.2.2 HA Link Monitoring Selection for Multiple Switch Topology +---------------------------------------------------------------- + + The choice of link monitoring ultimately depends upon your +switch. If the switch can reliably fail ports in response to other +failures, then either the MII or ARP monitors should work. For +example, in the above example, if the "port3" link fails at the remote +end, the MII monitor has no direct means to detect this. The ARP +monitor could be configured with a target at the remote end of port3, +thus detecting that failure without switch support. + + In general, however, in a multiple switch topology, the ARP +monitor can provide a higher level of reliability in detecting end to +end connectivity failures (which may be caused by the failure of any +individual component to pass traffic for any reason). Additionally, +the ARP monitor should be configured with multiple targets (at least +one for each switch in the network). This will insure that, +regardless of which switch is active, the ARP monitor has a suitable +target to query. + + Note, also, that of late many switches now support a functionality +generally referred to as "trunk failover." This is a feature of the +switch that causes the link state of a particular switch port to be set +down (or up) when the state of another switch port goes down (or up). +It's purpose is to propogate link failures from logically "exterior" ports +to the logically "interior" ports that bonding is able to monitor via +miimon. Availability and configuration for trunk failover varies by +switch, but this can be a viable alternative to the ARP monitor when using +suitable switches. + +12. Configuring Bonding for Maximum Throughput +============================================== + +12.1 Maximizing Throughput in a Single Switch Topology +------------------------------------------------------ + + In a single switch configuration, the best method to maximize +throughput depends upon the application and network environment. The +various load balancing modes each have strengths and weaknesses in +different environments, as detailed below. + + For this discussion, we will break down the topologies into +two categories. Depending upon the destination of most traffic, we +categorize them into either "gatewayed" or "local" configurations. + + In a gatewayed configuration, the "switch" is acting primarily +as a router, and the majority of traffic passes through this router to +other networks. An example would be the following: + + + +----------+ +----------+ + | |eth0 port1| | to other networks + | Host A +---------------------+ router +-------------------> + | +---------------------+ | Hosts B and C are out + | |eth1 port2| | here somewhere + +----------+ +----------+ + + The router may be a dedicated router device, or another host +acting as a gateway. For our discussion, the important point is that +the majority of traffic from Host A will pass through the router to +some other network before reaching its final destination. + + In a gatewayed network configuration, although Host A may +communicate with many other systems, all of its traffic will be sent +and received via one other peer on the local network, the router. + + Note that the case of two systems connected directly via +multiple physical links is, for purposes of configuring bonding, the +same as a gatewayed configuration. In that case, it happens that all +traffic is destined for the "gateway" itself, not some other network +beyond the gateway. + + In a local configuration, the "switch" is acting primarily as +a switch, and the majority of traffic passes through this switch to +reach other stations on the same network. An example would be the +following: + + +----------+ +----------+ +--------+ + | |eth0 port1| +-------+ Host B | + | Host A +------------+ switch |port3 +--------+ + | +------------+ | +--------+ + | |eth1 port2| +------------------+ Host C | + +----------+ +----------+port4 +--------+ + + + Again, the switch may be a dedicated switch device, or another +host acting as a gateway. For our discussion, the important point is +that the majority of traffic from Host A is destined for other hosts +on the same local network (Hosts B and C in the above example). + + In summary, in a gatewayed configuration, traffic to and from +the bonded device will be to the same MAC level peer on the network +(the gateway itself, i.e., the router), regardless of its final +destination. In a local configuration, traffic flows directly to and +from the final destinations, thus, each destination (Host B, Host C) +will be addressed directly by their individual MAC addresses. + + This distinction between a gatewayed and a local network +configuration is important because many of the load balancing modes +available use the MAC addresses of the local network source and +destination to make load balancing decisions. The behavior of each +mode is described below. + + +12.1.1 MT Bonding Mode Selection for Single Switch Topology +----------------------------------------------------------- + + This configuration is the easiest to set up and to understand, +although you will have to decide which bonding mode best suits your +needs. The trade offs for each mode are detailed below: + +balance-rr: This mode is the only mode that will permit a single + TCP/IP connection to stripe traffic across multiple + interfaces. It is therefore the only mode that will allow a + single TCP/IP stream to utilize more than one interface's + worth of throughput. This comes at a cost, however: the + striping generally results in peer systems receiving packets out + of order, causing TCP/IP's congestion control system to kick + in, often by retransmitting segments. + + It is possible to adjust TCP/IP's congestion limits by + altering the net.ipv4.tcp_reordering sysctl parameter. The + usual default value is 3, and the maximum useful value is 127. + For a four interface balance-rr bond, expect that a single + TCP/IP stream will utilize no more than approximately 2.3 + interface's worth of throughput, even after adjusting + tcp_reordering. + + Note that the fraction of packets that will be delivered out of + order is highly variable, and is unlikely to be zero. The level + of reordering depends upon a variety of factors, including the + networking interfaces, the switch, and the topology of the + configuration. Speaking in general terms, higher speed network + cards produce more reordering (due to factors such as packet + coalescing), and a "many to many" topology will reorder at a + higher rate than a "many slow to one fast" configuration. + + Many switches do not support any modes that stripe traffic + (instead choosing a port based upon IP or MAC level addresses); + for those devices, traffic for a particular connection flowing + through the switch to a balance-rr bond will not utilize greater + than one interface's worth of bandwidth. + + If you are utilizing protocols other than TCP/IP, UDP for + example, and your application can tolerate out of order + delivery, then this mode can allow for single stream datagram + performance that scales near linearly as interfaces are added + to the bond. + + This mode requires the switch to have the appropriate ports + configured for "etherchannel" or "trunking." + +active-backup: There is not much advantage in this network topology to + the active-backup mode, as the inactive backup devices are all + connected to the same peer as the primary. In this case, a + load balancing mode (with link monitoring) will provide the + same level of network availability, but with increased + available bandwidth. On the plus side, active-backup mode + does not require any configuration of the switch, so it may + have value if the hardware available does not support any of + the load balance modes. + +balance-xor: This mode will limit traffic such that packets destined + for specific peers will always be sent over the same + interface. Since the destination is determined by the MAC + addresses involved, this mode works best in a "local" network + configuration (as described above), with destinations all on + the same local network. This mode is likely to be suboptimal + if all your traffic is passed through a single router (i.e., a + "gatewayed" network configuration, as described above). + + As with balance-rr, the switch ports need to be configured for + "etherchannel" or "trunking." + +broadcast: Like active-backup, there is not much advantage to this + mode in this type of network topology. + +802.3ad: This mode can be a good choice for this type of network + topology. The 802.3ad mode is an IEEE standard, so all peers + that implement 802.3ad should interoperate well. The 802.3ad + protocol includes automatic configuration of the aggregates, + so minimal manual configuration of the switch is needed + (typically only to designate that some set of devices is + available for 802.3ad). The 802.3ad standard also mandates + that frames be delivered in order (within certain limits), so + in general single connections will not see misordering of + packets. The 802.3ad mode does have some drawbacks: the + standard mandates that all devices in the aggregate operate at + the same speed and duplex. Also, as with all bonding load + balance modes other than balance-rr, no single connection will + be able to utilize more than a single interface's worth of + bandwidth. + + Additionally, the linux bonding 802.3ad implementation + distributes traffic by peer (using an XOR of MAC addresses), + so in a "gatewayed" configuration, all outgoing traffic will + generally use the same device. Incoming traffic may also end + up on a single device, but that is dependent upon the + balancing policy of the peer's 8023.ad implementation. In a + "local" configuration, traffic will be distributed across the + devices in the bond. + + Finally, the 802.3ad mode mandates the use of the MII monitor, + therefore, the ARP monitor is not available in this mode. + +balance-tlb: The balance-tlb mode balances outgoing traffic by peer. + Since the balancing is done according to MAC address, in a + "gatewayed" configuration (as described above), this mode will + send all traffic across a single device. However, in a + "local" network configuration, this mode balances multiple + local network peers across devices in a vaguely intelligent + manner (not a simple XOR as in balance-xor or 802.3ad mode), + so that mathematically unlucky MAC addresses (i.e., ones that + XOR to the same value) will not all "bunch up" on a single + interface. + + Unlike 802.3ad, interfaces may be of differing speeds, and no + special switch configuration is required. On the down side, + in this mode all incoming traffic arrives over a single + interface, this mode requires certain ethtool support in the + network device driver of the slave interfaces, and the ARP + monitor is not available. + +balance-alb: This mode is everything that balance-tlb is, and more. + It has all of the features (and restrictions) of balance-tlb, + and will also balance incoming traffic from local network + peers (as described in the Bonding Module Options section, + above). + + The only additional down side to this mode is that the network + device driver must support changing the hardware address while + the device is open. + +12.1.2 MT Link Monitoring for Single Switch Topology +---------------------------------------------------- + + The choice of link monitoring may largely depend upon which +mode you choose to use. The more advanced load balancing modes do not +support the use of the ARP monitor, and are thus restricted to using +the MII monitor (which does not provide as high a level of end to end +assurance as the ARP monitor). + +12.2 Maximum Throughput in a Multiple Switch Topology +----------------------------------------------------- + + Multiple switches may be utilized to optimize for throughput +when they are configured in parallel as part of an isolated network +between two or more systems, for example: + + +-----------+ + | Host A | + +-+---+---+-+ + | | | + +--------+ | +---------+ + | | | + +------+---+ +-----+----+ +-----+----+ + | Switch A | | Switch B | | Switch C | + +------+---+ +-----+----+ +-----+----+ + | | | + +--------+ | +---------+ + | | | + +-+---+---+-+ + | Host B | + +-----------+ + + In this configuration, the switches are isolated from one +another. One reason to employ a topology such as this is for an +isolated network with many hosts (a cluster configured for high +performance, for example), using multiple smaller switches can be more +cost effective than a single larger switch, e.g., on a network with 24 +hosts, three 24 port switches can be significantly less expensive than +a single 72 port switch. + + If access beyond the network is required, an individual host +can be equipped with an additional network device connected to an +external network; this host then additionally acts as a gateway. + +12.2.1 MT Bonding Mode Selection for Multiple Switch Topology +------------------------------------------------------------- + + In actual practice, the bonding mode typically employed in +configurations of this type is balance-rr. Historically, in this +network configuration, the usual caveats about out of order packet +delivery are mitigated by the use of network adapters that do not do +any kind of packet coalescing (via the use of NAPI, or because the +device itself does not generate interrupts until some number of +packets has arrived). When employed in this fashion, the balance-rr +mode allows individual connections between two hosts to effectively +utilize greater than one interface's bandwidth. + +12.2.2 MT Link Monitoring for Multiple Switch Topology +------------------------------------------------------ + + Again, in actual practice, the MII monitor is most often used +in this configuration, as performance is given preference over +availability. The ARP monitor will function in this topology, but its +advantages over the MII monitor are mitigated by the volume of probes +needed as the number of systems involved grows (remember that each +host in the network is configured with bonding). + +13. Switch Behavior Issues +========================== + +13.1 Link Establishment and Failover Delays +------------------------------------------- + + Some switches exhibit undesirable behavior with regard to the +timing of link up and down reporting by the switch. + + First, when a link comes up, some switches may indicate that +the link is up (carrier available), but not pass traffic over the +interface for some period of time. This delay is typically due to +some type of autonegotiation or routing protocol, but may also occur +during switch initialization (e.g., during recovery after a switch +failure). If you find this to be a problem, specify an appropriate +value to the updelay bonding module option to delay the use of the +relevant interface(s). + + Second, some switches may "bounce" the link state one or more +times while a link is changing state. This occurs most commonly while +the switch is initializing. Again, an appropriate updelay value may +help. + + Note that when a bonding interface has no active links, the +driver will immediately reuse the first link that goes up, even if the +updelay parameter has been specified (the updelay is ignored in this +case). If there are slave interfaces waiting for the updelay timeout +to expire, the interface that first went into that state will be +immediately reused. This reduces down time of the network if the +value of updelay has been overestimated, and since this occurs only in +cases with no connectivity, there is no additional penalty for +ignoring the updelay. + + In addition to the concerns about switch timings, if your +switches take a long time to go into backup mode, it may be desirable +to not activate a backup interface immediately after a link goes down. +Failover may be delayed via the downdelay bonding module option. + +13.2 Duplicated Incoming Packets +-------------------------------- + + NOTE: Starting with version 3.0.2, the bonding driver has logic to +suppress duplicate packets, which should largely eliminate this problem. +The following description is kept for reference. + + It is not uncommon to observe a short burst of duplicated +traffic when the bonding device is first used, or after it has been +idle for some period of time. This is most easily observed by issuing +a "ping" to some other host on the network, and noticing that the +output from ping flags duplicates (typically one per slave). + + For example, on a bond in active-backup mode with five slaves +all connected to one switch, the output may appear as follows: + +# ping -n 10.0.4.2 +PING 10.0.4.2 (10.0.4.2) from 10.0.3.10 : 56(84) bytes of data. +64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.7 ms +64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) +64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) +64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) +64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) +64 bytes from 10.0.4.2: icmp_seq=2 ttl=64 time=0.216 ms +64 bytes from 10.0.4.2: icmp_seq=3 ttl=64 time=0.267 ms +64 bytes from 10.0.4.2: icmp_seq=4 ttl=64 time=0.222 ms + + This is not due to an error in the bonding driver, rather, it +is a side effect of how many switches update their MAC forwarding +tables. Initially, the switch does not associate the MAC address in +the packet with a particular switch port, and so it may send the +traffic to all ports until its MAC forwarding table is updated. Since +the interfaces attached to the bond may occupy multiple ports on a +single switch, when the switch (temporarily) floods the traffic to all +ports, the bond device receives multiple copies of the same packet +(one per slave device). + + The duplicated packet behavior is switch dependent, some +switches exhibit this, and some do not. On switches that display this +behavior, it can be induced by clearing the MAC forwarding table (on +most Cisco switches, the privileged command "clear mac address-table +dynamic" will accomplish this). + +14. Hardware Specific Considerations +==================================== + + This section contains additional information for configuring +bonding on specific hardware platforms, or for interfacing bonding +with particular switches or other devices. + +14.1 IBM BladeCenter +-------------------- + + This applies to the JS20 and similar systems. + + On the JS20 blades, the bonding driver supports only +balance-rr, active-backup, balance-tlb and balance-alb modes. This is +largely due to the network topology inside the BladeCenter, detailed +below. + +JS20 network adapter information +-------------------------------- + + All JS20s come with two Broadcom Gigabit Ethernet ports +integrated on the planar (that's "motherboard" in IBM-speak). In the +BladeCenter chassis, the eth0 port of all JS20 blades is hard wired to +I/O Module #1; similarly, all eth1 ports are wired to I/O Module #2. +An add-on Broadcom daughter card can be installed on a JS20 to provide +two more Gigabit Ethernet ports. These ports, eth2 and eth3, are +wired to I/O Modules 3 and 4, respectively. + + Each I/O Module may contain either a switch or a passthrough +module (which allows ports to be directly connected to an external +switch). Some bonding modes require a specific BladeCenter internal +network topology in order to function; these are detailed below. + + Additional BladeCenter-specific networking information can be +found in two IBM Redbooks (www.ibm.com/redbooks): + +"IBM eServer BladeCenter Networking Options" +"IBM eServer BladeCenter Layer 2-7 Network Switching" + +BladeCenter networking configuration +------------------------------------ + + Because a BladeCenter can be configured in a very large number +of ways, this discussion will be confined to describing basic +configurations. + + Normally, Ethernet Switch Modules (ESMs) are used in I/O +modules 1 and 2. In this configuration, the eth0 and eth1 ports of a +JS20 will be connected to different internal switches (in the +respective I/O modules). + + A passthrough module (OPM or CPM, optical or copper, +passthrough module) connects the I/O module directly to an external +switch. By using PMs in I/O module #1 and #2, the eth0 and eth1 +interfaces of a JS20 can be redirected to the outside world and +connected to a common external switch. + + Depending upon the mix of ESMs and PMs, the network will +appear to bonding as either a single switch topology (all PMs) or as a +multiple switch topology (one or more ESMs, zero or more PMs). It is +also possible to connect ESMs together, resulting in a configuration +much like the example in "High Availability in a Multiple Switch +Topology," above. + +Requirements for specific modes +------------------------------- + + The balance-rr mode requires the use of passthrough modules +for devices in the bond, all connected to an common external switch. +That switch must be configured for "etherchannel" or "trunking" on the +appropriate ports, as is usual for balance-rr. + + The balance-alb and balance-tlb modes will function with +either switch modules or passthrough modules (or a mix). The only +specific requirement for these modes is that all network interfaces +must be able to reach all destinations for traffic sent over the +bonding device (i.e., the network must converge at some point outside +the BladeCenter). + + The active-backup mode has no additional requirements. + +Link monitoring issues +---------------------- + + When an Ethernet Switch Module is in place, only the ARP +monitor will reliably detect link loss to an external switch. This is +nothing unusual, but examination of the BladeCenter cabinet would +suggest that the "external" network ports are the ethernet ports for +the system, when it fact there is a switch between these "external" +ports and the devices on the JS20 system itself. The MII monitor is +only able to detect link failures between the ESM and the JS20 system. + + When a passthrough module is in place, the MII monitor does +detect failures to the "external" port, which is then directly +connected to the JS20 system. + +Other concerns +-------------- + + The Serial Over LAN (SoL) link is established over the primary +ethernet (eth0) only, therefore, any loss of link to eth0 will result +in losing your SoL connection. It will not fail over with other +network traffic, as the SoL system is beyond the control of the +bonding driver. + + It may be desirable to disable spanning tree on the switch +(either the internal Ethernet Switch Module, or an external switch) to +avoid fail-over delay issues when using bonding. + + +15. Frequently Asked Questions +============================== + +1. Is it SMP safe? + + Yes. The old 2.0.xx channel bonding patch was not SMP safe. +The new driver was designed to be SMP safe from the start. + +2. What type of cards will work with it? + + Any Ethernet type cards (you can even mix cards - a Intel +EtherExpress PRO/100 and a 3com 3c905b, for example). For most modes, +devices need not be of the same speed. + + Starting with version 3.2.1, bonding also supports Infiniband +slaves in active-backup mode. + +3. How many bonding devices can I have? + + There is no limit. + +4. How many slaves can a bonding device have? + + This is limited only by the number of network interfaces Linux +supports and/or the number of network cards you can place in your +system. + +5. What happens when a slave link dies? + + If link monitoring is enabled, then the failing device will be +disabled. The active-backup mode will fail over to a backup link, and +other modes will ignore the failed link. The link will continue to be +monitored, and should it recover, it will rejoin the bond (in whatever +manner is appropriate for the mode). See the sections on High +Availability and the documentation for each mode for additional +information. + + Link monitoring can be enabled via either the miimon or +arp_interval parameters (described in the module parameters section, +above). In general, miimon monitors the carrier state as sensed by +the underlying network device, and the arp monitor (arp_interval) +monitors connectivity to another host on the local network. + + If no link monitoring is configured, the bonding driver will +be unable to detect link failures, and will assume that all links are +always available. This will likely result in lost packets, and a +resulting degradation of performance. The precise performance loss +depends upon the bonding mode and network configuration. + +6. Can bonding be used for High Availability? + + Yes. See the section on High Availability for details. + +7. Which switches/systems does it work with? + + The full answer to this depends upon the desired mode. + + In the basic balance modes (balance-rr and balance-xor), it +works with any system that supports etherchannel (also called +trunking). Most managed switches currently available have such +support, and many unmanaged switches as well. + + The advanced balance modes (balance-tlb and balance-alb) do +not have special switch requirements, but do need device drivers that +support specific features (described in the appropriate section under +module parameters, above). + + In 802.3ad mode, it works with systems that support IEEE +802.3ad Dynamic Link Aggregation. Most managed and many unmanaged +switches currently available support 802.3ad. + + The active-backup mode should work with any Layer-II switch. + +8. Where does a bonding device get its MAC address from? + + When using slave devices that have fixed MAC addresses, or when +the fail_over_mac option is enabled, the bonding device's MAC address is +the MAC address of the active slave. + + For other configurations, if not explicitly configured (with +ifconfig or ip link), the MAC address of the bonding device is taken from +its first slave device. This MAC address is then passed to all following +slaves and remains persistent (even if the first slave is removed) until +the bonding device is brought down or reconfigured. + + If you wish to change the MAC address, you can set it with +ifconfig or ip link: + +# ifconfig bond0 hw ether 00:11:22:33:44:55 + +# ip link set bond0 address 66:77:88:99:aa:bb + + The MAC address can be also changed by bringing down/up the +device and then changing its slaves (or their order): + +# ifconfig bond0 down ; modprobe -r bonding +# ifconfig bond0 .... up +# ifenslave bond0 eth... + + This method will automatically take the address from the next +slave that is added. + + To restore your slaves' MAC addresses, you need to detach them +from the bond (`ifenslave -d bond0 eth0'). The bonding driver will +then restore the MAC addresses that the slaves had before they were +enslaved. + +16. Resources and Links +======================= + +The latest version of the bonding driver can be found in the latest +version of the linux kernel, found on http://kernel.org + +The latest version of this document can be found in either the latest +kernel source (named Documentation/networking/bonding.txt), or on the +bonding sourceforge site: + +http://www.sourceforge.net/projects/bonding + +Discussions regarding the bonding driver take place primarily on the +bonding-devel mailing list, hosted at sourceforge.net. If you have +questions or problems, post them to the list. The list address is: + +bonding-devel@lists.sourceforge.net + + The administrative interface (to subscribe or unsubscribe) can +be found at: + +https://lists.sourceforge.net/lists/listinfo/bonding-devel + +Donald Becker's Ethernet Drivers and diag programs may be found at : + - http://www.scyld.com/network/ + +You will also find a lot of information regarding Ethernet, NWay, MII, +etc. at www.scyld.com. + +-- END -- diff --git a/Documentation/networking/bridge.txt b/Documentation/networking/bridge.txt new file mode 100644 index 0000000..bec69a8 --- /dev/null +++ b/Documentation/networking/bridge.txt @@ -0,0 +1,8 @@ +In order to use the Ethernet bridging functionality, you'll need the +userspace tools. These programs and documentation are available +at http://www.linux-foundation.org/en/Net:Bridge. The download page is +http://prdownloads.sourceforge.net/bridge. + +If you still have questions, don't hesitate to post to the mailing list +(more info http://lists.osdl.org/mailman/listinfo/bridge). + diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt new file mode 100644 index 0000000..2035bc4 --- /dev/null +++ b/Documentation/networking/can.txt @@ -0,0 +1,665 @@ +============================================================================ + +can.txt + +Readme file for the Controller Area Network Protocol Family (aka Socket CAN) + +This file contains + + 1 Overview / What is Socket CAN + + 2 Motivation / Why using the socket API + + 3 Socket CAN concept + 3.1 receive lists + 3.2 local loopback of sent frames + 3.3 network security issues (capabilities) + 3.4 network problem notifications + + 4 How to use Socket CAN + 4.1 RAW protocol sockets with can_filters (SOCK_RAW) + 4.1.1 RAW socket option CAN_RAW_FILTER + 4.1.2 RAW socket option CAN_RAW_ERR_FILTER + 4.1.3 RAW socket option CAN_RAW_LOOPBACK + 4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS + 4.2 Broadcast Manager protocol sockets (SOCK_DGRAM) + 4.3 connected transport protocols (SOCK_SEQPACKET) + 4.4 unconnected transport protocols (SOCK_DGRAM) + + 5 Socket CAN core module + 5.1 can.ko module params + 5.2 procfs content + 5.3 writing own CAN protocol modules + + 6 CAN network drivers + 6.1 general settings + 6.2 local loopback of sent frames + 6.3 CAN controller hardware filters + 6.4 The virtual CAN driver (vcan) + 6.5 currently supported CAN hardware + 6.6 todo + + 7 Credits + +============================================================================ + +1. Overview / What is Socket CAN +-------------------------------- + +The socketcan package is an implementation of CAN protocols +(Controller Area Network) for Linux. CAN is a networking technology +which has widespread use in automation, embedded devices, and +automotive fields. While there have been other CAN implementations +for Linux based on character devices, Socket CAN uses the Berkeley +socket API, the Linux network stack and implements the CAN device +drivers as network interfaces. The CAN socket API has been designed +as similar as possible to the TCP/IP protocols to allow programmers, +familiar with network programming, to easily learn how to use CAN +sockets. + +2. Motivation / Why using the socket API +---------------------------------------- + +There have been CAN implementations for Linux before Socket CAN so the +question arises, why we have started another project. Most existing +implementations come as a device driver for some CAN hardware, they +are based on character devices and provide comparatively little +functionality. Usually, there is only a hardware-specific device +driver which provides a character device interface to send and +receive raw CAN frames, directly to/from the controller hardware. +Queueing of frames and higher-level transport protocols like ISO-TP +have to be implemented in user space applications. Also, most +character-device implementations support only one single process to +open the device at a time, similar to a serial interface. Exchanging +the CAN controller requires employment of another device driver and +often the need for adaption of large parts of the application to the +new driver's API. + +Socket CAN was designed to overcome all of these limitations. A new +protocol family has been implemented which provides a socket interface +to user space applications and which builds upon the Linux network +layer, so to use all of the provided queueing functionality. A device +driver for CAN controller hardware registers itself with the Linux +network layer as a network device, so that CAN frames from the +controller can be passed up to the network layer and on to the CAN +protocol family module and also vice-versa. Also, the protocol family +module provides an API for transport protocol modules to register, so +that any number of transport protocols can be loaded or unloaded +dynamically. In fact, the can core module alone does not provide any +protocol and cannot be used without loading at least one additional +protocol module. Multiple sockets can be opened at the same time, +on different or the same protocol module and they can listen/send +frames on different or the same CAN IDs. Several sockets listening on +the same interface for frames with the same CAN ID are all passed the +same received matching CAN frames. An application wishing to +communicate using a specific transport protocol, e.g. ISO-TP, just +selects that protocol when opening the socket, and then can read and +write application data byte streams, without having to deal with +CAN-IDs, frames, etc. + +Similar functionality visible from user-space could be provided by a +character device, too, but this would lead to a technically inelegant +solution for a couple of reasons: + +* Intricate usage. Instead of passing a protocol argument to + socket(2) and using bind(2) to select a CAN interface and CAN ID, an + application would have to do all these operations using ioctl(2)s. + +* Code duplication. A character device cannot make use of the Linux + network queueing code, so all that code would have to be duplicated + for CAN networking. + +* Abstraction. In most existing character-device implementations, the + hardware-specific device driver for a CAN controller directly + provides the character device for the application to work with. + This is at least very unusual in Unix systems for both, char and + block devices. For example you don't have a character device for a + certain UART of a serial interface, a certain sound chip in your + computer, a SCSI or IDE controller providing access to your hard + disk or tape streamer device. Instead, you have abstraction layers + which provide a unified character or block device interface to the + application on the one hand, and a interface for hardware-specific + device drivers on the other hand. These abstractions are provided + by subsystems like the tty layer, the audio subsystem or the SCSI + and IDE subsystems for the devices mentioned above. + + The easiest way to implement a CAN device driver is as a character + device without such a (complete) abstraction layer, as is done by most + existing drivers. The right way, however, would be to add such a + layer with all the functionality like registering for certain CAN + IDs, supporting several open file descriptors and (de)multiplexing + CAN frames between them, (sophisticated) queueing of CAN frames, and + providing an API for device drivers to register with. However, then + it would be no more difficult, or may be even easier, to use the + networking framework provided by the Linux kernel, and this is what + Socket CAN does. + + The use of the networking framework of the Linux kernel is just the + natural and most appropriate way to implement CAN for Linux. + +3. Socket CAN concept +--------------------- + + As described in chapter 2 it is the main goal of Socket CAN to + provide a socket interface to user space applications which builds + upon the Linux network layer. In contrast to the commonly known + TCP/IP and ethernet networking, the CAN bus is a broadcast-only(!) + medium that has no MAC-layer addressing like ethernet. The CAN-identifier + (can_id) is used for arbitration on the CAN-bus. Therefore the CAN-IDs + have to be chosen uniquely on the bus. When designing a CAN-ECU + network the CAN-IDs are mapped to be sent by a specific ECU. + For this reason a CAN-ID can be treated best as a kind of source address. + + 3.1 receive lists + + The network transparent access of multiple applications leads to the + problem that different applications may be interested in the same + CAN-IDs from the same CAN network interface. The Socket CAN core + module - which implements the protocol family CAN - provides several + high efficient receive lists for this reason. If e.g. a user space + application opens a CAN RAW socket, the raw protocol module itself + requests the (range of) CAN-IDs from the Socket CAN core that are + requested by the user. The subscription and unsubscription of + CAN-IDs can be done for specific CAN interfaces or for all(!) known + CAN interfaces with the can_rx_(un)register() functions provided to + CAN protocol modules by the SocketCAN core (see chapter 5). + To optimize the CPU usage at runtime the receive lists are split up + into several specific lists per device that match the requested + filter complexity for a given use-case. + + 3.2 local loopback of sent frames + + As known from other networking concepts the data exchanging + applications may run on the same or different nodes without any + change (except for the according addressing information): + + ___ ___ ___ _______ ___ + | _ | | _ | | _ | | _ _ | | _ | + ||A|| ||B|| ||C|| ||A| |B|| ||C|| + |___| |___| |___| |_______| |___| + | | | | | + -----------------(1)- CAN bus -(2)--------------- + + To ensure that application A receives the same information in the + example (2) as it would receive in example (1) there is need for + some kind of local loopback of the sent CAN frames on the appropriate + node. + + The Linux network devices (by default) just can handle the + transmission and reception of media dependent frames. Due to the + arbitration on the CAN bus the transmission of a low prio CAN-ID + may be delayed by the reception of a high prio CAN frame. To + reflect the correct* traffic on the node the loopback of the sent + data has to be performed right after a successful transmission. If + the CAN network interface is not capable of performing the loopback for + some reason the SocketCAN core can do this task as a fallback solution. + See chapter 6.2 for details (recommended). + + The loopback functionality is enabled by default to reflect standard + networking behaviour for CAN applications. Due to some requests from + the RT-SocketCAN group the loopback optionally may be disabled for each + separate socket. See sockopts from the CAN RAW sockets in chapter 4.1. + + * = you really like to have this when you're running analyser tools + like 'candump' or 'cansniffer' on the (same) node. + + 3.3 network security issues (capabilities) + + The Controller Area Network is a local field bus transmitting only + broadcast messages without any routing and security concepts. + In the majority of cases the user application has to deal with + raw CAN frames. Therefore it might be reasonable NOT to restrict + the CAN access only to the user root, as known from other networks. + Since the currently implemented CAN_RAW and CAN_BCM sockets can only + send and receive frames to/from CAN interfaces it does not affect + security of others networks to allow all users to access the CAN. + To enable non-root users to access CAN_RAW and CAN_BCM protocol + sockets the Kconfig options CAN_RAW_USER and/or CAN_BCM_USER may be + selected at kernel compile time. + + 3.4 network problem notifications + + The use of the CAN bus may lead to several problems on the physical + and media access control layer. Detecting and logging of these lower + layer problems is a vital requirement for CAN users to identify + hardware issues on the physical transceiver layer as well as + arbitration problems and error frames caused by the different + ECUs. The occurrence of detected errors are important for diagnosis + and have to be logged together with the exact timestamp. For this + reason the CAN interface driver can generate so called Error Frames + that can optionally be passed to the user application in the same + way as other CAN frames. Whenever an error on the physical layer + or the MAC layer is detected (e.g. by the CAN controller) the driver + creates an appropriate error frame. Error frames can be requested by + the user application using the common CAN filter mechanisms. Inside + this filter definition the (interested) type of errors may be + selected. The reception of error frames is disabled by default. + +4. How to use Socket CAN +------------------------ + + Like TCP/IP, you first need to open a socket for communicating over a + CAN network. Since Socket CAN implements a new protocol family, you + need to pass PF_CAN as the first argument to the socket(2) system + call. Currently, there are two CAN protocols to choose from, the raw + socket protocol and the broadcast manager (BCM). So to open a socket, + you would write + + s = socket(PF_CAN, SOCK_RAW, CAN_RAW); + + and + + s = socket(PF_CAN, SOCK_DGRAM, CAN_BCM); + + respectively. After the successful creation of the socket, you would + normally use the bind(2) system call to bind the socket to a CAN + interface (which is different from TCP/IP due to different addressing + - see chapter 3). After binding (CAN_RAW) or connecting (CAN_BCM) + the socket, you can read(2) and write(2) from/to the socket or use + send(2), sendto(2), sendmsg(2) and the recv* counterpart operations + on the socket as usual. There are also CAN specific socket options + described below. + + The basic CAN frame structure and the sockaddr structure are defined + in include/linux/can.h: + + struct can_frame { + canid_t can_id; /* 32 bit CAN_ID + EFF/RTR/ERR flags */ + __u8 can_dlc; /* data length code: 0 .. 8 */ + __u8 data[8] __attribute__((aligned(8))); + }; + + The alignment of the (linear) payload data[] to a 64bit boundary + allows the user to define own structs and unions to easily access the + CAN payload. There is no given byteorder on the CAN bus by + default. A read(2) system call on a CAN_RAW socket transfers a + struct can_frame to the user space. + + The sockaddr_can structure has an interface index like the + PF_PACKET socket, that also binds to a specific interface: + + struct sockaddr_can { + sa_family_t can_family; + int can_ifindex; + union { + /* transport protocol class address info (e.g. ISOTP) */ + struct { canid_t rx_id, tx_id; } tp; + + /* reserved for future CAN protocols address information */ + } can_addr; + }; + + To determine the interface index an appropriate ioctl() has to + be used (example for CAN_RAW sockets without error checking): + + int s; + struct sockaddr_can addr; + struct ifreq ifr; + + s = socket(PF_CAN, SOCK_RAW, CAN_RAW); + + strcpy(ifr.ifr_name, "can0" ); + ioctl(s, SIOCGIFINDEX, &ifr); + + addr.can_family = AF_CAN; + addr.can_ifindex = ifr.ifr_ifindex; + + bind(s, (struct sockaddr *)&addr, sizeof(addr)); + + (..) + + To bind a socket to all(!) CAN interfaces the interface index must + be 0 (zero). In this case the socket receives CAN frames from every + enabled CAN interface. To determine the originating CAN interface + the system call recvfrom(2) may be used instead of read(2). To send + on a socket that is bound to 'any' interface sendto(2) is needed to + specify the outgoing interface. + + Reading CAN frames from a bound CAN_RAW socket (see above) consists + of reading a struct can_frame: + + struct can_frame frame; + + nbytes = read(s, &frame, sizeof(struct can_frame)); + + if (nbytes < 0) { + perror("can raw socket read"); + return 1; + } + + /* paraniod check ... */ + if (nbytes < sizeof(struct can_frame)) { + fprintf(stderr, "read: incomplete CAN frame\n"); + return 1; + } + + /* do something with the received CAN frame */ + + Writing CAN frames can be done similarly, with the write(2) system call: + + nbytes = write(s, &frame, sizeof(struct can_frame)); + + When the CAN interface is bound to 'any' existing CAN interface + (addr.can_ifindex = 0) it is recommended to use recvfrom(2) if the + information about the originating CAN interface is needed: + + struct sockaddr_can addr; + struct ifreq ifr; + socklen_t len = sizeof(addr); + struct can_frame frame; + + nbytes = recvfrom(s, &frame, sizeof(struct can_frame), + 0, (struct sockaddr*)&addr, &len); + + /* get interface name of the received CAN frame */ + ifr.ifr_ifindex = addr.can_ifindex; + ioctl(s, SIOCGIFNAME, &ifr); + printf("Received a CAN frame from interface %s", ifr.ifr_name); + + To write CAN frames on sockets bound to 'any' CAN interface the + outgoing interface has to be defined certainly. + + strcpy(ifr.ifr_name, "can0"); + ioctl(s, SIOCGIFINDEX, &ifr); + addr.can_ifindex = ifr.ifr_ifindex; + addr.can_family = AF_CAN; + + nbytes = sendto(s, &frame, sizeof(struct can_frame), + 0, (struct sockaddr*)&addr, sizeof(addr)); + + 4.1 RAW protocol sockets with can_filters (SOCK_RAW) + + Using CAN_RAW sockets is extensively comparable to the commonly + known access to CAN character devices. To meet the new possibilities + provided by the multi user SocketCAN approach, some reasonable + defaults are set at RAW socket binding time: + + - The filters are set to exactly one filter receiving everything + - The socket only receives valid data frames (=> no error frames) + - The loopback of sent CAN frames is enabled (see chapter 3.2) + - The socket does not receive its own sent frames (in loopback mode) + + These default settings may be changed before or after binding the socket. + To use the referenced definitions of the socket options for CAN_RAW + sockets, include <linux/can/raw.h>. + + 4.1.1 RAW socket option CAN_RAW_FILTER + + The reception of CAN frames using CAN_RAW sockets can be controlled + by defining 0 .. n filters with the CAN_RAW_FILTER socket option. + + The CAN filter structure is defined in include/linux/can.h: + + struct can_filter { + canid_t can_id; + canid_t can_mask; + }; + + A filter matches, when + + <received_can_id> & mask == can_id & mask + + which is analogous to known CAN controllers hardware filter semantics. + The filter can be inverted in this semantic, when the CAN_INV_FILTER + bit is set in can_id element of the can_filter structure. In + contrast to CAN controller hardware filters the user may set 0 .. n + receive filters for each open socket separately: + + struct can_filter rfilter[2]; + + rfilter[0].can_id = 0x123; + rfilter[0].can_mask = CAN_SFF_MASK; + rfilter[1].can_id = 0x200; + rfilter[1].can_mask = 0x700; + + setsockopt(s, SOL_CAN_RAW, CAN_RAW_FILTER, &rfilter, sizeof(rfilter)); + + To disable the reception of CAN frames on the selected CAN_RAW socket: + + setsockopt(s, SOL_CAN_RAW, CAN_RAW_FILTER, NULL, 0); + + To set the filters to zero filters is quite obsolete as not read + data causes the raw socket to discard the received CAN frames. But + having this 'send only' use-case we may remove the receive list in the + Kernel to save a little (really a very little!) CPU usage. + + 4.1.2 RAW socket option CAN_RAW_ERR_FILTER + + As described in chapter 3.4 the CAN interface driver can generate so + called Error Frames that can optionally be passed to the user + application in the same way as other CAN frames. The possible + errors are divided into different error classes that may be filtered + using the appropriate error mask. To register for every possible + error condition CAN_ERR_MASK can be used as value for the error mask. + The values for the error mask are defined in linux/can/error.h . + + can_err_mask_t err_mask = ( CAN_ERR_TX_TIMEOUT | CAN_ERR_BUSOFF ); + + setsockopt(s, SOL_CAN_RAW, CAN_RAW_ERR_FILTER, + &err_mask, sizeof(err_mask)); + + 4.1.3 RAW socket option CAN_RAW_LOOPBACK + + To meet multi user needs the local loopback is enabled by default + (see chapter 3.2 for details). But in some embedded use-cases + (e.g. when only one application uses the CAN bus) this loopback + functionality can be disabled (separately for each socket): + + int loopback = 0; /* 0 = disabled, 1 = enabled (default) */ + + setsockopt(s, SOL_CAN_RAW, CAN_RAW_LOOPBACK, &loopback, sizeof(loopback)); + + 4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS + + When the local loopback is enabled, all the sent CAN frames are + looped back to the open CAN sockets that registered for the CAN + frames' CAN-ID on this given interface to meet the multi user + needs. The reception of the CAN frames on the same socket that was + sending the CAN frame is assumed to be unwanted and therefore + disabled by default. This default behaviour may be changed on + demand: + + int recv_own_msgs = 1; /* 0 = disabled (default), 1 = enabled */ + + setsockopt(s, SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS, + &recv_own_msgs, sizeof(recv_own_msgs)); + + 4.2 Broadcast Manager protocol sockets (SOCK_DGRAM) + 4.3 connected transport protocols (SOCK_SEQPACKET) + 4.4 unconnected transport protocols (SOCK_DGRAM) + + +5. Socket CAN core module +------------------------- + + The Socket CAN core module implements the protocol family + PF_CAN. CAN protocol modules are loaded by the core module at + runtime. The core module provides an interface for CAN protocol + modules to subscribe needed CAN IDs (see chapter 3.1). + + 5.1 can.ko module params + + - stats_timer: To calculate the Socket CAN core statistics + (e.g. current/maximum frames per second) this 1 second timer is + invoked at can.ko module start time by default. This timer can be + disabled by using stattimer=0 on the module commandline. + + - debug: (removed since SocketCAN SVN r546) + + 5.2 procfs content + + As described in chapter 3.1 the Socket CAN core uses several filter + lists to deliver received CAN frames to CAN protocol modules. These + receive lists, their filters and the count of filter matches can be + checked in the appropriate receive list. All entries contain the + device and a protocol module identifier: + + foo@bar:~$ cat /proc/net/can/rcvlist_all + + receive list 'rx_all': + (vcan3: no entry) + (vcan2: no entry) + (vcan1: no entry) + device can_id can_mask function userdata matches ident + vcan0 000 00000000 f88e6370 f6c6f400 0 raw + (any: no entry) + + In this example an application requests any CAN traffic from vcan0. + + rcvlist_all - list for unfiltered entries (no filter operations) + rcvlist_eff - list for single extended frame (EFF) entries + rcvlist_err - list for error frames masks + rcvlist_fil - list for mask/value filters + rcvlist_inv - list for mask/value filters (inverse semantic) + rcvlist_sff - list for single standard frame (SFF) entries + + Additional procfs files in /proc/net/can + + stats - Socket CAN core statistics (rx/tx frames, match ratios, ...) + reset_stats - manual statistic reset + version - prints the Socket CAN core version and the ABI version + + 5.3 writing own CAN protocol modules + + To implement a new protocol in the protocol family PF_CAN a new + protocol has to be defined in include/linux/can.h . + The prototypes and definitions to use the Socket CAN core can be + accessed by including include/linux/can/core.h . + In addition to functions that register the CAN protocol and the + CAN device notifier chain there are functions to subscribe CAN + frames received by CAN interfaces and to send CAN frames: + + can_rx_register - subscribe CAN frames from a specific interface + can_rx_unregister - unsubscribe CAN frames from a specific interface + can_send - transmit a CAN frame (optional with local loopback) + + For details see the kerneldoc documentation in net/can/af_can.c or + the source code of net/can/raw.c or net/can/bcm.c . + +6. CAN network drivers +---------------------- + + Writing a CAN network device driver is much easier than writing a + CAN character device driver. Similar to other known network device + drivers you mainly have to deal with: + + - TX: Put the CAN frame from the socket buffer to the CAN controller. + - RX: Put the CAN frame from the CAN controller to the socket buffer. + + See e.g. at Documentation/networking/netdevices.txt . The differences + for writing CAN network device driver are described below: + + 6.1 general settings + + dev->type = ARPHRD_CAN; /* the netdevice hardware type */ + dev->flags = IFF_NOARP; /* CAN has no arp */ + + dev->mtu = sizeof(struct can_frame); + + The struct can_frame is the payload of each socket buffer in the + protocol family PF_CAN. + + 6.2 local loopback of sent frames + + As described in chapter 3.2 the CAN network device driver should + support a local loopback functionality similar to the local echo + e.g. of tty devices. In this case the driver flag IFF_ECHO has to be + set to prevent the PF_CAN core from locally echoing sent frames + (aka loopback) as fallback solution: + + dev->flags = (IFF_NOARP | IFF_ECHO); + + 6.3 CAN controller hardware filters + + To reduce the interrupt load on deep embedded systems some CAN + controllers support the filtering of CAN IDs or ranges of CAN IDs. + These hardware filter capabilities vary from controller to + controller and have to be identified as not feasible in a multi-user + networking approach. The use of the very controller specific + hardware filters could make sense in a very dedicated use-case, as a + filter on driver level would affect all users in the multi-user + system. The high efficient filter sets inside the PF_CAN core allow + to set different multiple filters for each socket separately. + Therefore the use of hardware filters goes to the category 'handmade + tuning on deep embedded systems'. The author is running a MPC603e + @133MHz with four SJA1000 CAN controllers from 2002 under heavy bus + load without any problems ... + + 6.4 The virtual CAN driver (vcan) + + Similar to the network loopback devices, vcan offers a virtual local + CAN interface. A full qualified address on CAN consists of + + - a unique CAN Identifier (CAN ID) + - the CAN bus this CAN ID is transmitted on (e.g. can0) + + so in common use cases more than one virtual CAN interface is needed. + + The virtual CAN interfaces allow the transmission and reception of CAN + frames without real CAN controller hardware. Virtual CAN network + devices are usually named 'vcanX', like vcan0 vcan1 vcan2 ... + When compiled as a module the virtual CAN driver module is called vcan.ko + + Since Linux Kernel version 2.6.24 the vcan driver supports the Kernel + netlink interface to create vcan network devices. The creation and + removal of vcan network devices can be managed with the ip(8) tool: + + - Create a virtual CAN network interface: + ip link add type vcan + + - Create a virtual CAN network interface with a specific name 'vcan42': + ip link add dev vcan42 type vcan + + - Remove a (virtual CAN) network interface 'vcan42': + ip link del vcan42 + + The tool 'vcan' from the SocketCAN SVN repository on BerliOS is obsolete. + + Virtual CAN network device creation in older Kernels: + In Linux Kernel versions < 2.6.24 the vcan driver creates 4 vcan + netdevices at module load time by default. This value can be changed + with the module parameter 'numdev'. E.g. 'modprobe vcan numdev=8' + + 6.5 currently supported CAN hardware + + On the project website http://developer.berlios.de/projects/socketcan + there are different drivers available: + + vcan: Virtual CAN interface driver (if no real hardware is available) + sja1000: Philips SJA1000 CAN controller (recommended) + i82527: Intel i82527 CAN controller + mscan: Motorola/Freescale CAN controller (e.g. inside SOC MPC5200) + ccan: CCAN controller core (e.g. inside SOC h7202) + slcan: For a bunch of CAN adaptors that are attached via a + serial line ASCII protocol (for serial / USB adaptors) + + Additionally the different CAN adaptors (ISA/PCI/PCMCIA/USB/Parport) + from PEAK Systemtechnik support the CAN netdevice driver model + since Linux driver v6.0: http://www.peak-system.com/linux/index.htm + + Please check the Mailing Lists on the berlios OSS project website. + + 6.6 todo + + The configuration interface for CAN network drivers is still an open + issue that has not been finalized in the socketcan project. Also the + idea of having a library module (candev.ko) that holds functions + that are needed by all CAN netdevices is not ready to ship. + Your contribution is welcome. + +7. Credits +---------- + + Oliver Hartkopp (PF_CAN core, filters, drivers, bcm) + Urs Thuermann (PF_CAN core, kernel integration, socket interfaces, raw, vcan) + Jan Kizka (RT-SocketCAN core, Socket-API reconciliation) + Wolfgang Grandegger (RT-SocketCAN core & drivers, Raw Socket-API reviews) + Robert Schwebel (design reviews, PTXdist integration) + Marc Kleine-Budde (design reviews, Kernel 2.6 cleanups, drivers) + Benedikt Spranger (reviews) + Thomas Gleixner (LKML reviews, coding style, posting hints) + Andrey Volkov (kernel subtree structure, ioctls, mscan driver) + Matthias Brukner (first SJA1000 CAN netdevice implementation Q2/2003) + Klaus Hitschler (PEAK driver integration) + Uwe Koppe (CAN netdevices with PF_PACKET approach) + Michael Schulze (driver layer loopback requirement, RT CAN drivers review) diff --git a/Documentation/networking/cops.txt b/Documentation/networking/cops.txt new file mode 100644 index 0000000..3e344b4 --- /dev/null +++ b/Documentation/networking/cops.txt @@ -0,0 +1,63 @@ +Text File for the COPS LocalTalk Linux driver (cops.c). + By Jay Schulist <jschlst@samba.org> + +This driver has two modes and they are: Dayna mode and Tangent mode. +Each mode corresponds with the type of card. It has been found +that there are 2 main types of cards and all other cards are +the same and just have different names or only have minor differences +such as more IO ports. As this driver is tested it will +become more clear exactly what cards are supported. + +Right now these cards are known to work with the COPS driver. The +LT-200 cards work in a somewhat more limited capacity than the +DL200 cards, which work very well and are in use by many people. + +TANGENT driver mode: + Tangent ATB-II, Novell NL-1000, Daystar Digital LT-200 +DAYNA driver mode: + Dayna DL2000/DaynaTalk PC (Half Length), COPS LT-95, + Farallon PhoneNET PC III, Farallon PhoneNET PC II +Other cards possibly supported mode unknown though: + Dayna DL2000 (Full length) + +The COPS driver defaults to using Dayna mode. To change the driver's +mode if you built a driver with dual support use board_type=1 or +board_type=2 for Dayna or Tangent with insmod. + +** Operation/loading of the driver. +Use modprobe like this: /sbin/modprobe cops.o (IO #) (IRQ #) +If you do not specify any options the driver will try and use the IO = 0x240, +IRQ = 5. As of right now I would only use IRQ 5 for the card, if autoprobing. + +To load multiple COPS driver Localtalk cards you can do one of the following. + +insmod cops io=0x240 irq=5 +insmod -o cops2 cops io=0x260 irq=3 + +Or in lilo.conf put something like this: + append="ether=5,0x240,lt0 ether=3,0x260,lt1" + +Then bring up the interface with ifconfig. It will look something like this: +lt0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-F7-00-00-00-00-00-00-00-00 + inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0 + UP BROADCAST RUNNING NOARP MULTICAST MTU:600 Metric:1 + RX packets:0 errors:0 dropped:0 overruns:0 frame:0 + TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 coll:0 + +** Netatalk Configuration +You will need to configure atalkd with something like the following to make +it work with the cops.c driver. + +* For single LTalk card use. +dummy -seed -phase 2 -net 2000 -addr 2000.10 -zone "1033" +lt0 -seed -phase 1 -net 1000 -addr 1000.50 -zone "1033" + +* For multiple cards, Ethernet and LocalTalk. +eth0 -seed -phase 2 -net 3000 -addr 3000.20 -zone "1033" +lt0 -seed -phase 1 -net 1000 -addr 1000.50 -zone "1033" + +* For multiple LocalTalk cards, and an Ethernet card. +* Order seems to matter here, Ethernet last. +lt0 -seed -phase 1 -net 1000 -addr 1000.10 -zone "LocalTalk1" +lt1 -seed -phase 1 -net 2000 -addr 2000.20 -zone "LocalTalk2" +eth0 -seed -phase 2 -net 3000 -addr 3000.30 -zone "EtherTalk" diff --git a/Documentation/networking/cs89x0.txt b/Documentation/networking/cs89x0.txt new file mode 100644 index 0000000..c725d33 --- /dev/null +++ b/Documentation/networking/cs89x0.txt @@ -0,0 +1,703 @@ + +NOTE +---- + +This document was contributed by Cirrus Logic for kernel 2.2.5. This version +has been updated for 2.3.48 by Andrew Morton. + +Cirrus make a copy of this driver available at their website, as +described below. In general, you should use the driver version which +comes with your Linux distribution. + + + +CIRRUS LOGIC LAN CS8900/CS8920 ETHERNET ADAPTERS +Linux Network Interface Driver ver. 2.00 <kernel 2.3.48> +=============================================================================== + + +TABLE OF CONTENTS + +1.0 CIRRUS LOGIC LAN CS8900/CS8920 ETHERNET ADAPTERS + 1.1 Product Overview + 1.2 Driver Description + 1.2.1 Driver Name + 1.2.2 File in the Driver Package + 1.3 System Requirements + 1.4 Licensing Information + +2.0 ADAPTER INSTALLATION and CONFIGURATION + 2.1 CS8900-based Adapter Configuration + 2.2 CS8920-based Adapter Configuration + +3.0 LOADING THE DRIVER AS A MODULE + +4.0 COMPILING THE DRIVER + 4.1 Compiling the Driver as a Loadable Module + 4.2 Compiling the driver to support memory mode + 4.3 Compiling the driver to support Rx DMA + 4.4 Compiling the Driver into the Kernel + +5.0 TESTING AND TROUBLESHOOTING + 5.1 Known Defects and Limitations + 5.2 Testing the Adapter + 5.2.1 Diagnostic Self-Test + 5.2.2 Diagnostic Network Test + 5.3 Using the Adapter's LEDs + 5.4 Resolving I/O Conflicts + +6.0 TECHNICAL SUPPORT + 6.1 Contacting Cirrus Logic's Technical Support + 6.2 Information Required Before Contacting Technical Support + 6.3 Obtaining the Latest Driver Version + 6.4 Current maintainer + 6.5 Kernel boot parameters + + +1.0 CIRRUS LOGIC LAN CS8900/CS8920 ETHERNET ADAPTERS +=============================================================================== + + +1.1 PRODUCT OVERVIEW + +The CS8900-based ISA Ethernet Adapters from Cirrus Logic follow +IEEE 802.3 standards and support half or full-duplex operation in ISA bus +computers on 10 Mbps Ethernet networks. The adapters are designed for operation +in 16-bit ISA or EISA bus expansion slots and are available in +10BaseT-only or 3-media configurations (10BaseT, 10Base2, and AUI for 10Base-5 +or fiber networks). + +CS8920-based adapters are similar to the CS8900-based adapter with additional +features for Plug and Play (PnP) support and Wakeup Frame recognition. As +such, the configuration procedures differ somewhat between the two types of +adapters. Refer to the "Adapter Configuration" section for details on +configuring both types of adapters. + + +1.2 DRIVER DESCRIPTION + +The CS8900/CS8920 Ethernet Adapter driver for Linux supports the Linux +v2.3.48 or greater kernel. It can be compiled directly into the kernel +or loaded at run-time as a device driver module. + +1.2.1 Driver Name: cs89x0 + +1.2.2 Files in the Driver Archive: + +The files in the driver at Cirrus' website include: + + readme.txt - this file + build - batch file to compile cs89x0.c. + cs89x0.c - driver C code + cs89x0.h - driver header file + cs89x0.o - pre-compiled module (for v2.2.5 kernel) + config/Config.in - sample file to include cs89x0 driver in the kernel. + config/Makefile - sample file to include cs89x0 driver in the kernel. + config/Space.c - sample file to include cs89x0 driver in the kernel. + + + +1.3 SYSTEM REQUIREMENTS + +The following hardware is required: + + * Cirrus Logic LAN (CS8900/20-based) Ethernet ISA Adapter + + * IBM or IBM-compatible PC with: + * An 80386 or higher processor + * 16 bytes of contiguous IO space available between 210h - 370h + * One available IRQ (5,10,11,or 12 for the CS8900, 3-7,9-15 for CS8920). + + * Appropriate cable (and connector for AUI, 10BASE-2) for your network + topology. + +The following software is required: + +* LINUX kernel version 2.3.48 or higher + + * CS8900/20 Setup Utility (DOS-based) + + * LINUX kernel sources for your kernel (if compiling into kernel) + + * GNU Toolkit (gcc and make) v2.6 or above (if compiling into kernel + or a module) + + + +1.4 LICENSING INFORMATION + +This program is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free Software +Foundation, version 1. + +This program is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +more details. + +For a full copy of the GNU General Public License, write to the Free Software +Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + + + +2.0 ADAPTER INSTALLATION and CONFIGURATION +=============================================================================== + +Both the CS8900 and CS8920-based adapters can be configured using parameters +stored in an on-board EEPROM. You must use the DOS-based CS8900/20 Setup +Utility if you want to change the adapter's configuration in EEPROM. + +When loading the driver as a module, you can specify many of the adapter's +configuration parameters on the command-line to override the EEPROM's settings +or for interface configuration when an EEPROM is not used. (CS8920-based +adapters must use an EEPROM.) See Section 3.0 LOADING THE DRIVER AS A MODULE. + +Since the CS8900/20 Setup Utility is a DOS-based application, you must install +and configure the adapter in a DOS-based system using the CS8900/20 Setup +Utility before installation in the target LINUX system. (Not required if +installing a CS8900-based adapter and the default configuration is acceptable.) + + +2.1 CS8900-BASED ADAPTER CONFIGURATION + +CS8900-based adapters shipped from Cirrus Logic have been configured +with the following "default" settings: + + Operation Mode: Memory Mode + IRQ: 10 + Base I/O Address: 300 + Memory Base Address: D0000 + Optimization: DOS Client + Transmission Mode: Half-duplex + BootProm: None + Media Type: Autodetect (3-media cards) or + 10BASE-T (10BASE-T only adapter) + +You should only change the default configuration settings if conflicts with +another adapter exists. To change the adapter's configuration, run the +CS8900/20 Setup Utility. + + +2.2 CS8920-BASED ADAPTER CONFIGURATION + +CS8920-based adapters are shipped from Cirrus Logic configured as Plug +and Play (PnP) enabled. However, since the cs89x0 driver does NOT +support PnP, you must install the CS8920 adapter in a DOS-based PC and +run the CS8900/20 Setup Utility to disable PnP and configure the +adapter before installation in the target Linux system. Failure to do +this will leave the adapter inactive and the driver will be unable to +communicate with the adapter. + + + **************************************************************** + * CS8920-BASED ADAPTERS: * + * * + * CS8920-BASED ADAPTERS ARE PLUG and PLAY ENABLED BY DEFAULT. * + * THE CS89X0 DRIVER DOES NOT SUPPORT PnP. THEREFORE, YOU MUST * + * RUN THE CS8900/20 SETUP UTILITY TO DISABLE PnP SUPPORT AND * + * TO ACTIVATE THE ADAPTER. * + **************************************************************** + + + + +3.0 LOADING THE DRIVER AS A MODULE +=============================================================================== + +If the driver is compiled as a loadable module, you can load the driver module +with the 'modprobe' command. Many of the adapter's configuration parameters can +be specified as command-line arguments to the load command. This facility +provides a means to override the EEPROM's settings or for interface +configuration when an EEPROM is not used. + +Example: + + insmod cs89x0.o io=0x200 irq=0xA media=aui + +This example loads the module and configures the adapter to use an IO port base +address of 200h, interrupt 10, and use the AUI media connection. The following +configuration options are available on the command line: + +* io=### - specify IO address (200h-360h) +* irq=## - specify interrupt level +* use_dma=1 - Enable DMA +* dma=# - specify dma channel (Driver is compiled to support + Rx DMA only) +* dmasize=# (16 or 64) - DMA size 16K or 64K. Default value is set to 16. +* media=rj45 - specify media type + or media=bnc + or media=aui + or media=auto +* duplex=full - specify forced half/full/autonegotiate duplex + or duplex=half + or duplex=auto +* debug=# - debug level (only available if the driver was compiled + for debugging) + +NOTES: + +a) If an EEPROM is present, any specified command-line parameter + will override the corresponding configuration value stored in + EEPROM. + +b) The "io" parameter must be specified on the command-line. + +c) The driver's hardware probe routine is designed to avoid + writing to I/O space until it knows that there is a cs89x0 + card at the written addresses. This could cause problems + with device probing. To avoid this behaviour, add one + to the `io=' module parameter. This doesn't actually change + the I/O address, but it is a flag to tell the driver + to partially initialise the hardware before trying to + identify the card. This could be dangerous if you are + not sure that there is a cs89x0 card at the provided address. + + For example, to scan for an adapter located at IO base 0x300, + specify an IO address of 0x301. + +d) The "duplex=auto" parameter is only supported for the CS8920. + +e) The minimum command-line configuration required if an EEPROM is + not present is: + + io + irq + media type (no autodetect) + +f) The following additional parameters are CS89XX defaults (values + used with no EEPROM or command-line argument). + + * DMA Burst = enabled + * IOCHRDY Enabled = enabled + * UseSA = enabled + * CS8900 defaults to half-duplex if not specified on command-line + * CS8920 defaults to autoneg if not specified on command-line + * Use reset defaults for other config parameters + * dma_mode = 0 + +g) You can use ifconfig to set the adapter's Ethernet address. + +h) Many Linux distributions use the 'modprobe' command to load + modules. This program uses the '/etc/conf.modules' file to + determine configuration information which is passed to a driver + module when it is loaded. All the configuration options which are + described above may be placed within /etc/conf.modules. + + For example: + + > cat /etc/conf.modules + ... + alias eth0 cs89x0 + options cs89x0 io=0x0200 dma=5 use_dma=1 + ... + + In this example we are telling the module system that the + ethernet driver for this machine should use the cs89x0 driver. We + are asking 'modprobe' to pass the 'io', 'dma' and 'use_dma' + arguments to the driver when it is loaded. + +i) Cirrus recommend that the cs89x0 use the ISA DMA channels 5, 6 or + 7. You will probably find that other DMA channels will not work. + +j) The cs89x0 supports DMA for receiving only. DMA mode is + significantly more efficient. Flooding a 400 MHz Celeron machine + with large ping packets consumes 82% of its CPU capacity in non-DMA + mode. With DMA this is reduced to 45%. + +k) If your Linux kernel was compiled with inbuilt plug-and-play + support you will be able to find information about the cs89x0 card + with the command + + cat /proc/isapnp + +l) If during DMA operation you find erratic behavior or network data + corruption you should use your PC's BIOS to slow the EISA bus clock. + +m) If the cs89x0 driver is compiled directly into the kernel + (non-modular) then its I/O address is automatically determined by + ISA bus probing. The IRQ number, media options, etc are determined + from the card's EEPROM. + +n) If the cs89x0 driver is compiled directly into the kernel, DMA + mode may be selected by providing the kernel with a boot option + 'cs89x0_dma=N' where 'N' is the desired DMA channel number (5, 6 or 7). + + Kernel boot options may be provided on the LILO command line: + + LILO boot: linux cs89x0_dma=5 + + or they may be placed in /etc/lilo.conf: + + image=/boot/bzImage-2.3.48 + append="cs89x0_dma=5" + label=linux + root=/dev/hda5 + read-only + + The DMA Rx buffer size is hardwired to 16 kbytes in this mode. + (64k mode is not available). + + +4.0 COMPILING THE DRIVER +=============================================================================== + +The cs89x0 driver can be compiled directly into the kernel or compiled into +a loadable device driver module. + + +4.1 COMPILING THE DRIVER AS A LOADABLE MODULE + +To compile the driver into a loadable module, use the following command +(single command line, without quotes): + +"gcc -D__KERNEL__ -I/usr/src/linux/include -I/usr/src/linux/net/inet -Wall +-Wstrict-prototypes -O2 -fomit-frame-pointer -DMODULE -DCONFIG_MODVERSIONS +-c cs89x0.c" + +4.2 COMPILING THE DRIVER TO SUPPORT MEMORY MODE + +Support for memory mode was not carried over into the 2.3 series kernels. + +4.3 COMPILING THE DRIVER TO SUPPORT Rx DMA + +The compile-time optionality for DMA was removed in the 2.3 kernel +series. DMA support is now unconditionally part of the driver. It is +enabled by the 'use_dma=1' module option. + +4.4 COMPILING THE DRIVER INTO THE KERNEL + +If your Linux distribution already has support for the cs89x0 driver +then simply copy the source file to the /usr/src/linux/drivers/net +directory to replace the original ones and run the make utility to +rebuild the kernel. See Step 3 for rebuilding the kernel. + +If your Linux does not include the cs89x0 driver, you need to edit three +configuration files, copy the source file to the /usr/src/linux/drivers/net +directory, and then run the make utility to rebuild the kernel. + +1. Edit the following configuration files by adding the statements as +indicated. (When possible, try to locate the added text to the section of the +file containing similar statements). + + +a.) In /usr/src/linux/drivers/net/Config.in, add: + +tristate 'CS89x0 support' CONFIG_CS89x0 + +Example: + + if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then + tristate 'ICL EtherTeam 16i/32 support' CONFIG_ETH16I + fi + + tristate 'CS89x0 support' CONFIG_CS89x0 + + tristate 'NE2000/NE1000 support' CONFIG_NE2000 + if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then + tristate 'NI5210 support' CONFIG_NI52 + + +b.) In /usr/src/linux/drivers/net/Makefile, add the following lines: + +ifeq ($(CONFIG_CS89x0),y) +L_OBJS += cs89x0.o +else + ifeq ($(CONFIG_CS89x0),m) + M_OBJS += cs89x0.o + endif +endif + + +c.) In /linux/drivers/net/Space.c file, add the line: + +extern int cs89x0_probe(struct device *dev); + + +Example: + + extern int ultra_probe(struct device *dev); + extern int wd_probe(struct device *dev); + extern int el2_probe(struct device *dev); + + extern int cs89x0_probe(struct device *dev); + + extern int ne_probe(struct device *dev); + extern int hp_probe(struct device *dev); + extern int hp_plus_probe(struct device *dev); + + +Also add: + + #ifdef CONFIG_CS89x0 + { cs89x0_probe,0 }, + #endif + + +2.) Copy the driver source files (cs89x0.c and cs89x0.h) +into the /usr/src/linux/drivers/net directory. + + +3.) Go to /usr/src/linux directory and run 'make config' followed by 'make' +(or make bzImage) to rebuild the kernel. + +4.) Use the DOS 'setup' utility to disable plug and play on the NIC. + + +5.0 TESTING AND TROUBLESHOOTING +=============================================================================== + +5.1 KNOWN DEFECTS and LIMITATIONS + +Refer to the RELEASE.TXT file distributed as part of this archive for a list of +known defects, driver limitations, and work arounds. + + +5.2 TESTING THE ADAPTER + +Once the adapter has been installed and configured, the diagnostic option of +the CS8900/20 Setup Utility can be used to test the functionality of the +adapter and its network connection. Use the diagnostics 'Self Test' option to +test the functionality of the adapter with the hardware configuration you have +assigned. You can use the diagnostics 'Network Test' to test the ability of the +adapter to communicate across the Ethernet with another PC equipped with a +CS8900/20-based adapter card (it must also be running the CS8900/20 Setup +Utility). + + NOTE: The Setup Utility's diagnostics are designed to run in a + DOS-only operating system environment. DO NOT run the diagnostics + from a DOS or command prompt session under Windows 95, Windows NT, + OS/2, or other operating system. + +To run the diagnostics tests on the CS8900/20 adapter: + + 1.) Boot DOS on the PC and start the CS8900/20 Setup Utility. + + 2.) The adapter's current configuration is displayed. Hit the ENTER key to + get to the main menu. + + 4.) Select 'Diagnostics' (ALT-G) from the main menu. + * Select 'Self-Test' to test the adapter's basic functionality. + * Select 'Network Test' to test the network connection and cabling. + + +5.2.1 DIAGNOSTIC SELF-TEST + +The diagnostic self-test checks the adapter's basic functionality as well as +its ability to communicate across the ISA bus based on the system resources +assigned during hardware configuration. The following tests are performed: + + * IO Register Read/Write Test + The IO Register Read/Write test insures that the CS8900/20 can be + accessed in IO mode, and that the IO base address is correct. + + * Shared Memory Test + The Shared Memory test insures the CS8900/20 can be accessed in memory + mode and that the range of memory addresses assigned does not conflict + with other devices in the system. + + * Interrupt Test + The Interrupt test insures there are no conflicts with the assigned IRQ + signal. + + * EEPROM Test + The EEPROM test insures the EEPROM can be read. + + * Chip RAM Test + The Chip RAM test insures the 4K of memory internal to the CS8900/20 is + working properly. + + * Internal Loop-back Test + The Internal Loop Back test insures the adapter's transmitter and + receiver are operating properly. If this test fails, make sure the + adapter's cable is connected to the network (check for LED activity for + example). + + * Boot PROM Test + The Boot PROM test insures the Boot PROM is present, and can be read. + Failure indicates the Boot PROM was not successfully read due to a + hardware problem or due to a conflicts on the Boot PROM address + assignment. (Test only applies if the adapter is configured to use the + Boot PROM option.) + +Failure of a test item indicates a possible system resource conflict with +another device on the ISA bus. In this case, you should use the Manual Setup +option to reconfigure the adapter by selecting a different value for the system +resource that failed. + + +5.2.2 DIAGNOSTIC NETWORK TEST + +The Diagnostic Network Test verifies a working network connection by +transferring data between two CS8900/20 adapters installed in different PCs +on the same network. (Note: the diagnostic network test should not be run +between two nodes across a router.) + +This test requires that each of the two PCs have a CS8900/20-based adapter +installed and have the CS8900/20 Setup Utility running. The first PC is +configured as a Responder and the other PC is configured as an Initiator. +Once the Initiator is started, it sends data frames to the Responder which +returns the frames to the Initiator. + +The total number of frames received and transmitted are displayed on the +Initiator's display, along with a count of the number of frames received and +transmitted OK or in error. The test can be terminated anytime by the user at +either PC. + +To setup the Diagnostic Network Test: + + 1.) Select a PC with a CS8900/20-based adapter and a known working network + connection to act as the Responder. Run the CS8900/20 Setup Utility + and select 'Diagnostics -> Network Test -> Responder' from the main + menu. Hit ENTER to start the Responder. + + 2.) Return to the PC with the CS8900/20-based adapter you want to test and + start the CS8900/20 Setup Utility. + + 3.) From the main menu, Select 'Diagnostic -> Network Test -> Initiator'. + Hit ENTER to start the test. + +You may stop the test on the Initiator at any time while allowing the Responder +to continue running. In this manner, you can move to additional PCs and test +them by starting the Initiator on another PC without having to stop/start the +Responder. + + + +5.3 USING THE ADAPTER'S LEDs + +The 2 and 3-media adapters have two LEDs visible on the back end of the board +located near the 10Base-T connector. + +Link Integrity LED: A "steady" ON of the green LED indicates a valid 10Base-T +connection. (Only applies to 10Base-T. The green LED has no significance for +a 10Base-2 or AUI connection.) + +TX/RX LED: The yellow LED lights briefly each time the adapter transmits or +receives data. (The yellow LED will appear to "flicker" on a typical network.) + + +5.4 RESOLVING I/O CONFLICTS + +An IO conflict occurs when two or more adapter use the same ISA resource (IO +address, memory address or IRQ). You can usually detect an IO conflict in one +of four ways after installing and or configuring the CS8900/20-based adapter: + + 1.) The system does not boot properly (or at all). + + 2.) The driver cannot communicate with the adapter, reporting an "Adapter + not found" error message. + + 3.) You cannot connect to the network or the driver will not load. + + 4.) If you have configured the adapter to run in memory mode but the driver + reports it is using IO mode when loading, this is an indication of a + memory address conflict. + +If an IO conflict occurs, run the CS8900/20 Setup Utility and perform a +diagnostic self-test. Normally, the ISA resource in conflict will fail the +self-test. If so, reconfigure the adapter selecting another choice for the +resource in conflict. Run the diagnostics again to check for further IO +conflicts. + +In some cases, such as when the PC will not boot, it may be necessary to remove +the adapter and reconfigure it by installing it in another PC to run the +CS8900/20 Setup Utility. Once reinstalled in the target system, run the +diagnostics self-test to ensure the new configuration is free of conflicts +before loading the driver again. + +When manually configuring the adapter, keep in mind the typical ISA system +resource usage as indicated in the tables below. + +I/O Address Device IRQ Device +----------- -------- --- -------- + 200-20F Game I/O adapter 3 COM2, Bus Mouse + 230-23F Bus Mouse 4 COM1 + 270-27F LPT3: third parallel port 5 LPT2 + 2F0-2FF COM2: second serial port 6 Floppy Disk controller + 320-32F Fixed disk controller 7 LPT1 + 8 Real-time Clock + 9 EGA/VGA display adapter + 12 Mouse (PS/2) +Memory Address Device 13 Math Coprocessor +-------------- --------------------- 14 Hard Disk controller +A000-BFFF EGA Graphics Adapter +A000-C7FF VGA Graphics Adapter +B000-BFFF Mono Graphics Adapter +B800-BFFF Color Graphics Adapter +E000-FFFF AT BIOS + + + + +6.0 TECHNICAL SUPPORT +=============================================================================== + +6.1 CONTACTING CIRRUS LOGIC'S TECHNICAL SUPPORT + +Cirrus Logic's CS89XX Technical Application Support can be reached at: + +Telephone :(800) 888-5016 (from inside U.S. and Canada) + :(512) 442-7555 (from outside the U.S. and Canada) +Fax :(512) 912-3871 +Email :ethernet@crystal.cirrus.com +WWW :http://www.cirrus.com + + +6.2 INFORMATION REQUIRED BEFORE CONTACTING TECHNICAL SUPPORT + +Before contacting Cirrus Logic for technical support, be prepared to provide as +Much of the following information as possible. + +1.) Adapter type (CRD8900, CDB8900, CDB8920, etc.) + +2.) Adapter configuration + + * IO Base, Memory Base, IO or memory mode enabled, IRQ, DMA channel + * Plug and Play enabled/disabled (CS8920-based adapters only) + * Configured for media auto-detect or specific media type (which type). + +3.) PC System's Configuration + + * Plug and Play system (yes/no) + * BIOS (make and version) + * System make and model + * CPU (type and speed) + * System RAM + * SCSI Adapter + +4.) Software + + * CS89XX driver and version + * Your network operating system and version + * Your system's OS version + * Version of all protocol support files + +5.) Any Error Message displayed. + + + +6.3 OBTAINING THE LATEST DRIVER VERSION + +You can obtain the latest CS89XX drivers and support software from Cirrus Logic's +Web site. You can also contact Cirrus Logic's Technical Support (email: +ethernet@crystal.cirrus.com) and request that you be registered for automatic +software-update notification. + +Cirrus Logic maintains a web page at http://www.cirrus.com with the +latest drivers and technical publications. + + +6.4 Current maintainer + +In February 2000 the maintenance of this driver was assumed by Andrew +Morton. + +6.5 Kernel module parameters + +For use in embedded environments with no cs89x0 EEPROM, the kernel boot +parameter `cs89x0_media=' has been implemented. Usage is: + + cs89x0_media=rj45 or + cs89x0_media=aui or + cs89x0_media=bnc + diff --git a/Documentation/networking/cxacru.txt b/Documentation/networking/cxacru.txt new file mode 100644 index 0000000..b074681 --- /dev/null +++ b/Documentation/networking/cxacru.txt @@ -0,0 +1,84 @@ +Firmware is required for this device: http://accessrunner.sourceforge.net/ + +While it is capable of managing/maintaining the ADSL connection without the +module loaded, the device will sometimes stop responding after unloading the +driver and it is necessary to unplug/remove power to the device to fix this. + +Detected devices will appear as ATM devices named "cxacru". In /sys/class/atm/ +these are directories named cxacruN where N is the device number. A symlink +named device points to the USB interface device's directory which contains +several sysfs attribute files for retrieving device statistics: + +* adsl_controller_version + +* adsl_headend +* adsl_headend_environment + Information about the remote headend. + +* downstream_attenuation (dB) +* downstream_bits_per_frame +* downstream_rate (kbps) +* downstream_snr_margin (dB) + Downstream stats. + +* upstream_attenuation (dB) +* upstream_bits_per_frame +* upstream_rate (kbps) +* upstream_snr_margin (dB) +* transmitter_power (dBm/Hz) + Upstream stats. + +* downstream_crc_errors +* downstream_fec_errors +* downstream_hec_errors +* upstream_crc_errors +* upstream_fec_errors +* upstream_hec_errors + Error counts. + +* line_startable + Indicates that ADSL support on the device + is/can be enabled, see adsl_start. + +* line_status + "initialising" + "down" + "attempting to activate" + "training" + "channel analysis" + "exchange" + "waiting" + "up" + + Changes between "down" and "attempting to activate" + if there is no signal. + +* link_status + "not connected" + "connected" + "lost" + +* mac_address + +* modulation + "ANSI T1.413" + "ITU-T G.992.1 (G.DMT)" + "ITU-T G.992.2 (G.LITE)" + +* startup_attempts + Count of total attempts to initialise ADSL. + +To enable/disable ADSL, the following can be written to the adsl_state file: + "start" + "stop + "restart" (stops, waits 1.5s, then starts) + "poll" (used to resume status polling if it was disabled due to failure) + +Changes in adsl/line state are reported via kernel log messages: + [4942145.150704] ATM dev 0: ADSL state: running + [4942243.663766] ATM dev 0: ADSL line: down + [4942249.665075] ATM dev 0: ADSL line: attempting to activate + [4942253.654954] ATM dev 0: ADSL line: training + [4942255.666387] ATM dev 0: ADSL line: channel analysis + [4942259.656262] ATM dev 0: ADSL line: exchange + [2635357.696901] ATM dev 0: ADSL line: up (8128 kb/s down | 832 kb/s up) diff --git a/Documentation/networking/cxgb.txt b/Documentation/networking/cxgb.txt new file mode 100644 index 0000000..20a8876 --- /dev/null +++ b/Documentation/networking/cxgb.txt @@ -0,0 +1,352 @@ + Chelsio N210 10Gb Ethernet Network Controller + + Driver Release Notes for Linux + + Version 2.1.1 + + June 20, 2005 + +CONTENTS +======== + INTRODUCTION + FEATURES + PERFORMANCE + DRIVER MESSAGES + KNOWN ISSUES + SUPPORT + + +INTRODUCTION +============ + + This document describes the Linux driver for Chelsio 10Gb Ethernet Network + Controller. This driver supports the Chelsio N210 NIC and is backward + compatible with the Chelsio N110 model 10Gb NICs. + + +FEATURES +======== + + Adaptive Interrupts (adaptive-rx) + --------------------------------- + + This feature provides an adaptive algorithm that adjusts the interrupt + coalescing parameters, allowing the driver to dynamically adapt the latency + settings to achieve the highest performance during various types of network + load. + + The interface used to control this feature is ethtool. Please see the + ethtool manpage for additional usage information. + + By default, adaptive-rx is disabled. + To enable adaptive-rx: + + ethtool -C <interface> adaptive-rx on + + To disable adaptive-rx, use ethtool: + + ethtool -C <interface> adaptive-rx off + + After disabling adaptive-rx, the timer latency value will be set to 50us. + You may set the timer latency after disabling adaptive-rx: + + ethtool -C <interface> rx-usecs <microseconds> + + An example to set the timer latency value to 100us on eth0: + + ethtool -C eth0 rx-usecs 100 + + You may also provide a timer latency value while disabling adaptive-rx: + + ethtool -C <interface> adaptive-rx off rx-usecs <microseconds> + + If adaptive-rx is disabled and a timer latency value is specified, the timer + will be set to the specified value until changed by the user or until + adaptive-rx is enabled. + + To view the status of the adaptive-rx and timer latency values: + + ethtool -c <interface> + + + TCP Segmentation Offloading (TSO) Support + ----------------------------------------- + + This feature, also known as "large send", enables a system's protocol stack + to offload portions of outbound TCP processing to a network interface card + thereby reducing system CPU utilization and enhancing performance. + + The interface used to control this feature is ethtool version 1.8 or higher. + Please see the ethtool manpage for additional usage information. + + By default, TSO is enabled. + To disable TSO: + + ethtool -K <interface> tso off + + To enable TSO: + + ethtool -K <interface> tso on + + To view the status of TSO: + + ethtool -k <interface> + + +PERFORMANCE +=========== + + The following information is provided as an example of how to change system + parameters for "performance tuning" an what value to use. You may or may not + want to change these system parameters, depending on your server/workstation + application. Doing so is not warranted in any way by Chelsio Communications, + and is done at "YOUR OWN RISK". Chelsio will not be held responsible for loss + of data or damage to equipment. + + Your distribution may have a different way of doing things, or you may prefer + a different method. These commands are shown only to provide an example of + what to do and are by no means definitive. + + Making any of the following system changes will only last until you reboot + your system. You may want to write a script that runs at boot-up which + includes the optimal settings for your system. + + Setting PCI Latency Timer: + setpci -d 1425:* 0x0c.l=0x0000F800 + + Disabling TCP timestamp: + sysctl -w net.ipv4.tcp_timestamps=0 + + Disabling SACK: + sysctl -w net.ipv4.tcp_sack=0 + + Setting large number of incoming connection requests: + sysctl -w net.ipv4.tcp_max_syn_backlog=3000 + + Setting maximum receive socket buffer size: + sysctl -w net.core.rmem_max=1024000 + + Setting maximum send socket buffer size: + sysctl -w net.core.wmem_max=1024000 + + Set smp_affinity (on a multiprocessor system) to a single CPU: + echo 1 > /proc/irq/<interrupt_number>/smp_affinity + + Setting default receive socket buffer size: + sysctl -w net.core.rmem_default=524287 + + Setting default send socket buffer size: + sysctl -w net.core.wmem_default=524287 + + Setting maximum option memory buffers: + sysctl -w net.core.optmem_max=524287 + + Setting maximum backlog (# of unprocessed packets before kernel drops): + sysctl -w net.core.netdev_max_backlog=300000 + + Setting TCP read buffers (min/default/max): + sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000" + + Setting TCP write buffers (min/pressure/max): + sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000" + + Setting TCP buffer space (min/pressure/max): + sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000" + + TCP window size for single connections: + The receive buffer (RX_WINDOW) size must be at least as large as the + Bandwidth-Delay Product of the communication link between the sender and + receiver. Due to the variations of RTT, you may want to increase the buffer + size up to 2 times the Bandwidth-Delay Product. Reference page 289 of + "TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens. + At 10Gb speeds, use the following formula: + RX_WINDOW >= 1.25MBytes * RTT(in milliseconds) + Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000 + RX_WINDOW sizes of 256KB - 512KB should be sufficient. + Setting the min, max, and default receive buffer (RX_WINDOW) size: + sysctl -w net.ipv4.tcp_rmem="<min> <default> <max>" + + TCP window size for multiple connections: + The receive buffer (RX_WINDOW) size may be calculated the same as single + connections, but should be divided by the number of connections. The + smaller window prevents congestion and facilitates better pacing, + especially if/when MAC level flow control does not work well or when it is + not supported on the machine. Experimentation may be necessary to attain + the correct value. This method is provided as a starting point for the + correct receive buffer size. + Setting the min, max, and default receive buffer (RX_WINDOW) size is + performed in the same manner as single connection. + + +DRIVER MESSAGES +=============== + + The following messages are the most common messages logged by syslog. These + may be found in /var/log/messages. + + Driver up: + Chelsio Network Driver - version 2.1.1 + + NIC detected: + eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit + + Link up: + eth#: link is up at 10 Gbps, full duplex + + Link down: + eth#: link is down + + +KNOWN ISSUES +============ + + These issues have been identified during testing. The following information + is provided as a workaround to the problem. In some cases, this problem is + inherent to Linux or to a particular Linux Distribution and/or hardware + platform. + + 1. Large number of TCP retransmits on a multiprocessor (SMP) system. + + On a system with multiple CPUs, the interrupt (IRQ) for the network + controller may be bound to more than one CPU. This will cause TCP + retransmits if the packet data were to be split across different CPUs + and re-assembled in a different order than expected. + + To eliminate the TCP retransmits, set smp_affinity on the particular + interrupt to a single CPU. You can locate the interrupt (IRQ) used on + the N110/N210 by using ifconfig: + ifconfig <dev_name> | grep Interrupt + Set the smp_affinity to a single CPU: + echo 1 > /proc/irq/<interrupt_number>/smp_affinity + + It is highly suggested that you do not run the irqbalance daemon on your + system, as this will change any smp_affinity setting you have applied. + The irqbalance daemon runs on a 10 second interval and binds interrupts + to the least loaded CPU determined by the daemon. To disable this daemon: + chkconfig --level 2345 irqbalance off + + By default, some Linux distributions enable the kernel feature, + irqbalance, which performs the same function as the daemon. To disable + this feature, add the following line to your bootloader: + noirqbalance + + Example using the Grub bootloader: + title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp) + root (hd0,0) + kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance + initrd /initrd-2.4.21-27.ELsmp.img + + 2. After running insmod, the driver is loaded and the incorrect network + interface is brought up without running ifup. + + When using 2.4.x kernels, including RHEL kernels, the Linux kernel + invokes a script named "hotplug". This script is primarily used to + automatically bring up USB devices when they are plugged in, however, + the script also attempts to automatically bring up a network interface + after loading the kernel module. The hotplug script does this by scanning + the ifcfg-eth# config files in /etc/sysconfig/network-scripts, looking + for HWADDR=<mac_address>. + + If the hotplug script does not find the HWADDRR within any of the + ifcfg-eth# files, it will bring up the device with the next available + interface name. If this interface is already configured for a different + network card, your new interface will have incorrect IP address and + network settings. + + To solve this issue, you can add the HWADDR=<mac_address> key to the + interface config file of your network controller. + + To disable this "hotplug" feature, you may add the driver (module name) + to the "blacklist" file located in /etc/hotplug. It has been noted that + this does not work for network devices because the net.agent script + does not use the blacklist file. Simply remove, or rename, the net.agent + script located in /etc/hotplug to disable this feature. + + 3. Transport Protocol (TP) hangs when running heavy multi-connection traffic + on an AMD Opteron system with HyperTransport PCI-X Tunnel chipset. + + If your AMD Opteron system uses the AMD-8131 HyperTransport PCI-X Tunnel + chipset, you may experience the "133-Mhz Mode Split Completion Data + Corruption" bug identified by AMD while using a 133Mhz PCI-X card on the + bus PCI-X bus. + + AMD states, "Under highly specific conditions, the AMD-8131 PCI-X Tunnel + can provide stale data via split completion cycles to a PCI-X card that + is operating at 133 Mhz", causing data corruption. + + AMD's provides three workarounds for this problem, however, Chelsio + recommends the first option for best performance with this bug: + + For 133Mhz secondary bus operation, limit the transaction length and + the number of outstanding transactions, via BIOS configuration + programming of the PCI-X card, to the following: + + Data Length (bytes): 1k + Total allowed outstanding transactions: 2 + + Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004, + section 56, "133-MHz Mode Split Completion Data Corruption" for more + details with this bug and workarounds suggested by AMD. + + It may be possible to work outside AMD's recommended PCI-X settings, try + increasing the Data Length to 2k bytes for increased performance. If you + have issues with these settings, please revert to the "safe" settings + and duplicate the problem before submitting a bug or asking for support. + + NOTE: The default setting on most systems is 8 outstanding transactions + and 2k bytes data length. + + 4. On multiprocessor systems, it has been noted that an application which + is handling 10Gb networking can switch between CPUs causing degraded + and/or unstable performance. + + If running on an SMP system and taking performance measurements, it + is suggested you either run the latest netperf-2.4.0+ or use a binding + tool such as Tim Hockin's procstate utilities (runon) + <http://www.hockin.org/~thockin/procstate/>. + + Binding netserver and netperf (or other applications) to particular + CPUs will have a significant difference in performance measurements. + You may need to experiment which CPU to bind the application to in + order to achieve the best performance for your system. + + If you are developing an application designed for 10Gb networking, + please keep in mind you may want to look at kernel functions + sched_setaffinity & sched_getaffinity to bind your application. + + If you are just running user-space applications such as ftp, telnet, + etc., you may want to try the runon tool provided by Tim Hockin's + procstate utility. You could also try binding the interface to a + particular CPU: runon 0 ifup eth0 + + +SUPPORT +======= + + If you have problems with the software or hardware, please contact our + customer support team via email at support@chelsio.com or check our website + at http://www.chelsio.com + +=============================================================================== + + Chelsio Communications + 370 San Aleso Ave. + Suite 100 + Sunnyvale, CA 94085 + http://www.chelsio.com + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License, version 2, as +published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License along +with this program; if not, write to the Free Software Foundation, Inc., +59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED +WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. + + Copyright (c) 2003-2005 Chelsio Communications. All rights reserved. + +=============================================================================== diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt new file mode 100644 index 0000000..39131a3 --- /dev/null +++ b/Documentation/networking/dccp.txt @@ -0,0 +1,156 @@ +DCCP protocol +============ + + +Contents +======== + +- Introduction +- Missing features +- Socket options +- Notes + +Introduction +============ + +Datagram Congestion Control Protocol (DCCP) is an unreliable, connection +oriented protocol designed to solve issues present in UDP and TCP, particularly +for real-time and multimedia (streaming) traffic. +It divides into a base protocol (RFC 4340) and plugable congestion control +modules called CCIDs. Like plugable TCP congestion control, at least one CCID +needs to be enabled in order for the protocol to function properly. In the Linux +implementation, this is the TCP-like CCID2 (RFC 4341). Additional CCIDs, such as +the TCP-friendly CCID3 (RFC 4342), are optional. +For a brief introduction to CCIDs and suggestions for choosing a CCID to match +given applications, see section 10 of RFC 4340. + +It has a base protocol and pluggable congestion control IDs (CCIDs). + +DCCP is a Proposed Standard (RFC 2026), and the homepage for DCCP as a protocol +is at http://www.ietf.org/html.charters/dccp-charter.html + +Missing features +================ + +The Linux DCCP implementation does not currently support all the features that are +specified in RFCs 4340...42. + +The known bugs are at: + http://linux-net.osdl.org/index.php/TODO#DCCP + +For more up-to-date versions of the DCCP implementation, please consider using +the experimental DCCP test tree; instructions for checking this out are on: +http://linux-net.osdl.org/index.php/DCCP_Testing#Experimental_DCCP_source_tree + + +Socket options +============== + +DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of +service codes (RFC 4340, sec. 8.1.2); if this socket option is not set, +the socket will fall back to 0 (which means that no meaningful service code +is present). On active sockets this is set before connect(); specifying more +than one code has no effect (all subsequent service codes are ignored). The +case is different for passive sockets, where multiple service codes (up to 32) +can be set before calling bind(). + +DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet +size (application payload size) in bytes, see RFC 4340, section 14. + +DCCP_SOCKOPT_SERVER_TIMEWAIT enables the server (listening socket) to hold +timewait state when closing the connection (RFC 4340, 8.3). The usual case is +that the closing server sends a CloseReq, whereupon the client holds timewait +state. When this boolean socket option is on, the server sends a Close instead +and will enter TIMEWAIT. This option must be set after accept() returns. + +DCCP_SOCKOPT_SEND_CSCOV and DCCP_SOCKOPT_RECV_CSCOV are used for setting the +partial checksum coverage (RFC 4340, sec. 9.2). The default is that checksums +always cover the entire packet and that only fully covered application data is +accepted by the receiver. Hence, when using this feature on the sender, it must +be enabled at the receiver, too with suitable choice of CsCov. + +DCCP_SOCKOPT_SEND_CSCOV sets the sender checksum coverage. Values in the + range 0..15 are acceptable. The default setting is 0 (full coverage), + values between 1..15 indicate partial coverage. +DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it + sets a threshold, where again values 0..15 are acceptable. The default + of 0 means that all packets with a partial coverage will be discarded. + Values in the range 1..15 indicate that packets with minimally such a + coverage value are also acceptable. The higher the number, the more + restrictive this setting (see [RFC 4340, sec. 9.2.1]). Partial coverage + settings are inherited to the child socket after accept(). + +The following two options apply to CCID 3 exclusively and are getsockopt()-only. +In either case, a TFRC info struct (defined in <linux/tfrc.h>) is returned. +DCCP_SOCKOPT_CCID_RX_INFO + Returns a `struct tfrc_rx_info' in optval; the buffer for optval and + optlen must be set to at least sizeof(struct tfrc_rx_info). +DCCP_SOCKOPT_CCID_TX_INFO + Returns a `struct tfrc_tx_info' in optval; the buffer for optval and + optlen must be set to at least sizeof(struct tfrc_tx_info). + +On unidirectional connections it is useful to close the unused half-connection +via shutdown (SHUT_WR or SHUT_RD): this will reduce per-packet processing costs. + +Sysctl variables +================ +Several DCCP default parameters can be managed by the following sysctls +(sysctl net.dccp.default or /proc/sys/net/dccp/default): + +request_retries + The number of active connection initiation retries (the number of + Requests minus one) before timing out. In addition, it also governs + the behaviour of the other, passive side: this variable also sets + the number of times DCCP repeats sending a Response when the initial + handshake does not progress from RESPOND to OPEN (i.e. when no Ack + is received after the initial Request). This value should be greater + than 0, suggested is less than 10. Analogue of tcp_syn_retries. + +retries1 + How often a DCCP Response is retransmitted until the listening DCCP + side considers its connecting peer dead. Analogue of tcp_retries1. + +retries2 + The number of times a general DCCP packet is retransmitted. This has + importance for retransmitted acknowledgments and feature negotiation, + data packets are never retransmitted. Analogue of tcp_retries2. + +send_ndp = 1 + Whether or not to send NDP count options (sec. 7.7.2). + +send_ackvec = 1 + Whether or not to send Ack Vector options (sec. 11.5). + +ack_ratio = 2 + The default Ack Ratio (sec. 11.3) to use. + +tx_ccid = 2 + Default CCID for the sender-receiver half-connection. + +rx_ccid = 2 + Default CCID for the receiver-sender half-connection. + +seq_window = 100 + The initial sequence window (sec. 7.5.2). + +tx_qlen = 5 + The size of the transmit buffer in packets. A value of 0 corresponds + to an unbounded transmit buffer. + +sync_ratelimit = 125 ms + The timeout between subsequent DCCP-Sync packets sent in response to + sequence-invalid packets on the same socket (RFC 4340, 7.5.4). The unit + of this parameter is milliseconds; a value of 0 disables rate-limiting. + +IOCTLS +====== +FIONREAD + Works as in udp(7): returns in the `int' argument pointer the size of + the next pending datagram in bytes, or 0 when no datagram is pending. + +Notes +===== + +DCCP does not travel through NAT successfully at present on many boxes. This is +because the checksum covers the pseudo-header as per TCP and UDP. Linux NAT +support for DCCP has been added. diff --git a/Documentation/networking/de4x5.txt b/Documentation/networking/de4x5.txt new file mode 100644 index 0000000..c8e4ca9 --- /dev/null +++ b/Documentation/networking/de4x5.txt @@ -0,0 +1,178 @@ + Originally, this driver was written for the Digital Equipment + Corporation series of EtherWORKS Ethernet cards: + + DE425 TP/COAX EISA + DE434 TP PCI + DE435 TP/COAX/AUI PCI + DE450 TP/COAX/AUI PCI + DE500 10/100 PCI Fasternet + + but it will now attempt to support all cards which conform to the + Digital Semiconductor SROM Specification. The driver currently + recognises the following chips: + + DC21040 (no SROM) + DC21041[A] + DC21140[A] + DC21142 + DC21143 + + So far the driver is known to work with the following cards: + + KINGSTON + Linksys + ZNYX342 + SMC8432 + SMC9332 (w/new SROM) + ZNYX31[45] + ZNYX346 10/100 4 port (can act as a 10/100 bridge!) + + The driver has been tested on a relatively busy network using the DE425, + DE434, DE435 and DE500 cards and benchmarked with 'ttcp': it transferred + 16M of data to a DECstation 5000/200 as follows: + + TCP UDP + TX RX TX RX + DE425 1030k 997k 1170k 1128k + DE434 1063k 995k 1170k 1125k + DE435 1063k 995k 1170k 1125k + DE500 1063k 998k 1170k 1125k in 10Mb/s mode + + All values are typical (in kBytes/sec) from a sample of 4 for each + measurement. Their error is +/-20k on a quiet (private) network and also + depend on what load the CPU has. + + ========================================================================= + + The ability to load this driver as a loadable module has been included + and used extensively during the driver development (to save those long + reboot sequences). Loadable module support under PCI and EISA has been + achieved by letting the driver autoprobe as if it were compiled into the + kernel. Do make sure you're not sharing interrupts with anything that + cannot accommodate interrupt sharing! + + To utilise this ability, you have to do 8 things: + + 0) have a copy of the loadable modules code installed on your system. + 1) copy de4x5.c from the /linux/drivers/net directory to your favourite + temporary directory. + 2) for fixed autoprobes (not recommended), edit the source code near + line 5594 to reflect the I/O address you're using, or assign these when + loading by: + + insmod de4x5 io=0xghh where g = bus number + hh = device number + + NB: autoprobing for modules is now supported by default. You may just + use: + + insmod de4x5 + + to load all available boards. For a specific board, still use + the 'io=?' above. + 3) compile de4x5.c, but include -DMODULE in the command line to ensure + that the correct bits are compiled (see end of source code). + 4) if you are wanting to add a new card, goto 5. Otherwise, recompile a + kernel with the de4x5 configuration turned off and reboot. + 5) insmod de4x5 [io=0xghh] + 6) run the net startup bits for your new eth?? interface(s) manually + (usually /etc/rc.inet[12] at boot time). + 7) enjoy! + + To unload a module, turn off the associated interface(s) + 'ifconfig eth?? down' then 'rmmod de4x5'. + + Automedia detection is included so that in principle you can disconnect + from, e.g. TP, reconnect to BNC and things will still work (after a + pause whilst the driver figures out where its media went). My tests + using ping showed that it appears to work.... + + By default, the driver will now autodetect any DECchip based card. + Should you have a need to restrict the driver to DIGITAL only cards, you + can compile with a DEC_ONLY define, or if loading as a module, use the + 'dec_only=1' parameter. + + I've changed the timing routines to use the kernel timer and scheduling + functions so that the hangs and other assorted problems that occurred + while autosensing the media should be gone. A bonus for the DC21040 + auto media sense algorithm is that it can now use one that is more in + line with the rest (the DC21040 chip doesn't have a hardware timer). + The downside is the 1 'jiffies' (10ms) resolution. + + IEEE 802.3u MII interface code has been added in anticipation that some + products may use it in the future. + + The SMC9332 card has a non-compliant SROM which needs fixing - I have + patched this driver to detect it because the SROM format used complies + to a previous DEC-STD format. + + I have removed the buffer copies needed for receive on Intels. I cannot + remove them for Alphas since the Tulip hardware only does longword + aligned DMA transfers and the Alphas get alignment traps with non + longword aligned data copies (which makes them really slow). No comment. + + I have added SROM decoding routines to make this driver work with any + card that supports the Digital Semiconductor SROM spec. This will help + all cards running the dc2114x series chips in particular. Cards using + the dc2104x chips should run correctly with the basic driver. I'm in + debt to <mjacob@feral.com> for the testing and feedback that helped get + this feature working. So far we have tested KINGSTON, SMC8432, SMC9332 + (with the latest SROM complying with the SROM spec V3: their first was + broken), ZNYX342 and LinkSys. ZNYX314 (dual 21041 MAC) and ZNYX 315 + (quad 21041 MAC) cards also appear to work despite their incorrectly + wired IRQs. + + I have added a temporary fix for interrupt problems when some SCSI cards + share the same interrupt as the DECchip based cards. The problem occurs + because the SCSI card wants to grab the interrupt as a fast interrupt + (runs the service routine with interrupts turned off) vs. this card + which really needs to run the service routine with interrupts turned on. + This driver will now add the interrupt service routine as a fast + interrupt if it is bounced from the slow interrupt. THIS IS NOT A + RECOMMENDED WAY TO RUN THE DRIVER and has been done for a limited time + until people sort out their compatibility issues and the kernel + interrupt service code is fixed. YOU SHOULD SEPARATE OUT THE FAST + INTERRUPT CARDS FROM THE SLOW INTERRUPT CARDS to ensure that they do not + run on the same interrupt. PCMCIA/CardBus is another can of worms... + + Finally, I think I have really fixed the module loading problem with + more than one DECchip based card. As a side effect, I don't mess with + the device structure any more which means that if more than 1 card in + 2.0.x is installed (4 in 2.1.x), the user will have to edit + linux/drivers/net/Space.c to make room for them. Hence, module loading + is the preferred way to use this driver, since it doesn't have this + limitation. + + Where SROM media detection is used and full duplex is specified in the + SROM, the feature is ignored unless lp->params.fdx is set at compile + time OR during a module load (insmod de4x5 args='eth??:fdx' [see + below]). This is because there is no way to automatically detect full + duplex links except through autonegotiation. When I include the + autonegotiation feature in the SROM autoconf code, this detection will + occur automatically for that case. + + Command line arguments are now allowed, similar to passing arguments + through LILO. This will allow a per adapter board set up of full duplex + and media. The only lexical constraints are: the board name (dev->name) + appears in the list before its parameters. The list of parameters ends + either at the end of the parameter list or with another board name. The + following parameters are allowed: + + fdx for full duplex + autosense to set the media/speed; with the following + sub-parameters: + TP, TP_NW, BNC, AUI, BNC_AUI, 100Mb, 10Mb, AUTO + + Case sensitivity is important for the sub-parameters. They *must* be + upper case. Examples: + + insmod de4x5 args='eth1:fdx autosense=BNC eth0:autosense=100Mb'. + + For a compiled in driver, in linux/drivers/net/CONFIG, place e.g. + DE4X5_OPTS = -DDE4X5_PARM='"eth0:fdx autosense=AUI eth2:autosense=TP"' + + Yes, I know full duplex isn't permissible on BNC or AUI; they're just + examples. By default, full duplex is turned off and AUTO is the default + autosense setting. In reality, I expect only the full duplex option to + be used. Note the use of single quotes in the two examples above and the + lack of commas to separate items. diff --git a/Documentation/networking/decnet.txt b/Documentation/networking/decnet.txt new file mode 100644 index 0000000..d896895 --- /dev/null +++ b/Documentation/networking/decnet.txt @@ -0,0 +1,232 @@ + Linux DECnet Networking Layer Information + =========================================== + +1) Other documentation.... + + o Project Home Pages + http://www.chygwyn.com/DECnet/ - Kernel info + http://linux-decnet.sourceforge.net/ - Userland tools + http://www.sourceforge.net/projects/linux-decnet/ - Status page + +2) Configuring the kernel + +Be sure to turn on the following options: + + CONFIG_DECNET (obviously) + CONFIG_PROC_FS (to see what's going on) + CONFIG_SYSCTL (for easy configuration) + +if you want to try out router support (not properly debugged yet) +you'll need the following options as well... + + CONFIG_DECNET_ROUTER (to be able to add/delete routes) + CONFIG_NETFILTER (will be required for the DECnet routing daemon) + + CONFIG_DECNET_ROUTE_FWMARK is optional + +Don't turn on SIOCGIFCONF support for DECnet unless you are really sure +that you need it, in general you won't and it can cause ifconfig to +malfunction. + +Run time configuration has changed slightly from the 2.4 system. If you +want to configure an endnode, then the simplified procedure is as follows: + + o Set the MAC address on your ethernet card before starting _any_ other + network protocols. + +As soon as your network card is brought into the UP state, DECnet should +start working. If you need something more complicated or are unsure how +to set the MAC address, see the next section. Also all configurations which +worked with 2.4 will work under 2.5 with no change. + +3) Command line options + +You can set a DECnet address on the kernel command line for compatibility +with the 2.4 configuration procedure, but in general it's not needed any more. +If you do st a DECnet address on the command line, it has only one purpose +which is that its added to the addresses on the loopback device. + +With 2.4 kernels, DECnet would only recognise addresses as local if they +were added to the loopback device. In 2.5, any local interface address +can be used to loop back to the local machine. Of course this does not +prevent you adding further addresses to the loopback device if you +want to. + +N.B. Since the address list of an interface determines the addresses for +which "hello" messages are sent, if you don't set an address on the loopback +interface then you won't see any entries in /proc/net/neigh for the local +host until such time as you start a connection. This doesn't affect the +operation of the local communications in any other way though. + +The kernel command line takes options looking like the following: + + decnet.addr=1,2 + +the two numbers are the node address 1,2 = 1.2 For 2.2.xx kernels +and early 2.3.xx kernels, you must use a comma when specifying the +DECnet address like this. For more recent 2.3.xx kernels, you may +use almost any character except space, although a `.` would be the most +obvious choice :-) + +There used to be a third number specifying the node type. This option +has gone away in favour of a per interface node type. This is now set +using /proc/sys/net/decnet/conf/<dev>/forwarding. This file can be +set with a single digit, 0=EndNode, 1=L1 Router and 2=L2 Router. + +There are also equivalent options for modules. The node address can +also be set through the /proc/sys/net/decnet/ files, as can other system +parameters. + +Currently the only supported devices are ethernet and ip_gre. The +ethernet address of your ethernet card has to be set according to the DECnet +address of the node in order for it to be autoconfigured (and then appear in +/proc/net/decnet_dev). There is a utility available at the above +FTP sites called dn2ethaddr which can compute the correct ethernet +address to use. The address can be set by ifconfig either before or +at the time the device is brought up. If you are using RedHat you can +add the line: + + MACADDR=AA:00:04:00:03:04 + +or something similar, to /etc/sysconfig/network-scripts/ifcfg-eth0 or +wherever your network card's configuration lives. Setting the MAC address +of your ethernet card to an address starting with "hi-ord" will cause a +DECnet address which matches to be added to the interface (which you can +verify with iproute2). + +The default device for routing can be set through the /proc filesystem +by setting /proc/sys/net/decnet/default_device to the +device you want DECnet to route packets out of when no specific route +is available. Usually this will be eth0, for example: + + echo -n "eth0" >/proc/sys/net/decnet/default_device + +If you don't set the default device, then it will default to the first +ethernet card which has been autoconfigured as described above. You can +confirm that by looking in the default_device file of course. + +There is a list of what the other files under /proc/sys/net/decnet/ do +on the kernel patch web site (shown above). + +4) Run time kernel configuration + +This is either done through the sysctl/proc interface (see the kernel web +pages for details on what the various options do) or through the iproute2 +package in the same way as IPv4/6 configuration is performed. + +Documentation for iproute2 is included with the package, although there is +as yet no specific section on DECnet, most of the features apply to both +IP and DECnet, albeit with DECnet addresses instead of IP addresses and +a reduced functionality. + +If you want to configure a DECnet router you'll need the iproute2 package +since its the _only_ way to add and delete routes currently. Eventually +there will be a routing daemon to send and receive routing messages for +each interface and update the kernel routing tables accordingly. The +routing daemon will use netfilter to listen to routing packets, and +rtnetlink to update the kernels routing tables. + +The DECnet raw socket layer has been removed since it was there purely +for use by the routing daemon which will now use netfilter (a much cleaner +and more generic solution) instead. + +5) How can I tell if its working ? + +Here is a quick guide of what to look for in order to know if your DECnet +kernel subsystem is working. + + - Is the node address set (see /proc/sys/net/decnet/node_address) + - Is the node of the correct type + (see /proc/sys/net/decnet/conf/<dev>/forwarding) + - Is the Ethernet MAC address of each Ethernet card set to match + the DECnet address. If in doubt use the dn2ethaddr utility available + at the ftp archive. + - If the previous two steps are satisfied, and the Ethernet card is up, + you should find that it is listed in /proc/net/decnet_dev and also + that it appears as a directory in /proc/sys/net/decnet/conf/. The + loopback device (lo) should also appear and is required to communicate + within a node. + - If you have any DECnet routers on your network, they should appear + in /proc/net/decnet_neigh, otherwise this file will only contain the + entry for the node itself (if it doesn't check to see if lo is up). + - If you want to send to any node which is not listed in the + /proc/net/decnet_neigh file, you'll need to set the default device + to point to an Ethernet card with connection to a router. This is + again done with the /proc/sys/net/decnet/default_device file. + - Try starting a simple server and client, like the dnping/dnmirror + over the loopback interface. With luck they should communicate. + For this step and those after, you'll need the DECnet library + which can be obtained from the above ftp sites as well as the + actual utilities themselves. + - If this seems to work, then try talking to a node on your local + network, and see if you can obtain the same results. + - At this point you are on your own... :-) + +6) How to send a bug report + +If you've found a bug and want to report it, then there are several things +you can do to help me work out exactly what it is that is wrong. Useful +information (_most_ of which _is_ _essential_) includes: + + - What kernel version are you running ? + - What version of the patch are you running ? + - How far though the above set of tests can you get ? + - What is in the /proc/decnet* files and /proc/sys/net/decnet/* files ? + - Which services are you running ? + - Which client caused the problem ? + - How much data was being transferred ? + - Was the network congested ? + - How can the problem be reproduced ? + - Can you use tcpdump to get a trace ? (N.B. Most (all?) versions of + tcpdump don't understand how to dump DECnet properly, so including + the hex listing of the packet contents is _essential_, usually the -x flag. + You may also need to increase the length grabbed with the -s flag. The + -e flag also provides very useful information (ethernet MAC addresses)) + +7) MAC FAQ + +A quick FAQ on ethernet MAC addresses to explain how Linux and DECnet +interact and how to get the best performance from your hardware. + +Ethernet cards are designed to normally only pass received network frames +to a host computer when they are addressed to it, or to the broadcast address. + +Linux has an interface which allows the setting of extra addresses for +an ethernet card to listen to. If the ethernet card supports it, the +filtering operation will be done in hardware, if not the extra unwanted packets +received will be discarded by the host computer. In the latter case, +significant processor time and bus bandwidth can be used up on a busy +network (see the NAPI documentation for a longer explanation of these +effects). + +DECnet makes use of this interface to allow running DECnet on an ethernet +card which has already been configured using TCP/IP (presumably using the +built in MAC address of the card, as usual) and/or to allow multiple DECnet +addresses on each physical interface. If you do this, be aware that if your +ethernet card doesn't support perfect hashing in its MAC address filter +then your computer will be doing more work than required. Some cards +will simply set themselves into promiscuous mode in order to receive +packets from the DECnet specified addresses. So if you have one of these +cards its better to set the MAC address of the card as described above +to gain the best efficiency. Better still is to use a card which supports +NAPI as well. + + +8) Mailing list + +If you are keen to get involved in development, or want to ask questions +about configuration, or even just report bugs, then there is a mailing +list that you can join, details are at: + +http://sourceforge.net/mail/?group_id=4993 + +9) Legal Info + +The Linux DECnet project team have placed their code under the GPL. The +software is provided "as is" and without warranty express or implied. +DECnet is a trademark of Compaq. This software is not a product of +Compaq. We acknowledge the help of people at Compaq in providing extra +documentation above and beyond what was previously publicly available. + +Steve Whitehouse <SteveW@ACM.org> + diff --git a/Documentation/networking/depca.txt b/Documentation/networking/depca.txt new file mode 100644 index 0000000..24c6b26 --- /dev/null +++ b/Documentation/networking/depca.txt @@ -0,0 +1,92 @@ + +DE10x +===== + +Memory Addresses: + + SW1 SW2 SW3 SW4 +64K on on on on d0000 dbfff + off on on on c0000 cbfff + off off on on e0000 ebfff + +32K on on off on d8000 dbfff + off on off on c8000 cbfff + off off off on e8000 ebfff + +DBR ROM on on dc000 dffff + off on cc000 cffff + off off ec000 effff + +Note that the 2K mode is set by SW3/SW4 on/off or off/off. Address +assignment is through the RBSA register. + +I/O Address: + SW5 +0x300 on +0x200 off + +Remote Boot: + SW6 +Disable on +Enable off + +Remote Boot Timeout: + SW7 +2.5min on +30s off + +IRQ: + SW8 SW9 SW10 SW11 SW12 +2 on off off off off +3 off on off off off +4 off off on off off +5 off off off on off +7 off off off off on + +DE20x +===== + +Memory Size: + + SW3 SW4 +64K on on +32K off on +2K on off +2K off off + +Start Addresses: + + SW1 SW2 SW3 SW4 +64K on on on on c0000 cffff + on off on on d0000 dffff + off on on on e0000 effff + +32K on on off off c8000 cffff + on off off off d8000 dffff + off on off off e8000 effff + +Illegal off off - - - - + +I/O Address: + SW5 +0x300 on +0x200 off + +Remote Boot: + SW6 +Disable on +Enable off + +Remote Boot Timeout: + SW7 +2.5min on +30s off + +IRQ: + SW8 SW9 SW10 SW11 SW12 +5 on off off off off +9 off on off off off +10 off off on off off +11 off off off on off +15 off off off off on + diff --git a/Documentation/networking/dl2k.txt b/Documentation/networking/dl2k.txt new file mode 100644 index 0000000..10e8490 --- /dev/null +++ b/Documentation/networking/dl2k.txt @@ -0,0 +1,281 @@ + + D-Link DL2000-based Gigabit Ethernet Adapter Installation + for Linux + May 23, 2002 + +Contents +======== + - Compatibility List + - Quick Install + - Compiling the Driver + - Installing the Driver + - Option parameter + - Configuration Script Sample + - Troubleshooting + + +Compatibility List +================= +Adapter Support: + +D-Link DGE-550T Gigabit Ethernet Adapter. +D-Link DGE-550SX Gigabit Ethernet Adapter. +D-Link DL2000-based Gigabit Ethernet Adapter. + + +The driver support Linux kernel 2.4.7 later. We had tested it +on the environments below. + + . Red Hat v6.2 (update kernel to 2.4.7) + . Red Hat v7.0 (update kernel to 2.4.7) + . Red Hat v7.1 (kernel 2.4.7) + . Red Hat v7.2 (kernel 2.4.7-10) + + +Quick Install +============= +Install linux driver as following command: + +1. make all +2. insmod dl2k.ko +3. ifconfig eth0 up 10.xxx.xxx.xxx netmask 255.0.0.0 + ^^^^^^^^^^^^^^^\ ^^^^^^^^\ + IP NETMASK +Now eth0 should active, you can test it by "ping" or get more information by +"ifconfig". If tested ok, continue the next step. + +4. cp dl2k.ko /lib/modules/`uname -r`/kernel/drivers/net +5. Add the following line to /etc/modprobe.conf: + alias eth0 dl2k +6. Run "netconfig" or "netconf" to create configuration script ifcfg-eth0 + located at /etc/sysconfig/network-scripts or create it manually. + [see - Configuration Script Sample] +7. Driver will automatically load and configure at next boot time. + +Compiling the Driver +==================== + In Linux, NIC drivers are most commonly configured as loadable modules. +The approach of building a monolithic kernel has become obsolete. The driver +can be compiled as part of a monolithic kernel, but is strongly discouraged. +The remainder of this section assumes the driver is built as a loadable module. +In the Linux environment, it is a good idea to rebuild the driver from the +source instead of relying on a precompiled version. This approach provides +better reliability since a precompiled driver might depend on libraries or +kernel features that are not present in a given Linux installation. + +The 3 files necessary to build Linux device driver are dl2k.c, dl2k.h and +Makefile. To compile, the Linux installation must include the gcc compiler, +the kernel source, and the kernel headers. The Linux driver supports Linux +Kernels 2.4.7. Copy the files to a directory and enter the following command +to compile and link the driver: + +CD-ROM drive +------------ + +[root@XXX /] mkdir cdrom +[root@XXX /] mount -r -t iso9660 -o conv=auto /dev/cdrom /cdrom +[root@XXX /] cd root +[root@XXX /root] mkdir dl2k +[root@XXX /root] cd dl2k +[root@XXX dl2k] cp /cdrom/linux/dl2k.tgz /root/dl2k +[root@XXX dl2k] tar xfvz dl2k.tgz +[root@XXX dl2k] make all + +Floppy disc drive +----------------- + +[root@XXX /] cd root +[root@XXX /root] mkdir dl2k +[root@XXX /root] cd dl2k +[root@XXX dl2k] mcopy a:/linux/dl2k.tgz /root/dl2k +[root@XXX dl2k] tar xfvz dl2k.tgz +[root@XXX dl2k] make all + +Installing the Driver +===================== + + Manual Installation + ------------------- + Once the driver has been compiled, it must be loaded, enabled, and bound + to a protocol stack in order to establish network connectivity. To load a + module enter the command: + + insmod dl2k.o + + or + + insmod dl2k.o <optional parameter> ; add parameter + + =============================================================== + example: insmod dl2k.o media=100mbps_hd + or insmod dl2k.o media=3 + or insmod dl2k.o media=3,2 ; for 2 cards + =============================================================== + + Please reference the list of the command line parameters supported by + the Linux device driver below. + + The insmod command only loads the driver and gives it a name of the form + eth0, eth1, etc. To bring the NIC into an operational state, + it is necessary to issue the following command: + + ifconfig eth0 up + + Finally, to bind the driver to the active protocol (e.g., TCP/IP with + Linux), enter the following command: + + ifup eth0 + + Note that this is meaningful only if the system can find a configuration + script that contains the necessary network information. A sample will be + given in the next paragraph. + + The commands to unload a driver are as follows: + + ifdown eth0 + ifconfig eth0 down + rmmod dl2k.o + + The following are the commands to list the currently loaded modules and + to see the current network configuration. + + lsmod + ifconfig + + + Automated Installation + ---------------------- + This section describes how to install the driver such that it is + automatically loaded and configured at boot time. The following description + is based on a Red Hat 6.0/7.0 distribution, but it can easily be ported to + other distributions as well. + + Red Hat v6.x/v7.x + ----------------- + 1. Copy dl2k.o to the network modules directory, typically + /lib/modules/2.x.x-xx/net or /lib/modules/2.x.x/kernel/drivers/net. + 2. Locate the boot module configuration file, most commonly modprobe.conf + or modules.conf (for 2.4) in the /etc directory. Add the following lines: + + alias ethx dl2k + options dl2k <optional parameters> + + where ethx will be eth0 if the NIC is the only ethernet adapter, eth1 if + one other ethernet adapter is installed, etc. Refer to the table in the + previous section for the list of optional parameters. + 3. Locate the network configuration scripts, normally the + /etc/sysconfig/network-scripts directory, and create a configuration + script named ifcfg-ethx that contains network information. + 4. Note that for most Linux distributions, Red Hat included, a configuration + utility with a graphical user interface is provided to perform steps 2 + and 3 above. + + +Parameter Description +===================== +You can install this driver without any additional parameter. However, if you +are going to have extensive functions then it is necessary to set extra +parameter. Below is a list of the command line parameters supported by the +Linux device +driver. + +mtu=packet_size - Specifies the maximum packet size. default + is 1500. + +media=media_type - Specifies the media type the NIC operates at. + autosense Autosensing active media. + 10mbps_hd 10Mbps half duplex. + 10mbps_fd 10Mbps full duplex. + 100mbps_hd 100Mbps half duplex. + 100mbps_fd 100Mbps full duplex. + 1000mbps_fd 1000Mbps full duplex. + 1000mbps_hd 1000Mbps half duplex. + 0 Autosensing active media. + 1 10Mbps half duplex. + 2 10Mbps full duplex. + 3 100Mbps half duplex. + 4 100Mbps full duplex. + 5 1000Mbps half duplex. + 6 1000Mbps full duplex. + + By default, the NIC operates at autosense. + 1000mbps_fd and 1000mbps_hd types are only + available for fiber adapter. + +vlan=n - Specifies the VLAN ID. If vlan=0, the + Virtual Local Area Network (VLAN) function is + disable. + +jumbo=[0|1] - Specifies the jumbo frame support. If jumbo=1, + the NIC accept jumbo frames. By default, this + function is disabled. + Jumbo frame usually improve the performance + int gigabit. + This feature need jumbo frame compatible + remote. + +rx_coalesce=m - Number of rx frame handled each interrupt. +rx_timeout=n - Rx DMA wait time for an interrupt. + If set rx_coalesce > 0, hardware only assert + an interrupt for m frames. Hardware won't + assert rx interrupt until m frames received or + reach timeout of n * 640 nano seconds. + Set proper rx_coalesce and rx_timeout can + reduce congestion collapse and overload which + has been a bottleneck for high speed network. + + For example, rx_coalesce=10 rx_timeout=800. + that is, hardware assert only 1 interrupt + for 10 frames received or timeout of 512 us. + +tx_coalesce=n - Number of tx frame handled each interrupt. + Set n > 1 can reduce the interrupts + congestion usually lower performance of + high speed network card. Default is 16. + +tx_flow=[1|0] - Specifies the Tx flow control. If tx_flow=0, + the Tx flow control disable else driver + autodetect. +rx_flow=[1|0] - Specifies the Rx flow control. If rx_flow=0, + the Rx flow control enable else driver + autodetect. + + +Configuration Script Sample +=========================== +Here is a sample of a simple configuration script: + +DEVICE=eth0 +USERCTL=no +ONBOOT=yes +POOTPROTO=none +BROADCAST=207.200.5.255 +NETWORK=207.200.5.0 +NETMASK=255.255.255.0 +IPADDR=207.200.5.2 + + +Troubleshooting +=============== +Q1. Source files contain ^ M behind every line. + Make sure all files are Unix file format (no LF). Try the following + shell command to convert files. + + cat dl2k.c | col -b > dl2k.tmp + mv dl2k.tmp dl2k.c + + OR + + cat dl2k.c | tr -d "\r" > dl2k.tmp + mv dl2k.tmp dl2k.c + +Q2: Could not find header files (*.h) ? + To compile the driver, you need kernel header files. After + installing the kernel source, the header files are usually located in + /usr/src/linux/include, which is the default include directory configured + in Makefile. For some distributions, there is a copy of header files in + /usr/src/include/linux and /usr/src/include/asm, that you can change the + INCLUDEDIR in Makefile to /usr/include without installing kernel source. + Note that RH 7.0 didn't provide correct header files in /usr/include, + including those files will make a wrong version driver. + diff --git a/Documentation/networking/dm9000.txt b/Documentation/networking/dm9000.txt new file mode 100644 index 0000000..65df3de --- /dev/null +++ b/Documentation/networking/dm9000.txt @@ -0,0 +1,167 @@ +DM9000 Network driver +===================== + +Copyright 2008 Simtec Electronics, + Ben Dooks <ben@simtec.co.uk> <ben-linux@fluff.org> + + +Introduction +------------ + +This file describes how to use the DM9000 platform-device based network driver +that is contained in the files drivers/net/dm9000.c and drivers/net/dm9000.h. + +The driver supports three DM9000 variants, the DM9000E which is the first chip +supported as well as the newer DM9000A and DM9000B devices. It is currently +maintained and tested by Ben Dooks, who should be CC: to any patches for this +driver. + + +Defining the platform device +---------------------------- + +The minimum set of resources attached to the platform device are as follows: + + 1) The physical address of the address register + 2) The physical address of the data register + 3) The IRQ line the device's interrupt pin is connected to. + +These resources should be specified in that order, as the ordering of the +two address regions is important (the driver expects these to be address +and then data). + +An example from arch/arm/mach-s3c2410/mach-bast.c is: + +static struct resource bast_dm9k_resource[] = { + [0] = { + .start = S3C2410_CS5 + BAST_PA_DM9000, + .end = S3C2410_CS5 + BAST_PA_DM9000 + 3, + .flags = IORESOURCE_MEM, + }, + [1] = { + .start = S3C2410_CS5 + BAST_PA_DM9000 + 0x40, + .end = S3C2410_CS5 + BAST_PA_DM9000 + 0x40 + 0x3f, + .flags = IORESOURCE_MEM, + }, + [2] = { + .start = IRQ_DM9000, + .end = IRQ_DM9000, + .flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHLEVEL, + } +}; + +static struct platform_device bast_device_dm9k = { + .name = "dm9000", + .id = 0, + .num_resources = ARRAY_SIZE(bast_dm9k_resource), + .resource = bast_dm9k_resource, +}; + +Note the setting of the IRQ trigger flag in bast_dm9k_resource[2].flags, +as this will generate a warning if it is not present. The trigger from +the flags field will be passed to request_irq() when registering the IRQ +handler to ensure that the IRQ is setup correctly. + +This shows a typical platform device, without the optional configuration +platform data supplied. The next example uses the same resources, but adds +the optional platform data to pass extra configuration data: + +static struct dm9000_plat_data bast_dm9k_platdata = { + .flags = DM9000_PLATF_16BITONLY, +}; + +static struct platform_device bast_device_dm9k = { + .name = "dm9000", + .id = 0, + .num_resources = ARRAY_SIZE(bast_dm9k_resource), + .resource = bast_dm9k_resource, + .dev = { + .platform_data = &bast_dm9k_platdata, + } +}; + +The platform data is defined in include/linux/dm9000.h and described below. + + +Platform data +------------- + +Extra platform data for the DM9000 can describe the IO bus width to the +device, whether or not an external PHY is attached to the device and +the availability of an external configuration EEPROM. + +The flags for the platform data .flags field are as follows: + +DM9000_PLATF_8BITONLY + + The IO should be done with 8bit operations. + +DM9000_PLATF_16BITONLY + + The IO should be done with 16bit operations. + +DM9000_PLATF_32BITONLY + + The IO should be done with 32bit operations. + +DM9000_PLATF_EXT_PHY + + The chip is connected to an external PHY. + +DM9000_PLATF_NO_EEPROM + + This can be used to signify that the board does not have an + EEPROM, or that the EEPROM should be hidden from the user. + +DM9000_PLATF_SIMPLE_PHY + + Switch to using the simpler PHY polling method which does not + try and read the MII PHY state regularly. This is only available + when using the internal PHY. See the section on link state polling + for more information. + + The config symbol DM9000_FORCE_SIMPLE_PHY_POLL, Kconfig entry + "Force simple NSR based PHY polling" allows this flag to be + forced on at build time. + + +PHY Link state polling +---------------------- + +The driver keeps track of the link state and informs the network core +about link (carrier) availablilty. This is managed by several methods +depending on the version of the chip and on which PHY is being used. + +For the internal PHY, the original (and currently default) method is +to read the MII state, either when the status changes if we have the +necessary interrupt support in the chip or every two seconds via a +periodic timer. + +To reduce the overhead for the internal PHY, there is now the option +of using the DM9000_FORCE_SIMPLE_PHY_POLL config, or DM9000_PLATF_SIMPLE_PHY +platform data option to read the summary information without the +expensive MII accesses. This method is faster, but does not print +as much information. + +When using an external PHY, the driver currently has to poll the MII +link status as there is no method for getting an interrupt on link change. + + +DM9000A / DM9000B +----------------- + +These chips are functionally similar to the DM9000E and are supported easily +by the same driver. The features are: + + 1) Interrupt on internal PHY state change. This means that the periodic + polling of the PHY status may be disabled on these devices when using + the internal PHY. + + 2) TCP/UDP checksum offloading, which the driver does not currently support. + + +ethtool +------- + +The driver supports the ethtool interface for access to the driver +state information, the PHY state and the EEPROM. diff --git a/Documentation/networking/dmfe.txt b/Documentation/networking/dmfe.txt new file mode 100644 index 0000000..8006c22 --- /dev/null +++ b/Documentation/networking/dmfe.txt @@ -0,0 +1,65 @@ +Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver for Linux. + +This program is free software; you can redistribute it and/or +modify it under the terms of the GNU General Public License +as published by the Free Software Foundation; either version 2 +of the License, or (at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + + +This driver provides kernel support for Davicom DM9102(A)/DM9132/DM9801 ethernet cards ( CNET +10/100 ethernet cards uses Davicom chipset too, so this driver supports CNET cards too ).If you +didn't compile this driver as a module, it will automatically load itself on boot and print a +line similar to : + + dmfe: Davicom DM9xxx net driver, version 1.36.4 (2002-01-17) + +If you compiled this driver as a module, you have to load it on boot.You can load it with command : + + insmod dmfe + +This way it will autodetect the device mode.This is the suggested way to load the module.Or you can pass +a mode= setting to module while loading, like : + + insmod dmfe mode=0 # Force 10M Half Duplex + insmod dmfe mode=1 # Force 100M Half Duplex + insmod dmfe mode=4 # Force 10M Full Duplex + insmod dmfe mode=5 # Force 100M Full Duplex + +Next you should configure your network interface with a command similar to : + + ifconfig eth0 172.22.3.18 + ^^^^^^^^^^^ + Your IP Address + +Then you may have to modify the default routing table with command : + + route add default eth0 + + +Now your ethernet card should be up and running. + + +TODO: + +Implement pci_driver::suspend() and pci_driver::resume() power management methods. +Check on 64 bit boxes. +Check and fix on big endian boxes. +Test and make sure PCI latency is now correct for all cases. + + +Authors: + +Sten Wang <sten_wang@davicom.com.tw > : Original Author +Tobias Ringstrom <tori@unhappy.mine.nu> : Current Maintainer + +Contributors: + +Marcelo Tosatti <marcelo@conectiva.com.br> +Alan Cox <alan@lxorguk.ukuu.org.uk> +Jeff Garzik <jgarzik@pobox.com> +Vojtech Pavlik <vojtech@suse.cz> diff --git a/Documentation/networking/driver.txt b/Documentation/networking/driver.txt new file mode 100644 index 0000000..ea72d2e --- /dev/null +++ b/Documentation/networking/driver.txt @@ -0,0 +1,94 @@ +Document about softnet driver issues + +Transmit path guidelines: + +1) The hard_start_xmit method must never return '1' under any + normal circumstances. It is considered a hard error unless + there is no way your device can tell ahead of time when it's + transmit function will become busy. + + Instead it must maintain the queue properly. For example, + for a driver implementing scatter-gather this means: + + static int drv_hard_start_xmit(struct sk_buff *skb, + struct net_device *dev) + { + struct drv *dp = dev->priv; + + lock_tx(dp); + ... + /* This is a hard error log it. */ + if (TX_BUFFS_AVAIL(dp) <= (skb_shinfo(skb)->nr_frags + 1)) { + netif_stop_queue(dev); + unlock_tx(dp); + printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n", + dev->name); + return 1; + } + + ... queue packet to card ... + ... update tx consumer index ... + + if (TX_BUFFS_AVAIL(dp) <= (MAX_SKB_FRAGS + 1)) + netif_stop_queue(dev); + + ... + unlock_tx(dp); + ... + } + + And then at the end of your TX reclamation event handling: + + if (netif_queue_stopped(dp->dev) && + TX_BUFFS_AVAIL(dp) > (MAX_SKB_FRAGS + 1)) + netif_wake_queue(dp->dev); + + For a non-scatter-gather supporting card, the three tests simply become: + + /* This is a hard error log it. */ + if (TX_BUFFS_AVAIL(dp) <= 0) + + and: + + if (TX_BUFFS_AVAIL(dp) == 0) + + and: + + if (netif_queue_stopped(dp->dev) && + TX_BUFFS_AVAIL(dp) > 0) + netif_wake_queue(dp->dev); + +2) Do not forget to update netdev->trans_start to jiffies after + each new tx packet is given to the hardware. + +3) A hard_start_xmit method must not modify the shared parts of a + cloned SKB. + +4) Do not forget that once you return 0 from your hard_start_xmit + method, it is your driver's responsibility to free up the SKB + and in some finite amount of time. + + For example, this means that it is not allowed for your TX + mitigation scheme to let TX packets "hang out" in the TX + ring unreclaimed forever if no new TX packets are sent. + This error can deadlock sockets waiting for send buffer room + to be freed up. + + If you return 1 from the hard_start_xmit method, you must not keep + any reference to that SKB and you must not attempt to free it up. + +Probing guidelines: + +1) Any hardware layer address you obtain for your device should + be verified. For example, for ethernet check it with + linux/etherdevice.h:is_valid_ether_addr() + +Close/stop guidelines: + +1) After the dev->stop routine has been called, the hardware must + not receive or transmit any data. All in flight packets must + be aborted. If necessary, poll or wait for completion of + any reset commands. + +2) The dev->stop routine will be called by unregister_netdevice + if device is still UP. diff --git a/Documentation/networking/e100.txt b/Documentation/networking/e100.txt new file mode 100644 index 0000000..944aa55 --- /dev/null +++ b/Documentation/networking/e100.txt @@ -0,0 +1,206 @@ +Linux* Base Driver for the Intel(R) PRO/100 Family of Adapters +============================================================== + +November 15, 2005 + +Contents +======== + +- In This Release +- Identifying Your Adapter +- Building and Installation +- Driver Configuration Parameters +- Additional Configurations +- Known Issues +- Support + + +In This Release +=============== + +This file describes the Linux* Base Driver for the Intel(R) PRO/100 Family of +Adapters. This driver includes support for Itanium(R)2-based systems. + +For questions related to hardware requirements, refer to the documentation +supplied with your Intel PRO/100 adapter. + +The following features are now available in supported kernels: + - Native VLANs + - Channel Bonding (teaming) + - SNMP + +Channel Bonding documentation can be found in the Linux kernel source: +/Documentation/networking/bonding.txt + + +Identifying Your Adapter +======================== + +For more information on how to identify your adapter, go to the Adapter & +Driver ID Guide at: + + http://support.intel.com/support/network/adapter/pro100/21397.htm + +For the latest Intel network drivers for Linux, refer to the following +website. In the search field, enter your adapter name or type, or use the +networking link on the left to search for your adapter: + + http://downloadfinder.intel.com/scripts-df/support_intel.asp + +Driver Configuration Parameters +=============================== + +The default value for each parameter is generally the recommended setting, +unless otherwise noted. + +Rx Descriptors: Number of receive descriptors. A receive descriptor is a data + structure that describes a receive buffer and its attributes to the network + controller. The data in the descriptor is used by the controller to write + data from the controller to host memory. In the 3.x.x driver the valid range + for this parameter is 64-256. The default value is 64. This parameter can be + changed using the command: + + ethtool -G eth? rx n, where n is the number of desired rx descriptors. + +Tx Descriptors: Number of transmit descriptors. A transmit descriptor is a data + structure that describes a transmit buffer and its attributes to the network + controller. The data in the descriptor is used by the controller to read + data from the host memory to the controller. In the 3.x.x driver the valid + range for this parameter is 64-256. The default value is 64. This parameter + can be changed using the command: + + ethtool -G eth? tx n, where n is the number of desired tx descriptors. + +Speed/Duplex: The driver auto-negotiates the link speed and duplex settings by + default. Ethtool can be used as follows to force speed/duplex. + + ethtool -s eth? autoneg off speed {10|100} duplex {full|half} + + NOTE: setting the speed/duplex to incorrect values will cause the link to + fail. + +Event Log Message Level: The driver uses the message level flag to log events + to syslog. The message level can be set at driver load time. It can also be + set using the command: + + ethtool -s eth? msglvl n + + +Additional Configurations +========================= + + Configuring the Driver on Different Distributions + ------------------------------------------------- + + Configuring a network driver to load properly when the system is started is + distribution dependent. Typically, the configuration process involves adding + an alias line to /etc/modules.conf or /etc/modprobe.conf as well as editing + other system startup scripts and/or configuration files. Many popular Linux + distributions ship with tools to make these changes for you. To learn the + proper way to configure a network device for your system, refer to your + distribution documentation. If during this process you are asked for the + driver or module name, the name for the Linux Base Driver for the Intel + PRO/100 Family of Adapters is e100. + + As an example, if you install the e100 driver for two PRO/100 adapters + (eth0 and eth1), add the following to modules.conf or modprobe.conf: + + alias eth0 e100 + alias eth1 e100 + + Viewing Link Messages + --------------------- + In order to see link messages and other Intel driver information on your + console, you must set the dmesg level up to six. This can be done by + entering the following on the command line before loading the e100 driver: + + dmesg -n 8 + + If you wish to see all messages issued by the driver, including debug + messages, set the dmesg level to eight. + + NOTE: This setting is not saved across reboots. + + + Ethtool + ------- + + The driver utilizes the ethtool interface for driver configuration and + diagnostics, as well as displaying statistical information. Ethtool + version 1.6 or later is required for this functionality. + + The latest release of ethtool can be found from + http://sourceforge.net/projects/gkernel. + + NOTE: Ethtool 1.6 only supports a limited set of ethtool options. Support + for a more complete ethtool feature set can be enabled by upgrading + ethtool to ethtool-1.8.1. + + + Enabling Wake on LAN* (WoL) + --------------------------- + WoL is provided through the Ethtool* utility. Ethtool is included with Red + Hat* 8.0. For other Linux distributions, download and install Ethtool from + the following website: http://sourceforge.net/projects/gkernel. + + For instructions on enabling WoL with Ethtool, refer to the Ethtool man page. + + WoL will be enabled on the system during the next shut down or reboot. For + this driver version, in order to enable WoL, the e100 driver must be + loaded when shutting down or rebooting the system. + + + NAPI + ---- + + NAPI (Rx polling mode) is supported in the e100 driver. + + See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI. + + Multiple Interfaces on Same Ethernet Broadcast Network + ------------------------------------------------------ + + Due to the default ARP behavior on Linux, it is not possible to have + one system on two IP networks in the same Ethernet broadcast domain + (non-partitioned switch) behave as expected. All Ethernet interfaces + will respond to IP traffic for any IP address assigned to the system. + This results in unbalanced receive traffic. + + If you have multiple interfaces in a server, either turn on ARP + filtering by + + (1) entering: echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter + (this only works if your kernel's version is higher than 2.4.5), or + + (2) installing the interfaces in separate broadcast domains (either + in different switches or in a switch partitioned to VLANs). + + +Support +======= + +For general information, go to the Intel support website at: + + http://support.intel.com + + or the Intel Wired Networking project hosted by Sourceforge at: + + http://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on the supported +kernel with a supported adapter, email the specific information related to the +issue to e1000-devel@lists.sourceforge.net. + + +License +======= + +This software program is released under the terms of a license agreement +between you ('Licensee') and Intel. Do not use or load this software or any +associated materials (collectively, the 'Software') until you have carefully +read the full terms and conditions of the file COPYING located in this software +package. By loading or using the Software, you agree to the terms of this +Agreement. If you do not agree with the terms of this Agreement, do not install +or use the Software. + +* Other names and brands may be claimed as the property of others. diff --git a/Documentation/networking/e1000.txt b/Documentation/networking/e1000.txt new file mode 100644 index 0000000..2df7186 --- /dev/null +++ b/Documentation/networking/e1000.txt @@ -0,0 +1,642 @@ +Linux* Base Driver for the Intel(R) PRO/1000 Family of Adapters +=============================================================== + +September 26, 2006 + + +Contents +======== + +- In This Release +- Identifying Your Adapter +- Building and Installation +- Command Line Parameters +- Speed and Duplex Configuration +- Additional Configurations +- Known Issues +- Support + + +In This Release +=============== + +This file describes the Linux* Base Driver for the Intel(R) PRO/1000 Family +of Adapters. This driver includes support for Itanium(R)2-based systems. + +For questions related to hardware requirements, refer to the documentation +supplied with your Intel PRO/1000 adapter. All hardware requirements listed +apply to use with Linux. + +The following features are now available in supported kernels: + - Native VLANs + - Channel Bonding (teaming) + - SNMP + +Channel Bonding documentation can be found in the Linux kernel source: +/Documentation/networking/bonding.txt + +The driver information previously displayed in the /proc filesystem is not +supported in this release. Alternatively, you can use ethtool (version 1.6 +or later), lspci, and ifconfig to obtain the same information. + +Instructions on updating ethtool can be found in the section "Additional +Configurations" later in this document. + +NOTE: The Intel(R) 82562v 10/100 Network Connection only provides 10/100 +support. + + +Identifying Your Adapter +======================== + +For more information on how to identify your adapter, go to the Adapter & +Driver ID Guide at: + + http://support.intel.com/support/network/adapter/pro100/21397.htm + +For the latest Intel network drivers for Linux, refer to the following +website. In the search field, enter your adapter name or type, or use the +networking link on the left to search for your adapter: + + http://downloadfinder.intel.com/scripts-df/support_intel.asp + + +Command Line Parameters +======================= + +If the driver is built as a module, the following optional parameters +are used by entering them on the command line with the modprobe command +using this syntax: + + modprobe e1000 [<option>=<VAL1>,<VAL2>,...] + +For example, with two PRO/1000 PCI adapters, entering: + + modprobe e1000 TxDescriptors=80,128 + +loads the e1000 driver with 80 TX descriptors for the first adapter and +128 TX descriptors for the second adapter. + +The default value for each parameter is generally the recommended setting, +unless otherwise noted. + +NOTES: For more information about the AutoNeg, Duplex, and Speed + parameters, see the "Speed and Duplex Configuration" section in + this document. + + For more information about the InterruptThrottleRate, + RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay + parameters, see the application note at: + http://www.intel.com/design/network/applnots/ap450.htm + + A descriptor describes a data buffer and attributes related to + the data buffer. This information is accessed by the hardware. + + +AutoNeg +------- +(Supported only on adapters with copper connections) +Valid Range: 0x01-0x0F, 0x20-0x2F +Default Value: 0x2F + +This parameter is a bit-mask that specifies the speed and duplex settings +advertised by the adapter. When this parameter is used, the Speed and +Duplex parameters must not be specified. + +NOTE: Refer to the Speed and Duplex section of this readme for more + information on the AutoNeg parameter. + + +Duplex +------ +(Supported only on adapters with copper connections) +Valid Range: 0-2 (0=auto-negotiate, 1=half, 2=full) +Default Value: 0 + +This defines the direction in which data is allowed to flow. Can be +either one or two-directional. If both Duplex and the link partner are +set to auto-negotiate, the board auto-detects the correct duplex. If the +link partner is forced (either full or half), Duplex defaults to half- +duplex. + + +FlowControl +----------- +Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx) +Default Value: Reads flow control settings from the EEPROM + +This parameter controls the automatic generation(Tx) and response(Rx) +to Ethernet PAUSE frames. + + +InterruptThrottleRate +--------------------- +(not supported on Intel(R) 82542, 82543 or 82544-based adapters) +Valid Range: 0,1,3,100-100000 (0=off, 1=dynamic, 3=dynamic conservative) +Default Value: 3 + +The driver can limit the amount of interrupts per second that the adapter +will generate for incoming packets. It does this by writing a value to the +adapter that is based on the maximum amount of interrupts that the adapter +will generate per second. + +Setting InterruptThrottleRate to a value greater or equal to 100 +will program the adapter to send out a maximum of that many interrupts +per second, even if more packets have come in. This reduces interrupt +load on the system and can lower CPU utilization under heavy load, +but will increase latency as packets are not processed as quickly. + +The default behaviour of the driver previously assumed a static +InterruptThrottleRate value of 8000, providing a good fallback value for +all traffic types,but lacking in small packet performance and latency. +The hardware can handle many more small packets per second however, and +for this reason an adaptive interrupt moderation algorithm was implemented. + +Since 7.3.x, the driver has two adaptive modes (setting 1 or 3) in which +it dynamically adjusts the InterruptThrottleRate value based on the traffic +that it receives. After determining the type of incoming traffic in the last +timeframe, it will adjust the InterruptThrottleRate to an appropriate value +for that traffic. + +The algorithm classifies the incoming traffic every interval into +classes. Once the class is determined, the InterruptThrottleRate value is +adjusted to suit that traffic type the best. There are three classes defined: +"Bulk traffic", for large amounts of packets of normal size; "Low latency", +for small amounts of traffic and/or a significant percentage of small +packets; and "Lowest latency", for almost completely small packets or +minimal traffic. + +In dynamic conservative mode, the InterruptThrottleRate value is set to 4000 +for traffic that falls in class "Bulk traffic". If traffic falls in the "Low +latency" or "Lowest latency" class, the InterruptThrottleRate is increased +stepwise to 20000. This default mode is suitable for most applications. + +For situations where low latency is vital such as cluster or +grid computing, the algorithm can reduce latency even more when +InterruptThrottleRate is set to mode 1. In this mode, which operates +the same as mode 3, the InterruptThrottleRate will be increased stepwise to +70000 for traffic in class "Lowest latency". + +Setting InterruptThrottleRate to 0 turns off any interrupt moderation +and may improve small packet latency, but is generally not suitable +for bulk throughput traffic. + +NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and + RxAbsIntDelay parameters. In other words, minimizing the receive + and/or transmit absolute delays does not force the controller to + generate more interrupts than what the Interrupt Throttle Rate + allows. + +CAUTION: If you are using the Intel(R) PRO/1000 CT Network Connection + (controller 82547), setting InterruptThrottleRate to a value + greater than 75,000, may hang (stop transmitting) adapters + under certain network conditions. If this occurs a NETDEV + WATCHDOG message is logged in the system event log. In + addition, the controller is automatically reset, restoring + the network connection. To eliminate the potential for the + hang, ensure that InterruptThrottleRate is set no greater + than 75,000 and is not set to 0. + +NOTE: When e1000 is loaded with default settings and multiple adapters + are in use simultaneously, the CPU utilization may increase non- + linearly. In order to limit the CPU utilization without impacting + the overall throughput, we recommend that you load the driver as + follows: + + modprobe e1000 InterruptThrottleRate=3000,3000,3000 + + This sets the InterruptThrottleRate to 3000 interrupts/sec for + the first, second, and third instances of the driver. The range + of 2000 to 3000 interrupts per second works on a majority of + systems and is a good starting point, but the optimal value will + be platform-specific. If CPU utilization is not a concern, use + RX_POLLING (NAPI) and default driver settings. + + + +RxDescriptors +------------- +Valid Range: 80-256 for 82542 and 82543-based adapters + 80-4096 for all other supported adapters +Default Value: 256 + +This value specifies the number of receive buffer descriptors allocated +by the driver. Increasing this value allows the driver to buffer more +incoming packets, at the expense of increased system memory utilization. + +Each descriptor is 16 bytes. A receive buffer is also allocated for each +descriptor and can be either 2048, 4096, 8192, or 16384 bytes, depending +on the MTU setting. The maximum MTU size is 16110. + +NOTE: MTU designates the frame size. It only needs to be set for Jumbo + Frames. Depending on the available system resources, the request + for a higher number of receive descriptors may be denied. In this + case, use a lower number. + + +RxIntDelay +---------- +Valid Range: 0-65535 (0=off) +Default Value: 0 + +This value delays the generation of receive interrupts in units of 1.024 +microseconds. Receive interrupt reduction can improve CPU efficiency if +properly tuned for specific network traffic. Increasing this value adds +extra latency to frame reception and can end up decreasing the throughput +of TCP traffic. If the system is reporting dropped receives, this value +may be set too high, causing the driver to run out of available receive +descriptors. + +CAUTION: When setting RxIntDelay to a value other than 0, adapters may + hang (stop transmitting) under certain network conditions. If + this occurs a NETDEV WATCHDOG message is logged in the system + event log. In addition, the controller is automatically reset, + restoring the network connection. To eliminate the potential + for the hang ensure that RxIntDelay is set to 0. + + +RxAbsIntDelay +------------- +(This parameter is supported only on 82540, 82545 and later adapters.) +Valid Range: 0-65535 (0=off) +Default Value: 128 + +This value, in units of 1.024 microseconds, limits the delay in which a +receive interrupt is generated. Useful only if RxIntDelay is non-zero, +this value ensures that an interrupt is generated after the initial +packet is received within the set amount of time. Proper tuning, +along with RxIntDelay, may improve traffic throughput in specific network +conditions. + + +Speed +----- +(This parameter is supported only on adapters with copper connections.) +Valid Settings: 0, 10, 100, 1000 +Default Value: 0 (auto-negotiate at all supported speeds) + +Speed forces the line speed to the specified value in megabits per second +(Mbps). If this parameter is not specified or is set to 0 and the link +partner is set to auto-negotiate, the board will auto-detect the correct +speed. Duplex should also be set when Speed is set to either 10 or 100. + + +TxDescriptors +------------- +Valid Range: 80-256 for 82542 and 82543-based adapters + 80-4096 for all other supported adapters +Default Value: 256 + +This value is the number of transmit descriptors allocated by the driver. +Increasing this value allows the driver to queue more transmits. Each +descriptor is 16 bytes. + +NOTE: Depending on the available system resources, the request for a + higher number of transmit descriptors may be denied. In this case, + use a lower number. + + +TxIntDelay +---------- +Valid Range: 0-65535 (0=off) +Default Value: 64 + +This value delays the generation of transmit interrupts in units of +1.024 microseconds. Transmit interrupt reduction can improve CPU +efficiency if properly tuned for specific network traffic. If the +system is reporting dropped transmits, this value may be set too high +causing the driver to run out of available transmit descriptors. + + +TxAbsIntDelay +------------- +(This parameter is supported only on 82540, 82545 and later adapters.) +Valid Range: 0-65535 (0=off) +Default Value: 64 + +This value, in units of 1.024 microseconds, limits the delay in which a +transmit interrupt is generated. Useful only if TxIntDelay is non-zero, +this value ensures that an interrupt is generated after the initial +packet is sent on the wire within the set amount of time. Proper tuning, +along with TxIntDelay, may improve traffic throughput in specific +network conditions. + +XsumRX +------ +(This parameter is NOT supported on the 82542-based adapter.) +Valid Range: 0-1 +Default Value: 1 + +A value of '1' indicates that the driver should enable IP checksum +offload for received packets (both UDP and TCP) to the adapter hardware. + + +Speed and Duplex Configuration +============================== + +Three keywords are used to control the speed and duplex configuration. +These keywords are Speed, Duplex, and AutoNeg. + +If the board uses a fiber interface, these keywords are ignored, and the +fiber interface board only links at 1000 Mbps full-duplex. + +For copper-based boards, the keywords interact as follows: + + The default operation is auto-negotiate. The board advertises all + supported speed and duplex combinations, and it links at the highest + common speed and duplex mode IF the link partner is set to auto-negotiate. + + If Speed = 1000, limited auto-negotiation is enabled and only 1000 Mbps + is advertised (The 1000BaseT spec requires auto-negotiation.) + + If Speed = 10 or 100, then both Speed and Duplex should be set. Auto- + negotiation is disabled, and the AutoNeg parameter is ignored. Partner + SHOULD also be forced. + +The AutoNeg parameter is used when more control is required over the +auto-negotiation process. It should be used when you wish to control which +speed and duplex combinations are advertised during the auto-negotiation +process. + +The parameter may be specified as either a decimal or hexadecimal value as +determined by the bitmap below. + +Bit position 7 6 5 4 3 2 1 0 +Decimal Value 128 64 32 16 8 4 2 1 +Hex value 80 40 20 10 8 4 2 1 +Speed (Mbps) N/A N/A 1000 N/A 100 100 10 10 +Duplex Full Full Half Full Half + +Some examples of using AutoNeg: + + modprobe e1000 AutoNeg=0x01 (Restricts autonegotiation to 10 Half) + modprobe e1000 AutoNeg=1 (Same as above) + modprobe e1000 AutoNeg=0x02 (Restricts autonegotiation to 10 Full) + modprobe e1000 AutoNeg=0x03 (Restricts autonegotiation to 10 Half or 10 Full) + modprobe e1000 AutoNeg=0x04 (Restricts autonegotiation to 100 Half) + modprobe e1000 AutoNeg=0x05 (Restricts autonegotiation to 10 Half or 100 + Half) + modprobe e1000 AutoNeg=0x020 (Restricts autonegotiation to 1000 Full) + modprobe e1000 AutoNeg=32 (Same as above) + +Note that when this parameter is used, Speed and Duplex must not be specified. + +If the link partner is forced to a specific speed and duplex, then this +parameter should not be used. Instead, use the Speed and Duplex parameters +previously mentioned to force the adapter to the same speed and duplex. + + +Additional Configurations +========================= + + Configuring the Driver on Different Distributions + ------------------------------------------------- + Configuring a network driver to load properly when the system is started + is distribution dependent. Typically, the configuration process involves + adding an alias line to /etc/modules.conf or /etc/modprobe.conf as well + as editing other system startup scripts and/or configuration files. Many + popular Linux distributions ship with tools to make these changes for you. + To learn the proper way to configure a network device for your system, + refer to your distribution documentation. If during this process you are + asked for the driver or module name, the name for the Linux Base Driver + for the Intel(R) PRO/1000 Family of Adapters is e1000. + + As an example, if you install the e1000 driver for two PRO/1000 adapters + (eth0 and eth1) and set the speed and duplex to 10full and 100half, add + the following to modules.conf or or modprobe.conf: + + alias eth0 e1000 + alias eth1 e1000 + options e1000 Speed=10,100 Duplex=2,1 + + Viewing Link Messages + --------------------- + Link messages will not be displayed to the console if the distribution is + restricting system messages. In order to see network driver link messages + on your console, set dmesg to eight by entering the following: + + dmesg -n 8 + + NOTE: This setting is not saved across reboots. + + Jumbo Frames + ------------ + Jumbo Frames support is enabled by changing the MTU to a value larger than + the default of 1500. Use the ifconfig command to increase the MTU size. + For example: + + ifconfig eth<x> mtu 9000 up + + This setting is not saved across reboots. It can be made permanent if + you add: + + MTU=9000 + + to the file /etc/sysconfig/network-scripts/ifcfg-eth<x>. This example + applies to the Red Hat distributions; other distributions may store this + setting in a different location. + + Notes: + + - To enable Jumbo Frames, increase the MTU size on the interface beyond + 1500. + + - The maximum MTU setting for Jumbo Frames is 16110. This value coincides + with the maximum Jumbo Frames size of 16128. + + - Using Jumbo Frames at 10 or 100 Mbps may result in poor performance or + loss of link. + + - Some Intel gigabit adapters that support Jumbo Frames have a frame size + limit of 9238 bytes, with a corresponding MTU size limit of 9216 bytes. + The adapters with this limitation are based on the Intel(R) 82571EB, + 82572EI, 82573L and 80003ES2LAN controller. These correspond to the + following product names: + Intel(R) PRO/1000 PT Server Adapter + Intel(R) PRO/1000 PT Desktop Adapter + Intel(R) PRO/1000 PT Network Connection + Intel(R) PRO/1000 PT Dual Port Server Adapter + Intel(R) PRO/1000 PT Dual Port Network Connection + Intel(R) PRO/1000 PF Server Adapter + Intel(R) PRO/1000 PF Network Connection + Intel(R) PRO/1000 PF Dual Port Server Adapter + Intel(R) PRO/1000 PB Server Connection + Intel(R) PRO/1000 PL Network Connection + Intel(R) PRO/1000 EB Network Connection with I/O Acceleration + Intel(R) PRO/1000 EB Backplane Connection with I/O Acceleration + Intel(R) PRO/1000 PT Quad Port Server Adapter + + - Adapters based on the Intel(R) 82542 and 82573V/E controller do not + support Jumbo Frames. These correspond to the following product names: + Intel(R) PRO/1000 Gigabit Server Adapter + Intel(R) PRO/1000 PM Network Connection + + - The following adapters do not support Jumbo Frames: + Intel(R) 82562V 10/100 Network Connection + Intel(R) 82566DM Gigabit Network Connection + Intel(R) 82566DC Gigabit Network Connection + Intel(R) 82566MM Gigabit Network Connection + Intel(R) 82566MC Gigabit Network Connection + Intel(R) 82562GT 10/100 Network Connection + Intel(R) 82562G 10/100 Network Connection + + + Ethtool + ------- + The driver utilizes the ethtool interface for driver configuration and + diagnostics, as well as displaying statistical information. Ethtool + version 1.6 or later is required for this functionality. + + The latest release of ethtool can be found from + http://sourceforge.net/projects/gkernel. + + NOTE: Ethtool 1.6 only supports a limited set of ethtool options. Support + for a more complete ethtool feature set can be enabled by upgrading + ethtool to ethtool-1.8.1. + + Enabling Wake on LAN* (WoL) + --------------------------- + WoL is configured through the Ethtool* utility. Ethtool is included with + all versions of Red Hat after Red Hat 7.2. For other Linux distributions, + download and install Ethtool from the following website: + http://sourceforge.net/projects/gkernel. + + For instructions on enabling WoL with Ethtool, refer to the website listed + above. + + WoL will be enabled on the system during the next shut down or reboot. + For this driver version, in order to enable WoL, the e1000 driver must be + loaded when shutting down or rebooting the system. + + Wake On LAN is only supported on port A for the following devices: + Intel(R) PRO/1000 PT Dual Port Network Connection + Intel(R) PRO/1000 PT Dual Port Server Connection + Intel(R) PRO/1000 PT Dual Port Server Adapter + Intel(R) PRO/1000 PF Dual Port Server Adapter + Intel(R) PRO/1000 PT Quad Port Server Adapter + + NAPI + ---- + NAPI (Rx polling mode) is enabled in the e1000 driver. + + See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI. + + +Known Issues +============ + +Dropped Receive Packets on Half-duplex 10/100 Networks +------------------------------------------------------ +If you have an Intel PCI Express adapter running at 10mbps or 100mbps, half- +duplex, you may observe occasional dropped receive packets. There are no +workarounds for this problem in this network configuration. The network must +be updated to operate in full-duplex, and/or 1000mbps only. + +Jumbo Frames System Requirement +------------------------------- +Memory allocation failures have been observed on Linux systems with 64 MB +of RAM or less that are running Jumbo Frames. If you are using Jumbo +Frames, your system may require more than the advertised minimum +requirement of 64 MB of system memory. + +Performance Degradation with Jumbo Frames +----------------------------------------- +Degradation in throughput performance may be observed in some Jumbo frames +environments. If this is observed, increasing the application's socket +buffer size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values +may help. See the specific application manual and +/usr/src/linux*/Documentation/ +networking/ip-sysctl.txt for more details. + +Jumbo Frames on Foundry BigIron 8000 switch +------------------------------------------- +There is a known issue using Jumbo frames when connected to a Foundry +BigIron 8000 switch. This is a 3rd party limitation. If you experience +loss of packets, lower the MTU size. + +Allocating Rx Buffers when Using Jumbo Frames +--------------------------------------------- +Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if +the available memory is heavily fragmented. This issue may be seen with PCI-X +adapters or with packet split disabled. This can be reduced or eliminated +by changing the amount of available memory for receive buffer allocation, by +increasing /proc/sys/vm/min_free_kbytes. + +Multiple Interfaces on Same Ethernet Broadcast Network +------------------------------------------------------ +Due to the default ARP behavior on Linux, it is not possible to have +one system on two IP networks in the same Ethernet broadcast domain +(non-partitioned switch) behave as expected. All Ethernet interfaces +will respond to IP traffic for any IP address assigned to the system. +This results in unbalanced receive traffic. + +If you have multiple interfaces in a server, either turn on ARP +filtering by entering: + + echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter +(this only works if your kernel's version is higher than 2.4.5), + +NOTE: This setting is not saved across reboots. The configuration +change can be made permanent by adding the line: + net.ipv4.conf.all.arp_filter = 1 +to the file /etc/sysctl.conf + + or, + +install the interfaces in separate broadcast domains (either in +different switches or in a switch partitioned to VLANs). + +82541/82547 can't link or are slow to link with some link partners +----------------------------------------------------------------- +There is a known compatibility issue with 82541/82547 and some +low-end switches where the link will not be established, or will +be slow to establish. In particular, these switches are known to +be incompatible with 82541/82547: + + Planex FXG-08TE + I-O Data ETG-SH8 + +To workaround this issue, the driver can be compiled with an override +of the PHY's master/slave setting. Forcing master or forcing slave +mode will improve time-to-link. + + # make CFLAGS_EXTRA=-DE1000_MASTER_SLAVE=<n> + +Where <n> is: + + 0 = Hardware default + 1 = Master mode + 2 = Slave mode + 3 = Auto master/slave + +Disable rx flow control with ethtool +------------------------------------ +In order to disable receive flow control using ethtool, you must turn +off auto-negotiation on the same command line. + +For example: + + ethtool -A eth? autoneg off rx off + +Unplugging network cable while ethtool -p is running +---------------------------------------------------- +In kernel versions 2.5.50 and later (including 2.6 kernel), unplugging +the network cable while ethtool -p is running will cause the system to +become unresponsive to keyboard commands, except for control-alt-delete. +Restarting the system appears to be the only remedy. + + +Support +======= + +For general information, go to the Intel support website at: + + http://support.intel.com + +or the Intel Wired Networking project hosted by Sourceforge at: + + http://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on the supported +kernel with a supported adapter, email the specific information related +to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/eql.txt b/Documentation/networking/eql.txt new file mode 100644 index 0000000..0f15501 --- /dev/null +++ b/Documentation/networking/eql.txt @@ -0,0 +1,528 @@ + EQL Driver: Serial IP Load Balancing HOWTO + Simon "Guru Aleph-Null" Janes, simon@ncm.com + v1.1, February 27, 1995 + + This is the manual for the EQL device driver. EQL is a software device + that lets you load-balance IP serial links (SLIP or uncompressed PPP) + to increase your bandwidth. It will not reduce your latency (i.e. ping + times) except in the case where you already have lots of traffic on + your link, in which it will help them out. This driver has been tested + with the 1.1.75 kernel, and is known to have patched cleanly with + 1.1.86. Some testing with 1.1.92 has been done with the v1.1 patch + which was only created to patch cleanly in the very latest kernel + source trees. (Yes, it worked fine.) + + 1. Introduction + + Which is worse? A huge fee for a 56K leased line or two phone lines? + It's probably the former. If you find yourself craving more bandwidth, + and have a ISP that is flexible, it is now possible to bind modems + together to work as one point-to-point link to increase your + bandwidth. All without having to have a special black box on either + side. + + + The eql driver has only been tested with the Livingston PortMaster-2e + terminal server. I do not know if other terminal servers support load- + balancing, but I do know that the PortMaster does it, and does it + almost as well as the eql driver seems to do it (-- Unfortunately, in + my testing so far, the Livingston PortMaster 2e's load-balancing is a + good 1 to 2 KB/s slower than the test machine working with a 28.8 Kbps + and 14.4 Kbps connection. However, I am not sure that it really is + the PortMaster, or if it's Linux's TCP drivers. I'm told that Linux's + TCP implementation is pretty fast though.--) + + + I suggest to ISPs out there that it would probably be fair to charge + a load-balancing client 75% of the cost of the second line and 50% of + the cost of the third line etc... + + + Hey, we can all dream you know... + + + 2. Kernel Configuration + + Here I describe the general steps of getting a kernel up and working + with the eql driver. From patching, building, to installing. + + + 2.1. Patching The Kernel + + If you do not have or cannot get a copy of the kernel with the eql + driver folded into it, get your copy of the driver from + ftp://slaughter.ncm.com/pub/Linux/LOAD_BALANCING/eql-1.1.tar.gz. + Unpack this archive someplace obvious like /usr/local/src/. It will + create the following files: + + + + ______________________________________________________________________ + -rw-r--r-- guru/ncm 198 Jan 19 18:53 1995 eql-1.1/NO-WARRANTY + -rw-r--r-- guru/ncm 30620 Feb 27 21:40 1995 eql-1.1/eql-1.1.patch + -rwxr-xr-x guru/ncm 16111 Jan 12 22:29 1995 eql-1.1/eql_enslave + -rw-r--r-- guru/ncm 2195 Jan 10 21:48 1995 eql-1.1/eql_enslave.c + ______________________________________________________________________ + + Unpack a recent kernel (something after 1.1.92) someplace convenient + like say /usr/src/linux-1.1.92.eql. Use symbolic links to point + /usr/src/linux to this development directory. + + + Apply the patch by running the commands: + + + ______________________________________________________________________ + cd /usr/src + patch </usr/local/src/eql-1.1/eql-1.1.patch + ______________________________________________________________________ + + + + + + 2.2. Building The Kernel + + After patching the kernel, run make config and configure the kernel + for your hardware. + + + After configuration, make and install according to your habit. + + + 3. Network Configuration + + So far, I have only used the eql device with the DSLIP SLIP connection + manager by Matt Dillon (-- "The man who sold his soul to code so much + so quickly."--) . How you configure it for other "connection" + managers is up to you. Most other connection managers that I've seen + don't do a very good job when it comes to handling more than one + connection. + + + 3.1. /etc/rc.d/rc.inet1 + + In rc.inet1, ifconfig the eql device to the IP address you usually use + for your machine, and the MTU you prefer for your SLIP lines. One + could argue that MTU should be roughly half the usual size for two + modems, one-third for three, one-fourth for four, etc... But going + too far below 296 is probably overkill. Here is an example ifconfig + command that sets up the eql device: + + + + ______________________________________________________________________ + ifconfig eql 198.67.33.239 mtu 1006 + ______________________________________________________________________ + + + + + + Once the eql device is up and running, add a static default route to + it in the routing table using the cool new route syntax that makes + life so much easier: + + + + ______________________________________________________________________ + route add default eql + ______________________________________________________________________ + + + 3.2. Enslaving Devices By Hand + + Enslaving devices by hand requires two utility programs: eql_enslave + and eql_emancipate (-- eql_emancipate hasn't been written because when + an enslaved device "dies", it is automatically taken out of the queue. + I haven't found a good reason to write it yet... other than for + completeness, but that isn't a good motivator is it?--) + + + The syntax for enslaving a device is "eql_enslave <master-name> + <slave-name> <estimated-bps>". Here are some example enslavings: + + + + ______________________________________________________________________ + eql_enslave eql sl0 28800 + eql_enslave eql ppp0 14400 + eql_enslave eql sl1 57600 + ______________________________________________________________________ + + + + + + When you want to free a device from its life of slavery, you can + either down the device with ifconfig (eql will automatically bury the + dead slave and remove it from its queue) or use eql_emancipate to free + it. (-- Or just ifconfig it down, and the eql driver will take it out + for you.--) + + + + ______________________________________________________________________ + eql_emancipate eql sl0 + eql_emancipate eql ppp0 + eql_emancipate eql sl1 + ______________________________________________________________________ + + + + + + 3.3. DSLIP Configuration for the eql Device + + The general idea is to bring up and keep up as many SLIP connections + as you need, automatically. + + + 3.3.1. /etc/slip/runslip.conf + + Here is an example runslip.conf: + + + + + + + + + + + + + + + + ______________________________________________________________________ + name sl-line-1 + enabled + baud 38400 + mtu 576 + ducmd -e /etc/slip/dialout/cua2-288.xp -t 9 + command eql_enslave eql $interface 28800 + address 198.67.33.239 + line /dev/cua2 + + name sl-line-2 + enabled + baud 38400 + mtu 576 + ducmd -e /etc/slip/dialout/cua3-288.xp -t 9 + command eql_enslave eql $interface 28800 + address 198.67.33.239 + line /dev/cua3 + ______________________________________________________________________ + + + + + + 3.4. Using PPP and the eql Device + + I have not yet done any load-balancing testing for PPP devices, mainly + because I don't have a PPP-connection manager like SLIP has with + DSLIP. I did find a good tip from LinuxNET:Billy for PPP performance: + make sure you have asyncmap set to something so that control + characters are not escaped. + + + I tried to fix up a PPP script/system for redialing lost PPP + connections for use with the eql driver the weekend of Feb 25-26 '95 + (Hereafter known as the 8-hour PPP Hate Festival). Perhaps later this + year. + + + 4. About the Slave Scheduler Algorithm + + The slave scheduler probably could be replaced with a dozen other + things and push traffic much faster. The formula in the current set + up of the driver was tuned to handle slaves with wildly different + bits-per-second "priorities". + + + All testing I have done was with two 28.8 V.FC modems, one connecting + at 28800 bps or slower, and the other connecting at 14400 bps all the + time. + + + One version of the scheduler was able to push 5.3 K/s through the + 28800 and 14400 connections, but when the priorities on the links were + very wide apart (57600 vs. 14400) the "faster" modem received all + traffic and the "slower" modem starved. + + + 5. Testers' Reports + + Some people have experimented with the eql device with newer + kernels (than 1.1.75). I have since updated the driver to patch + cleanly in newer kernels because of the removal of the old "slave- + balancing" driver config option. + + + o icee from LinuxNET patched 1.1.86 without any rejects and was able + to boot the kernel and enslave a couple of ISDN PPP links. + + 5.1. Randolph Bentson's Test Report + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + From bentson@grieg.seaslug.org Wed Feb 8 19:08:09 1995 + Date: Tue, 7 Feb 95 22:57 PST + From: Randolph Bentson <bentson@grieg.seaslug.org> + To: guru@ncm.com + Subject: EQL driver tests + + + I have been checking out your eql driver. (Nice work, that!) + Although you may already done this performance testing, here + are some data I've discovered. + + Randolph Bentson + bentson@grieg.seaslug.org + + --------------------------------------------------------- + + + A pseudo-device driver, EQL, written by Simon Janes, can be used + to bundle multiple SLIP connections into what appears to be a + single connection. This allows one to improve dial-up network + connectivity gradually, without having to buy expensive DSU/CSU + hardware and services. + + I have done some testing of this software, with two goals in + mind: first, to ensure it actually works as described and + second, as a method of exercising my device driver. + + The following performance measurements were derived from a set + of SLIP connections run between two Linux systems (1.1.84) using + a 486DX2/66 with a Cyclom-8Ys and a 486SLC/40 with a Cyclom-16Y. + (Ports 0,1,2,3 were used. A later configuration will distribute + port selection across the different Cirrus chips on the boards.) + Once a link was established, I timed a binary ftp transfer of + 289284 bytes of data. If there were no overhead (packet headers, + inter-character and inter-packet delays, etc.) the transfers + would take the following times: + + bits/sec seconds + 345600 8.3 + 234600 12.3 + 172800 16.7 + 153600 18.8 + 76800 37.6 + 57600 50.2 + 38400 75.3 + 28800 100.4 + 19200 150.6 + 9600 301.3 + + A single line running at the lower speeds and with large packets + comes to within 2% of this. Performance is limited for the higher + speeds (as predicted by the Cirrus databook) to an aggregate of + about 160 kbits/sec. The next round of testing will distribute + the load across two or more Cirrus chips. + + The good news is that one gets nearly the full advantage of the + second, third, and fourth line's bandwidth. (The bad news is + that the connection establishment seemed fragile for the higher + speeds. Once established, the connection seemed robust enough.) + + #lines speed mtu seconds theory actual %of + kbit/sec duration speed speed max + 3 115200 900 _ 345600 + 3 115200 400 18.1 345600 159825 46 + 2 115200 900 _ 230400 + 2 115200 600 18.1 230400 159825 69 + 2 115200 400 19.3 230400 149888 65 + 4 57600 900 _ 234600 + 4 57600 600 _ 234600 + 4 57600 400 _ 234600 + 3 57600 600 20.9 172800 138413 80 + 3 57600 900 21.2 172800 136455 78 + 3 115200 600 21.7 345600 133311 38 + 3 57600 400 22.5 172800 128571 74 + 4 38400 900 25.2 153600 114795 74 + 4 38400 600 26.4 153600 109577 71 + 4 38400 400 27.3 153600 105965 68 + 2 57600 900 29.1 115200 99410.3 86 + 1 115200 900 30.7 115200 94229.3 81 + 2 57600 600 30.2 115200 95789.4 83 + 3 38400 900 30.3 115200 95473.3 82 + 3 38400 600 31.2 115200 92719.2 80 + 1 115200 600 31.3 115200 92423 80 + 2 57600 400 32.3 115200 89561.6 77 + 1 115200 400 32.8 115200 88196.3 76 + 3 38400 400 33.5 115200 86353.4 74 + 2 38400 900 43.7 76800 66197.7 86 + 2 38400 600 44 76800 65746.4 85 + 2 38400 400 47.2 76800 61289 79 + 4 19200 900 50.8 76800 56945.7 74 + 4 19200 400 53.2 76800 54376.7 70 + 4 19200 600 53.7 76800 53870.4 70 + 1 57600 900 54.6 57600 52982.4 91 + 1 57600 600 56.2 57600 51474 89 + 3 19200 900 60.5 57600 47815.5 83 + 1 57600 400 60.2 57600 48053.8 83 + 3 19200 600 62 57600 46658.7 81 + 3 19200 400 64.7 57600 44711.6 77 + 1 38400 900 79.4 38400 36433.8 94 + 1 38400 600 82.4 38400 35107.3 91 + 2 19200 900 84.4 38400 34275.4 89 + 1 38400 400 86.8 38400 33327.6 86 + 2 19200 600 87.6 38400 33023.3 85 + 2 19200 400 91.2 38400 31719.7 82 + 4 9600 900 94.7 38400 30547.4 79 + 4 9600 400 106 38400 27290.9 71 + 4 9600 600 110 38400 26298.5 68 + 3 9600 900 118 28800 24515.6 85 + 3 9600 600 120 28800 24107 83 + 3 9600 400 131 28800 22082.7 76 + 1 19200 900 155 19200 18663.5 97 + 1 19200 600 161 19200 17968 93 + 1 19200 400 170 19200 17016.7 88 + 2 9600 600 176 19200 16436.6 85 + 2 9600 900 180 19200 16071.3 83 + 2 9600 400 181 19200 15982.5 83 + 1 9600 900 305 9600 9484.72 98 + 1 9600 600 314 9600 9212.87 95 + 1 9600 400 332 9600 8713.37 90 + + + + + + 5.2. Anthony Healy's Report + + + + + + + + Date: Mon, 13 Feb 1995 16:17:29 +1100 (EST) + From: Antony Healey <ahealey@st.nepean.uws.edu.au> + To: Simon Janes <guru@ncm.com> + Subject: Re: Load Balancing + + Hi Simon, + I've installed your patch and it works great. I have trialed + it over twin SL/IP lines, just over null modems, but I was + able to data at over 48Kb/s [ISDN link -Simon]. I managed a + transfer of up to 7.5 Kbyte/s on one go, but averaged around + 6.4 Kbyte/s, which I think is pretty cool. :) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/Documentation/networking/ewrk3.txt b/Documentation/networking/ewrk3.txt new file mode 100644 index 0000000..90e9e5f --- /dev/null +++ b/Documentation/networking/ewrk3.txt @@ -0,0 +1,46 @@ +The EtherWORKS 3 driver in this distribution is designed to work with all +kernels > 1.1.33 (approx) and includes tools in the 'ewrk3tools' +subdirectory to allow set up of the card, similar to the MSDOS +'NICSETUP.EXE' tools provided on the DOS drivers disk (type 'make' in that +subdirectory to make the tools). + +The supported cards are DE203, DE204 and DE205. All other cards are NOT +supported - refer to 'depca.c' for running the LANCE based network cards and +'de4x5.c' for the DIGITAL Semiconductor PCI chip based adapters from +Digital. + +The ability to load this driver as a loadable module has been included and +used extensively during the driver development (to save those long reboot +sequences). To utilise this ability, you have to do 8 things: + + 0) have a copy of the loadable modules code installed on your system. + 1) copy ewrk3.c from the /linux/drivers/net directory to your favourite + temporary directory. + 2) edit the source code near line 1898 to reflect the I/O address and + IRQ you're using. + 3) compile ewrk3.c, but include -DMODULE in the command line to ensure + that the correct bits are compiled (see end of source code). + 4) if you are wanting to add a new card, goto 5. Otherwise, recompile a + kernel with the ewrk3 configuration turned off and reboot. + 5) insmod ewrk3.o + [Alan Cox: Changed this so you can insmod ewrk3.o irq=x io=y] + [Adam Kropelin: Multiple cards now supported by irq=x1,x2 io=y1,y2] + 6) run the net startup bits for your new eth?? interface manually + (usually /etc/rc.inet[12] at boot time). + 7) enjoy! + + Note that autoprobing is not allowed in loadable modules - the system is + already up and running and you're messing with interrupts. + + To unload a module, turn off the associated interface + 'ifconfig eth?? down' then 'rmmod ewrk3'. + +The performance we've achieved so far has been measured through the 'ttcp' +tool at 975kB/s. This measures the total TCP stack performance which +includes the card, so don't expect to get much nearer the 1.25MB/s +theoretical Ethernet rate. + + +Enjoy! + +Dave diff --git a/Documentation/networking/fib_trie.txt b/Documentation/networking/fib_trie.txt new file mode 100644 index 0000000..0723db7 --- /dev/null +++ b/Documentation/networking/fib_trie.txt @@ -0,0 +1,145 @@ + LC-trie implementation notes. + +Node types +---------- +leaf + An end node with data. This has a copy of the relevant key, along + with 'hlist' with routing table entries sorted by prefix length. + See struct leaf and struct leaf_info. + +trie node or tnode + An internal node, holding an array of child (leaf or tnode) pointers, + indexed through a subset of the key. See Level Compression. + +A few concepts explained +------------------------ +Bits (tnode) + The number of bits in the key segment used for indexing into the + child array - the "child index". See Level Compression. + +Pos (tnode) + The position (in the key) of the key segment used for indexing into + the child array. See Path Compression. + +Path Compression / skipped bits + Any given tnode is linked to from the child array of its parent, using + a segment of the key specified by the parent's "pos" and "bits" + In certain cases, this tnode's own "pos" will not be immediately + adjacent to the parent (pos+bits), but there will be some bits + in the key skipped over because they represent a single path with no + deviations. These "skipped bits" constitute Path Compression. + Note that the search algorithm will simply skip over these bits when + searching, making it necessary to save the keys in the leaves to + verify that they actually do match the key we are searching for. + +Level Compression / child arrays + the trie is kept level balanced moving, under certain conditions, the + children of a full child (see "full_children") up one level, so that + instead of a pure binary tree, each internal node ("tnode") may + contain an arbitrarily large array of links to several children. + Conversely, a tnode with a mostly empty child array (see empty_children) + may be "halved", having some of its children moved downwards one level, + in order to avoid ever-increasing child arrays. + +empty_children + the number of positions in the child array of a given tnode that are + NULL. + +full_children + the number of children of a given tnode that aren't path compressed. + (in other words, they aren't NULL or leaves and their "pos" is equal + to this tnode's "pos"+"bits"). + + (The word "full" here is used more in the sense of "complete" than + as the opposite of "empty", which might be a tad confusing.) + +Comments +--------- + +We have tried to keep the structure of the code as close to fib_hash as +possible to allow verification and help up reviewing. + +fib_find_node() + A good start for understanding this code. This function implements a + straightforward trie lookup. + +fib_insert_node() + Inserts a new leaf node in the trie. This is bit more complicated than + fib_find_node(). Inserting a new node means we might have to run the + level compression algorithm on part of the trie. + +trie_leaf_remove() + Looks up a key, deletes it and runs the level compression algorithm. + +trie_rebalance() + The key function for the dynamic trie after any change in the trie + it is run to optimize and reorganize. Tt will walk the trie upwards + towards the root from a given tnode, doing a resize() at each step + to implement level compression. + +resize() + Analyzes a tnode and optimizes the child array size by either inflating + or shrinking it repeatedly until it fulfills the criteria for optimal + level compression. This part follows the original paper pretty closely + and there may be some room for experimentation here. + +inflate() + Doubles the size of the child array within a tnode. Used by resize(). + +halve() + Halves the size of the child array within a tnode - the inverse of + inflate(). Used by resize(); + +fn_trie_insert(), fn_trie_delete(), fn_trie_select_default() + The route manipulation functions. Should conform pretty closely to the + corresponding functions in fib_hash. + +fn_trie_flush() + This walks the full trie (using nextleaf()) and searches for empty + leaves which have to be removed. + +fn_trie_dump() + Dumps the routing table ordered by prefix length. This is somewhat + slower than the corresponding fib_hash function, as we have to walk the + entire trie for each prefix length. In comparison, fib_hash is organized + as one "zone"/hash per prefix length. + +Locking +------- + +fib_lock is used for an RW-lock in the same way that this is done in fib_hash. +However, the functions are somewhat separated for other possible locking +scenarios. It might conceivably be possible to run trie_rebalance via RCU +to avoid read_lock in the fn_trie_lookup() function. + +Main lookup mechanism +--------------------- +fn_trie_lookup() is the main lookup function. + +The lookup is in its simplest form just like fib_find_node(). We descend the +trie, key segment by key segment, until we find a leaf. check_leaf() does +the fib_semantic_match in the leaf's sorted prefix hlist. + +If we find a match, we are done. + +If we don't find a match, we enter prefix matching mode. The prefix length, +starting out at the same as the key length, is reduced one step at a time, +and we backtrack upwards through the trie trying to find a longest matching +prefix. The goal is always to reach a leaf and get a positive result from the +fib_semantic_match mechanism. + +Inside each tnode, the search for longest matching prefix consists of searching +through the child array, chopping off (zeroing) the least significant "1" of +the child index until we find a match or the child index consists of nothing but +zeros. + +At this point we backtrack (t->stats.backtrack++) up the trie, continuing to +chop off part of the key in order to find the longest matching prefix. + +At this point we will repeatedly descend subtries to look for a match, and there +are some optimizations available that can provide us with "shortcuts" to avoid +descending into dead ends. Look for "HL_OPTIMIZE" sections in the code. + +To alleviate any doubts about the correctness of the route selection process, +a new netlink operation has been added. Look for NETLINK_FIB_LOOKUP, which +gives userland access to fib_lookup(). diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt new file mode 100644 index 0000000..bbf2005 --- /dev/null +++ b/Documentation/networking/filter.txt @@ -0,0 +1,42 @@ +filter.txt: Linux Socket Filtering +Written by: Jay Schulist <jschlst@samba.org> + +Introduction +============ + + Linux Socket Filtering is derived from the Berkeley +Packet Filter. There are some distinct differences between +the BSD and Linux Kernel Filtering. + +Linux Socket Filtering (LSF) allows a user-space program to +attach a filter onto any socket and allow or disallow certain +types of data to come through the socket. LSF follows exactly +the same filter code structure as the BSD Berkeley Packet Filter +(BPF), so referring to the BSD bpf.4 manpage is very helpful in +creating filters. + +LSF is much simpler than BPF. One does not have to worry about +devices or anything like that. You simply create your filter +code, send it to the kernel via the SO_ATTACH_FILTER ioctl and +if your filter code passes the kernel check on it, you then +immediately begin filtering data on that socket. + +You can also detach filters from your socket via the +SO_DETACH_FILTER ioctl. This will probably not be used much +since when you close a socket that has a filter on it the +filter is automagically removed. The other less common case +may be adding a different filter on the same socket where you had another +filter that is still running: the kernel takes care of removing +the old one and placing your new one in its place, assuming your +filter has passed the checks, otherwise if it fails the old filter +will remain on that socket. + +Examples +======== + +Ioctls- +setsockopt(sockfd, SOL_SOCKET, SO_ATTACH_FILTER, &Filter, sizeof(Filter)); +setsockopt(sockfd, SOL_SOCKET, SO_DETACH_FILTER, &value, sizeof(value)); + +See the BSD bpf.4 manpage and the BSD Packet Filter paper written by +Steven McCanne and Van Jacobson of Lawrence Berkeley Laboratory. diff --git a/Documentation/networking/fore200e.txt b/Documentation/networking/fore200e.txt new file mode 100644 index 0000000..b1f337f --- /dev/null +++ b/Documentation/networking/fore200e.txt @@ -0,0 +1,66 @@ + +FORE Systems PCA-200E/SBA-200E ATM NIC driver +--------------------------------------------- + +This driver adds support for the FORE Systems 200E-series ATM adapters +to the Linux operating system. It is based on the earlier PCA-200E driver +written by Uwe Dannowski. + +The driver simultaneously supports PCA-200E and SBA-200E adapters on +i386, alpha (untested), powerpc, sparc and sparc64 archs. + +The intent is to enable the use of different models of FORE adapters at the +same time, by hosts that have several bus interfaces (such as PCI+SBUS, +PCI+MCA or PCI+EISA). + +Only PCI and SBUS devices are currently supported by the driver, but support +for other bus interfaces such as EISA should not be too hard to add (this may +be more tricky for the MCA bus, though, as FORE made some MCA-specific +modifications to the adapter's AALI interface). + + +Firmware Copyright Notice +------------------------- + +Please read the fore200e_firmware_copyright file present +in the linux/drivers/atm directory for details and restrictions. + + +Firmware Updates +---------------- + +The FORE Systems 200E-series driver is shipped with firmware data being +uploaded to the ATM adapters at system boot time or at module loading time. +The supplied firmware images should work with all adapters. + +However, if you encounter problems (the firmware doesn't start or the driver +is unable to read the PROM data), you may consider trying another firmware +version. Alternative binary firmware images can be found somewhere on the +ForeThought CD-ROM supplied with your adapter by FORE Systems. + +You can also get the latest firmware images from FORE Systems at +http://www.fore.com. Register TACTics Online and go to +the 'software updates' pages. The firmware binaries are part of +the various ForeThought software distributions. + +Notice that different versions of the PCA-200E firmware exist, depending +on the endianess of the host architecture. The driver is shipped with +both little and big endian PCA firmware images. + +Name and location of the new firmware images can be set at kernel +configuration time: + +1. Copy the new firmware binary files (with .bin, .bin1 or .bin2 suffix) + to some directory, such as linux/drivers/atm. + +2. Reconfigure your kernel to set the new firmware name and location. + Expected pathnames are absolute or relative to the drivers/atm directory. + +3. Rebuild and re-install your kernel or your module. + + +Feedback +-------- + +Feedback is welcome. Please send success stories/bug reports/ +patches/improvement/comments/flames to <lizzi@cnam.fr>. diff --git a/Documentation/networking/framerelay.txt b/Documentation/networking/framerelay.txt new file mode 100644 index 0000000..1a0b720 --- /dev/null +++ b/Documentation/networking/framerelay.txt @@ -0,0 +1,39 @@ +Frame Relay (FR) support for linux is built into a two tiered system of device +drivers. The upper layer implements RFC1490 FR specification, and uses the +Data Link Connection Identifier (DLCI) as its hardware address. Usually these +are assigned by your network supplier, they give you the number/numbers of +the Virtual Connections (VC) assigned to you. + +Each DLCI is a point-to-point link between your machine and a remote one. +As such, a separate device is needed to accommodate the routing. Within the +net-tools archives is 'dlcicfg'. This program will communicate with the +base "DLCI" device, and create new net devices named 'dlci00', 'dlci01'... +The configuration script will ask you how many DLCIs you need, as well as +how many DLCIs you want to assign to each Frame Relay Access Device (FRAD). + +The DLCI uses a number of function calls to communicate with the FRAD, all +of which are stored in the FRAD's private data area. assoc/deassoc, +activate/deactivate and dlci_config. The DLCI supplies a receive function +to the FRAD to accept incoming packets. + +With this initial offering, only 1 FRAD driver is available. With many thanks +to Sangoma Technologies, David Mandelstam & Gene Kozin, the S502A, S502E & +S508 are supported. This driver is currently set up for only FR, but as +Sangoma makes more firmware modules available, it can be updated to provide +them as well. + +Configuration of the FRAD makes use of another net-tools program, 'fradcfg'. +This program makes use of a configuration file (which dlcicfg can also read) +to specify the types of boards to be configured as FRADs, as well as perform +any board specific configuration. The Sangoma module of fradcfg loads the +FR firmware into the card, sets the irq/port/memory information, and provides +an initial configuration. + +Additional FRAD device drivers can be added as hardware is available. + +At this time, the dlcicfg and fradcfg programs have not been incorporated into +the net-tools distribution. They can be found at ftp.invlogic.com, in +/pub/linux. Note that with OS/2 FTPD, you end up in /pub by default, so just +use 'cd linux'. v0.10 is for use on pre-2.0.3 and earlier, v0.15 is for +pre-2.0.4 and later. + diff --git a/Documentation/networking/gen_stats.txt b/Documentation/networking/gen_stats.txt new file mode 100644 index 0000000..70e6275 --- /dev/null +++ b/Documentation/networking/gen_stats.txt @@ -0,0 +1,117 @@ +Generic networking statistics for netlink users +====================================================================== + +Statistic counters are grouped into structs: + +Struct TLV type Description +---------------------------------------------------------------------- +gnet_stats_basic TCA_STATS_BASIC Basic statistics +gnet_stats_rate_est TCA_STATS_RATE_EST Rate estimator +gnet_stats_queue TCA_STATS_QUEUE Queue statistics +none TCA_STATS_APP Application specific + + +Collecting: +----------- + +Declare the statistic structs you need: +struct mystruct { + struct gnet_stats_basic bstats; + struct gnet_stats_queue qstats; + ... +}; + +Update statistics: +mystruct->tstats.packet++; +mystruct->qstats.backlog += skb->pkt_len; + + +Export to userspace (Dump): +--------------------------- + +my_dumping_routine(struct sk_buff *skb, ...) +{ + struct gnet_dump dump; + + if (gnet_stats_start_copy(skb, TCA_STATS2, &mystruct->lock, &dump) < 0) + goto rtattr_failure; + + if (gnet_stats_copy_basic(&dump, &mystruct->bstats) < 0 || + gnet_stats_copy_queue(&dump, &mystruct->qstats) < 0 || + gnet_stats_copy_app(&dump, &xstats, sizeof(xstats)) < 0) + goto rtattr_failure; + + if (gnet_stats_finish_copy(&dump) < 0) + goto rtattr_failure; + ... +} + +TCA_STATS/TCA_XSTATS backward compatibility: +-------------------------------------------- + +Prior users of struct tc_stats and xstats can maintain backward +compatibility by calling the compat wrappers to keep providing the +existing TLV types. + +my_dumping_routine(struct sk_buff *skb, ...) +{ + if (gnet_stats_start_copy_compat(skb, TCA_STATS2, TCA_STATS, + TCA_XSTATS, &mystruct->lock, &dump) < 0) + goto rtattr_failure; + ... +} + +A struct tc_stats will be filled out during gnet_stats_copy_* calls +and appended to the skb. TCA_XSTATS is provided if gnet_stats_copy_app +was called. + + +Locking: +-------- + +Locks are taken before writing and released once all statistics have +been written. Locks are always released in case of an error. You +are responsible for making sure that the lock is initialized. + + +Rate Estimator: +-------------- + +0) Prepare an estimator attribute. Most likely this would be in user + space. The value of this TLV should contain a tc_estimator structure. + As usual, such a TLV needs to be 32 bit aligned and therefore the + length needs to be appropriately set, etc. The estimator interval + and ewma log need to be converted to the appropriate values. + tc_estimator.c::tc_setup_estimator() is advisable to be used as the + conversion routine. It does a few clever things. It takes a time + interval in microsecs, a time constant also in microsecs and a struct + tc_estimator to be populated. The returned tc_estimator can be + transported to the kernel. Transfer such a structure in a TLV of type + TCA_RATE to your code in the kernel. + +In the kernel when setting up: +1) make sure you have basic stats and rate stats setup first. +2) make sure you have initialized stats lock that is used to setup such + stats. +3) Now initialize a new estimator: + + int ret = gen_new_estimator(my_basicstats,my_rate_est_stats, + mystats_lock, attr_with_tcestimator_struct); + + if ret == 0 + success + else + failed + +From now on, every time you dump my_rate_est_stats it will contain +up-to-date info. + +Once you are done, call gen_kill_estimator(my_basicstats, +my_rate_est_stats) Make sure that my_basicstats and my_rate_est_stats +are still valid (i.e still exist) at the time of making this call. + + +Authors: +-------- +Thomas Graf <tgraf@suug.ch> +Jamal Hadi Salim <hadi@cyberus.ca> diff --git a/Documentation/networking/generic-hdlc.txt b/Documentation/networking/generic-hdlc.txt new file mode 100644 index 0000000..31bc8b7 --- /dev/null +++ b/Documentation/networking/generic-hdlc.txt @@ -0,0 +1,132 @@ +Generic HDLC layer +Krzysztof Halasa <khc@pm.waw.pl> + + +Generic HDLC layer currently supports: +1. Frame Relay (ANSI, CCITT, Cisco and no LMI). + - Normal (routed) and Ethernet-bridged (Ethernet device emulation) + interfaces can share a single PVC. + - ARP support (no InARP support in the kernel - there is an + experimental InARP user-space daemon available on: + http://www.kernel.org/pub/linux/utils/net/hdlc/). +2. raw HDLC - either IP (IPv4) interface or Ethernet device emulation. +3. Cisco HDLC. +4. PPP (uses syncppp.c). +5. X.25 (uses X.25 routines). + +Generic HDLC is a protocol driver only - it needs a low-level driver +for your particular hardware. + +Ethernet device emulation (using HDLC or Frame-Relay PVC) is compatible +with IEEE 802.1Q (VLANs) and 802.1D (Ethernet bridging). + + +Make sure the hdlc.o and the hardware driver are loaded. It should +create a number of "hdlc" (hdlc0 etc) network devices, one for each +WAN port. You'll need the "sethdlc" utility, get it from: + http://www.kernel.org/pub/linux/utils/net/hdlc/ + +Compile sethdlc.c utility: + gcc -O2 -Wall -o sethdlc sethdlc.c +Make sure you're using a correct version of sethdlc for your kernel. + +Use sethdlc to set physical interface, clock rate, HDLC mode used, +and add any required PVCs if using Frame Relay. +Usually you want something like: + + sethdlc hdlc0 clock int rate 128000 + sethdlc hdlc0 cisco interval 10 timeout 25 +or + sethdlc hdlc0 rs232 clock ext + sethdlc hdlc0 fr lmi ansi + sethdlc hdlc0 create 99 + ifconfig hdlc0 up + ifconfig pvc0 localIP pointopoint remoteIP + +In Frame Relay mode, ifconfig master hdlc device up (without assigning +any IP address to it) before using pvc devices. + + +Setting interface: + +* v35 | rs232 | x21 | t1 | e1 - sets physical interface for a given port + if the card has software-selectable interfaces + loopback - activate hardware loopback (for testing only) +* clock ext - both RX clock and TX clock external +* clock int - both RX clock and TX clock internal +* clock txint - RX clock external, TX clock internal +* clock txfromrx - RX clock external, TX clock derived from RX clock +* rate - sets clock rate in bps (for "int" or "txint" clock only) + + +Setting protocol: + +* hdlc - sets raw HDLC (IP-only) mode + nrz / nrzi / fm-mark / fm-space / manchester - sets transmission code + no-parity / crc16 / crc16-pr0 (CRC16 with preset zeros) / crc32-itu + crc16-itu (CRC16 with ITU-T polynomial) / crc16-itu-pr0 - sets parity + +* hdlc-eth - Ethernet device emulation using HDLC. Parity and encoding + as above. + +* cisco - sets Cisco HDLC mode (IP, IPv6 and IPX supported) + interval - time in seconds between keepalive packets + timeout - time in seconds after last received keepalive packet before + we assume the link is down + +* ppp - sets synchronous PPP mode + +* x25 - sets X.25 mode + +* fr - Frame Relay mode + lmi ansi / ccitt / cisco / none - LMI (link management) type + dce - Frame Relay DCE (network) side LMI instead of default DTE (user). + It has nothing to do with clocks! + t391 - link integrity verification polling timer (in seconds) - user + t392 - polling verification timer (in seconds) - network + n391 - full status polling counter - user + n392 - error threshold - both user and network + n393 - monitored events count - both user and network + +Frame-Relay only: +* create n | delete n - adds / deletes PVC interface with DLCI #n. + Newly created interface will be named pvc0, pvc1 etc. + +* create ether n | delete ether n - adds a device for Ethernet-bridged + frames. The device will be named pvceth0, pvceth1 etc. + + + + +Board-specific issues +--------------------- + +n2.o and c101.o need parameters to work: + + insmod n2 hw=io,irq,ram,ports[:io,irq,...] +example: + insmod n2 hw=0x300,10,0xD0000,01 + +or + insmod c101 hw=irq,ram[:irq,...] +example: + insmod c101 hw=9,0xdc000 + +If built into the kernel, these drivers need kernel (command line) parameters: + n2.hw=io,irq,ram,ports:... +or + c101.hw=irq,ram:... + + + +If you have a problem with N2, C101 or PLX200SYN card, you can issue the +"private" command to see port's packet descriptor rings (in kernel logs): + + sethdlc hdlc0 private + +The hardware driver has to be build with #define DEBUG_RINGS. +Attaching this info to bug reports would be helpful. Anyway, let me know +if you have problems using this. + +For patches and other info look at: +<http://www.kernel.org/pub/linux/utils/net/hdlc/>. diff --git a/Documentation/networking/generic_netlink.txt b/Documentation/networking/generic_netlink.txt new file mode 100644 index 0000000..d4f8b8b --- /dev/null +++ b/Documentation/networking/generic_netlink.txt @@ -0,0 +1,3 @@ +A wiki document on how to use Generic Netlink can be found here: + + * http://linux-net.osdl.org/index.php/Generic_Netlink_HOWTO diff --git a/Documentation/networking/gianfar.txt b/Documentation/networking/gianfar.txt new file mode 100644 index 0000000..ad474ea --- /dev/null +++ b/Documentation/networking/gianfar.txt @@ -0,0 +1,72 @@ +The Gianfar Ethernet Driver +Sysfs File description + +Author: Andy Fleming <afleming@freescale.com> +Updated: 2005-07-28 + +SYSFS + +Several of the features of the gianfar driver are controlled +through sysfs files. These are: + +bd_stash: +To stash RX Buffer Descriptors in the L2, echo 'on' or '1' to +bd_stash, echo 'off' or '0' to disable + +rx_stash_len: +To stash the first n bytes of the packet in L2, echo the number +of bytes to buf_stash_len. echo 0 to disable. + +WARNING: You could really screw these up if you set them too low or high! +fifo_threshold: +To change the number of bytes the controller needs in the +fifo before it starts transmission, echo the number of bytes to +fifo_thresh. Range should be 0-511. + +fifo_starve: +When the FIFO has less than this many bytes during a transmit, it +enters starve mode, and increases the priority of TX memory +transactions. To change, echo the number of bytes to +fifo_starve. Range should be 0-511. + +fifo_starve_off: +Once in starve mode, the FIFO remains there until it has this +many bytes. To change, echo the number of bytes to +fifo_starve_off. Range should be 0-511. + +CHECKSUM OFFLOADING + +The eTSEC controller (first included in parts from late 2005 like +the 8548) has the ability to perform TCP, UDP, and IP checksums +in hardware. The Linux kernel only offloads the TCP and UDP +checksums (and always performs the pseudo header checksums), so +the driver only supports checksumming for TCP/IP and UDP/IP +packets. Use ethtool to enable or disable this feature for RX +and TX. + +VLAN + +In order to use VLAN, please consult Linux documentation on +configuring VLANs. The gianfar driver supports hardware insertion and +extraction of VLAN headers, but not filtering. Filtering will be +done by the kernel. + +MULTICASTING + +The gianfar driver supports using the group hash table on the +TSEC (and the extended hash table on the eTSEC) for multicast +filtering. On the eTSEC, the exact-match MAC registers are used +before the hash tables. See Linux documentation on how to join +multicast groups. + +PADDING + +The gianfar driver supports padding received frames with 2 bytes +to align the IP header to a 16-byte boundary, when supported by +hardware. + +ETHTOOL + +The gianfar driver supports the use of ethtool for many +configuration options. You must run ethtool only on currently +open interfaces. See ethtool documentation for details. diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c new file mode 100644 index 0000000..1b96ccd --- /dev/null +++ b/Documentation/networking/ifenslave.c @@ -0,0 +1,1103 @@ +/* Mode: C; + * ifenslave.c: Configure network interfaces for parallel routing. + * + * This program controls the Linux implementation of running multiple + * network interfaces in parallel. + * + * Author: Donald Becker <becker@cesdis.gsfc.nasa.gov> + * Copyright 1994-1996 Donald Becker + * + * This program is free software; you can redistribute it + * and/or modify it under the terms of the GNU General Public + * License as published by the Free Software Foundation. + * + * The author may be reached as becker@CESDIS.gsfc.nasa.gov, or C/O + * Center of Excellence in Space Data and Information Sciences + * Code 930.5, Goddard Space Flight Center, Greenbelt MD 20771 + * + * Changes : + * - 2000/10/02 Willy Tarreau <willy at meta-x.org> : + * - few fixes. Master's MAC address is now correctly taken from + * the first device when not previously set ; + * - detach support : call BOND_RELEASE to detach an enslaved interface. + * - give a mini-howto from command-line help : # ifenslave -h + * + * - 2001/02/16 Chad N. Tindel <ctindel at ieee dot org> : + * - Master is now brought down before setting the MAC address. In + * the 2.4 kernel you can't change the MAC address while the device is + * up because you get EBUSY. + * + * - 2001/09/13 Takao Indoh <indou dot takao at jp dot fujitsu dot com> + * - Added the ability to change the active interface on a mode 1 bond + * at runtime. + * + * - 2001/10/23 Chad N. Tindel <ctindel at ieee dot org> : + * - No longer set the MAC address of the master. The bond device will + * take care of this itself + * - Try the SIOC*** versions of the bonding ioctls before using the + * old versions + * - 2002/02/18 Erik Habbinga <erik_habbinga @ hp dot com> : + * - ifr2.ifr_flags was not initialized in the hwaddr_notset case, + * SIOCGIFFLAGS now called before hwaddr_notset test + * + * - 2002/10/31 Tony Cureington <tony.cureington * hp_com> : + * - If the master does not have a hardware address when the first slave + * is enslaved, the master is assigned the hardware address of that + * slave - there is a comment in bonding.c stating "ifenslave takes + * care of this now." This corrects the problem of slaves having + * different hardware addresses in active-backup mode when + * multiple interfaces are specified on a single ifenslave command + * (ifenslave bond0 eth0 eth1). + * + * - 2003/03/18 - Tsippy Mendelson <tsippy.mendelson at intel dot com> and + * Shmulik Hen <shmulik.hen at intel dot com> + * - Moved setting the slave's mac address and openning it, from + * the application to the driver. This enables support of modes + * that need to use the unique mac address of each slave. + * The driver also takes care of closing the slave and restoring its + * original mac address upon release. + * In addition, block possibility of enslaving before the master is up. + * This prevents putting the system in an undefined state. + * + * - 2003/05/01 - Amir Noam <amir.noam at intel dot com> + * - Added ABI version control to restore compatibility between + * new/old ifenslave and new/old bonding. + * - Prevent adding an adapter that is already a slave. + * Fixes the problem of stalling the transmission and leaving + * the slave in a down state. + * + * - 2003/05/01 - Shmulik Hen <shmulik.hen at intel dot com> + * - Prevent enslaving if the bond device is down. + * Fixes the problem of leaving the system in unstable state and + * halting when trying to remove the module. + * - Close socket on all abnormal exists. + * - Add versioning scheme that follows that of the bonding driver. + * current version is 1.0.0 as a base line. + * + * - 2003/05/22 - Jay Vosburgh <fubar at us dot ibm dot com> + * - ifenslave -c was broken; it's now fixed + * - Fixed problem with routes vanishing from master during enslave + * processing. + * + * - 2003/05/27 - Amir Noam <amir.noam at intel dot com> + * - Fix backward compatibility issues: + * For drivers not using ABI versions, slave was set down while + * it should be left up before enslaving. + * Also, master was not set down and the default set_mac_address() + * would fail and generate an error message in the system log. + * - For opt_c: slave should not be set to the master's setting + * while it is running. It was already set during enslave. To + * simplify things, it is now handled separately. + * + * - 2003/12/01 - Shmulik Hen <shmulik.hen at intel dot com> + * - Code cleanup and style changes + * set version to 1.1.0 + */ + +#define APP_VERSION "1.1.0" +#define APP_RELDATE "December 1, 2003" +#define APP_NAME "ifenslave" + +static char *version = +APP_NAME ".c:v" APP_VERSION " (" APP_RELDATE ")\n" +"o Donald Becker (becker@cesdis.gsfc.nasa.gov).\n" +"o Detach support added on 2000/10/02 by Willy Tarreau (willy at meta-x.org).\n" +"o 2.4 kernel support added on 2001/02/16 by Chad N. Tindel\n" +" (ctindel at ieee dot org).\n"; + +static const char *usage_msg = +"Usage: ifenslave [-f] <master-if> <slave-if> [<slave-if>...]\n" +" ifenslave -d <master-if> <slave-if> [<slave-if>...]\n" +" ifenslave -c <master-if> <slave-if>\n" +" ifenslave --help\n"; + +static const char *help_msg = +"\n" +" To create a bond device, simply follow these three steps :\n" +" - ensure that the required drivers are properly loaded :\n" +" # modprobe bonding ; modprobe <3c59x|eepro100|pcnet32|tulip|...>\n" +" - assign an IP address to the bond device :\n" +" # ifconfig bond0 <addr> netmask <mask> broadcast <bcast>\n" +" - attach all the interfaces you need to the bond device :\n" +" # ifenslave [{-f|--force}] bond0 eth0 [eth1 [eth2]...]\n" +" If bond0 didn't have a MAC address, it will take eth0's. Then, all\n" +" interfaces attached AFTER this assignment will get the same MAC addr.\n" +" (except for ALB/TLB modes)\n" +"\n" +" To set the bond device down and automatically release all the slaves :\n" +" # ifconfig bond0 down\n" +"\n" +" To detach a dead interface without setting the bond device down :\n" +" # ifenslave {-d|--detach} bond0 eth0 [eth1 [eth2]...]\n" +"\n" +" To change active slave :\n" +" # ifenslave {-c|--change-active} bond0 eth0\n" +"\n" +" To show master interface info\n" +" # ifenslave bond0\n" +"\n" +" To show all interfaces info\n" +" # ifenslave {-a|--all-interfaces}\n" +"\n" +" To be more verbose\n" +" # ifenslave {-v|--verbose} ...\n" +"\n" +" # ifenslave {-u|--usage} Show usage\n" +" # ifenslave {-V|--version} Show version\n" +" # ifenslave {-h|--help} This message\n" +"\n"; + +#include <unistd.h> +#include <stdlib.h> +#include <stdio.h> +#include <ctype.h> +#include <string.h> +#include <errno.h> +#include <fcntl.h> +#include <getopt.h> +#include <sys/types.h> +#include <sys/socket.h> +#include <sys/ioctl.h> +#include <linux/if.h> +#include <net/if_arp.h> +#include <linux/if_ether.h> +#include <linux/if_bonding.h> +#include <linux/sockios.h> + +typedef unsigned long long u64; /* hack, so we may include kernel's ethtool.h */ +typedef __uint32_t u32; /* ditto */ +typedef __uint16_t u16; /* ditto */ +typedef __uint8_t u8; /* ditto */ +#include <linux/ethtool.h> + +struct option longopts[] = { + /* { name has_arg *flag val } */ + {"all-interfaces", 0, 0, 'a'}, /* Show all interfaces. */ + {"change-active", 0, 0, 'c'}, /* Change the active slave. */ + {"detach", 0, 0, 'd'}, /* Detach a slave interface. */ + {"force", 0, 0, 'f'}, /* Force the operation. */ + {"help", 0, 0, 'h'}, /* Give help */ + {"usage", 0, 0, 'u'}, /* Give usage */ + {"verbose", 0, 0, 'v'}, /* Report each action taken. */ + {"version", 0, 0, 'V'}, /* Emit version information. */ + { 0, 0, 0, 0} +}; + +/* Command-line flags. */ +unsigned int +opt_a = 0, /* Show-all-interfaces flag. */ +opt_c = 0, /* Change-active-slave flag. */ +opt_d = 0, /* Detach a slave interface. */ +opt_f = 0, /* Force the operation. */ +opt_h = 0, /* Help */ +opt_u = 0, /* Usage */ +opt_v = 0, /* Verbose flag. */ +opt_V = 0; /* Version */ + +int skfd = -1; /* AF_INET socket for ioctl() calls.*/ +int abi_ver = 0; /* userland - kernel ABI version */ +int hwaddr_set = 0; /* Master's hwaddr is set */ +int saved_errno; + +struct ifreq master_mtu, master_flags, master_hwaddr; +struct ifreq slave_mtu, slave_flags, slave_hwaddr; + +struct dev_ifr { + struct ifreq *req_ifr; + char *req_name; + int req_type; +}; + +struct dev_ifr master_ifra[] = { + {&master_mtu, "SIOCGIFMTU", SIOCGIFMTU}, + {&master_flags, "SIOCGIFFLAGS", SIOCGIFFLAGS}, + {&master_hwaddr, "SIOCGIFHWADDR", SIOCGIFHWADDR}, + {NULL, "", 0} +}; + +struct dev_ifr slave_ifra[] = { + {&slave_mtu, "SIOCGIFMTU", SIOCGIFMTU}, + {&slave_flags, "SIOCGIFFLAGS", SIOCGIFFLAGS}, + {&slave_hwaddr, "SIOCGIFHWADDR", SIOCGIFHWADDR}, + {NULL, "", 0} +}; + +static void if_print(char *ifname); +static int get_drv_info(char *master_ifname); +static int get_if_settings(char *ifname, struct dev_ifr ifra[]); +static int get_slave_flags(char *slave_ifname); +static int set_master_hwaddr(char *master_ifname, struct sockaddr *hwaddr); +static int set_slave_hwaddr(char *slave_ifname, struct sockaddr *hwaddr); +static int set_slave_mtu(char *slave_ifname, int mtu); +static int set_if_flags(char *ifname, short flags); +static int set_if_up(char *ifname, short flags); +static int set_if_down(char *ifname, short flags); +static int clear_if_addr(char *ifname); +static int set_if_addr(char *master_ifname, char *slave_ifname); +static int change_active(char *master_ifname, char *slave_ifname); +static int enslave(char *master_ifname, char *slave_ifname); +static int release(char *master_ifname, char *slave_ifname); +#define v_print(fmt, args...) \ + if (opt_v) \ + fprintf(stderr, fmt, ## args ) + +int main(int argc, char *argv[]) +{ + char **spp, *master_ifname, *slave_ifname; + int c, i, rv; + int res = 0; + int exclusive = 0; + + while ((c = getopt_long(argc, argv, "acdfhuvV", longopts, 0)) != EOF) { + switch (c) { + case 'a': opt_a++; exclusive++; break; + case 'c': opt_c++; exclusive++; break; + case 'd': opt_d++; exclusive++; break; + case 'f': opt_f++; exclusive++; break; + case 'h': opt_h++; exclusive++; break; + case 'u': opt_u++; exclusive++; break; + case 'v': opt_v++; break; + case 'V': opt_V++; exclusive++; break; + + case '?': + fprintf(stderr, usage_msg); + res = 2; + goto out; + } + } + + /* options check */ + if (exclusive > 1) { + fprintf(stderr, usage_msg); + res = 2; + goto out; + } + + if (opt_v || opt_V) { + printf(version); + if (opt_V) { + res = 0; + goto out; + } + } + + if (opt_u) { + printf(usage_msg); + res = 0; + goto out; + } + + if (opt_h) { + printf(usage_msg); + printf(help_msg); + res = 0; + goto out; + } + + /* Open a basic socket */ + if ((skfd = socket(AF_INET, SOCK_DGRAM, 0)) < 0) { + perror("socket"); + res = 1; + goto out; + } + + if (opt_a) { + if (optind == argc) { + /* No remaining args */ + /* show all interfaces */ + if_print((char *)NULL); + goto out; + } else { + /* Just show usage */ + fprintf(stderr, usage_msg); + res = 2; + goto out; + } + } + + /* Copy the interface name */ + spp = argv + optind; + master_ifname = *spp++; + + if (master_ifname == NULL) { + fprintf(stderr, usage_msg); + res = 2; + goto out; + } + + /* exchange abi version with bonding module */ + res = get_drv_info(master_ifname); + if (res) { + fprintf(stderr, + "Master '%s': Error: handshake with driver failed. " + "Aborting\n", + master_ifname); + goto out; + } + + slave_ifname = *spp++; + + if (slave_ifname == NULL) { + if (opt_d || opt_c) { + fprintf(stderr, usage_msg); + res = 2; + goto out; + } + + /* A single arg means show the + * configuration for this interface + */ + if_print(master_ifname); + goto out; + } + + res = get_if_settings(master_ifname, master_ifra); + if (res) { + /* Probably a good reason not to go on */ + fprintf(stderr, + "Master '%s': Error: get settings failed: %s. " + "Aborting\n", + master_ifname, strerror(res)); + goto out; + } + + /* check if master is indeed a master; + * if not then fail any operation + */ + if (!(master_flags.ifr_flags & IFF_MASTER)) { + fprintf(stderr, + "Illegal operation; the specified interface '%s' " + "is not a master. Aborting\n", + master_ifname); + res = 1; + goto out; + } + + /* check if master is up; if not then fail any operation */ + if (!(master_flags.ifr_flags & IFF_UP)) { + fprintf(stderr, + "Illegal operation; the specified master interface " + "'%s' is not up.\n", + master_ifname); + res = 1; + goto out; + } + + /* Only for enslaving */ + if (!opt_c && !opt_d) { + sa_family_t master_family = master_hwaddr.ifr_hwaddr.sa_family; + unsigned char *hwaddr = + (unsigned char *)master_hwaddr.ifr_hwaddr.sa_data; + + /* The family '1' is ARPHRD_ETHER for ethernet. */ + if (master_family != 1 && !opt_f) { + fprintf(stderr, + "Illegal operation: The specified master " + "interface '%s' is not ethernet-like.\n " + "This program is designed to work with " + "ethernet-like network interfaces.\n " + "Use the '-f' option to force the " + "operation.\n", + master_ifname); + res = 1; + goto out; + } + + /* Check master's hw addr */ + for (i = 0; i < 6; i++) { + if (hwaddr[i] != 0) { + hwaddr_set = 1; + break; + } + } + + if (hwaddr_set) { + v_print("current hardware address of master '%s' " + "is %2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x, " + "type %d\n", + master_ifname, + hwaddr[0], hwaddr[1], + hwaddr[2], hwaddr[3], + hwaddr[4], hwaddr[5], + master_family); + } + } + + /* Accepts only one slave */ + if (opt_c) { + /* change active slave */ + res = get_slave_flags(slave_ifname); + if (res) { + fprintf(stderr, + "Slave '%s': Error: get flags failed. " + "Aborting\n", + slave_ifname); + goto out; + } + res = change_active(master_ifname, slave_ifname); + if (res) { + fprintf(stderr, + "Master '%s', Slave '%s': Error: " + "Change active failed\n", + master_ifname, slave_ifname); + } + } else { + /* Accept multiple slaves */ + do { + if (opt_d) { + /* detach a slave interface from the master */ + rv = get_slave_flags(slave_ifname); + if (rv) { + /* Can't work with this slave. */ + /* remember the error and skip it*/ + fprintf(stderr, + "Slave '%s': Error: get flags " + "failed. Skipping\n", + slave_ifname); + res = rv; + continue; + } + rv = release(master_ifname, slave_ifname); + if (rv) { + fprintf(stderr, + "Master '%s', Slave '%s': Error: " + "Release failed\n", + master_ifname, slave_ifname); + res = rv; + } + } else { + /* attach a slave interface to the master */ + rv = get_if_settings(slave_ifname, slave_ifra); + if (rv) { + /* Can't work with this slave. */ + /* remember the error and skip it*/ + fprintf(stderr, + "Slave '%s': Error: get " + "settings failed: %s. " + "Skipping\n", + slave_ifname, strerror(rv)); + res = rv; + continue; + } + rv = enslave(master_ifname, slave_ifname); + if (rv) { + fprintf(stderr, + "Master '%s', Slave '%s': Error: " + "Enslave failed\n", + master_ifname, slave_ifname); + res = rv; + } + } + } while ((slave_ifname = *spp++) != NULL); + } + +out: + if (skfd >= 0) { + close(skfd); + } + + return res; +} + +static short mif_flags; + +/* Get the inteface configuration from the kernel. */ +static int if_getconfig(char *ifname) +{ + struct ifreq ifr; + int metric, mtu; /* Parameters of the master interface. */ + struct sockaddr dstaddr, broadaddr, netmask; + unsigned char *hwaddr; + + strcpy(ifr.ifr_name, ifname); + if (ioctl(skfd, SIOCGIFFLAGS, &ifr) < 0) + return -1; + mif_flags = ifr.ifr_flags; + printf("The result of SIOCGIFFLAGS on %s is %x.\n", + ifname, ifr.ifr_flags); + + strcpy(ifr.ifr_name, ifname); + if (ioctl(skfd, SIOCGIFADDR, &ifr) < 0) + return -1; + printf("The result of SIOCGIFADDR is %2.2x.%2.2x.%2.2x.%2.2x.\n", + ifr.ifr_addr.sa_data[0], ifr.ifr_addr.sa_data[1], + ifr.ifr_addr.sa_data[2], ifr.ifr_addr.sa_data[3]); + + strcpy(ifr.ifr_name, ifname); + if (ioctl(skfd, SIOCGIFHWADDR, &ifr) < 0) + return -1; + + /* Gotta convert from 'char' to unsigned for printf(). */ + hwaddr = (unsigned char *)ifr.ifr_hwaddr.sa_data; + printf("The result of SIOCGIFHWADDR is type %d " + "%2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x.\n", + ifr.ifr_hwaddr.sa_family, hwaddr[0], hwaddr[1], + hwaddr[2], hwaddr[3], hwaddr[4], hwaddr[5]); + + strcpy(ifr.ifr_name, ifname); + if (ioctl(skfd, SIOCGIFMETRIC, &ifr) < 0) { + metric = 0; + } else + metric = ifr.ifr_metric; + + strcpy(ifr.ifr_name, ifname); + if (ioctl(skfd, SIOCGIFMTU, &ifr) < 0) + mtu = 0; + else + mtu = ifr.ifr_mtu; + + strcpy(ifr.ifr_name, ifname); + if (ioctl(skfd, SIOCGIFDSTADDR, &ifr) < 0) { + memset(&dstaddr, 0, sizeof(struct sockaddr)); + } else + dstaddr = ifr.ifr_dstaddr; + + strcpy(ifr.ifr_name, ifname); + if (ioctl(skfd, SIOCGIFBRDADDR, &ifr) < 0) { + memset(&broadaddr, 0, sizeof(struct sockaddr)); + } else + broadaddr = ifr.ifr_broadaddr; + + strcpy(ifr.ifr_name, ifname); + if (ioctl(skfd, SIOCGIFNETMASK, &ifr) < 0) { + memset(&netmask, 0, sizeof(struct sockaddr)); + } else + netmask = ifr.ifr_netmask; + + return 0; +} + +static void if_print(char *ifname) +{ + char buff[1024]; + struct ifconf ifc; + struct ifreq *ifr; + int i; + + if (ifname == (char *)NULL) { + ifc.ifc_len = sizeof(buff); + ifc.ifc_buf = buff; + if (ioctl(skfd, SIOCGIFCONF, &ifc) < 0) { + perror("SIOCGIFCONF failed"); + return; + } + + ifr = ifc.ifc_req; + for (i = ifc.ifc_len / sizeof(struct ifreq); --i >= 0; ifr++) { + if (if_getconfig(ifr->ifr_name) < 0) { + fprintf(stderr, + "%s: unknown interface.\n", + ifr->ifr_name); + continue; + } + + if (((mif_flags & IFF_UP) == 0) && !opt_a) continue; + /*ife_print(&ife);*/ + } + } else { + if (if_getconfig(ifname) < 0) { + fprintf(stderr, + "%s: unknown interface.\n", ifname); + } + } +} + +static int get_drv_info(char *master_ifname) +{ + struct ifreq ifr; + struct ethtool_drvinfo info; + char *endptr; + + memset(&ifr, 0, sizeof(ifr)); + strncpy(ifr.ifr_name, master_ifname, IFNAMSIZ); + ifr.ifr_data = (caddr_t)&info; + + info.cmd = ETHTOOL_GDRVINFO; + strncpy(info.driver, "ifenslave", 32); + snprintf(info.fw_version, 32, "%d", BOND_ABI_VERSION); + + if (ioctl(skfd, SIOCETHTOOL, &ifr) < 0) { + if (errno == EOPNOTSUPP) { + goto out; + } + + saved_errno = errno; + v_print("Master '%s': Error: get bonding info failed %s\n", + master_ifname, strerror(saved_errno)); + return 1; + } + + abi_ver = strtoul(info.fw_version, &endptr, 0); + if (*endptr) { + v_print("Master '%s': Error: got invalid string as an ABI " + "version from the bonding module\n", + master_ifname); + return 1; + } + +out: + v_print("ABI ver is %d\n", abi_ver); + + return 0; +} + +static int change_active(char *master_ifname, char *slave_ifname) +{ + struct ifreq ifr; + int res = 0; + + if (!(slave_flags.ifr_flags & IFF_SLAVE)) { + fprintf(stderr, + "Illegal operation: The specified slave interface " + "'%s' is not a slave\n", + slave_ifname); + return 1; + } + + strncpy(ifr.ifr_name, master_ifname, IFNAMSIZ); + strncpy(ifr.ifr_slave, slave_ifname, IFNAMSIZ); + if ((ioctl(skfd, SIOCBONDCHANGEACTIVE, &ifr) < 0) && + (ioctl(skfd, BOND_CHANGE_ACTIVE_OLD, &ifr) < 0)) { + saved_errno = errno; + v_print("Master '%s': Error: SIOCBONDCHANGEACTIVE failed: " + "%s\n", + master_ifname, strerror(saved_errno)); + res = 1; + } + + return res; +} + +static int enslave(char *master_ifname, char *slave_ifname) +{ + struct ifreq ifr; + int res = 0; + + if (slave_flags.ifr_flags & IFF_SLAVE) { + fprintf(stderr, + "Illegal operation: The specified slave interface " + "'%s' is already a slave\n", + slave_ifname); + return 1; + } + + res = set_if_down(slave_ifname, slave_flags.ifr_flags); + if (res) { + fprintf(stderr, + "Slave '%s': Error: bring interface down failed\n", + slave_ifname); + return res; + } + + if (abi_ver < 2) { + /* Older bonding versions would panic if the slave has no IP + * address, so get the IP setting from the master. + */ + set_if_addr(master_ifname, slave_ifname); + } else { + res = clear_if_addr(slave_ifname); + if (res) { + fprintf(stderr, + "Slave '%s': Error: clear address failed\n", + slave_ifname); + return res; + } + } + + if (master_mtu.ifr_mtu != slave_mtu.ifr_mtu) { + res = set_slave_mtu(slave_ifname, master_mtu.ifr_mtu); + if (res) { + fprintf(stderr, + "Slave '%s': Error: set MTU failed\n", + slave_ifname); + return res; + } + } + + if (hwaddr_set) { + /* Master already has an hwaddr + * so set it's hwaddr to the slave + */ + if (abi_ver < 1) { + /* The driver is using an old ABI, so + * the application sets the slave's + * hwaddr + */ + res = set_slave_hwaddr(slave_ifname, + &(master_hwaddr.ifr_hwaddr)); + if (res) { + fprintf(stderr, + "Slave '%s': Error: set hw address " + "failed\n", + slave_ifname); + goto undo_mtu; + } + + /* For old ABI the application needs to bring the + * slave back up + */ + res = set_if_up(slave_ifname, slave_flags.ifr_flags); + if (res) { + fprintf(stderr, + "Slave '%s': Error: bring interface " + "down failed\n", + slave_ifname); + goto undo_slave_mac; + } + } + /* The driver is using a new ABI, + * so the driver takes care of setting + * the slave's hwaddr and bringing + * it up again + */ + } else { + /* No hwaddr for master yet, so + * set the slave's hwaddr to it + */ + if (abi_ver < 1) { + /* For old ABI, the master needs to be + * down before setting it's hwaddr + */ + res = set_if_down(master_ifname, master_flags.ifr_flags); + if (res) { + fprintf(stderr, + "Master '%s': Error: bring interface " + "down failed\n", + master_ifname); + goto undo_mtu; + } + } + + res = set_master_hwaddr(master_ifname, + &(slave_hwaddr.ifr_hwaddr)); + if (res) { + fprintf(stderr, + "Master '%s': Error: set hw address " + "failed\n", + master_ifname); + goto undo_mtu; + } + + if (abi_ver < 1) { + /* For old ABI, bring the master + * back up + */ + res = set_if_up(master_ifname, master_flags.ifr_flags); + if (res) { + fprintf(stderr, + "Master '%s': Error: bring interface " + "up failed\n", + master_ifname); + goto undo_master_mac; + } + } + + hwaddr_set = 1; + } + + /* Do the real thing */ + strncpy(ifr.ifr_name, master_ifname, IFNAMSIZ); + strncpy(ifr.ifr_slave, slave_ifname, IFNAMSIZ); + if ((ioctl(skfd, SIOCBONDENSLAVE, &ifr) < 0) && + (ioctl(skfd, BOND_ENSLAVE_OLD, &ifr) < 0)) { + saved_errno = errno; + v_print("Master '%s': Error: SIOCBONDENSLAVE failed: %s\n", + master_ifname, strerror(saved_errno)); + res = 1; + } + + if (res) { + goto undo_master_mac; + } + + return 0; + +/* rollback (best effort) */ +undo_master_mac: + set_master_hwaddr(master_ifname, &(master_hwaddr.ifr_hwaddr)); + hwaddr_set = 0; + goto undo_mtu; +undo_slave_mac: + set_slave_hwaddr(slave_ifname, &(slave_hwaddr.ifr_hwaddr)); +undo_mtu: + set_slave_mtu(slave_ifname, slave_mtu.ifr_mtu); + return res; +} + +static int release(char *master_ifname, char *slave_ifname) +{ + struct ifreq ifr; + int res = 0; + + if (!(slave_flags.ifr_flags & IFF_SLAVE)) { + fprintf(stderr, + "Illegal operation: The specified slave interface " + "'%s' is not a slave\n", + slave_ifname); + return 1; + } + + strncpy(ifr.ifr_name, master_ifname, IFNAMSIZ); + strncpy(ifr.ifr_slave, slave_ifname, IFNAMSIZ); + if ((ioctl(skfd, SIOCBONDRELEASE, &ifr) < 0) && + (ioctl(skfd, BOND_RELEASE_OLD, &ifr) < 0)) { + saved_errno = errno; + v_print("Master '%s': Error: SIOCBONDRELEASE failed: %s\n", + master_ifname, strerror(saved_errno)); + return 1; + } else if (abi_ver < 1) { + /* The driver is using an old ABI, so we'll set the interface + * down to avoid any conflicts due to same MAC/IP + */ + res = set_if_down(slave_ifname, slave_flags.ifr_flags); + if (res) { + fprintf(stderr, + "Slave '%s': Error: bring interface " + "down failed\n", + slave_ifname); + } + } + + /* set to default mtu */ + set_slave_mtu(slave_ifname, 1500); + + return res; +} + +static int get_if_settings(char *ifname, struct dev_ifr ifra[]) +{ + int i; + int res = 0; + + for (i = 0; ifra[i].req_ifr; i++) { + strncpy(ifra[i].req_ifr->ifr_name, ifname, IFNAMSIZ); + res = ioctl(skfd, ifra[i].req_type, ifra[i].req_ifr); + if (res < 0) { + saved_errno = errno; + v_print("Interface '%s': Error: %s failed: %s\n", + ifname, ifra[i].req_name, + strerror(saved_errno)); + + return saved_errno; + } + } + + return 0; +} + +static int get_slave_flags(char *slave_ifname) +{ + int res = 0; + + strncpy(slave_flags.ifr_name, slave_ifname, IFNAMSIZ); + res = ioctl(skfd, SIOCGIFFLAGS, &slave_flags); + if (res < 0) { + saved_errno = errno; + v_print("Slave '%s': Error: SIOCGIFFLAGS failed: %s\n", + slave_ifname, strerror(saved_errno)); + } else { + v_print("Slave %s: flags %04X.\n", + slave_ifname, slave_flags.ifr_flags); + } + + return res; +} + +static int set_master_hwaddr(char *master_ifname, struct sockaddr *hwaddr) +{ + unsigned char *addr = (unsigned char *)hwaddr->sa_data; + struct ifreq ifr; + int res = 0; + + strncpy(ifr.ifr_name, master_ifname, IFNAMSIZ); + memcpy(&(ifr.ifr_hwaddr), hwaddr, sizeof(struct sockaddr)); + res = ioctl(skfd, SIOCSIFHWADDR, &ifr); + if (res < 0) { + saved_errno = errno; + v_print("Master '%s': Error: SIOCSIFHWADDR failed: %s\n", + master_ifname, strerror(saved_errno)); + return res; + } else { + v_print("Master '%s': hardware address set to " + "%2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x.\n", + master_ifname, addr[0], addr[1], addr[2], + addr[3], addr[4], addr[5]); + } + + return res; +} + +static int set_slave_hwaddr(char *slave_ifname, struct sockaddr *hwaddr) +{ + unsigned char *addr = (unsigned char *)hwaddr->sa_data; + struct ifreq ifr; + int res = 0; + + strncpy(ifr.ifr_name, slave_ifname, IFNAMSIZ); + memcpy(&(ifr.ifr_hwaddr), hwaddr, sizeof(struct sockaddr)); + res = ioctl(skfd, SIOCSIFHWADDR, &ifr); + if (res < 0) { + saved_errno = errno; + + v_print("Slave '%s': Error: SIOCSIFHWADDR failed: %s\n", + slave_ifname, strerror(saved_errno)); + + if (saved_errno == EBUSY) { + v_print(" The device is busy: it must be idle " + "before running this command.\n"); + } else if (saved_errno == EOPNOTSUPP) { + v_print(" The device does not support setting " + "the MAC address.\n" + " Your kernel likely does not support slave " + "devices.\n"); + } else if (saved_errno == EINVAL) { + v_print(" The device's address type does not match " + "the master's address type.\n"); + } + return res; + } else { + v_print("Slave '%s': hardware address set to " + "%2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x.\n", + slave_ifname, addr[0], addr[1], addr[2], + addr[3], addr[4], addr[5]); + } + + return res; +} + +static int set_slave_mtu(char *slave_ifname, int mtu) +{ + struct ifreq ifr; + int res = 0; + + ifr.ifr_mtu = mtu; + strncpy(ifr.ifr_name, slave_ifname, IFNAMSIZ); + + res = ioctl(skfd, SIOCSIFMTU, &ifr); + if (res < 0) { + saved_errno = errno; + v_print("Slave '%s': Error: SIOCSIFMTU failed: %s\n", + slave_ifname, strerror(saved_errno)); + } else { + v_print("Slave '%s': MTU set to %d.\n", slave_ifname, mtu); + } + + return res; +} + +static int set_if_flags(char *ifname, short flags) +{ + struct ifreq ifr; + int res = 0; + + ifr.ifr_flags = flags; + strncpy(ifr.ifr_name, ifname, IFNAMSIZ); + + res = ioctl(skfd, SIOCSIFFLAGS, &ifr); + if (res < 0) { + saved_errno = errno; + v_print("Interface '%s': Error: SIOCSIFFLAGS failed: %s\n", + ifname, strerror(saved_errno)); + } else { + v_print("Interface '%s': flags set to %04X.\n", ifname, flags); + } + + return res; +} + +static int set_if_up(char *ifname, short flags) +{ + return set_if_flags(ifname, flags | IFF_UP); +} + +static int set_if_down(char *ifname, short flags) +{ + return set_if_flags(ifname, flags & ~IFF_UP); +} + +static int clear_if_addr(char *ifname) +{ + struct ifreq ifr; + int res = 0; + + strncpy(ifr.ifr_name, ifname, IFNAMSIZ); + ifr.ifr_addr.sa_family = AF_INET; + memset(ifr.ifr_addr.sa_data, 0, sizeof(ifr.ifr_addr.sa_data)); + + res = ioctl(skfd, SIOCSIFADDR, &ifr); + if (res < 0) { + saved_errno = errno; + v_print("Interface '%s': Error: SIOCSIFADDR failed: %s\n", + ifname, strerror(saved_errno)); + } else { + v_print("Interface '%s': address cleared\n", ifname); + } + + return res; +} + +static int set_if_addr(char *master_ifname, char *slave_ifname) +{ + struct ifreq ifr; + int res; + unsigned char *ipaddr; + int i; + struct { + char *req_name; + char *desc; + int g_ioctl; + int s_ioctl; + } ifra[] = { + {"IFADDR", "addr", SIOCGIFADDR, SIOCSIFADDR}, + {"DSTADDR", "destination addr", SIOCGIFDSTADDR, SIOCSIFDSTADDR}, + {"BRDADDR", "broadcast addr", SIOCGIFBRDADDR, SIOCSIFBRDADDR}, + {"NETMASK", "netmask", SIOCGIFNETMASK, SIOCSIFNETMASK}, + {NULL, NULL, 0, 0}, + }; + + for (i = 0; ifra[i].req_name; i++) { + strncpy(ifr.ifr_name, master_ifname, IFNAMSIZ); + res = ioctl(skfd, ifra[i].g_ioctl, &ifr); + if (res < 0) { + int saved_errno = errno; + + v_print("Interface '%s': Error: SIOCG%s failed: %s\n", + master_ifname, ifra[i].req_name, + strerror(saved_errno)); + + ifr.ifr_addr.sa_family = AF_INET; + memset(ifr.ifr_addr.sa_data, 0, + sizeof(ifr.ifr_addr.sa_data)); + } + + strncpy(ifr.ifr_name, slave_ifname, IFNAMSIZ); + res = ioctl(skfd, ifra[i].s_ioctl, &ifr); + if (res < 0) { + int saved_errno = errno; + + v_print("Interface '%s': Error: SIOCS%s failed: %s\n", + slave_ifname, ifra[i].req_name, + strerror(saved_errno)); + + } + + ipaddr = (unsigned char *)ifr.ifr_addr.sa_data; + v_print("Interface '%s': set IP %s to %d.%d.%d.%d\n", + slave_ifname, ifra[i].desc, + ipaddr[0], ipaddr[1], ipaddr[2], ipaddr[3]); + } + + return 0; +} + +/* + * Local variables: + * version-control: t + * kept-new-versions: 5 + * c-indent-level: 4 + * c-basic-offset: 4 + * tab-width: 4 + * compile-command: "gcc -Wall -Wstrict-prototypes -O -I/usr/src/linux/include ifenslave.c -o ifenslave" + * End: + */ + diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt new file mode 100644 index 0000000..d849326 --- /dev/null +++ b/Documentation/networking/ip-sysctl.txt @@ -0,0 +1,1268 @@ +/proc/sys/net/ipv4/* Variables: + +ip_forward - BOOLEAN + 0 - disabled (default) + not 0 - enabled + + Forward Packets between interfaces. + + This variable is special, its change resets all configuration + parameters to their default state (RFC1122 for hosts, RFC1812 + for routers) + +ip_default_ttl - INTEGER + default 64 + +ip_no_pmtu_disc - BOOLEAN + Disable Path MTU Discovery. + default FALSE + +min_pmtu - INTEGER + default 562 - minimum discovered Path MTU + +mtu_expires - INTEGER + Time, in seconds, that cached PMTU information is kept. + +min_adv_mss - INTEGER + The advertised MSS depends on the first hop route MTU, but will + never be lower than this setting. + +IP Fragmentation: + +ipfrag_high_thresh - INTEGER + Maximum memory used to reassemble IP fragments. When + ipfrag_high_thresh bytes of memory is allocated for this purpose, + the fragment handler will toss packets until ipfrag_low_thresh + is reached. + +ipfrag_low_thresh - INTEGER + See ipfrag_high_thresh + +ipfrag_time - INTEGER + Time in seconds to keep an IP fragment in memory. + +ipfrag_secret_interval - INTEGER + Regeneration interval (in seconds) of the hash secret (or lifetime + for the hash secret) for IP fragments. + Default: 600 + +ipfrag_max_dist - INTEGER + ipfrag_max_dist is a non-negative integer value which defines the + maximum "disorder" which is allowed among fragments which share a + common IP source address. Note that reordering of packets is + not unusual, but if a large number of fragments arrive from a source + IP address while a particular fragment queue remains incomplete, it + probably indicates that one or more fragments belonging to that queue + have been lost. When ipfrag_max_dist is positive, an additional check + is done on fragments before they are added to a reassembly queue - if + ipfrag_max_dist (or more) fragments have arrived from a particular IP + address between additions to any IP fragment queue using that source + address, it's presumed that one or more fragments in the queue are + lost. The existing fragment queue will be dropped, and a new one + started. An ipfrag_max_dist value of zero disables this check. + + Using a very small value, e.g. 1 or 2, for ipfrag_max_dist can + result in unnecessarily dropping fragment queues when normal + reordering of packets occurs, which could lead to poor application + performance. Using a very large value, e.g. 50000, increases the + likelihood of incorrectly reassembling IP fragments that originate + from different IP datagrams, which could result in data corruption. + Default: 64 + +INET peer storage: + +inet_peer_threshold - INTEGER + The approximate size of the storage. Starting from this threshold + entries will be thrown aggressively. This threshold also determines + entries' time-to-live and time intervals between garbage collection + passes. More entries, less time-to-live, less GC interval. + +inet_peer_minttl - INTEGER + Minimum time-to-live of entries. Should be enough to cover fragment + time-to-live on the reassembling side. This minimum time-to-live is + guaranteed if the pool size is less than inet_peer_threshold. + Measured in seconds. + +inet_peer_maxttl - INTEGER + Maximum time-to-live of entries. Unused entries will expire after + this period of time if there is no memory pressure on the pool (i.e. + when the number of entries in the pool is very small). + Measured in seconds. + +inet_peer_gc_mintime - INTEGER + Minimum interval between garbage collection passes. This interval is + in effect under high memory pressure on the pool. + Measured in seconds. + +inet_peer_gc_maxtime - INTEGER + Minimum interval between garbage collection passes. This interval is + in effect under low (or absent) memory pressure on the pool. + Measured in seconds. + +TCP variables: + +somaxconn - INTEGER + Limit of socket listen() backlog, known in userspace as SOMAXCONN. + Defaults to 128. See also tcp_max_syn_backlog for additional tuning + for TCP sockets. + +tcp_abc - INTEGER + Controls Appropriate Byte Count (ABC) defined in RFC3465. + ABC is a way of increasing congestion window (cwnd) more slowly + in response to partial acknowledgments. + Possible values are: + 0 increase cwnd once per acknowledgment (no ABC) + 1 increase cwnd once per acknowledgment of full sized segment + 2 allow increase cwnd by two if acknowledgment is + of two segments to compensate for delayed acknowledgments. + Default: 0 (off) + +tcp_abort_on_overflow - BOOLEAN + If listening service is too slow to accept new connections, + reset them. Default state is FALSE. It means that if overflow + occurred due to a burst, connection will recover. Enable this + option _only_ if you are really sure that listening daemon + cannot be tuned to accept connections faster. Enabling this + option can harm clients of your server. + +tcp_adv_win_scale - INTEGER + Count buffering overhead as bytes/2^tcp_adv_win_scale + (if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale), + if it is <= 0. + Default: 2 + +tcp_allowed_congestion_control - STRING + Show/set the congestion control choices available to non-privileged + processes. The list is a subset of those listed in + tcp_available_congestion_control. + Default is "reno" and the default setting (tcp_congestion_control). + +tcp_app_win - INTEGER + Reserve max(window/2^tcp_app_win, mss) of window for application + buffer. Value 0 is special, it means that nothing is reserved. + Default: 31 + +tcp_available_congestion_control - STRING + Shows the available congestion control choices that are registered. + More congestion control algorithms may be available as modules, + but not loaded. + +tcp_base_mss - INTEGER + The initial value of search_low to be used by the packetization layer + Path MTU discovery (MTU probing). If MTU probing is enabled, + this is the initial MSS used by the connection. + +tcp_congestion_control - STRING + Set the congestion control algorithm to be used for new + connections. The algorithm "reno" is always available, but + additional choices may be available based on kernel configuration. + Default is set as part of kernel configuration. + +tcp_dsack - BOOLEAN + Allows TCP to send "duplicate" SACKs. + +tcp_ecn - BOOLEAN + Enable Explicit Congestion Notification in TCP. + +tcp_fack - BOOLEAN + Enable FACK congestion avoidance and fast retransmission. + The value is not used, if tcp_sack is not enabled. + +tcp_fin_timeout - INTEGER + Time to hold socket in state FIN-WAIT-2, if it was closed + by our side. Peer can be broken and never close its side, + or even died unexpectedly. Default value is 60sec. + Usual value used in 2.2 was 180 seconds, you may restore + it, but remember that if your machine is even underloaded WEB server, + you risk to overflow memory with kilotons of dead sockets, + FIN-WAIT-2 sockets are less dangerous than FIN-WAIT-1, + because they eat maximum 1.5K of memory, but they tend + to live longer. Cf. tcp_max_orphans. + +tcp_frto - INTEGER + Enables Forward RTO-Recovery (F-RTO) defined in RFC4138. + F-RTO is an enhanced recovery algorithm for TCP retransmission + timeouts. It is particularly beneficial in wireless environments + where packet loss is typically due to random radio interference + rather than intermediate router congestion. F-RTO is sender-side + only modification. Therefore it does not require any support from + the peer. + + If set to 1, basic version is enabled. 2 enables SACK enhanced + F-RTO if flow uses SACK. The basic version can be used also when + SACK is in use though scenario(s) with it exists where F-RTO + interacts badly with the packet counting of the SACK enabled TCP + flow. + +tcp_frto_response - INTEGER + When F-RTO has detected that a TCP retransmission timeout was + spurious (i.e, the timeout would have been avoided had TCP set a + longer retransmission timeout), TCP has several options what to do + next. Possible values are: + 0 Rate halving based; a smooth and conservative response, + results in halved cwnd and ssthresh after one RTT + 1 Very conservative response; not recommended because even + though being valid, it interacts poorly with the rest of + Linux TCP, halves cwnd and ssthresh immediately + 2 Aggressive response; undoes congestion control measures + that are now known to be unnecessary (ignoring the + possibility of a lost retransmission that would require + TCP to be more cautious), cwnd and ssthresh are restored + to the values prior timeout + Default: 0 (rate halving based) + +tcp_keepalive_time - INTEGER + How often TCP sends out keepalive messages when keepalive is enabled. + Default: 2hours. + +tcp_keepalive_probes - INTEGER + How many keepalive probes TCP sends out, until it decides that the + connection is broken. Default value: 9. + +tcp_keepalive_intvl - INTEGER + How frequently the probes are send out. Multiplied by + tcp_keepalive_probes it is time to kill not responding connection, + after probes started. Default value: 75sec i.e. connection + will be aborted after ~11 minutes of retries. + +tcp_low_latency - BOOLEAN + If set, the TCP stack makes decisions that prefer lower + latency as opposed to higher throughput. By default, this + option is not set meaning that higher throughput is preferred. + An example of an application where this default should be + changed would be a Beowulf compute cluster. + Default: 0 + +tcp_max_orphans - INTEGER + Maximal number of TCP sockets not attached to any user file handle, + held by system. If this number is exceeded orphaned connections are + reset immediately and warning is printed. This limit exists + only to prevent simple DoS attacks, you _must_ not rely on this + or lower the limit artificially, but rather increase it + (probably, after increasing installed memory), + if network conditions require more than default value, + and tune network services to linger and kill such states + more aggressively. Let me to remind again: each orphan eats + up to ~64K of unswappable memory. + +tcp_max_syn_backlog - INTEGER + Maximal number of remembered connection requests, which are + still did not receive an acknowledgment from connecting client. + Default value is 1024 for systems with more than 128Mb of memory, + and 128 for low memory machines. If server suffers of overload, + try to increase this number. + +tcp_max_tw_buckets - INTEGER + Maximal number of timewait sockets held by system simultaneously. + If this number is exceeded time-wait socket is immediately destroyed + and warning is printed. This limit exists only to prevent + simple DoS attacks, you _must_ not lower the limit artificially, + but rather increase it (probably, after increasing installed memory), + if network conditions require more than default value. + +tcp_mem - vector of 3 INTEGERs: min, pressure, max + min: below this number of pages TCP is not bothered about its + memory appetite. + + pressure: when amount of memory allocated by TCP exceeds this number + of pages, TCP moderates its memory consumption and enters memory + pressure mode, which is exited when memory consumption falls + under "min". + + max: number of pages allowed for queueing by all TCP sockets. + + Defaults are calculated at boot time from amount of available + memory. + +tcp_moderate_rcvbuf - BOOLEAN + If set, TCP performs receive buffer auto-tuning, attempting to + automatically size the buffer (no greater than tcp_rmem[2]) to + match the size required by the path for full throughput. Enabled by + default. + +tcp_mtu_probing - INTEGER + Controls TCP Packetization-Layer Path MTU Discovery. Takes three + values: + 0 - Disabled + 1 - Disabled by default, enabled when an ICMP black hole detected + 2 - Always enabled, use initial MSS of tcp_base_mss. + +tcp_no_metrics_save - BOOLEAN + By default, TCP saves various connection metrics in the route cache + when the connection closes, so that connections established in the + near future can use these to set initial conditions. Usually, this + increases overall performance, but may sometimes cause performance + degradation. If set, TCP will not cache metrics on closing + connections. + +tcp_orphan_retries - INTEGER + How may times to retry before killing TCP connection, closed + by our side. Default value 7 corresponds to ~50sec-16min + depending on RTO. If you machine is loaded WEB server, + you should think about lowering this value, such sockets + may consume significant resources. Cf. tcp_max_orphans. + +tcp_reordering - INTEGER + Maximal reordering of packets in a TCP stream. + Default: 3 + +tcp_retrans_collapse - BOOLEAN + Bug-to-bug compatibility with some broken printers. + On retransmit try to send bigger packets to work around bugs in + certain TCP stacks. + +tcp_retries1 - INTEGER + How many times to retry before deciding that something is wrong + and it is necessary to report this suspicion to network layer. + Minimal RFC value is 3, it is default, which corresponds + to ~3sec-8min depending on RTO. + +tcp_retries2 - INTEGER + How may times to retry before killing alive TCP connection. + RFC1122 says that the limit should be longer than 100 sec. + It is too small number. Default value 15 corresponds to ~13-30min + depending on RTO. + +tcp_rfc1337 - BOOLEAN + If set, the TCP stack behaves conforming to RFC1337. If unset, + we are not conforming to RFC, but prevent TCP TIME_WAIT + assassination. + Default: 0 + +tcp_rmem - vector of 3 INTEGERs: min, default, max + min: Minimal size of receive buffer used by TCP sockets. + It is guaranteed to each TCP socket, even under moderate memory + pressure. + Default: 8K + + default: initial size of receive buffer used by TCP sockets. + This value overrides net.core.rmem_default used by other protocols. + Default: 87380 bytes. This value results in window of 65535 with + default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit + less for default tcp_app_win. See below about these variables. + + max: maximal size of receive buffer allowed for automatically + selected receiver buffers for TCP socket. This value does not override + net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables + automatic tuning of that socket's receive buffer size, in which + case this value is ignored. + Default: between 87380B and 4MB, depending on RAM size. + +tcp_sack - BOOLEAN + Enable select acknowledgments (SACKS). + +tcp_slow_start_after_idle - BOOLEAN + If set, provide RFC2861 behavior and time out the congestion + window after an idle period. An idle period is defined at + the current RTO. If unset, the congestion window will not + be timed out after an idle period. + Default: 1 + +tcp_stdurg - BOOLEAN + Use the Host requirements interpretation of the TCP urgent pointer field. + Most hosts use the older BSD interpretation, so if you turn this on + Linux might not communicate correctly with them. + Default: FALSE + +tcp_synack_retries - INTEGER + Number of times SYNACKs for a passive TCP connection attempt will + be retransmitted. Should not be higher than 255. Default value + is 5, which corresponds to ~180seconds. + +tcp_syncookies - BOOLEAN + Only valid when the kernel was compiled with CONFIG_SYNCOOKIES + Send out syncookies when the syn backlog queue of a socket + overflows. This is to prevent against the common 'SYN flood attack' + Default: FALSE + + Note, that syncookies is fallback facility. + It MUST NOT be used to help highly loaded servers to stand + against legal connection rate. If you see SYN flood warnings + in your logs, but investigation shows that they occur + because of overload with legal connections, you should tune + another parameters until this warning disappear. + See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow. + + syncookies seriously violate TCP protocol, do not allow + to use TCP extensions, can result in serious degradation + of some services (f.e. SMTP relaying), visible not by you, + but your clients and relays, contacting you. While you see + SYN flood warnings in logs not being really flooded, your server + is seriously misconfigured. + +tcp_syn_retries - INTEGER + Number of times initial SYNs for an active TCP connection attempt + will be retransmitted. Should not be higher than 255. Default value + is 5, which corresponds to ~180seconds. + +tcp_timestamps - BOOLEAN + Enable timestamps as defined in RFC1323. + +tcp_tso_win_divisor - INTEGER + This allows control over what percentage of the congestion window + can be consumed by a single TSO frame. + The setting of this parameter is a choice between burstiness and + building larger TSO frames. + Default: 3 + +tcp_tw_recycle - BOOLEAN + Enable fast recycling TIME-WAIT sockets. Default value is 0. + It should not be changed without advice/request of technical + experts. + +tcp_tw_reuse - BOOLEAN + Allow to reuse TIME-WAIT sockets for new connections when it is + safe from protocol viewpoint. Default value is 0. + It should not be changed without advice/request of technical + experts. + +tcp_window_scaling - BOOLEAN + Enable window scaling as defined in RFC1323. + +tcp_wmem - vector of 3 INTEGERs: min, default, max + min: Amount of memory reserved for send buffers for TCP sockets. + Each TCP socket has rights to use it due to fact of its birth. + Default: 4K + + default: initial size of send buffer used by TCP sockets. This + value overrides net.core.wmem_default used by other protocols. + It is usually lower than net.core.wmem_default. + Default: 16K + + max: Maximal amount of memory allowed for automatically tuned + send buffers for TCP sockets. This value does not override + net.core.wmem_max. Calling setsockopt() with SO_SNDBUF disables + automatic tuning of that socket's send buffer size, in which case + this value is ignored. + Default: between 64K and 4MB, depending on RAM size. + +tcp_workaround_signed_windows - BOOLEAN + If set, assume no receipt of a window scaling option means the + remote TCP is broken and treats the window as a signed quantity. + If unset, assume the remote TCP is not broken even if we do + not receive a window scaling option from them. + Default: 0 + +tcp_dma_copybreak - INTEGER + Lower limit, in bytes, of the size of socket reads that will be + offloaded to a DMA copy engine, if one is present in the system + and CONFIG_NET_DMA is enabled. + Default: 4096 + +UDP variables: + +udp_mem - vector of 3 INTEGERs: min, pressure, max + Number of pages allowed for queueing by all UDP sockets. + + min: Below this number of pages UDP is not bothered about its + memory appetite. When amount of memory allocated by UDP exceeds + this number, UDP starts to moderate memory usage. + + pressure: This value was introduced to follow format of tcp_mem. + + max: Number of pages allowed for queueing by all UDP sockets. + + Default is calculated at boot time from amount of available memory. + +udp_rmem_min - INTEGER + Minimal size of receive buffer used by UDP sockets in moderation. + Each UDP socket is able to use the size for receiving data, even if + total pages of UDP sockets exceed udp_mem pressure. The unit is byte. + Default: 4096 + +udp_wmem_min - INTEGER + Minimal size of send buffer used by UDP sockets in moderation. + Each UDP socket is able to use the size for sending data, even if + total pages of UDP sockets exceed udp_mem pressure. The unit is byte. + Default: 4096 + +CIPSOv4 Variables: + +cipso_cache_enable - BOOLEAN + If set, enable additions to and lookups from the CIPSO label mapping + cache. If unset, additions are ignored and lookups always result in a + miss. However, regardless of the setting the cache is still + invalidated when required when means you can safely toggle this on and + off and the cache will always be "safe". + Default: 1 + +cipso_cache_bucket_size - INTEGER + The CIPSO label cache consists of a fixed size hash table with each + hash bucket containing a number of cache entries. This variable limits + the number of entries in each hash bucket; the larger the value the + more CIPSO label mappings that can be cached. When the number of + entries in a given hash bucket reaches this limit adding new entries + causes the oldest entry in the bucket to be removed to make room. + Default: 10 + +cipso_rbm_optfmt - BOOLEAN + Enable the "Optimized Tag 1 Format" as defined in section 3.4.2.6 of + the CIPSO draft specification (see Documentation/netlabel for details). + This means that when set the CIPSO tag will be padded with empty + categories in order to make the packet data 32-bit aligned. + Default: 0 + +cipso_rbm_structvalid - BOOLEAN + If set, do a very strict check of the CIPSO option when + ip_options_compile() is called. If unset, relax the checks done during + ip_options_compile(). Either way is "safe" as errors are caught else + where in the CIPSO processing code but setting this to 0 (False) should + result in less work (i.e. it should be faster) but could cause problems + with other implementations that require strict checking. + Default: 0 + +IP Variables: + +ip_local_port_range - 2 INTEGERS + Defines the local port range that is used by TCP and UDP to + choose the local port. The first number is the first, the + second the last local port number. Default value depends on + amount of memory available on the system: + > 128Mb 32768-61000 + < 128Mb 1024-4999 or even less. + This number defines number of active connections, which this + system can issue simultaneously to systems not supporting + TCP extensions (timestamps). With tcp_tw_recycle enabled + (i.e. by default) range 1024-4999 is enough to issue up to + 2000 connections per second to systems supporting timestamps. + +ip_nonlocal_bind - BOOLEAN + If set, allows processes to bind() to non-local IP addresses, + which can be quite useful - but may break some applications. + Default: 0 + +ip_dynaddr - BOOLEAN + If set non-zero, enables support for dynamic addresses. + If set to a non-zero value larger than 1, a kernel log + message will be printed when dynamic address rewriting + occurs. + Default: 0 + +icmp_echo_ignore_all - BOOLEAN + If set non-zero, then the kernel will ignore all ICMP ECHO + requests sent to it. + Default: 0 + +icmp_echo_ignore_broadcasts - BOOLEAN + If set non-zero, then the kernel will ignore all ICMP ECHO and + TIMESTAMP requests sent to it via broadcast/multicast. + Default: 1 + +icmp_ratelimit - INTEGER + Limit the maximal rates for sending ICMP packets whose type matches + icmp_ratemask (see below) to specific targets. + 0 to disable any limiting, + otherwise the minimal space between responses in milliseconds. + Default: 1000 + +icmp_ratemask - INTEGER + Mask made of ICMP types for which rates are being limited. + Significant bits: IHGFEDCBA9876543210 + Default mask: 0000001100000011000 (6168) + + Bit definitions (see include/linux/icmp.h): + 0 Echo Reply + 3 Destination Unreachable * + 4 Source Quench * + 5 Redirect + 8 Echo Request + B Time Exceeded * + C Parameter Problem * + D Timestamp Request + E Timestamp Reply + F Info Request + G Info Reply + H Address Mask Request + I Address Mask Reply + + * These are rate limited by default (see default mask above) + +icmp_ignore_bogus_error_responses - BOOLEAN + Some routers violate RFC1122 by sending bogus responses to broadcast + frames. Such violations are normally logged via a kernel warning. + If this is set to TRUE, the kernel will not give such warnings, which + will avoid log file clutter. + Default: FALSE + +icmp_errors_use_inbound_ifaddr - BOOLEAN + + If zero, icmp error messages are sent with the primary address of + the exiting interface. + + If non-zero, the message will be sent with the primary address of + the interface that received the packet that caused the icmp error. + This is the behaviour network many administrators will expect from + a router. And it can make debugging complicated network layouts + much easier. + + Note that if no primary address exists for the interface selected, + then the primary address of the first non-loopback interface that + has one will be used regardless of this setting. + + Default: 0 + +igmp_max_memberships - INTEGER + Change the maximum number of multicast groups we can subscribe to. + Default: 20 + +conf/interface/* changes special settings per interface (where "interface" is + the name of your network interface) +conf/all/* is special, changes the settings for all interfaces + + +log_martians - BOOLEAN + Log packets with impossible addresses to kernel log. + log_martians for the interface will be enabled if at least one of + conf/{all,interface}/log_martians is set to TRUE, + it will be disabled otherwise + +accept_redirects - BOOLEAN + Accept ICMP redirect messages. + accept_redirects for the interface will be enabled if: + - both conf/{all,interface}/accept_redirects are TRUE in the case forwarding + for the interface is enabled + or + - at least one of conf/{all,interface}/accept_redirects is TRUE in the case + forwarding for the interface is disabled + accept_redirects for the interface will be disabled otherwise + default TRUE (host) + FALSE (router) + +forwarding - BOOLEAN + Enable IP forwarding on this interface. + +mc_forwarding - BOOLEAN + Do multicast routing. The kernel needs to be compiled with CONFIG_MROUTE + and a multicast routing daemon is required. + conf/all/mc_forwarding must also be set to TRUE to enable multicast routing + for the interface + +medium_id - INTEGER + Integer value used to differentiate the devices by the medium they + are attached to. Two devices can have different id values when + the broadcast packets are received only on one of them. + The default value 0 means that the device is the only interface + to its medium, value of -1 means that medium is not known. + + Currently, it is used to change the proxy_arp behavior: + the proxy_arp feature is enabled for packets forwarded between + two devices attached to different media. + +proxy_arp - BOOLEAN + Do proxy arp. + proxy_arp for the interface will be enabled if at least one of + conf/{all,interface}/proxy_arp is set to TRUE, + it will be disabled otherwise + +shared_media - BOOLEAN + Send(router) or accept(host) RFC1620 shared media redirects. + Overrides ip_secure_redirects. + shared_media for the interface will be enabled if at least one of + conf/{all,interface}/shared_media is set to TRUE, + it will be disabled otherwise + default TRUE + +secure_redirects - BOOLEAN + Accept ICMP redirect messages only for gateways, + listed in default gateway list. + secure_redirects for the interface will be enabled if at least one of + conf/{all,interface}/secure_redirects is set to TRUE, + it will be disabled otherwise + default TRUE + +send_redirects - BOOLEAN + Send redirects, if router. + send_redirects for the interface will be enabled if at least one of + conf/{all,interface}/send_redirects is set to TRUE, + it will be disabled otherwise + Default: TRUE + +bootp_relay - BOOLEAN + Accept packets with source address 0.b.c.d destined + not to this host as local ones. It is supposed, that + BOOTP relay daemon will catch and forward such packets. + conf/all/bootp_relay must also be set to TRUE to enable BOOTP relay + for the interface + default FALSE + Not Implemented Yet. + +accept_source_route - BOOLEAN + Accept packets with SRR option. + conf/all/accept_source_route must also be set to TRUE to accept packets + with SRR option on the interface + default TRUE (router) + FALSE (host) + +rp_filter - BOOLEAN + 1 - do source validation by reversed path, as specified in RFC1812 + Recommended option for single homed hosts and stub network + routers. Could cause troubles for complicated (not loop free) + networks running a slow unreliable protocol (sort of RIP), + or using static routes. + + 0 - No source validation. + + conf/all/rp_filter must also be set to TRUE to do source validation + on the interface + + Default value is 0. Note that some distributions enable it + in startup scripts. + +arp_filter - BOOLEAN + 1 - Allows you to have multiple network interfaces on the same + subnet, and have the ARPs for each interface be answered + based on whether or not the kernel would route a packet from + the ARP'd IP out that interface (therefore you must use source + based routing for this to work). In other words it allows control + of which cards (usually 1) will respond to an arp request. + + 0 - (default) The kernel can respond to arp requests with addresses + from other interfaces. This may seem wrong but it usually makes + sense, because it increases the chance of successful communication. + IP addresses are owned by the complete host on Linux, not by + particular interfaces. Only for more complex setups like load- + balancing, does this behaviour cause problems. + + arp_filter for the interface will be enabled if at least one of + conf/{all,interface}/arp_filter is set to TRUE, + it will be disabled otherwise + +arp_announce - INTEGER + Define different restriction levels for announcing the local + source IP address from IP packets in ARP requests sent on + interface: + 0 - (default) Use any local address, configured on any interface + 1 - Try to avoid local addresses that are not in the target's + subnet for this interface. This mode is useful when target + hosts reachable via this interface require the source IP + address in ARP requests to be part of their logical network + configured on the receiving interface. When we generate the + request we will check all our subnets that include the + target IP and will preserve the source address if it is from + such subnet. If there is no such subnet we select source + address according to the rules for level 2. + 2 - Always use the best local address for this target. + In this mode we ignore the source address in the IP packet + and try to select local address that we prefer for talks with + the target host. Such local address is selected by looking + for primary IP addresses on all our subnets on the outgoing + interface that include the target IP address. If no suitable + local address is found we select the first local address + we have on the outgoing interface or on all other interfaces, + with the hope we will receive reply for our request and + even sometimes no matter the source IP address we announce. + + The max value from conf/{all,interface}/arp_announce is used. + + Increasing the restriction level gives more chance for + receiving answer from the resolved target while decreasing + the level announces more valid sender's information. + +arp_ignore - INTEGER + Define different modes for sending replies in response to + received ARP requests that resolve local target IP addresses: + 0 - (default): reply for any local target IP address, configured + on any interface + 1 - reply only if the target IP address is local address + configured on the incoming interface + 2 - reply only if the target IP address is local address + configured on the incoming interface and both with the + sender's IP address are part from same subnet on this interface + 3 - do not reply for local addresses configured with scope host, + only resolutions for global and link addresses are replied + 4-7 - reserved + 8 - do not reply for all local addresses + + The max value from conf/{all,interface}/arp_ignore is used + when ARP request is received on the {interface} + +arp_accept - BOOLEAN + Define behavior when gratuitous arp replies are received: + 0 - drop gratuitous arp frames + 1 - accept gratuitous arp frames + +app_solicit - INTEGER + The maximum number of probes to send to the user space ARP daemon + via netlink before dropping back to multicast probes (see + mcast_solicit). Defaults to 0. + +disable_policy - BOOLEAN + Disable IPSEC policy (SPD) for this interface + +disable_xfrm - BOOLEAN + Disable IPSEC encryption on this interface, whatever the policy + + + +tag - INTEGER + Allows you to write a number, which can be used as required. + Default value is 0. + +Alexey Kuznetsov. +kuznet@ms2.inr.ac.ru + +Updated by: +Andi Kleen +ak@muc.de +Nicolas Delon +delon.nicolas@wanadoo.fr + + + + +/proc/sys/net/ipv6/* Variables: + +IPv6 has no global variables such as tcp_*. tcp_* settings under ipv4/ also +apply to IPv6 [XXX?]. + +bindv6only - BOOLEAN + Default value for IPV6_V6ONLY socket option, + which restricts use of the IPv6 socket to IPv6 communication + only. + TRUE: disable IPv4-mapped address feature + FALSE: enable IPv4-mapped address feature + + Default: FALSE (as specified in RFC2553bis) + +IPv6 Fragmentation: + +ip6frag_high_thresh - INTEGER + Maximum memory used to reassemble IPv6 fragments. When + ip6frag_high_thresh bytes of memory is allocated for this purpose, + the fragment handler will toss packets until ip6frag_low_thresh + is reached. + +ip6frag_low_thresh - INTEGER + See ip6frag_high_thresh + +ip6frag_time - INTEGER + Time in seconds to keep an IPv6 fragment in memory. + +ip6frag_secret_interval - INTEGER + Regeneration interval (in seconds) of the hash secret (or lifetime + for the hash secret) for IPv6 fragments. + Default: 600 + +conf/default/*: + Change the interface-specific default settings. + + +conf/all/*: + Change all the interface-specific settings. + + [XXX: Other special features than forwarding?] + +conf/all/forwarding - BOOLEAN + Enable global IPv6 forwarding between all interfaces. + + IPv4 and IPv6 work differently here; e.g. netfilter must be used + to control which interfaces may forward packets and which not. + + This also sets all interfaces' Host/Router setting + 'forwarding' to the specified value. See below for details. + + This referred to as global forwarding. + +proxy_ndp - BOOLEAN + Do proxy ndp. + +conf/interface/*: + Change special settings per interface. + + The functional behaviour for certain settings is different + depending on whether local forwarding is enabled or not. + +accept_ra - BOOLEAN + Accept Router Advertisements; autoconfigure using them. + + Functional default: enabled if local forwarding is disabled. + disabled if local forwarding is enabled. + +accept_ra_defrtr - BOOLEAN + Learn default router in Router Advertisement. + + Functional default: enabled if accept_ra is enabled. + disabled if accept_ra is disabled. + +accept_ra_pinfo - BOOLEAN + Learn Prefix Information in Router Advertisement. + + Functional default: enabled if accept_ra is enabled. + disabled if accept_ra is disabled. + +accept_ra_rt_info_max_plen - INTEGER + Maximum prefix length of Route Information in RA. + + Route Information w/ prefix larger than or equal to this + variable shall be ignored. + + Functional default: 0 if accept_ra_rtr_pref is enabled. + -1 if accept_ra_rtr_pref is disabled. + +accept_ra_rtr_pref - BOOLEAN + Accept Router Preference in RA. + + Functional default: enabled if accept_ra is enabled. + disabled if accept_ra is disabled. + +accept_redirects - BOOLEAN + Accept Redirects. + + Functional default: enabled if local forwarding is disabled. + disabled if local forwarding is enabled. + +accept_source_route - INTEGER + Accept source routing (routing extension header). + + >= 0: Accept only routing header type 2. + < 0: Do not accept routing header. + + Default: 0 + +autoconf - BOOLEAN + Autoconfigure addresses using Prefix Information in Router + Advertisements. + + Functional default: enabled if accept_ra_pinfo is enabled. + disabled if accept_ra_pinfo is disabled. + +dad_transmits - INTEGER + The amount of Duplicate Address Detection probes to send. + Default: 1 + +forwarding - BOOLEAN + Configure interface-specific Host/Router behaviour. + + Note: It is recommended to have the same setting on all + interfaces; mixed router/host scenarios are rather uncommon. + + FALSE: + + By default, Host behaviour is assumed. This means: + + 1. IsRouter flag is not set in Neighbour Advertisements. + 2. Router Solicitations are being sent when necessary. + 3. If accept_ra is TRUE (default), accept Router + Advertisements (and do autoconfiguration). + 4. If accept_redirects is TRUE (default), accept Redirects. + + TRUE: + + If local forwarding is enabled, Router behaviour is assumed. + This means exactly the reverse from the above: + + 1. IsRouter flag is set in Neighbour Advertisements. + 2. Router Solicitations are not sent. + 3. Router Advertisements are ignored. + 4. Redirects are ignored. + + Default: FALSE if global forwarding is disabled (default), + otherwise TRUE. + +hop_limit - INTEGER + Default Hop Limit to set. + Default: 64 + +mtu - INTEGER + Default Maximum Transfer Unit + Default: 1280 (IPv6 required minimum) + +router_probe_interval - INTEGER + Minimum interval (in seconds) between Router Probing described + in RFC4191. + + Default: 60 + +router_solicitation_delay - INTEGER + Number of seconds to wait after interface is brought up + before sending Router Solicitations. + Default: 1 + +router_solicitation_interval - INTEGER + Number of seconds to wait between Router Solicitations. + Default: 4 + +router_solicitations - INTEGER + Number of Router Solicitations to send until assuming no + routers are present. + Default: 3 + +use_tempaddr - INTEGER + Preference for Privacy Extensions (RFC3041). + <= 0 : disable Privacy Extensions + == 1 : enable Privacy Extensions, but prefer public + addresses over temporary addresses. + > 1 : enable Privacy Extensions and prefer temporary + addresses over public addresses. + Default: 0 (for most devices) + -1 (for point-to-point devices and loopback devices) + +temp_valid_lft - INTEGER + valid lifetime (in seconds) for temporary addresses. + Default: 604800 (7 days) + +temp_prefered_lft - INTEGER + Preferred lifetime (in seconds) for temporary addresses. + Default: 86400 (1 day) + +max_desync_factor - INTEGER + Maximum value for DESYNC_FACTOR, which is a random value + that ensures that clients don't synchronize with each + other and generate new addresses at exactly the same time. + value is in seconds. + Default: 600 + +regen_max_retry - INTEGER + Number of attempts before give up attempting to generate + valid temporary addresses. + Default: 5 + +max_addresses - INTEGER + Number of maximum addresses per interface. 0 disables limitation. + It is recommended not set too large value (or 0) because it would + be too easy way to crash kernel to allow to create too much of + autoconfigured addresses. + Default: 16 + +disable_ipv6 - BOOLEAN + Disable IPv6 operation. + Default: FALSE (enable IPv6 operation) + +accept_dad - INTEGER + Whether to accept DAD (Duplicate Address Detection). + 0: Disable DAD + 1: Enable DAD (default) + 2: Enable DAD, and disable IPv6 operation if MAC-based duplicate + link-local address has been found. + +icmp/*: +ratelimit - INTEGER + Limit the maximal rates for sending ICMPv6 packets. + 0 to disable any limiting, + otherwise the minimal space between responses in milliseconds. + Default: 1000 + + +IPv6 Update by: +Pekka Savola <pekkas@netcore.fi> +YOSHIFUJI Hideaki / USAGI Project <yoshfuji@linux-ipv6.org> + + +/proc/sys/net/bridge/* Variables: + +bridge-nf-call-arptables - BOOLEAN + 1 : pass bridged ARP traffic to arptables' FORWARD chain. + 0 : disable this. + Default: 1 + +bridge-nf-call-iptables - BOOLEAN + 1 : pass bridged IPv4 traffic to iptables' chains. + 0 : disable this. + Default: 1 + +bridge-nf-call-ip6tables - BOOLEAN + 1 : pass bridged IPv6 traffic to ip6tables' chains. + 0 : disable this. + Default: 1 + +bridge-nf-filter-vlan-tagged - BOOLEAN + 1 : pass bridged vlan-tagged ARP/IP/IPv6 traffic to {arp,ip,ip6}tables. + 0 : disable this. + Default: 1 + +bridge-nf-filter-pppoe-tagged - BOOLEAN + 1 : pass bridged pppoe-tagged IP/IPv6 traffic to {ip,ip6}tables. + 0 : disable this. + Default: 1 + + +proc/sys/net/sctp/* Variables: + +addip_enable - BOOLEAN + Enable or disable extension of Dynamic Address Reconfiguration + (ADD-IP) functionality specified in RFC5061. This extension provides + the ability to dynamically add and remove new addresses for the SCTP + associations. + + 1: Enable extension. + + 0: Disable extension. + + Default: 0 + +addip_noauth_enable - BOOLEAN + Dynamic Address Reconfiguration (ADD-IP) requires the use of + authentication to protect the operations of adding or removing new + addresses. This requirement is mandated so that unauthorized hosts + would not be able to hijack associations. However, older + implementations may not have implemented this requirement while + allowing the ADD-IP extension. For reasons of interoperability, + we provide this variable to control the enforcement of the + authentication requirement. + + 1: Allow ADD-IP extension to be used without authentication. This + should only be set in a closed environment for interoperability + with older implementations. + + 0: Enforce the authentication requirement + + Default: 0 + +auth_enable - BOOLEAN + Enable or disable Authenticated Chunks extension. This extension + provides the ability to send and receive authenticated chunks and is + required for secure operation of Dynamic Address Reconfiguration + (ADD-IP) extension. + + 1: Enable this extension. + 0: Disable this extension. + + Default: 0 + +prsctp_enable - BOOLEAN + Enable or disable the Partial Reliability extension (RFC3758) which + is used to notify peers that a given DATA should no longer be expected. + + 1: Enable extension + 0: Disable + + Default: 1 + +max_burst - INTEGER + The limit of the number of new packets that can be initially sent. It + controls how bursty the generated traffic can be. + + Default: 4 + +association_max_retrans - INTEGER + Set the maximum number for retransmissions that an association can + attempt deciding that the remote end is unreachable. If this value + is exceeded, the association is terminated. + + Default: 10 + +max_init_retransmits - INTEGER + The maximum number of retransmissions of INIT and COOKIE-ECHO chunks + that an association will attempt before declaring the destination + unreachable and terminating. + + Default: 8 + +path_max_retrans - INTEGER + The maximum number of retransmissions that will be attempted on a given + path. Once this threshold is exceeded, the path is considered + unreachable, and new traffic will use a different path when the + association is multihomed. + + Default: 5 + +rto_initial - INTEGER + The initial round trip timeout value in milliseconds that will be used + in calculating round trip times. This is the initial time interval + for retransmissions. + + Default: 3000 + +rto_max - INTEGER + The maximum value (in milliseconds) of the round trip timeout. This + is the largest time interval that can elapse between retransmissions. + + Default: 60000 + +rto_min - INTEGER + The minimum value (in milliseconds) of the round trip timeout. This + is the smallest time interval the can elapse between retransmissions. + + Default: 1000 + +hb_interval - INTEGER + The interval (in milliseconds) between HEARTBEAT chunks. These chunks + are sent at the specified interval on idle paths to probe the state of + a given path between 2 associations. + + Default: 30000 + +sack_timeout - INTEGER + The amount of time (in milliseconds) that the implementation will wait + to send a SACK. + + Default: 200 + +valid_cookie_life - INTEGER + The default lifetime of the SCTP cookie (in milliseconds). The cookie + is used during association establishment. + + Default: 60000 + +cookie_preserve_enable - BOOLEAN + Enable or disable the ability to extend the lifetime of the SCTP cookie + that is used during the establishment phase of SCTP association + + 1: Enable cookie lifetime extension. + 0: Disable + + Default: 1 + +rcvbuf_policy - INTEGER + Determines if the receive buffer is attributed to the socket or to + association. SCTP supports the capability to create multiple + associations on a single socket. When using this capability, it is + possible that a single stalled association that's buffering a lot + of data may block other associations from delivering their data by + consuming all of the receive buffer space. To work around this, + the rcvbuf_policy could be set to attribute the receiver buffer space + to each association instead of the socket. This prevents the described + blocking. + + 1: rcvbuf space is per association + 0: recbuf space is per socket + + Default: 0 + +sndbuf_policy - INTEGER + Similar to rcvbuf_policy above, this applies to send buffer space. + + 1: Send buffer is tracked per association + 0: Send buffer is tracked per socket. + + Default: 0 + +sctp_mem - vector of 3 INTEGERs: min, pressure, max + Number of pages allowed for queueing by all SCTP sockets. + + min: Below this number of pages SCTP is not bothered about its + memory appetite. When amount of memory allocated by SCTP exceeds + this number, SCTP starts to moderate memory usage. + + pressure: This value was introduced to follow format of tcp_mem. + + max: Number of pages allowed for queueing by all SCTP sockets. + + Default is calculated at boot time from amount of available memory. + +sctp_rmem - vector of 3 INTEGERs: min, default, max + See tcp_rmem for a description. + +sctp_wmem - vector of 3 INTEGERs: min, default, max + See tcp_wmem for a description. + +UNDOCUMENTED: + +/proc/sys/net/core/* + dev_weight FIXME + +/proc/sys/net/unix/* + max_dgram_qlen FIXME + +/proc/sys/net/irda/* + fast_poll_increase FIXME + warn_noreply_time FIXME + discovery_slots FIXME + slot_timeout FIXME + max_baud_rate FIXME + discovery_timeout FIXME + lap_keepalive_time FIXME + max_noreply_time FIXME + max_tx_data_size FIXME + max_tx_window FIXME + min_tx_turn_time FIXME diff --git a/Documentation/networking/ip_dynaddr.txt b/Documentation/networking/ip_dynaddr.txt new file mode 100644 index 0000000..45f3c12 --- /dev/null +++ b/Documentation/networking/ip_dynaddr.txt @@ -0,0 +1,29 @@ +IP dynamic address hack-port v0.03 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This stuff allows diald ONESHOT connections to get established by +dynamically changing packet source address (and socket's if local procs). +It is implemented for TCP diald-box connections(1) and IP_MASQuerading(2). + +If enabled[*] and forwarding interface has changed: + 1) Socket (and packet) source address is rewritten ON RETRANSMISSIONS + while in SYN_SENT state (diald-box processes). + 2) Out-bounded MASQueraded source address changes ON OUTPUT (when + internal host does retransmission) until a packet from outside is + received by the tunnel. + +This is specially helpful for auto dialup links (diald), where the +``actual'' outgoing address is unknown at the moment the link is +going up. So, the *same* (local AND masqueraded) connections requests that +bring the link up will be able to get established. + +[*] At boot, by default no address rewriting is attempted. + To enable: + # echo 1 > /proc/sys/net/ipv4/ip_dynaddr + To enable verbose mode: + # echo 2 > /proc/sys/net/ipv4/ip_dynaddr + To disable (default) + # echo 0 > /proc/sys/net/ipv4/ip_dynaddr + +Enjoy! + +-- Juanjo <jjciarla@raiz.uncu.edu.ar> diff --git a/Documentation/networking/ipddp.txt b/Documentation/networking/ipddp.txt new file mode 100644 index 0000000..661a555 --- /dev/null +++ b/Documentation/networking/ipddp.txt @@ -0,0 +1,78 @@ +Text file for ipddp.c: + AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation + +This text file is written by Jay Schulist <jschlst@samba.org> + +Introduction +------------ + +AppleTalk-IP (IPDDP) is the method computers connected to AppleTalk +networks can use to communicate via IP. AppleTalk-IP is simply IP datagrams +inside AppleTalk packets. + +Through this driver you can either allow your Linux box to communicate +IP over an AppleTalk network or you can provide IP gatewaying functions +for your AppleTalk users. + +You can currently encapsulate or decapsulate AppleTalk-IP on LocalTalk, +EtherTalk and PPPTalk. The only limit on the protocol is that of what +kernel AppleTalk layer and drivers are available. + +Each mode requires its own user space software. + +Compiling AppleTalk-IP Decapsulation/Encapsulation +================================================= + +AppleTalk-IP decapsulation needs to be compiled into your kernel. You +will need to turn on AppleTalk-IP driver support. Then you will need to +select ONE of the two options; IP to AppleTalk-IP encapsulation support or +AppleTalk-IP to IP decapsulation support. If you compile the driver +statically you will only be able to use the driver for the function you have +enabled in the kernel. If you compile the driver as a module you can +select what mode you want it to run in via a module loading param. +ipddp_mode=1 for AppleTalk-IP encapsulation and ipddp_mode=2 for +AppleTalk-IP to IP decapsulation. + +Basic instructions for user space tools +======================================= + +To enable AppleTalk-IP decapsulation/encapsulation you will need the +proper tools. You can get the tools for decapsulation from +http://spacs1.spacs.k12.wi.us/~jschlst/index.html and for encapsulation +from http://www.maths.unm.edu/~bradford/ltpc.html + +I will briefly describe the operation of the tools, but you will +need to consult the supporting documentation for each set of tools. + +Decapsulation - You will need to download a software package called +MacGate. In this distribution there will be a tool called MacRoute +which enables you to add routes to the kernel for your Macs by hand. +Also the tool MacRegGateWay is included to register the +proper IP Gateway and IP addresses for your machine. Included in this +distribution is a patch to netatalk-1.4b2+asun2.0a17.2 (available from +ftp.u.washington.edu/pub/user-supported/asun/) this patch is optional +but it allows automatic adding and deleting of routes for Macs. (Handy +for locations with large Mac installations) + +Encapsulation - You will need to download a software daemon called ipddpd. +This software expects there to be an AppleTalk-IP gateway on the network. +You will also need to add the proper routes to route your Linux box's IP +traffic out the ipddp interface. + +Common Uses of ipddp.c +---------------------- +Of course AppleTalk-IP decapsulation and encapsulation, but specifically +decapsulation is being used most for connecting LocalTalk networks to +IP networks. Although it has been used on EtherTalk networks to allow +Macs that are only able to tunnel IP over EtherTalk. + +Encapsulation has been used to allow a Linux box stuck on a LocalTalk +network to use IP. It should work equally well if you are stuck on an +EtherTalk only network. + +Further Assistance +------------------- +You can contact me (Jay Schulist <jschlst@samba.org>) with any +questions regarding decapsulation or encapsulation. Bradford W. Johnson +<johns393@maroon.tc.umn.edu> originally wrote the ipddp.c driver for IP +encapsulation in AppleTalk. diff --git a/Documentation/networking/iphase.txt b/Documentation/networking/iphase.txt new file mode 100644 index 0000000..55eac4a --- /dev/null +++ b/Documentation/networking/iphase.txt @@ -0,0 +1,158 @@ + + READ ME FISRT + ATM (i)Chip IA Linux Driver Source +-------------------------------------------------------------------------------- + Read This Before You Begin! +-------------------------------------------------------------------------------- + +Description +----------- + +This is the README file for the Interphase PCI ATM (i)Chip IA Linux driver +source release. + +The features and limitations of this driver are as follows: + - A single VPI (VPI value of 0) is supported. + - Supports 4K VCs for the server board (with 512K control memory) and 1K + VCs for the client board (with 128K control memory). + - UBR, ABR and CBR service categories are supported. + - Only AAL5 is supported. + - Supports setting of PCR on the VCs. + - Multiple adapters in a system are supported. + - All variants of Interphase ATM PCI (i)Chip adapter cards are supported, + including x575 (OC3, control memory 128K , 512K and packet memory 128K, + 512K and 1M), x525 (UTP25) and x531 (DS3 and E3). See + http://www.iphase.com/site/iphase-web/?epi_menuItemID=e196f04b4b3b40502f150882e21046a0 + for details. + - Only x86 platforms are supported. + - SMP is supported. + + +Before You Start +---------------- + + +Installation +------------ + +1. Installing the adapters in the system + To install the ATM adapters in the system, follow the steps below. + a. Login as root. + b. Shut down the system and power off the system. + c. Install one or more ATM adapters in the system. + d. Connect each adapter to a port on an ATM switch. The green 'Link' + LED on the front panel of the adapter will be on if the adapter is + connected to the switch properly when the system is powered up. + e. Power on and boot the system. + +2. [ Removed ] + +3. Rebuild kernel with ABR support + [ a. and b. removed ] + c. Reconfigure the kernel, choose the Interphase ia driver through "make + menuconfig" or "make xconfig". + d. Rebuild the kernel, loadable modules and the atm tools. + e. Install the new built kernel and modules and reboot. + +4. Load the adapter hardware driver (ia driver) if it is built as a module + a. Login as root. + b. Change directory to /lib/modules/<kernel-version>/atm. + c. Run "insmod suni.o;insmod iphase.o" + The yellow 'status' LED on the front panel of the adapter will blink + while the driver is loaded in the system. + d. To verify that the 'ia' driver is loaded successfully, run the + following command: + + cat /proc/atm/devices + + If the driver is loaded successfully, the output of the command will + be similar to the following lines: + + Itf Type ESI/"MAC"addr AAL(TX,err,RX,err,drop) ... + 0 ia xxxxxxxxx 0 ( 0 0 0 0 0 ) 5 ( 0 0 0 0 0 ) + + You can also check the system log file /var/log/messages for messages + related to the ATM driver. + +5. Ia Driver Configuration + +5.1 Configuration of adapter buffers + The (i)Chip boards have 3 different packet RAM size variants: 128K, 512K and + 1M. The RAM size decides the number of buffers and buffer size. The default + size and number of buffers are set as following: + + Total Rx RAM Tx RAM Rx Buf Tx Buf Rx buf Tx buf + RAM size size size size size cnt cnt + -------- ------ ------ ------ ------ ------ ------ + 128K 64K 64K 10K 10K 6 6 + 512K 256K 256K 10K 10K 25 25 + 1M 512K 512K 10K 10K 51 51 + + These setting should work well in most environments, but can be + changed by typing the following command: + + insmod <IA_DIR>/ia.o IA_RX_BUF=<RX_CNT> IA_RX_BUF_SZ=<RX_SIZE> \ + IA_TX_BUF=<TX_CNT> IA_TX_BUF_SZ=<TX_SIZE> + Where: + RX_CNT = number of receive buffers in the range (1-128) + RX_SIZE = size of receive buffers in the range (48-64K) + TX_CNT = number of transmit buffers in the range (1-128) + TX_SIZE = size of transmit buffers in the range (48-64K) + + 1. Transmit and receive buffer size must be a multiple of 4. + 2. Care should be taken so that the memory required for the + transmit and receive buffers is less than or equal to the + total adapter packet memory. + +5.2 Turn on ia debug trace + + When the ia driver is built with the CONFIG_ATM_IA_DEBUG flag, the driver + can provide more debug trace if needed. There is a bit mask variable, + IADebugFlag, which controls the output of the traces. You can find the bit + map of the IADebugFlag in iphase.h. + The debug trace can be turn on through the insmod command line option, for + example, "insmod iphase.o IADebugFlag=0xffffffff" can turn on all the debug + traces together with loading the driver. + +6. Ia Driver Test Using ttcp_atm and PVC + + For the PVC setup, the test machines can either be connected back-to-back or + through a switch. If connected through the switch, the switch must be + configured for the PVC(s). + + a. For UBR test: + At the test machine intended to receive data, type: + ttcp_atm -r -a -s 0.100 + At the other test machine, type: + ttcp_atm -t -a -s 0.100 -n 10000 + Run "ttcp_atm -h" to display more options of the ttcp_atm tool. + b. For ABR test: + It is the same as the UBR testing, but with an extra command option: + -Pabr:max_pcr=<xxx> + where: + xxx = the maximum peak cell rate, from 170 - 353207. + This option must be set on both the machines. + c. For CBR test: + It is the same as the UBR testing, but with an extra command option: + -Pcbr:max_pcr=<xxx> + where: + xxx = the maximum peak cell rate, from 170 - 353207. + This option may only be set on the transmit machine. + + +OUTSTANDING ISSUES +------------------ + + + +Contact Information +------------------- + + Customer Support: + United States: Telephone: (214) 654-5555 + Fax: (214) 654-5500 + E-Mail: intouch@iphase.com + Europe: Telephone: 33 (0)1 41 15 44 00 + Fax: 33 (0)1 41 15 12 13 + World Wide Web: http://www.iphase.com + Anonymous FTP: ftp.iphase.com diff --git a/Documentation/networking/ipvs-sysctl.txt b/Documentation/networking/ipvs-sysctl.txt new file mode 100644 index 0000000..4ccdbca --- /dev/null +++ b/Documentation/networking/ipvs-sysctl.txt @@ -0,0 +1,143 @@ +/proc/sys/net/ipv4/vs/* Variables: + +am_droprate - INTEGER + default 10 + + It sets the always mode drop rate, which is used in the mode 3 + of the drop_rate defense. + +amemthresh - INTEGER + default 1024 + + It sets the available memory threshold (in pages), which is + used in the automatic modes of defense. When there is no + enough available memory, the respective strategy will be + enabled and the variable is automatically set to 2, otherwise + the strategy is disabled and the variable is set to 1. + +cache_bypass - BOOLEAN + 0 - disabled (default) + not 0 - enabled + + If it is enabled, forward packets to the original destination + directly when no cache server is available and destination + address is not local (iph->daddr is RTN_UNICAST). It is mostly + used in transparent web cache cluster. + +debug_level - INTEGER + 0 - transmission error messages (default) + 1 - non-fatal error messages + 2 - configuration + 3 - destination trash + 4 - drop entry + 5 - service lookup + 6 - scheduling + 7 - connection new/expire, lookup and synchronization + 8 - state transition + 9 - binding destination, template checks and applications + 10 - IPVS packet transmission + 11 - IPVS packet handling (ip_vs_in/ip_vs_out) + 12 or more - packet traversal + + Only available when IPVS is compiled with the CONFIG_IPVS_DEBUG + + Higher debugging levels include the messages for lower debugging + levels, so setting debug level 2, includes level 0, 1 and 2 + messages. Thus, logging becomes more and more verbose the higher + the level. + +drop_entry - INTEGER + 0 - disabled (default) + + The drop_entry defense is to randomly drop entries in the + connection hash table, just in order to collect back some + memory for new connections. In the current code, the + drop_entry procedure can be activated every second, then it + randomly scans 1/32 of the whole and drops entries that are in + the SYN-RECV/SYNACK state, which should be effective against + syn-flooding attack. + + The valid values of drop_entry are from 0 to 3, where 0 means + that this strategy is always disabled, 1 and 2 mean automatic + modes (when there is no enough available memory, the strategy + is enabled and the variable is automatically set to 2, + otherwise the strategy is disabled and the variable is set to + 1), and 3 means that that the strategy is always enabled. + +drop_packet - INTEGER + 0 - disabled (default) + + The drop_packet defense is designed to drop 1/rate packets + before forwarding them to real servers. If the rate is 1, then + drop all the incoming packets. + + The value definition is the same as that of the drop_entry. In + the automatic mode, the rate is determined by the follow + formula: rate = amemthresh / (amemthresh - available_memory) + when available memory is less than the available memory + threshold. When the mode 3 is set, the always mode drop rate + is controlled by the /proc/sys/net/ipv4/vs/am_droprate. + +expire_nodest_conn - BOOLEAN + 0 - disabled (default) + not 0 - enabled + + The default value is 0, the load balancer will silently drop + packets when its destination server is not available. It may + be useful, when user-space monitoring program deletes the + destination server (because of server overload or wrong + detection) and add back the server later, and the connections + to the server can continue. + + If this feature is enabled, the load balancer will expire the + connection immediately when a packet arrives and its + destination server is not available, then the client program + will be notified that the connection is closed. This is + equivalent to the feature some people requires to flush + connections when its destination is not available. + +expire_quiescent_template - BOOLEAN + 0 - disabled (default) + not 0 - enabled + + When set to a non-zero value, the load balancer will expire + persistent templates when the destination server is quiescent. + This may be useful, when a user makes a destination server + quiescent by setting its weight to 0 and it is desired that + subsequent otherwise persistent connections are sent to a + different destination server. By default new persistent + connections are allowed to quiescent destination servers. + + If this feature is enabled, the load balancer will expire the + persistence template if it is to be used to schedule a new + connection and the destination server is quiescent. + +nat_icmp_send - BOOLEAN + 0 - disabled (default) + not 0 - enabled + + It controls sending icmp error messages (ICMP_DEST_UNREACH) + for VS/NAT when the load balancer receives packets from real + servers but the connection entries don't exist. + +secure_tcp - INTEGER + 0 - disabled (default) + + The secure_tcp defense is to use a more complicated state + transition table and some possible short timeouts of each + state. In the VS/NAT, it delays the entering the ESTABLISHED + until the real server starts to send data and ACK packet + (after 3-way handshake). + + The value definition is the same as that of drop_entry or + drop_packet. + +sync_threshold - INTEGER + default 3 + + It sets synchronization threshold, which is the minimum number + of incoming packets that a connection needs to receive before + the connection will be synchronized. A connection will be + synchronized, every time the number of its incoming packets + modulus 50 equals the threshold. The range of the threshold is + from 0 to 49. diff --git a/Documentation/networking/irda.txt b/Documentation/networking/irda.txt new file mode 100644 index 0000000..bff26c1 --- /dev/null +++ b/Documentation/networking/irda.txt @@ -0,0 +1,10 @@ +To use the IrDA protocols within Linux you will need to get a suitable copy +of the IrDA Utilities. More detailed information about these and associated +programs can be found on http://irda.sourceforge.net/ + +For more information about how to use the IrDA protocol stack, see the +Linux Infrared HOWTO by Werner Heuser <wehe@tuxmobil.org>: +<http://www.tuxmobil.org/Infrared-HOWTO/Infrared-HOWTO.html> + +There is an active mailing list for discussing Linux-IrDA matters called + irda-users@lists.sourceforge.net diff --git a/Documentation/networking/ixgb.txt b/Documentation/networking/ixgb.txt new file mode 100644 index 0000000..a0d0ffb --- /dev/null +++ b/Documentation/networking/ixgb.txt @@ -0,0 +1,433 @@ +Linux Base Driver for 10 Gigabit Intel(R) Network Connection +============================================================= + +October 9, 2007 + + +Contents +======== + +- In This Release +- Identifying Your Adapter +- Building and Installation +- Command Line Parameters +- Improving Performance +- Additional Configurations +- Known Issues/Troubleshooting +- Support + + + +In This Release +=============== + +This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R) +Network Connection. This driver includes support for Itanium(R)2-based +systems. + +For questions related to hardware requirements, refer to the documentation +supplied with your 10 Gigabit adapter. All hardware requirements listed apply +to use with Linux. + +The following features are available in this kernel: + - Native VLANs + - Channel Bonding (teaming) + - SNMP + +Channel Bonding documentation can be found in the Linux kernel source: +/Documentation/networking/bonding.txt + +The driver information previously displayed in the /proc filesystem is not +supported in this release. Alternatively, you can use ethtool (version 1.6 +or later), lspci, and ifconfig to obtain the same information. + +Instructions on updating ethtool can be found in the section "Additional +Configurations" later in this document. + + +Identifying Your Adapter +======================== + +The following Intel network adapters are compatible with the drivers in this +release: + +Controller Adapter Name Physical Layer +---------- ------------ -------------- +82597EX Intel(R) PRO/10GbE LR/SR/CX4 10G Base-LR (1310 nm optical fiber) + Server Adapters 10G Base-SR (850 nm optical fiber) + 10G Base-CX4(twin-axial copper cabling) + +For more information on how to identify your adapter, go to the Adapter & +Driver ID Guide at: + + http://support.intel.com/support/network/sb/CS-012904.htm + + +Building and Installation +========================= + +select m for "Intel(R) PRO/10GbE support" located at: + Location: + -> Device Drivers + -> Network device support (NETDEVICES [=y]) + -> Ethernet (10000 Mbit) (NETDEV_10000 [=y]) +1. make modules && make modules_install + +2. Load the module: + +    modprobe ixgb <parameter>=<value> + + The insmod command can be used if the full + path to the driver module is specified. For example: + + insmod /lib/modules/<KERNEL VERSION>/kernel/drivers/net/ixgb/ixgb.ko + + With 2.6 based kernels also make sure that older ixgb drivers are + removed from the kernel, before loading the new module: + + rmmod ixgb; modprobe ixgb + +3. Assign an IP address to the interface by entering the following, where + x is the interface number: + + ifconfig ethx <IP_address> + +4. Verify that the interface works. Enter the following, where <IP_address> + is the IP address for another machine on the same subnet as the interface + that is being tested: + + ping <IP_address> + + +Command Line Parameters +======================= + +If the driver is built as a module, the following optional parameters are +used by entering them on the command line with the modprobe command using +this syntax: + + modprobe ixgb [<option>=<VAL1>,<VAL2>,...] + +For example, with two 10GbE PCI adapters, entering: + + modprobe ixgb TxDescriptors=80,128 + +loads the ixgb driver with 80 TX resources for the first adapter and 128 TX +resources for the second adapter. + +The default value for each parameter is generally the recommended setting, +unless otherwise noted. + +FlowControl +Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx) +Default: Read from the EEPROM + If EEPROM is not detected, default is 1 + This parameter controls the automatic generation(Tx) and response(Rx) to + Ethernet PAUSE frames. There are hardware bugs associated with enabling + Tx flow control so beware. + +RxDescriptors +Valid Range: 64-512 +Default Value: 512 + This value is the number of receive descriptors allocated by the driver. + Increasing this value allows the driver to buffer more incoming packets. + Each descriptor is 16 bytes. A receive buffer is also allocated for + each descriptor and can be either 2048, 4056, 8192, or 16384 bytes, + depending on the MTU setting. When the MTU size is 1500 or less, the + receive buffer size is 2048 bytes. When the MTU is greater than 1500 the + receive buffer size will be either 4056, 8192, or 16384 bytes. The + maximum MTU size is 16114. + +RxIntDelay +Valid Range: 0-65535 (0=off) +Default Value: 72 + This value delays the generation of receive interrupts in units of + 0.8192 microseconds. Receive interrupt reduction can improve CPU + efficiency if properly tuned for specific network traffic. Increasing + this value adds extra latency to frame reception and can end up + decreasing the throughput of TCP traffic. If the system is reporting + dropped receives, this value may be set too high, causing the driver to + run out of available receive descriptors. + +TxDescriptors +Valid Range: 64-4096 +Default Value: 256 + This value is the number of transmit descriptors allocated by the driver. + Increasing this value allows the driver to queue more transmits. Each + descriptor is 16 bytes. + +XsumRX +Valid Range: 0-1 +Default Value: 1 + A value of '1' indicates that the driver should enable IP checksum + offload for received packets (both UDP and TCP) to the adapter hardware. + + +Improving Performance +===================== + +With the 10 Gigabit server adapters, the default Linux configuration will +very likely limit the total available throughput artificially. There is a set +of configuration changes that, when applied together, will increase the ability +of Linux to transmit and receive data. The following enhancements were +originally acquired from settings published at http://www.spec.org/web99/ for +various submitted results using Linux. + +NOTE: These changes are only suggestions, and serve as a starting point for + tuning your network performance. + +The changes are made in three major ways, listed in order of greatest effect: +- Use ifconfig to modify the mtu (maximum transmission unit) and the txqueuelen + parameter. +- Use sysctl to modify /proc parameters (essentially kernel tuning) +- Use setpci to modify the MMRBC field in PCI-X configuration space to increase + transmit burst lengths on the bus. + +NOTE: setpci modifies the adapter's configuration registers to allow it to read +up to 4k bytes at a time (for transmits). However, for some systems the +behavior after modifying this register may be undefined (possibly errors of +some kind). A power-cycle, hard reset or explicitly setting the e6 register +back to 22 (setpci -d 8086:1a48 e6.b=22) may be required to get back to a +stable configuration. + +- COPY these lines and paste them into ixgb_perf.sh: +#!/bin/bash +echo "configuring network performance , edit this file to change the interface +or device ID of 10GbE card" +# set mmrbc to 4k reads, modify only Intel 10GbE device IDs +# replace 1a48 with appropriate 10GbE device's ID installed on the system, +# if needed. +setpci -d 8086:1a48 e6.b=2e +# set the MTU (max transmission unit) - it requires your switch and clients +# to change as well. +# set the txqueuelen +# your ixgb adapter should be loaded as eth1 for this to work, change if needed +ifconfig eth1 mtu 9000 txqueuelen 1000 up +# call the sysctl utility to modify /proc/sys entries +sysctl -p ./sysctl_ixgb.conf +- END ixgb_perf.sh + +- COPY these lines and paste them into sysctl_ixgb.conf: +# some of the defaults may be different for your kernel +# call this file with sysctl -p <this file> +# these are just suggested values that worked well to increase throughput in +# several network benchmark tests, your mileage may vary + +### IPV4 specific settings +# turn TCP timestamp support off, default 1, reduces CPU use +net.ipv4.tcp_timestamps = 0 +# turn SACK support off, default on +# on systems with a VERY fast bus -> memory interface this is the big gainer +net.ipv4.tcp_sack = 0 +# set min/default/max TCP read buffer, default 4096 87380 174760 +net.ipv4.tcp_rmem = 10000000 10000000 10000000 +# set min/pressure/max TCP write buffer, default 4096 16384 131072 +net.ipv4.tcp_wmem = 10000000 10000000 10000000 +# set min/pressure/max TCP buffer space, default 31744 32256 32768 +net.ipv4.tcp_mem = 10000000 10000000 10000000 + +### CORE settings (mostly for socket and UDP effect) +# set maximum receive socket buffer size, default 131071 +net.core.rmem_max = 524287 +# set maximum send socket buffer size, default 131071 +net.core.wmem_max = 524287 +# set default receive socket buffer size, default 65535 +net.core.rmem_default = 524287 +# set default send socket buffer size, default 65535 +net.core.wmem_default = 524287 +# set maximum amount of option memory buffers, default 10240 +net.core.optmem_max = 524287 +# set number of unprocessed input packets before kernel starts dropping them; default 300 +net.core.netdev_max_backlog = 300000 +- END sysctl_ixgb.conf + +Edit the ixgb_perf.sh script if necessary to change eth1 to whatever interface +your ixgb driver is using and/or replace '1a48' with appropriate 10GbE device's +ID installed on the system. + +NOTE: Unless these scripts are added to the boot process, these changes will + only last only until the next system reboot. + + +Resolving Slow UDP Traffic +-------------------------- +If your server does not seem to be able to receive UDP traffic as fast as it +can receive TCP traffic, it could be because Linux, by default, does not set +the network stack buffers as large as they need to be to support high UDP +transfer rates. One way to alleviate this problem is to allow more memory to +be used by the IP stack to store incoming data. + +For instance, use the commands: + sysctl -w net.core.rmem_max=262143 +and + sysctl -w net.core.rmem_default=262143 +to increase the read buffer memory max and default to 262143 (256k - 1) from +defaults of max=131071 (128k - 1) and default=65535 (64k - 1). These variables +will increase the amount of memory used by the network stack for receives, and +can be increased significantly more if necessary for your application. + + +Additional Configurations +========================= + + Configuring the Driver on Different Distributions + ------------------------------------------------- + Configuring a network driver to load properly when the system is started is + distribution dependent. Typically, the configuration process involves adding + an alias line to /etc/modprobe.conf as well as editing other system startup + scripts and/or configuration files. Many popular Linux distributions ship + with tools to make these changes for you. To learn the proper way to + configure a network device for your system, refer to your distribution + documentation. If during this process you are asked for the driver or module + name, the name for the Linux Base Driver for the Intel 10GbE Family of + Adapters is ixgb. + + Viewing Link Messages + --------------------- + Link messages will not be displayed to the console if the distribution is + restricting system messages. In order to see network driver link messages on + your console, set dmesg to eight by entering the following: + + dmesg -n 8 + + NOTE: This setting is not saved across reboots. + + + Jumbo Frames + ------------ + The driver supports Jumbo Frames for all adapters. Jumbo Frames support is + enabled by changing the MTU to a value larger than the default of 1500. + The maximum value for the MTU is 16114. Use the ifconfig command to + increase the MTU size. For example: + + ifconfig ethx mtu 9000 up + + The maximum MTU setting for Jumbo Frames is 16114. This value coincides + with the maximum Jumbo Frames size of 16128. + + + Ethtool + ------- + The driver utilizes the ethtool interface for driver configuration and + diagnostics, as well as displaying statistical information. Ethtool + version 1.6 or later is required for this functionality. + + The latest release of ethtool can be found from + http://sourceforge.net/projects/gkernel + + NOTE: Ethtool 1.6 only supports a limited set of ethtool options. Support + for a more complete ethtool feature set can be enabled by upgrading + to the latest version. + + + NAPI + ---- + + NAPI (Rx polling mode) is supported in the ixgb driver. NAPI is enabled + or disabled based on the configuration of the kernel. see CONFIG_IXGB_NAPI + + See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI. + + +Known Issues/Troubleshooting +============================ + + NOTE: After installing the driver, if your Intel Network Connection is not + working, verify in the "In This Release" section of the readme that you have + installed the correct driver. + + Intel(R) PRO/10GbE CX4 Server Adapter Cable Interoperability Issue with + Fujitsu XENPAK Module in SmartBits Chassis + --------------------------------------------------------------------- + Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 + Server adapter is connected to a Fujitsu XENPAK CX4 module in a SmartBits + chassis using 15 m/24AWG cable assemblies manufactured by Fujitsu or Leoni. + The CRC errors may be received either by the Intel(R) PRO/10GbE CX4 + Server adapter or the SmartBits. If this situation occurs using a different + cable assembly may resolve the issue. + + CX4 Server Adapter Cable Interoperability Issues with HP Procurve 3400cl + Switch Port + ------------------------------------------------------------------------ + Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 Server + adapter is connected to an HP Procurve 3400cl switch port using short cables + (1 m or shorter). If this situation occurs, using a longer cable may resolve + the issue. + + Excessive CRC errors may be observed using Fujitsu 24AWG cable assemblies that + Are 10 m or longer or where using a Leoni 15 m/24AWG cable assembly. The CRC + errors may be received either by the CX4 Server adapter or at the switch. If + this situation occurs, using a different cable assembly may resolve the issue. + + + Jumbo Frames System Requirement + ------------------------------- + Memory allocation failures have been observed on Linux systems with 64 MB + of RAM or less that are running Jumbo Frames. If you are using Jumbo + Frames, your system may require more than the advertised minimum + requirement of 64 MB of system memory. + + + Performance Degradation with Jumbo Frames + ----------------------------------------- + Degradation in throughput performance may be observed in some Jumbo frames + environments. If this is observed, increasing the application's socket buffer + size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help. + See the specific application manual and /usr/src/linux*/Documentation/ + networking/ip-sysctl.txt for more details. + + + Allocating Rx Buffers when Using Jumbo Frames + --------------------------------------------- + Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if + the available memory is heavily fragmented. This issue may be seen with PCI-X + adapters or with packet split disabled. This can be reduced or eliminated + by changing the amount of available memory for receive buffer allocation, by + increasing /proc/sys/vm/min_free_kbytes. + + + Multiple Interfaces on Same Ethernet Broadcast Network + ------------------------------------------------------ + Due to the default ARP behavior on Linux, it is not possible to have + one system on two IP networks in the same Ethernet broadcast domain + (non-partitioned switch) behave as expected. All Ethernet interfaces + will respond to IP traffic for any IP address assigned to the system. + This results in unbalanced receive traffic. + + If you have multiple interfaces in a server, do either of the following: + + - Turn on ARP filtering by entering: + echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter + + - Install the interfaces in separate broadcast domains - either in + different switches or in a switch partitioned to VLANs. + + + UDP Stress Test Dropped Packet Issue + -------------------------------------- + Under small packets UDP stress test with 10GbE driver, the Linux system + may drop UDP packets due to the fullness of socket buffers. You may want + to change the driver's Flow Control variables to the minimum value for + controlling packet reception. + + + Tx Hangs Possible Under Stress + ------------------------------ + Under stress conditions, if TX hangs occur, turning off TSO + "ethtool -K eth0 tso off" may resolve the problem. + + +Support +======= + +For general information, go to the Intel support website at: + + http://support.intel.com + +or the Intel Wired Networking project hosted by Sourceforge at: + + http://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on the supported +kernel with a supported adapter, email the specific information related +to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/l2tp.txt b/Documentation/networking/l2tp.txt new file mode 100644 index 0000000..2451f55 --- /dev/null +++ b/Documentation/networking/l2tp.txt @@ -0,0 +1,169 @@ +This brief document describes how to use the kernel's PPPoL2TP driver +to provide L2TP functionality. L2TP is a protocol that tunnels one or +more PPP sessions over a UDP tunnel. It is commonly used for VPNs +(L2TP/IPSec) and by ISPs to tunnel subscriber PPP sessions over an IP +network infrastructure. + +Design +====== + +The PPPoL2TP driver, drivers/net/pppol2tp.c, provides a mechanism by +which PPP frames carried through an L2TP session are passed through +the kernel's PPP subsystem. The standard PPP daemon, pppd, handles all +PPP interaction with the peer. PPP network interfaces are created for +each local PPP endpoint. + +The L2TP protocol http://www.faqs.org/rfcs/rfc2661.html defines L2TP +control and data frames. L2TP control frames carry messages between +L2TP clients/servers and are used to setup / teardown tunnels and +sessions. An L2TP client or server is implemented in userspace and +will use a regular UDP socket per tunnel. L2TP data frames carry PPP +frames, which may be PPP control or PPP data. The kernel's PPP +subsystem arranges for PPP control frames to be delivered to pppd, +while data frames are forwarded as usual. + +Each tunnel and session within a tunnel is assigned a unique tunnel_id +and session_id. These ids are carried in the L2TP header of every +control and data packet. The pppol2tp driver uses them to lookup +internal tunnel and/or session contexts. Zero tunnel / session ids are +treated specially - zero ids are never assigned to tunnels or sessions +in the network. In the driver, the tunnel context keeps a pointer to +the tunnel UDP socket. The session context keeps a pointer to the +PPPoL2TP socket, as well as other data that lets the driver interface +to the kernel PPP subsystem. + +Note that the pppol2tp kernel driver handles only L2TP data frames; +L2TP control frames are simply passed up to userspace in the UDP +tunnel socket. The kernel handles all datapath aspects of the +protocol, including data packet resequencing (if enabled). + +There are a number of requirements on the userspace L2TP daemon in +order to use the pppol2tp driver. + +1. Use a UDP socket per tunnel. + +2. Create a single PPPoL2TP socket per tunnel bound to a special null + session id. This is used only for communicating with the driver but + must remain open while the tunnel is active. Opening this tunnel + management socket causes the driver to mark the tunnel socket as an + L2TP UDP encapsulation socket and flags it for use by the + referenced tunnel id. This hooks up the UDP receive path via + udp_encap_rcv() in net/ipv4/udp.c. PPP data frames are never passed + in this special PPPoX socket. + +3. Create a PPPoL2TP socket per L2TP session. This is typically done + by starting pppd with the pppol2tp plugin and appropriate + arguments. A PPPoL2TP tunnel management socket (Step 2) must be + created before the first PPPoL2TP session socket is created. + +When creating PPPoL2TP sockets, the application provides information +to the driver about the socket in a socket connect() call. Source and +destination tunnel and session ids are provided, as well as the file +descriptor of a UDP socket. See struct pppol2tp_addr in +include/linux/if_ppp.h. Note that zero tunnel / session ids are +treated specially. When creating the per-tunnel PPPoL2TP management +socket in Step 2 above, zero source and destination session ids are +specified, which tells the driver to prepare the supplied UDP file +descriptor for use as an L2TP tunnel socket. + +Userspace may control behavior of the tunnel or session using +setsockopt and ioctl on the PPPoX socket. The following socket +options are supported:- + +DEBUG - bitmask of debug message categories. See below. +SENDSEQ - 0 => don't send packets with sequence numbers + 1 => send packets with sequence numbers +RECVSEQ - 0 => receive packet sequence numbers are optional + 1 => drop receive packets without sequence numbers +LNSMODE - 0 => act as LAC. + 1 => act as LNS. +REORDERTO - reorder timeout (in millisecs). If 0, don't try to reorder. + +Only the DEBUG option is supported by the special tunnel management +PPPoX socket. + +In addition to the standard PPP ioctls, a PPPIOCGL2TPSTATS is provided +to retrieve tunnel and session statistics from the kernel using the +PPPoX socket of the appropriate tunnel or session. + +Debugging +========= + +The driver supports a flexible debug scheme where kernel trace +messages may be optionally enabled per tunnel and per session. Care is +needed when debugging a live system since the messages are not +rate-limited and a busy system could be swamped. Userspace uses +setsockopt on the PPPoX socket to set a debug mask. + +The following debug mask bits are available: + +PPPOL2TP_MSG_DEBUG verbose debug (if compiled in) +PPPOL2TP_MSG_CONTROL userspace - kernel interface +PPPOL2TP_MSG_SEQ sequence numbers handling +PPPOL2TP_MSG_DATA data packets + +Sample Userspace Code +===================== + +1. Create tunnel management PPPoX socket + + kernel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP); + if (kernel_fd >= 0) { + struct sockaddr_pppol2tp sax; + struct sockaddr_in const *peer_addr; + + peer_addr = l2tp_tunnel_get_peer_addr(tunnel); + memset(&sax, 0, sizeof(sax)); + sax.sa_family = AF_PPPOX; + sax.sa_protocol = PX_PROTO_OL2TP; + sax.pppol2tp.fd = udp_fd; /* fd of tunnel UDP socket */ + sax.pppol2tp.addr.sin_addr.s_addr = peer_addr->sin_addr.s_addr; + sax.pppol2tp.addr.sin_port = peer_addr->sin_port; + sax.pppol2tp.addr.sin_family = AF_INET; + sax.pppol2tp.s_tunnel = tunnel_id; + sax.pppol2tp.s_session = 0; /* special case: mgmt socket */ + sax.pppol2tp.d_tunnel = 0; + sax.pppol2tp.d_session = 0; /* special case: mgmt socket */ + + if(connect(kernel_fd, (struct sockaddr *)&sax, sizeof(sax) ) < 0 ) { + perror("connect failed"); + result = -errno; + goto err; + } + } + +2. Create session PPPoX data socket + + struct sockaddr_pppol2tp sax; + int fd; + + /* Note, the target socket must be bound already, else it will not be ready */ + sax.sa_family = AF_PPPOX; + sax.sa_protocol = PX_PROTO_OL2TP; + sax.pppol2tp.fd = tunnel_fd; + sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr; + sax.pppol2tp.addr.sin_port = addr->sin_port; + sax.pppol2tp.addr.sin_family = AF_INET; + sax.pppol2tp.s_tunnel = tunnel_id; + sax.pppol2tp.s_session = session_id; + sax.pppol2tp.d_tunnel = peer_tunnel_id; + sax.pppol2tp.d_session = peer_session_id; + + /* session_fd is the fd of the session's PPPoL2TP socket. + * tunnel_fd is the fd of the tunnel UDP socket. + */ + fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax)); + if (fd < 0 ) { + return -errno; + } + return 0; + +Miscellanous +============ + +The PPPoL2TP driver was developed as part of the OpenL2TP project by +Katalix Systems Ltd. OpenL2TP is a full-featured L2TP client / server, +designed from the ground up to have the L2TP datapath in the +kernel. The project also implemented the pppol2tp plugin for pppd +which allows pppd to use the kernel driver. Details can be found at +http://openl2tp.sourceforge.net. diff --git a/Documentation/networking/lapb-module.txt b/Documentation/networking/lapb-module.txt new file mode 100644 index 0000000..d4fc8f2 --- /dev/null +++ b/Documentation/networking/lapb-module.txt @@ -0,0 +1,263 @@ + The Linux LAPB Module Interface 1.3 + + Jonathan Naylor 29.12.96 + +Changed (Henner Eisen, 2000-10-29): int return value for data_indication() + +The LAPB module will be a separately compiled module for use by any parts of +the Linux operating system that require a LAPB service. This document +defines the interfaces to, and the services provided by this module. The +term module in this context does not imply that the LAPB module is a +separately loadable module, although it may be. The term module is used in +its more standard meaning. + +The interface to the LAPB module consists of functions to the module, +callbacks from the module to indicate important state changes, and +structures for getting and setting information about the module. + +Structures +---------- + +Probably the most important structure is the skbuff structure for holding +received and transmitted data, however it is beyond the scope of this +document. + +The two LAPB specific structures are the LAPB initialisation structure and +the LAPB parameter structure. These will be defined in a standard header +file, <linux/lapb.h>. The header file <net/lapb.h> is internal to the LAPB +module and is not for use. + +LAPB Initialisation Structure +----------------------------- + +This structure is used only once, in the call to lapb_register (see below). +It contains information about the device driver that requires the services +of the LAPB module. + +struct lapb_register_struct { + void (*connect_confirmation)(int token, int reason); + void (*connect_indication)(int token, int reason); + void (*disconnect_confirmation)(int token, int reason); + void (*disconnect_indication)(int token, int reason); + int (*data_indication)(int token, struct sk_buff *skb); + void (*data_transmit)(int token, struct sk_buff *skb); +}; + +Each member of this structure corresponds to a function in the device driver +that is called when a particular event in the LAPB module occurs. These will +be described in detail below. If a callback is not required (!!) then a NULL +may be substituted. + + +LAPB Parameter Structure +------------------------ + +This structure is used with the lapb_getparms and lapb_setparms functions +(see below). They are used to allow the device driver to get and set the +operational parameters of the LAPB implementation for a given connection. + +struct lapb_parms_struct { + unsigned int t1; + unsigned int t1timer; + unsigned int t2; + unsigned int t2timer; + unsigned int n2; + unsigned int n2count; + unsigned int window; + unsigned int state; + unsigned int mode; +}; + +T1 and T2 are protocol timing parameters and are given in units of 100ms. N2 +is the maximum number of tries on the link before it is declared a failure. +The window size is the maximum number of outstanding data packets allowed to +be unacknowledged by the remote end, the value of the window is between 1 +and 7 for a standard LAPB link, and between 1 and 127 for an extended LAPB +link. + +The mode variable is a bit field used for setting (at present) three values. +The bit fields have the following meanings: + +Bit Meaning +0 LAPB operation (0=LAPB_STANDARD 1=LAPB_EXTENDED). +1 [SM]LP operation (0=LAPB_SLP 1=LAPB=MLP). +2 DTE/DCE operation (0=LAPB_DTE 1=LAPB_DCE) +3-31 Reserved, must be 0. + +Extended LAPB operation indicates the use of extended sequence numbers and +consequently larger window sizes, the default is standard LAPB operation. +MLP operation is the same as SLP operation except that the addresses used by +LAPB are different to indicate the mode of operation, the default is Single +Link Procedure. The difference between DCE and DTE operation is (i) the +addresses used for commands and responses, and (ii) when the DCE is not +connected, it sends DM without polls set, every T1. The upper case constant +names will be defined in the public LAPB header file. + + +Functions +--------- + +The LAPB module provides a number of function entry points. + + +int lapb_register(void *token, struct lapb_register_struct); + +This must be called before the LAPB module may be used. If the call is +successful then LAPB_OK is returned. The token must be a unique identifier +generated by the device driver to allow for the unique identification of the +instance of the LAPB link. It is returned by the LAPB module in all of the +callbacks, and is used by the device driver in all calls to the LAPB module. +For multiple LAPB links in a single device driver, multiple calls to +lapb_register must be made. The format of the lapb_register_struct is given +above. The return values are: + +LAPB_OK LAPB registered successfully. +LAPB_BADTOKEN Token is already registered. +LAPB_NOMEM Out of memory + + +int lapb_unregister(void *token); + +This releases all the resources associated with a LAPB link. Any current +LAPB link will be abandoned without further messages being passed. After +this call, the value of token is no longer valid for any calls to the LAPB +function. The valid return values are: + +LAPB_OK LAPB unregistered successfully. +LAPB_BADTOKEN Invalid/unknown LAPB token. + + +int lapb_getparms(void *token, struct lapb_parms_struct *parms); + +This allows the device driver to get the values of the current LAPB +variables, the lapb_parms_struct is described above. The valid return values +are: + +LAPB_OK LAPB getparms was successful. +LAPB_BADTOKEN Invalid/unknown LAPB token. + + +int lapb_setparms(void *token, struct lapb_parms_struct *parms); + +This allows the device driver to set the values of the current LAPB +variables, the lapb_parms_struct is described above. The values of t1timer, +t2timer and n2count are ignored, likewise changing the mode bits when +connected will be ignored. An error implies that none of the values have +been changed. The valid return values are: + +LAPB_OK LAPB getparms was successful. +LAPB_BADTOKEN Invalid/unknown LAPB token. +LAPB_INVALUE One of the values was out of its allowable range. + + +int lapb_connect_request(void *token); + +Initiate a connect using the current parameter settings. The valid return +values are: + +LAPB_OK LAPB is starting to connect. +LAPB_BADTOKEN Invalid/unknown LAPB token. +LAPB_CONNECTED LAPB module is already connected. + + +int lapb_disconnect_request(void *token); + +Initiate a disconnect. The valid return values are: + +LAPB_OK LAPB is starting to disconnect. +LAPB_BADTOKEN Invalid/unknown LAPB token. +LAPB_NOTCONNECTED LAPB module is not connected. + + +int lapb_data_request(void *token, struct sk_buff *skb); + +Queue data with the LAPB module for transmitting over the link. If the call +is successful then the skbuff is owned by the LAPB module and may not be +used by the device driver again. The valid return values are: + +LAPB_OK LAPB has accepted the data. +LAPB_BADTOKEN Invalid/unknown LAPB token. +LAPB_NOTCONNECTED LAPB module is not connected. + + +int lapb_data_received(void *token, struct sk_buff *skb); + +Queue data with the LAPB module which has been received from the device. It +is expected that the data passed to the LAPB module has skb->data pointing +to the beginning of the LAPB data. If the call is successful then the skbuff +is owned by the LAPB module and may not be used by the device driver again. +The valid return values are: + +LAPB_OK LAPB has accepted the data. +LAPB_BADTOKEN Invalid/unknown LAPB token. + + +Callbacks +--------- + +These callbacks are functions provided by the device driver for the LAPB +module to call when an event occurs. They are registered with the LAPB +module with lapb_register (see above) in the structure lapb_register_struct +(see above). + + +void (*connect_confirmation)(void *token, int reason); + +This is called by the LAPB module when a connection is established after +being requested by a call to lapb_connect_request (see above). The reason is +always LAPB_OK. + + +void (*connect_indication)(void *token, int reason); + +This is called by the LAPB module when the link is established by the remote +system. The value of reason is always LAPB_OK. + + +void (*disconnect_confirmation)(void *token, int reason); + +This is called by the LAPB module when an event occurs after the device +driver has called lapb_disconnect_request (see above). The reason indicates +what has happened. In all cases the LAPB link can be regarded as being +terminated. The values for reason are: + +LAPB_OK The LAPB link was terminated normally. +LAPB_NOTCONNECTED The remote system was not connected. +LAPB_TIMEDOUT No response was received in N2 tries from the remote + system. + + +void (*disconnect_indication)(void *token, int reason); + +This is called by the LAPB module when the link is terminated by the remote +system or another event has occurred to terminate the link. This may be +returned in response to a lapb_connect_request (see above) if the remote +system refused the request. The values for reason are: + +LAPB_OK The LAPB link was terminated normally by the remote + system. +LAPB_REFUSED The remote system refused the connect request. +LAPB_NOTCONNECTED The remote system was not connected. +LAPB_TIMEDOUT No response was received in N2 tries from the remote + system. + + +int (*data_indication)(void *token, struct sk_buff *skb); + +This is called by the LAPB module when data has been received from the +remote system that should be passed onto the next layer in the protocol +stack. The skbuff becomes the property of the device driver and the LAPB +module will not perform any more actions on it. The skb->data pointer will +be pointing to the first byte of data after the LAPB header. + +This method should return NET_RX_DROP (as defined in the header +file include/linux/netdevice.h) if and only if the frame was dropped +before it could be delivered to the upper layer. + + +void (*data_transmit)(void *token, struct sk_buff *skb); + +This is called by the LAPB module when data is to be transmitted to the +remote system by the device driver. The skbuff becomes the property of the +device driver and the LAPB module will not perform any more actions on it. +The skb->data pointer will be pointing to the first byte of the LAPB header. diff --git a/Documentation/networking/ltpc.txt b/Documentation/networking/ltpc.txt new file mode 100644 index 0000000..fe2a912 --- /dev/null +++ b/Documentation/networking/ltpc.txt @@ -0,0 +1,131 @@ +This is the ALPHA version of the ltpc driver. + +In order to use it, you will need at least version 1.3.3 of the +netatalk package, and the Apple or Farallon LocalTalk PC card. +There are a number of different LocalTalk cards for the PC; this +driver applies only to the one with the 65c02 processor chip on it. + +To include it in the kernel, select the CONFIG_LTPC switch in the +configuration dialog. You can also compile it as a module. + +While the driver will attempt to autoprobe the I/O port address, IRQ +line, and DMA channel of the card, this does not always work. For +this reason, you should be prepared to supply these parameters +yourself. (see "Card Configuration" below for how to determine or +change the settings on your card) + +When the driver is compiled into the kernel, you can add a line such +as the following to your /etc/lilo.conf: + + append="ltpc=0x240,9,1" + +where the parameters (in order) are the port address, IRQ, and DMA +channel. The second and third values can be omitted, in which case +the driver will try to determine them itself. + +If you load the driver as a module, you can pass the parameters "io=", +"irq=", and "dma=" on the command line with insmod or modprobe, or add +them as options in /etc/modprobe.conf: + + alias lt0 ltpc # autoload the module when the interface is configured + options ltpc io=0x240 irq=9 dma=1 + +Before starting up the netatalk demons (perhaps in rc.local), you +need to add a line such as: + + /sbin/ifconfig lt0 127.0.0.42 + +The address is unimportant - however, the card needs to be configured +with ifconfig so that Netatalk can find it. + +The appropriate netatalk configuration depends on whether you are +attached to a network that includes AppleTalk routers or not. If, +like me, you are simply connecting to your home Macintoshes and +printers, you need to set up netatalk to "seed". The way I do this +is to have the lines + + dummy -seed -phase 2 -net 2000 -addr 2000.26 -zone "1033" + lt0 -seed -phase 1 -net 1033 -addr 1033.27 -zone "1033" + +in my atalkd.conf. What is going on here is that I need to fool +netatalk into thinking that there are two AppleTalk interfaces +present; otherwise, it refuses to seed. This is a hack, and a more +permanent solution would be to alter the netatalk code. Also, make +sure you have the correct name for the dummy interface - If it's +compiled as a module, you will need to refer to it as "dummy0" or some +such. + +If you are attached to an extended AppleTalk network, with routers on +it, then you don't need to fool around with this -- the appropriate +line in atalkd.conf is + + lt0 -phase 1 + +-------------------------------------- + +Card Configuration: + +The interrupts and so forth are configured via the dipswitch on the +board. Set the switches so as not to conflict with other hardware. + + Interrupts -- set at most one. If none are set, the driver uses + polled mode. Because the card was developed in the XT era, the + original documentation refers to IRQ2. Since you'll be running + this on an AT (or later) class machine, that really means IRQ9. + + SW1 IRQ 4 + SW2 IRQ 3 + SW3 IRQ 9 (2 in original card documentation only applies to XT) + + + DMA -- choose DMA 1 or 3, and set both corresponding switches. + + SW4 DMA 3 + SW5 DMA 1 + SW6 DMA 3 + SW7 DMA 1 + + + I/O address -- choose one. + + SW8 220 / 240 + +-------------------------------------- + +IP: + +Yes, it is possible to do IP over LocalTalk. However, you can't just +treat the LocalTalk device like an ordinary Ethernet device, even if +that's what it looks like to Netatalk. + +Instead, you follow the same procedure as for doing IP in EtherTalk. +See Documentation/networking/ipddp.txt for more information about the +kernel driver and userspace tools needed. + +-------------------------------------- + +BUGS: + +IRQ autoprobing often doesn't work on a cold boot. To get around +this, either compile the driver as a module, or pass the parameters +for the card to the kernel as described above. + +Also, as usual, autoprobing is not recommended when you use the driver +as a module. (though it usually works at boot time, at least) + +Polled mode is *really* slow sometimes, but this seems to depend on +the configuration of the network. + +It may theoretically be possible to use two LTPC cards in the same +machine, but this is unsupported, so if you really want to do this, +you'll probably have to hack the initialization code a bit. + +______________________________________ + +THANKS: + Thanks to Alan Cox for helpful discussions early on in this +work, and to Denis Hainsworth for doing the bleeding-edge testing. + +-- Bradford Johnson <bradford@math.umn.edu> + +-- Updated 11/09/1998 by David Huggins-Daines <dhd@debian.org> diff --git a/Documentation/networking/mac80211-injection.txt b/Documentation/networking/mac80211-injection.txt new file mode 100644 index 0000000..84906ef --- /dev/null +++ b/Documentation/networking/mac80211-injection.txt @@ -0,0 +1,79 @@ +How to use packet injection with mac80211 +========================================= + +mac80211 now allows arbitrary packets to be injected down any Monitor Mode +interface from userland. The packet you inject needs to be composed in the +following format: + + [ radiotap header ] + [ ieee80211 header ] + [ payload ] + +The radiotap format is discussed in +./Documentation/networking/radiotap-headers.txt. + +Despite 13 radiotap argument types are currently defined, most only make sense +to appear on received packets. The following information is parsed from the +radiotap headers and used to control injection: + + * IEEE80211_RADIOTAP_RATE + + rate in 500kbps units, automatic if invalid or not present + + + * IEEE80211_RADIOTAP_ANTENNA + + antenna to use, automatic if not present + + + * IEEE80211_RADIOTAP_DBM_TX_POWER + + transmit power in dBm, automatic if not present + + + * IEEE80211_RADIOTAP_FLAGS + + IEEE80211_RADIOTAP_F_FCS: FCS will be removed and recalculated + IEEE80211_RADIOTAP_F_WEP: frame will be encrypted if key available + IEEE80211_RADIOTAP_F_FRAG: frame will be fragmented if longer than the + current fragmentation threshold. Note that + this flag is only reliable when software + fragmentation is enabled) + +The injection code can also skip all other currently defined radiotap fields +facilitating replay of captured radiotap headers directly. + +Here is an example valid radiotap header defining these three parameters + + 0x00, 0x00, // <-- radiotap version + 0x0b, 0x00, // <- radiotap header length + 0x04, 0x0c, 0x00, 0x00, // <-- bitmap + 0x6c, // <-- rate + 0x0c, //<-- tx power + 0x01 //<-- antenna + +The ieee80211 header follows immediately afterwards, looking for example like +this: + + 0x08, 0x01, 0x00, 0x00, + 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, + 0x13, 0x22, 0x33, 0x44, 0x55, 0x66, + 0x13, 0x22, 0x33, 0x44, 0x55, 0x66, + 0x10, 0x86 + +Then lastly there is the payload. + +After composing the packet contents, it is sent by send()-ing it to a logical +mac80211 interface that is in Monitor mode. Libpcap can also be used, +(which is easier than doing the work to bind the socket to the right +interface), along the following lines: + + ppcap = pcap_open_live(szInterfaceName, 800, 1, 20, szErrbuf); +... + r = pcap_inject(ppcap, u8aSendBuffer, nLength); + +You can also find sources for a complete inject test applet here: + +http://penumbra.warmcat.com/_twk/tiki-index.php?page=packetspammer + +Andy Green <andy@warmcat.com> diff --git a/Documentation/networking/mac80211_hwsim/README b/Documentation/networking/mac80211_hwsim/README new file mode 100644 index 0000000..2ff8ccb --- /dev/null +++ b/Documentation/networking/mac80211_hwsim/README @@ -0,0 +1,67 @@ +mac80211_hwsim - software simulator of 802.11 radio(s) for mac80211 +Copyright (c) 2008, Jouni Malinen <j@w1.fi> + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License version 2 as +published by the Free Software Foundation. + + +Introduction + +mac80211_hwsim is a Linux kernel module that can be used to simulate +arbitrary number of IEEE 802.11 radios for mac80211. It can be used to +test most of the mac80211 functionality and user space tools (e.g., +hostapd and wpa_supplicant) in a way that matches very closely with +the normal case of using real WLAN hardware. From the mac80211 view +point, mac80211_hwsim is yet another hardware driver, i.e., no changes +to mac80211 are needed to use this testing tool. + +The main goal for mac80211_hwsim is to make it easier for developers +to test their code and work with new features to mac80211, hostapd, +and wpa_supplicant. The simulated radios do not have the limitations +of real hardware, so it is easy to generate an arbitrary test setup +and always reproduce the same setup for future tests. In addition, +since all radio operation is simulated, any channel can be used in +tests regardless of regulatory rules. + +mac80211_hwsim kernel module has a parameter 'radios' that can be used +to select how many radios are simulated (default 2). This allows +configuration of both very simply setups (e.g., just a single access +point and a station) or large scale tests (multiple access points with +hundreds of stations). + +mac80211_hwsim works by tracking the current channel of each virtual +radio and copying all transmitted frames to all other radios that are +currently enabled and on the same channel as the transmitting +radio. Software encryption in mac80211 is used so that the frames are +actually encrypted over the virtual air interface to allow more +complete testing of encryption. + +A global monitoring netdev, hwsim#, is created independent of +mac80211. This interface can be used to monitor all transmitted frames +regardless of channel. + + +Simple example + +This example shows how to use mac80211_hwsim to simulate two radios: +one to act as an access point and the other as a station that +associates with the AP. hostapd and wpa_supplicant are used to take +care of WPA2-PSK authentication. In addition, hostapd is also +processing access point side of association. + +Please note that the current Linux kernel does not enable AP mode, so a +simple patch is needed to enable AP mode selection: +http://johannes.sipsolutions.net/patches/kernel/all/LATEST/006-allow-ap-vlan-modes.patch + + +# Build mac80211_hwsim as part of kernel configuration + +# Load the module +modprobe mac80211_hwsim + +# Run hostapd (AP) for wlan0 +hostapd hostapd.conf + +# Run wpa_supplicant (station) for wlan1 +wpa_supplicant -Dwext -iwlan1 -c wpa_supplicant.conf diff --git a/Documentation/networking/mac80211_hwsim/hostapd.conf b/Documentation/networking/mac80211_hwsim/hostapd.conf new file mode 100644 index 0000000..08cde7e --- /dev/null +++ b/Documentation/networking/mac80211_hwsim/hostapd.conf @@ -0,0 +1,11 @@ +interface=wlan0 +driver=nl80211 + +hw_mode=g +channel=1 +ssid=mac80211 test + +wpa=2 +wpa_key_mgmt=WPA-PSK +wpa_pairwise=CCMP +wpa_passphrase=12345678 diff --git a/Documentation/networking/mac80211_hwsim/wpa_supplicant.conf b/Documentation/networking/mac80211_hwsim/wpa_supplicant.conf new file mode 100644 index 0000000..299128c --- /dev/null +++ b/Documentation/networking/mac80211_hwsim/wpa_supplicant.conf @@ -0,0 +1,10 @@ +ctrl_interface=/var/run/wpa_supplicant + +network={ + ssid="mac80211 test" + psk="12345678" + key_mgmt=WPA-PSK + proto=WPA2 + pairwise=CCMP + group=CCMP +} diff --git a/Documentation/networking/multicast.txt b/Documentation/networking/multicast.txt new file mode 100644 index 0000000..b06c8c6 --- /dev/null +++ b/Documentation/networking/multicast.txt @@ -0,0 +1,63 @@ +Behaviour of Cards Under Multicast +================================== + +This is how they currently behave, not what the hardware can do--for example, +the Lance driver doesn't use its filter, even though the code for loading +it is in the DEC Lance-based driver. + +The following are requirements for multicasting +----------------------------------------------- +AppleTalk Multicast hardware filtering not important but + avoid cards only doing promisc +IP-Multicast Multicast hardware filters really help +IP-MRoute AllMulti hardware filters are of no help + + +Board Multicast AllMulti Promisc Filter +------------------------------------------------------------------------ +3c501 YES YES YES Software +3c503 YES YES YES Hardware +3c505 YES NO YES Hardware +3c507 NO NO NO N/A +3c509 YES YES YES Software +3c59x YES YES YES Software +ac3200 YES YES YES Hardware +apricot YES PROMISC YES Hardware +arcnet NO NO NO N/A +at1700 PROMISC PROMISC YES Software +atp PROMISC PROMISC YES Software +cs89x0 YES YES YES Software +de4x5 YES YES YES Hardware +de600 NO NO NO N/A +de620 PROMISC PROMISC YES Software +depca YES PROMISC YES Hardware +dmfe YES YES YES Software(*) +e2100 YES YES YES Hardware +eepro YES PROMISC YES Hardware +eexpress NO NO NO N/A +ewrk3 YES PROMISC YES Hardware +hp-plus YES YES YES Hardware +hp YES YES YES Hardware +hp100 YES YES YES Hardware +ibmtr NO NO NO N/A +ioc3-eth YES YES YES Hardware +lance YES YES YES Software(#) +ne YES YES YES Hardware +ni52 <------------------ Buggy ------------------> +ni65 YES YES YES Software(#) +seeq NO NO NO N/A +sgiseek <------------------ Buggy ------------------> +smc-ultra YES YES YES Hardware +sunlance YES YES YES Hardware +tulip YES YES YES Hardware +wavelan YES PROMISC YES Hardware +wd YES YES YES Hardware +xirc2ps_cs YES YES YES Hardware +znet YES YES YES Software + + +PROMISC = This multicast mode is in fact promiscuous mode. Avoid using +cards who go PROMISC on any multicast in a multicast kernel. + +(#) = Hardware multicast support is not used yet. +(*) = Hardware support for Davicom 9132 chipset only. diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt new file mode 100644 index 0000000..4caa0e3 --- /dev/null +++ b/Documentation/networking/multiqueue.txt @@ -0,0 +1,79 @@ + + HOWTO for multiqueue network device support + =========================================== + +Section 1: Base driver requirements for implementing multiqueue support + +Intro: Kernel support for multiqueue devices +--------------------------------------------------------- + +Kernel support for multiqueue devices is always present. + +Section 1: Base driver requirements for implementing multiqueue support +----------------------------------------------------------------------- + +Base drivers are required to use the new alloc_etherdev_mq() or +alloc_netdev_mq() functions to allocate the subqueues for the device. The +underlying kernel API will take care of the allocation and deallocation of +the subqueue memory, as well as netdev configuration of where the queues +exist in memory. + +The base driver will also need to manage the queues as it does the global +netdev->queue_lock today. Therefore base drivers should use the +netif_{start|stop|wake}_subqueue() functions to manage each queue while the +device is still operational. netdev->queue_lock is still used when the device +comes online or when it's completely shut down (unregister_netdev(), etc.). + + +Section 2: Qdisc support for multiqueue devices + +----------------------------------------------- + +Currently two qdiscs are optimized for multiqueue devices. The first is the +default pfifo_fast qdisc. This qdisc supports one qdisc per hardware queue. +A new round-robin qdisc, sch_multiq also supports multiple hardware queues. The +qdisc is responsible for classifying the skb's and then directing the skb's to +bands and queues based on the value in skb->queue_mapping. Use this field in +the base driver to determine which queue to send the skb to. + +sch_multiq has been added for hardware that wishes to avoid head-of-line +blocking. It will cycle though the bands and verify that the hardware queue +associated with the band is not stopped prior to dequeuing a packet. + +On qdisc load, the number of bands is based on the number of queues on the +hardware. Once the association is made, any skb with skb->queue_mapping set, +will be queued to the band associated with the hardware queue. + + +Section 3: Brief howto using MULTIQ for multiqueue devices +--------------------------------------------------------------- + +The userspace command 'tc,' part of the iproute2 package, is used to configure +qdiscs. To add the MULTIQ qdisc to your network device, assuming the device +is called eth0, run the following command: + +# tc qdisc add dev eth0 root handle 1: multiq + +The qdisc will allocate the number of bands to equal the number of queues that +the device reports, and bring the qdisc online. Assuming eth0 has 4 Tx +queues, the band mapping would look like: + +band 0 => queue 0 +band 1 => queue 1 +band 2 => queue 2 +band 3 => queue 3 + +Traffic will begin flowing through each queue based on either the simple_tx_hash +function or based on netdev->select_queue() if you have it defined. + +The behavior of tc filters remains the same. However a new tc action, +skbedit, has been added. Assuming you wanted to route all traffic to a +specific host, for example 192.168.0.3, through a specific queue you could use +this action and establish a filter such as: + +tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \ + match ip dst 192.168.0.3 \ + action skbedit queue_mapping 3 + +Author: Alexander Duyck <alexander.h.duyck@intel.com> +Original Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com> diff --git a/Documentation/networking/netconsole.txt b/Documentation/networking/netconsole.txt new file mode 100644 index 0000000..3c2f2b3 --- /dev/null +++ b/Documentation/networking/netconsole.txt @@ -0,0 +1,156 @@ + +started by Ingo Molnar <mingo@redhat.com>, 2001.09.17 +2.6 port and netpoll api by Matt Mackall <mpm@selenic.com>, Sep 9 2003 + +Please send bug reports to Matt Mackall <mpm@selenic.com> +and Satyam Sharma <satyam.sharma@gmail.com> + +Introduction: +============= + +This module logs kernel printk messages over UDP allowing debugging of +problem where disk logging fails and serial consoles are impractical. + +It can be used either built-in or as a module. As a built-in, +netconsole initializes immediately after NIC cards and will bring up +the specified interface as soon as possible. While this doesn't allow +capture of early kernel panics, it does capture most of the boot +process. + +Sender and receiver configuration: +================================== + +It takes a string configuration parameter "netconsole" in the +following format: + + netconsole=[src-port]@[src-ip]/[<dev>],[tgt-port]@<tgt-ip>/[tgt-macaddr] + + where + src-port source for UDP packets (defaults to 6665) + src-ip source IP to use (interface address) + dev network interface (eth0) + tgt-port port for logging agent (6666) + tgt-ip IP address for logging agent + tgt-macaddr ethernet MAC address for logging agent (broadcast) + +Examples: + + linux netconsole=4444@10.0.0.1/eth1,9353@10.0.0.2/12:34:56:78:9a:bc + + or + + insmod netconsole netconsole=@/,@10.0.0.2/ + +It also supports logging to multiple remote agents by specifying +parameters for the multiple agents separated by semicolons and the +complete string enclosed in "quotes", thusly: + + modprobe netconsole netconsole="@/,@10.0.0.2/;@/eth1,6892@10.0.0.3/" + +Built-in netconsole starts immediately after the TCP stack is +initialized and attempts to bring up the supplied dev at the supplied +address. + +The remote host can run either 'netcat -u -l -p <port>' or syslogd. + +Dynamic reconfiguration: +======================== + +Dynamic reconfigurability is a useful addition to netconsole that enables +remote logging targets to be dynamically added, removed, or have their +parameters reconfigured at runtime from a configfs-based userspace interface. +[ Note that the parameters of netconsole targets that were specified/created +from the boot/module option are not exposed via this interface, and hence +cannot be modified dynamically. ] + +To include this feature, select CONFIG_NETCONSOLE_DYNAMIC when building the +netconsole module (or kernel, if netconsole is built-in). + +Some examples follow (where configfs is mounted at the /sys/kernel/config +mountpoint). + +To add a remote logging target (target names can be arbitrary): + + cd /sys/kernel/config/netconsole/ + mkdir target1 + +Note that newly created targets have default parameter values (as mentioned +above) and are disabled by default -- they must first be enabled by writing +"1" to the "enabled" attribute (usually after setting parameters accordingly) +as described below. + +To remove a target: + + rmdir /sys/kernel/config/netconsole/othertarget/ + +The interface exposes these parameters of a netconsole target to userspace: + + enabled Is this target currently enabled? (read-write) + dev_name Local network interface name (read-write) + local_port Source UDP port to use (read-write) + remote_port Remote agent's UDP port (read-write) + local_ip Source IP address to use (read-write) + remote_ip Remote agent's IP address (read-write) + local_mac Local interface's MAC address (read-only) + remote_mac Remote agent's MAC address (read-write) + +The "enabled" attribute is also used to control whether the parameters of +a target can be updated or not -- you can modify the parameters of only +disabled targets (i.e. if "enabled" is 0). + +To update a target's parameters: + + cat enabled # check if enabled is 1 + echo 0 > enabled # disable the target (if required) + echo eth2 > dev_name # set local interface + echo 10.0.0.4 > remote_ip # update some parameter + echo cb:a9:87:65:43:21 > remote_mac # update more parameters + echo 1 > enabled # enable target again + +You can also update the local interface dynamically. This is especially +useful if you want to use interfaces that have newly come up (and may not +have existed when netconsole was loaded / initialized). + +Miscellaneous notes: +==================== + +WARNING: the default target ethernet setting uses the broadcast +ethernet address to send packets, which can cause increased load on +other systems on the same ethernet segment. + +TIP: some LAN switches may be configured to suppress ethernet broadcasts +so it is advised to explicitly specify the remote agents' MAC addresses +from the config parameters passed to netconsole. + +TIP: to find out the MAC address of, say, 10.0.0.2, you may try using: + + ping -c 1 10.0.0.2 ; /sbin/arp -n | grep 10.0.0.2 + +TIP: in case the remote logging agent is on a separate LAN subnet than +the sender, it is suggested to try specifying the MAC address of the +default gateway (you may use /sbin/route -n to find it out) as the +remote MAC address instead. + +NOTE: the network device (eth1 in the above case) can run any kind +of other network traffic, netconsole is not intrusive. Netconsole +might cause slight delays in other traffic if the volume of kernel +messages is high, but should have no other impact. + +NOTE: if you find that the remote logging agent is not receiving or +printing all messages from the sender, it is likely that you have set +the "console_loglevel" parameter (on the sender) to only send high +priority messages to the console. You can change this at runtime using: + + dmesg -n 8 + +or by specifying "debug" on the kernel command line at boot, to send +all kernel messages to the console. A specific value for this parameter +can also be set using the "loglevel" kernel boot option. See the +dmesg(8) man page and Documentation/kernel-parameters.txt for details. + +Netconsole was designed to be as instantaneous as possible, to +enable the logging of even the most critical kernel bugs. It works +from IRQ contexts as well, and does not enable interrupts while +sending packets. Due to these unique needs, configuration cannot +be more automatic, and some fundamental limitations will remain: +only IP networks, UDP packets and ethernet devices are supported. diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt new file mode 100644 index 0000000..d0f71fc --- /dev/null +++ b/Documentation/networking/netdevices.txt @@ -0,0 +1,108 @@ + +Network Devices, the Kernel, and You! + + +Introduction +============ +The following is a random collection of documentation regarding +network devices. + +struct net_device allocation rules +================================== +Network device structures need to persist even after module is unloaded and +must be allocated with kmalloc. If device has registered successfully, +it will be freed on last use by free_netdev. This is required to handle the +pathologic case cleanly (example: rmmod mydriver </sys/class/net/myeth/mtu ) + +There are routines in net_init.c to handle the common cases of +alloc_etherdev, alloc_netdev. These reserve extra space for driver +private data which gets freed when the network device is freed. If +separately allocated data is attached to the network device +(dev->priv) then it is up to the module exit handler to free that. + +MTU +=== +Each network device has a Maximum Transfer Unit. The MTU does not +include any link layer protocol overhead. Upper layer protocols must +not pass a socket buffer (skb) to a device to transmit with more data +than the mtu. The MTU does not include link layer header overhead, so +for example on Ethernet if the standard MTU is 1500 bytes used, the +actual skb will contain up to 1514 bytes because of the Ethernet +header. Devices should allow for the 4 byte VLAN header as well. + +Segmentation Offload (GSO, TSO) is an exception to this rule. The +upper layer protocol may pass a large socket buffer to the device +transmit routine, and the device will break that up into separate +packets based on the current MTU. + +MTU is symmetrical and applies both to receive and transmit. A device +must be able to receive at least the maximum size packet allowed by +the MTU. A network device may use the MTU as mechanism to size receive +buffers, but the device should allow packets with VLAN header. With +standard Ethernet mtu of 1500 bytes, the device should allow up to +1518 byte packets (1500 + 14 header + 4 tag). The device may either: +drop, truncate, or pass up oversize packets, but dropping oversize +packets is preferred. + + +struct net_device synchronization rules +======================================= +dev->open: + Synchronization: rtnl_lock() semaphore. + Context: process + +dev->stop: + Synchronization: rtnl_lock() semaphore. + Context: process + Note1: netif_running() is guaranteed false + Note2: dev->poll() is guaranteed to be stopped + +dev->do_ioctl: + Synchronization: rtnl_lock() semaphore. + Context: process + +dev->get_stats: + Synchronization: dev_base_lock rwlock. + Context: nominally process, but don't sleep inside an rwlock + +dev->hard_start_xmit: + Synchronization: netif_tx_lock spinlock. + + When the driver sets NETIF_F_LLTX in dev->features this will be + called without holding netif_tx_lock. In this case the driver + has to lock by itself when needed. It is recommended to use a try lock + for this and return NETDEV_TX_LOCKED when the spin lock fails. + The locking there should also properly protect against + set_multicast_list. Note that the use of NETIF_F_LLTX is deprecated. + Dont use it for new drivers. + + Context: Process with BHs disabled or BH (timer), + will be called with interrupts disabled by netconsole. + + Return codes: + o NETDEV_TX_OK everything ok. + o NETDEV_TX_BUSY Cannot transmit packet, try later + Usually a bug, means queue start/stop flow control is broken in + the driver. Note: the driver must NOT put the skb in its DMA ring. + o NETDEV_TX_LOCKED Locking failed, please retry quickly. + Only valid when NETIF_F_LLTX is set. + +dev->tx_timeout: + Synchronization: netif_tx_lock spinlock. + Context: BHs disabled + Notes: netif_queue_stopped() is guaranteed true + +dev->set_multicast_list: + Synchronization: netif_tx_lock spinlock. + Context: BHs disabled + +struct napi_struct synchronization rules +======================================== +napi->poll: + Synchronization: NAPI_STATE_SCHED bit in napi->state. Device + driver's dev->close method will invoke napi_disable() on + all NAPI instances which will do a sleeping poll on the + NAPI_STATE_SCHED napi->state bit, waiting for all pending + NAPI activity to cease. + Context: softirq + will be called with interrupts disabled by netconsole. diff --git a/Documentation/networking/netif-msg.txt b/Documentation/networking/netif-msg.txt new file mode 100644 index 0000000..c967ddb --- /dev/null +++ b/Documentation/networking/netif-msg.txt @@ -0,0 +1,79 @@ + +________________ +NETIF Msg Level + +The design of the network interface message level setting. + +History + + The design of the debugging message interface was guided and + constrained by backwards compatibility previous practice. It is useful + to understand the history and evolution in order to understand current + practice and relate it to older driver source code. + + From the beginning of Linux, each network device driver has had a local + integer variable that controls the debug message level. The message + level ranged from 0 to 7, and monotonically increased in verbosity. + + The message level was not precisely defined past level 3, but were + always implemented within +-1 of the specified level. Drivers tended + to shed the more verbose level messages as they matured. + 0 Minimal messages, only essential information on fatal errors. + 1 Standard messages, initialization status. No run-time messages + 2 Special media selection messages, generally timer-driver. + 3 Interface starts and stops, including normal status messages + 4 Tx and Rx frame error messages, and abnormal driver operation + 5 Tx packet queue information, interrupt events. + 6 Status on each completed Tx packet and received Rx packets + 7 Initial contents of Tx and Rx packets + + Initially this message level variable was uniquely named in each driver + e.g. "lance_debug", so that a kernel symbolic debugger could locate and + modify the setting. When kernel modules became common, the variables + were consistently renamed to "debug" and allowed to be set as a module + parameter. + + This approach worked well. However there is always a demand for + additional features. Over the years the following emerged as + reasonable and easily implemented enhancements + Using an ioctl() call to modify the level. + Per-interface rather than per-driver message level setting. + More selective control over the type of messages emitted. + + The netif_msg recommendation adds these features with only a minor + complexity and code size increase. + + The recommendation is the following points + Retaining the per-driver integer variable "debug" as a module + parameter with a default level of '1'. + + Adding a per-interface private variable named "msg_enable". The + variable is a bit map rather than a level, and is initialized as + 1 << debug + Or more precisely + debug < 0 ? 0 : 1 << min(sizeof(int)-1, debug) + + Messages should changes from + if (debug > 1) + printk(MSG_DEBUG "%s: ... + to + if (np->msg_enable & NETIF_MSG_LINK) + printk(MSG_DEBUG "%s: ... + + +The set of message levels is named + Old level Name Bit position + 0 NETIF_MSG_DRV 0x0001 + 1 NETIF_MSG_PROBE 0x0002 + 2 NETIF_MSG_LINK 0x0004 + 2 NETIF_MSG_TIMER 0x0004 + 3 NETIF_MSG_IFDOWN 0x0008 + 3 NETIF_MSG_IFUP 0x0008 + 4 NETIF_MSG_RX_ERR 0x0010 + 4 NETIF_MSG_TX_ERR 0x0010 + 5 NETIF_MSG_TX_QUEUED 0x0020 + 5 NETIF_MSG_INTR 0x0020 + 6 NETIF_MSG_TX_DONE 0x0040 + 6 NETIF_MSG_RX_STATUS 0x0040 + 7 NETIF_MSG_PKTDATA 0x0080 + diff --git a/Documentation/networking/olympic.txt b/Documentation/networking/olympic.txt new file mode 100644 index 0000000..c65a940 --- /dev/null +++ b/Documentation/networking/olympic.txt @@ -0,0 +1,79 @@ + +IBM PCI Pit/Pit-Phy/Olympic CHIPSET BASED TOKEN RING CARDS README + +Release 0.2.0 - Release + June 8th 1999 Peter De Schrijver & Mike Phillips +Release 0.9.C - Release + April 18th 2001 Mike Phillips + +Thanks: +Erik De Cock, Adrian Bridgett and Frank Fiene for their +patience and testing. +Donald Champion for the cardbus support +Kyle Lucke for the dma api changes. +Jonathon Bitner for hardware support. +Everybody on linux-tr for their continued support. + +Options: + +The driver accepts four options: ringspeed, pkt_buf_sz, +message_level and network_monitor. + +These options can be specified differently for each card found. + +ringspeed: Has one of three settings 0 (default), 4 or 16. 0 will +make the card autosense the ringspeed and join at the appropriate speed, +this will be the default option for most people. 4 or 16 allow you to +explicitly force the card to operate at a certain speed. The card will fail +if you try to insert it at the wrong speed. (Although some hubs will allow +this so be *very* careful). The main purpose for explicitly setting the ring +speed is for when the card is first on the ring. In autosense mode, if the card +cannot detect any active monitors on the ring it will not open, so you must +re-init the card at the appropriate speed. Unfortunately at present the only +way of doing this is rmmod and insmod which is a bit tough if it is compiled +in the kernel. + +pkt_buf_sz: This is this initial receive buffer allocation size. This will +default to 4096 if no value is entered. You may increase performance of the +driver by setting this to a value larger than the network packet size, although +the driver now re-sizes buffers based on MTU settings as well. + +message_level: Controls level of messages created by the driver. Defaults to 0: +which only displays start-up and critical messages. Presently any non-zero +value will display all soft messages as well. NB This does not turn +debugging messages on, that must be done by modified the source code. + +network_monitor: Any non-zero value will provide a quasi network monitoring +mode. All unexpected MAC frames (beaconing etc.) will be received +by the driver and the source and destination addresses printed. +Also an entry will be added in /proc/net called olympic_tr%d, where tr%d +is the registered device name, i.e tr0, tr1, etc. This displays low +level information about the configuration of the ring and the adapter. +This feature has been designed for network administrators to assist in +the diagnosis of network / ring problems. (This used to OLYMPIC_NETWORK_MONITOR, +but has now changed to allow each adapter to be configured differently and +to alleviate the necessity to re-compile olympic to turn the option on). + +Multi-card: + +The driver will detect multiple cards and will work with shared interrupts, +each card is assigned the next token ring device, i.e. tr0 , tr1, tr2. The +driver should also happily reside in the system with other drivers. It has +been tested with ibmtr.c running, and I personally have had one Olicom PCI +card and two IBM olympic cards (all on the same interrupt), all running +together. + +Variable MTU size: + +The driver can handle a MTU size upto either 4500 or 18000 depending upon +ring speed. The driver also changes the size of the receive buffers as part +of the mtu re-sizing, so if you set mtu = 18000, you will need to be able +to allocate 16 * (sk_buff with 18000 buffer size) call it 18500 bytes per ring +position = 296,000 bytes of memory space, plus of course anything +necessary for the tx sk_buff's. Remember this is per card, so if you are +building routers, gateway's etc, you could start to use a lot of memory +real fast. + + +6/8/99 Peter De Schrijver and Mike Phillips + diff --git a/Documentation/networking/operstates.txt b/Documentation/networking/operstates.txt new file mode 100644 index 0000000..c9074f9 --- /dev/null +++ b/Documentation/networking/operstates.txt @@ -0,0 +1,161 @@ + +1. Introduction + +Linux distinguishes between administrative and operational state of an +interface. Administrative state is the result of "ip link set dev +<dev> up or down" and reflects whether the administrator wants to use +the device for traffic. + +However, an interface is not usable just because the admin enabled it +- ethernet requires to be plugged into the switch and, depending on +a site's networking policy and configuration, an 802.1X authentication +to be performed before user data can be transferred. Operational state +shows the ability of an interface to transmit this user data. + +Thanks to 802.1X, userspace must be granted the possibility to +influence operational state. To accommodate this, operational state is +split into two parts: Two flags that can be set by the driver only, and +a RFC2863 compatible state that is derived from these flags, a policy, +and changeable from userspace under certain rules. + + +2. Querying from userspace + +Both admin and operational state can be queried via the netlink +operation RTM_GETLINK. It is also possible to subscribe to RTMGRP_LINK +to be notified of updates. This is important for setting from userspace. + +These values contain interface state: + +ifinfomsg::if_flags & IFF_UP: + Interface is admin up +ifinfomsg::if_flags & IFF_RUNNING: + Interface is in RFC2863 operational state UP or UNKNOWN. This is for + backward compatibility, routing daemons, dhcp clients can use this + flag to determine whether they should use the interface. +ifinfomsg::if_flags & IFF_LOWER_UP: + Driver has signaled netif_carrier_on() +ifinfomsg::if_flags & IFF_DORMANT: + Driver has signaled netif_dormant_on() + +These interface flags can also be queried without netlink using the +SIOCGIFFLAGS ioctl. + +TLV IFLA_OPERSTATE + +contains RFC2863 state of the interface in numeric representation: + +IF_OPER_UNKNOWN (0): + Interface is in unknown state, neither driver nor userspace has set + operational state. Interface must be considered for user data as + setting operational state has not been implemented in every driver. +IF_OPER_NOTPRESENT (1): + Unused in current kernel (notpresent interfaces normally disappear), + just a numerical placeholder. +IF_OPER_DOWN (2): + Interface is unable to transfer data on L1, f.e. ethernet is not + plugged or interface is ADMIN down. +IF_OPER_LOWERLAYERDOWN (3): + Interfaces stacked on an interface that is IF_OPER_DOWN show this + state (f.e. VLAN). +IF_OPER_TESTING (4): + Unused in current kernel. +IF_OPER_DORMANT (5): + Interface is L1 up, but waiting for an external event, f.e. for a + protocol to establish. (802.1X) +IF_OPER_UP (6): + Interface is operational up and can be used. + +This TLV can also be queried via sysfs. + +TLV IFLA_LINKMODE + +contains link policy. This is needed for userspace interaction +described below. + +This TLV can also be queried via sysfs. + + +3. Kernel driver API + +Kernel drivers have access to two flags that map to IFF_LOWER_UP and +IFF_DORMANT. These flags can be set from everywhere, even from +interrupts. It is guaranteed that only the driver has write access, +however, if different layers of the driver manipulate the same flag, +the driver has to provide the synchronisation needed. + +__LINK_STATE_NOCARRIER, maps to !IFF_LOWER_UP: + +The driver uses netif_carrier_on() to clear and netif_carrier_off() to +set this flag. On netif_carrier_off(), the scheduler stops sending +packets. The name 'carrier' and the inversion are historical, think of +it as lower layer. + +netif_carrier_ok() can be used to query that bit. + +__LINK_STATE_DORMANT, maps to IFF_DORMANT: + +Set by the driver to express that the device cannot yet be used +because some driver controlled protocol establishment has to +complete. Corresponding functions are netif_dormant_on() to set the +flag, netif_dormant_off() to clear it and netif_dormant() to query. + +On device allocation, networking core sets the flags equivalent to +netif_carrier_ok() and !netif_dormant(). + + +Whenever the driver CHANGES one of these flags, a workqueue event is +scheduled to translate the flag combination to IFLA_OPERSTATE as +follows: + +!netif_carrier_ok(): + IF_OPER_LOWERLAYERDOWN if the interface is stacked, IF_OPER_DOWN + otherwise. Kernel can recognise stacked interfaces because their + ifindex != iflink. + +netif_carrier_ok() && netif_dormant(): + IF_OPER_DORMANT + +netif_carrier_ok() && !netif_dormant(): + IF_OPER_UP if userspace interaction is disabled. Otherwise + IF_OPER_DORMANT with the possibility for userspace to initiate the + IF_OPER_UP transition afterwards. + + +4. Setting from userspace + +Applications have to use the netlink interface to influence the +RFC2863 operational state of an interface. Setting IFLA_LINKMODE to 1 +via RTM_SETLINK instructs the kernel that an interface should go to +IF_OPER_DORMANT instead of IF_OPER_UP when the combination +netif_carrier_ok() && !netif_dormant() is set by the +driver. Afterwards, the userspace application can set IFLA_OPERSTATE +to IF_OPER_DORMANT or IF_OPER_UP as long as the driver does not set +netif_carrier_off() or netif_dormant_on(). Changes made by userspace +are multicasted on the netlink group RTMGRP_LINK. + +So basically a 802.1X supplicant interacts with the kernel like this: + +-subscribe to RTMGRP_LINK +-set IFLA_LINKMODE to 1 via RTM_SETLINK +-query RTM_GETLINK once to get initial state +-if initial flags are not (IFF_LOWER_UP && !IFF_DORMANT), wait until + netlink multicast signals this state +-do 802.1X, eventually abort if flags go down again +-send RTM_SETLINK to set operstate to IF_OPER_UP if authentication + succeeds, IF_OPER_DORMANT otherwise +-see how operstate and IFF_RUNNING is echoed via netlink multicast +-set interface back to IF_OPER_DORMANT if 802.1X reauthentication + fails +-restart if kernel changes IFF_LOWER_UP or IFF_DORMANT flag + +if supplicant goes down, bring back IFLA_LINKMODE to 0 and +IFLA_OPERSTATE to a sane value. + +A routing daemon or dhcp client just needs to care for IFF_RUNNING or +waiting for operstate to go IF_OPER_UP/IF_OPER_UNKNOWN before +considering the interface / querying a DHCP address. + + +For technical questions and/or comments please e-mail to Stefan Rompf +(stefan at loplof.de). diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt new file mode 100644 index 0000000..07c53d5 --- /dev/null +++ b/Documentation/networking/packet_mmap.txt @@ -0,0 +1,399 @@ +-------------------------------------------------------------------------------- ++ ABSTRACT +-------------------------------------------------------------------------------- + +This file documents the CONFIG_PACKET_MMAP option available with the PACKET +socket interface on 2.4 and 2.6 kernels. This type of sockets is used for +capture network traffic with utilities like tcpdump or any other that uses +the libpcap library. + +You can find the latest version of this document at + + http://pusa.uv.es/~ulisses/packet_mmap/ + +Please send me your comments to + + Ulisses Alonso Camaró <uaca@i.hate.spam.alumni.uv.es> + +------------------------------------------------------------------------------- ++ Why use PACKET_MMAP +-------------------------------------------------------------------------------- + +In Linux 2.4/2.6 if PACKET_MMAP is not enabled, the capture process is very +inefficient. It uses very limited buffers and requires one system call +to capture each packet, it requires two if you want to get packet's +timestamp (like libpcap always does). + +In the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size +configurable circular buffer mapped in user space. This way reading packets just +needs to wait for them, most of the time there is no need to issue a single +system call. By using a shared buffer between the kernel and the user +also has the benefit of minimizing packet copies. + +It's fine to use PACKET_MMAP to improve the performance of the capture process, +but it isn't everything. At least, if you are capturing at high speeds (this +is relative to the cpu speed), you should check if the device driver of your +network interface card supports some sort of interrupt load mitigation or +(even better) if it supports NAPI, also make sure it is enabled. + +-------------------------------------------------------------------------------- ++ How to use CONFIG_PACKET_MMAP +-------------------------------------------------------------------------------- + +From the user standpoint, you should use the higher level libpcap library, which +is a de facto standard, portable across nearly all operating systems +including Win32. + +Said that, at time of this writing, official libpcap 0.8.1 is out and doesn't include +support for PACKET_MMAP, and also probably the libpcap included in your distribution. + +I'm aware of two implementations of PACKET_MMAP in libpcap: + + http://pusa.uv.es/~ulisses/packet_mmap/ (by Simon Patarin, based on libpcap 0.6.2) + http://public.lanl.gov/cpw/ (by Phil Wood, based on lastest libpcap) + +The rest of this document is intended for people who want to understand +the low level details or want to improve libpcap by including PACKET_MMAP +support. + +-------------------------------------------------------------------------------- ++ How to use CONFIG_PACKET_MMAP directly +-------------------------------------------------------------------------------- + +From the system calls stand point, the use of PACKET_MMAP involves +the following process: + + +[setup] socket() -------> creation of the capture socket + setsockopt() ---> allocation of the circular buffer (ring) + mmap() ---------> mapping of the allocated buffer to the + user process + +[capture] poll() ---------> to wait for incoming packets + +[shutdown] close() --------> destruction of the capture socket and + deallocation of all associated + resources. + + +socket creation and destruction is straight forward, and is done +the same way with or without PACKET_MMAP: + +int fd; + +fd= socket(PF_PACKET, mode, htons(ETH_P_ALL)) + +where mode is SOCK_RAW for the raw interface were link level +information can be captured or SOCK_DGRAM for the cooked +interface where link level information capture is not +supported and a link level pseudo-header is provided +by the kernel. + +The destruction of the socket and all associated resources +is done by a simple call to close(fd). + +Next I will describe PACKET_MMAP settings and it's constraints, +also the mapping of the circular buffer in the user process and +the use of this buffer. + +-------------------------------------------------------------------------------- ++ PACKET_MMAP settings +-------------------------------------------------------------------------------- + + +To setup PACKET_MMAP from user level code is done with a call like + + setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &req, sizeof(req)) + +The most significant argument in the previous call is the req parameter, +this parameter must to have the following structure: + + struct tpacket_req + { + unsigned int tp_block_size; /* Minimal size of contiguous block */ + unsigned int tp_block_nr; /* Number of blocks */ + unsigned int tp_frame_size; /* Size of frame */ + unsigned int tp_frame_nr; /* Total number of frames */ + }; + +This structure is defined in /usr/include/linux/if_packet.h and establishes a +circular buffer (ring) of unswappable memory mapped in the capture process. +Being mapped in the capture process allows reading the captured frames and +related meta-information like timestamps without requiring a system call. + +Captured frames are grouped in blocks. Each block is a physically contiguous +region of memory and holds tp_block_size/tp_frame_size frames. The total number +of blocks is tp_block_nr. Note that tp_frame_nr is a redundant parameter because + + frames_per_block = tp_block_size/tp_frame_size + +indeed, packet_set_ring checks that the following condition is true + + frames_per_block * tp_block_nr == tp_frame_nr + + +Lets see an example, with the following values: + + tp_block_size= 4096 + tp_frame_size= 2048 + tp_block_nr = 4 + tp_frame_nr = 8 + +we will get the following buffer structure: + + block #1 block #2 ++---------+---------+ +---------+---------+ +| frame 1 | frame 2 | | frame 3 | frame 4 | ++---------+---------+ +---------+---------+ + + block #3 block #4 ++---------+---------+ +---------+---------+ +| frame 5 | frame 6 | | frame 7 | frame 8 | ++---------+---------+ +---------+---------+ + +A frame can be of any size with the only condition it can fit in a block. A block +can only hold an integer number of frames, or in other words, a frame cannot +be spawned accross two blocks, so there are some details you have to take into +account when choosing the frame_size. See "Mapping and use of the circular +buffer (ring)". + + +-------------------------------------------------------------------------------- ++ PACKET_MMAP setting constraints +-------------------------------------------------------------------------------- + +In kernel versions prior to 2.4.26 (for the 2.4 branch) and 2.6.5 (2.6 branch), +the PACKET_MMAP buffer could hold only 32768 frames in a 32 bit architecture or +16384 in a 64 bit architecture. For information on these kernel versions +see http://pusa.uv.es/~ulisses/packet_mmap/packet_mmap.pre-2.4.26_2.6.5.txt + + Block size limit +------------------ + +As stated earlier, each block is a contiguous physical region of memory. These +memory regions are allocated with calls to the __get_free_pages() function. As +the name indicates, this function allocates pages of memory, and the second +argument is "order" or a power of two number of pages, that is +(for PAGE_SIZE == 4096) order=0 ==> 4096 bytes, order=1 ==> 8192 bytes, +order=2 ==> 16384 bytes, etc. The maximum size of a +region allocated by __get_free_pages is determined by the MAX_ORDER macro. More +precisely the limit can be calculated as: + + PAGE_SIZE << MAX_ORDER + + In a i386 architecture PAGE_SIZE is 4096 bytes + In a 2.4/i386 kernel MAX_ORDER is 10 + In a 2.6/i386 kernel MAX_ORDER is 11 + +So get_free_pages can allocate as much as 4MB or 8MB in a 2.4/2.6 kernel +respectively, with an i386 architecture. + +User space programs can include /usr/include/sys/user.h and +/usr/include/linux/mmzone.h to get PAGE_SIZE MAX_ORDER declarations. + +The pagesize can also be determined dynamically with the getpagesize (2) +system call. + + + Block number limit +-------------------- + +To understand the constraints of PACKET_MMAP, we have to see the structure +used to hold the pointers to each block. + +Currently, this structure is a dynamically allocated vector with kmalloc +called pg_vec, its size limits the number of blocks that can be allocated. + + +---+---+---+---+ + | x | x | x | x | + +---+---+---+---+ + | | | | + | | | v + | | v block #4 + | v block #3 + v block #2 + block #1 + + +kmalloc allocates any number of bytes of physically contiguous memory from +a pool of pre-determined sizes. This pool of memory is maintained by the slab +allocator which is at the end the responsible for doing the allocation and +hence which imposes the maximum memory that kmalloc can allocate. + +In a 2.4/2.6 kernel and the i386 architecture, the limit is 131072 bytes. The +predetermined sizes that kmalloc uses can be checked in the "size-<bytes>" +entries of /proc/slabinfo + +In a 32 bit architecture, pointers are 4 bytes long, so the total number of +pointers to blocks is + + 131072/4 = 32768 blocks + + + PACKET_MMAP buffer size calculator +------------------------------------ + +Definitions: + +<size-max> : is the maximum size of allocable with kmalloc (see /proc/slabinfo) +<pointer size>: depends on the architecture -- sizeof(void *) +<page size> : depends on the architecture -- PAGE_SIZE or getpagesize (2) +<max-order> : is the value defined with MAX_ORDER +<frame size> : it's an upper bound of frame's capture size (more on this later) + +from these definitions we will derive + + <block number> = <size-max>/<pointer size> + <block size> = <pagesize> << <max-order> + +so, the max buffer size is + + <block number> * <block size> + +and, the number of frames be + + <block number> * <block size> / <frame size> + +Suppose the following parameters, which apply for 2.6 kernel and an +i386 architecture: + + <size-max> = 131072 bytes + <pointer size> = 4 bytes + <pagesize> = 4096 bytes + <max-order> = 11 + +and a value for <frame size> of 2048 bytes. These parameters will yield + + <block number> = 131072/4 = 32768 blocks + <block size> = 4096 << 11 = 8 MiB. + +and hence the buffer will have a 262144 MiB size. So it can hold +262144 MiB / 2048 bytes = 134217728 frames + + +Actually, this buffer size is not possible with an i386 architecture. +Remember that the memory is allocated in kernel space, in the case of +an i386 kernel's memory size is limited to 1GiB. + +All memory allocations are not freed until the socket is closed. The memory +allocations are done with GFP_KERNEL priority, this basically means that +the allocation can wait and swap other process' memory in order to allocate +the necessary memory, so normally limits can be reached. + + Other constraints +------------------- + +If you check the source code you will see that what I draw here as a frame +is not only the link level frame. At the beginning of each frame there is a +header called struct tpacket_hdr used in PACKET_MMAP to hold link level's frame +meta information like timestamp. So what we draw here a frame it's really +the following (from include/linux/if_packet.h): + +/* + Frame structure: + + - Start. Frame must be aligned to TPACKET_ALIGNMENT=16 + - struct tpacket_hdr + - pad to TPACKET_ALIGNMENT=16 + - struct sockaddr_ll + - Gap, chosen so that packet data (Start+tp_net) aligns to + TPACKET_ALIGNMENT=16 + - Start+tp_mac: [ Optional MAC header ] + - Start+tp_net: Packet data, aligned to TPACKET_ALIGNMENT=16. + - Pad to align to TPACKET_ALIGNMENT=16 + */ + + + The following are conditions that are checked in packet_set_ring + + tp_block_size must be a multiple of PAGE_SIZE (1) + tp_frame_size must be greater than TPACKET_HDRLEN (obvious) + tp_frame_size must be a multiple of TPACKET_ALIGNMENT + tp_frame_nr must be exactly frames_per_block*tp_block_nr + +Note that tp_block_size should be chosen to be a power of two or there will +be a waste of memory. + +-------------------------------------------------------------------------------- ++ Mapping and use of the circular buffer (ring) +-------------------------------------------------------------------------------- + +The mapping of the buffer in the user process is done with the conventional +mmap function. Even the circular buffer is compound of several physically +discontiguous blocks of memory, they are contiguous to the user space, hence +just one call to mmap is needed: + + mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); + +If tp_frame_size is a divisor of tp_block_size frames will be +contiguously spaced by tp_frame_size bytes. If not, each +tp_block_size/tp_frame_size frames there will be a gap between +the frames. This is because a frame cannot be spawn across two +blocks. + +At the beginning of each frame there is an status field (see +struct tpacket_hdr). If this field is 0 means that the frame is ready +to be used for the kernel, If not, there is a frame the user can read +and the following flags apply: + + from include/linux/if_packet.h + + #define TP_STATUS_COPY 2 + #define TP_STATUS_LOSING 4 + #define TP_STATUS_CSUMNOTREADY 8 + + +TP_STATUS_COPY : This flag indicates that the frame (and associated + meta information) has been truncated because it's + larger than tp_frame_size. This packet can be + read entirely with recvfrom(). + + In order to make this work it must to be + enabled previously with setsockopt() and + the PACKET_COPY_THRESH option. + + The number of frames than can be buffered to + be read with recvfrom is limited like a normal socket. + See the SO_RCVBUF option in the socket (7) man page. + +TP_STATUS_LOSING : indicates there were packet drops from last time + statistics where checked with getsockopt() and + the PACKET_STATISTICS option. + +TP_STATUS_CSUMNOTREADY: currently it's used for outgoing IP packets which + it's checksum will be done in hardware. So while + reading the packet we should not try to check the + checksum. + +for convenience there are also the following defines: + + #define TP_STATUS_KERNEL 0 + #define TP_STATUS_USER 1 + +The kernel initializes all frames to TP_STATUS_KERNEL, when the kernel +receives a packet it puts in the buffer and updates the status with +at least the TP_STATUS_USER flag. Then the user can read the packet, +once the packet is read the user must zero the status field, so the kernel +can use again that frame buffer. + +The user can use poll (any other variant should apply too) to check if new +packets are in the ring: + + struct pollfd pfd; + + pfd.fd = fd; + pfd.revents = 0; + pfd.events = POLLIN|POLLRDNORM|POLLERR; + + if (status == TP_STATUS_KERNEL) + retval = poll(&pfd, 1, timeout); + +It doesn't incur in a race condition to first check the status value and +then poll for frames. + +-------------------------------------------------------------------------------- ++ THANKS +-------------------------------------------------------------------------------- + + Jesse Brandeburg, for fixing my grammathical/spelling errors + diff --git a/Documentation/networking/phonet.txt b/Documentation/networking/phonet.txt new file mode 100644 index 0000000..6a07e45 --- /dev/null +++ b/Documentation/networking/phonet.txt @@ -0,0 +1,175 @@ +Linux Phonet protocol family +============================ + +Introduction +------------ + +Phonet is a packet protocol used by Nokia cellular modems for both IPC +and RPC. With the Linux Phonet socket family, Linux host processes can +receive and send messages from/to the modem, or any other external +device attached to the modem. The modem takes care of routing. + +Phonet packets can be exchanged through various hardware connections +depending on the device, such as: + - USB with the CDC Phonet interface, + - infrared, + - Bluetooth, + - an RS232 serial port (with a dedicated "FBUS" line discipline), + - the SSI bus with some TI OMAP processors. + + +Packets format +-------------- + +Phonet packets have a common header as follows: + + struct phonethdr { + uint8_t pn_media; /* Media type (link-layer identifier) */ + uint8_t pn_rdev; /* Receiver device ID */ + uint8_t pn_sdev; /* Sender device ID */ + uint8_t pn_res; /* Resource ID or function */ + uint16_t pn_length; /* Big-endian message byte length (minus 6) */ + uint8_t pn_robj; /* Receiver object ID */ + uint8_t pn_sobj; /* Sender object ID */ + }; + +On Linux, the link-layer header includes the pn_media byte (see below). +The next 7 bytes are part of the network-layer header. + +The device ID is split: the 6 higher-order bits consitute the device +address, while the 2 lower-order bits are used for multiplexing, as are +the 8-bit object identifiers. As such, Phonet can be considered as a +network layer with 6 bits of address space and 10 bits for transport +protocol (much like port numbers in IP world). + +The modem always has address number zero. All other device have a their +own 6-bit address. + + +Link layer +---------- + +Phonet links are always point-to-point links. The link layer header +consists of a single Phonet media type byte. It uniquely identifies the +link through which the packet is transmitted, from the modem's +perspective. Each Phonet network device shall prepend and set the media +type byte as appropriate. For convenience, a common phonet_header_ops +link-layer header operations structure is provided. It sets the +media type according to the network device hardware address. + +Linux Phonet network interfaces support a dedicated link layer packets +type (ETH_P_PHONET) which is out of the Ethernet type range. They can +only send and receive Phonet packets. + +The virtual TUN tunnel device driver can also be used for Phonet. This +requires IFF_TUN mode, _without_ the IFF_NO_PI flag. In this case, +there is no link-layer header, so there is no Phonet media type byte. + +Note that Phonet interfaces are not allowed to re-order packets, so +only the (default) Linux FIFO qdisc should be used with them. + + +Network layer +------------- + +The Phonet socket address family maps the Phonet packet header: + + struct sockaddr_pn { + sa_family_t spn_family; /* AF_PHONET */ + uint8_t spn_obj; /* Object ID */ + uint8_t spn_dev; /* Device ID */ + uint8_t spn_resource; /* Resource or function */ + uint8_t spn_zero[...]; /* Padding */ + }; + +The resource field is only used when sending and receiving; +It is ignored by bind() and getsockname(). + + +Low-level datagram protocol +--------------------------- + +Applications can send Phonet messages using the Phonet datagram socket +protocol from the PF_PHONET family. Each socket is bound to one of the +2^10 object IDs available, and can send and receive packets with any +other peer. + + struct sockaddr_pn addr = { .spn_family = AF_PHONET, }; + ssize_t len; + socklen_t addrlen = sizeof(addr); + int fd; + + fd = socket(PF_PHONET, SOCK_DGRAM, 0); + bind(fd, (struct sockaddr *)&addr, sizeof(addr)); + /* ... */ + + sendto(fd, msg, msglen, 0, (struct sockaddr *)&addr, sizeof(addr)); + len = recvfrom(fd, buf, sizeof(buf), 0, + (struct sockaddr *)&addr, &addrlen); + +This protocol follows the SOCK_DGRAM connection-less semantics. +However, connect() and getpeername() are not supported, as they did +not seem useful with Phonet usages (could be added easily). + + +Phonet Pipe protocol +-------------------- + +The Phonet Pipe protocol is a simple sequenced packets protocol +with end-to-end congestion control. It uses the passive listening +socket paradigm. The listening socket is bound to an unique free object +ID. Each listening socket can handle up to 255 simultaneous +connections, one per accept()'d socket. + + int lfd, cfd; + + lfd = socket(PF_PHONET, SOCK_SEQPACKET, PN_PROTO_PIPE); + listen (lfd, INT_MAX); + + /* ... */ + cfd = accept(lfd, NULL, NULL); + for (;;) + { + char buf[...]; + ssize_t len = read(cfd, buf, sizeof(buf)); + + /* ... */ + + write(cfd, msg, msglen); + } + +Connections are established between two endpoints by a "third party" +application. This means that both endpoints are passive; so connect() +is not possible. + +WARNING: +When polling a connected pipe socket for writability, there is an +intrinsic race condition whereby writability might be lost between the +polling and the writing system calls. In this case, the socket will +block until write becomes possible again, unless non-blocking mode +is enabled. + + +The pipe protocol provides two socket options at the SOL_PNPIPE level: + + PNPIPE_ENCAP accepts one integer value (int) of: + + PNPIPE_ENCAP_NONE: The socket operates normally (default). + + PNPIPE_ENCAP_IP: The socket is used as a backend for a virtual IP + interface. This requires CAP_NET_ADMIN capability. GPRS data + support on Nokia modems can use this. Note that the socket cannot + be reliably poll()'d or read() from while in this mode. + + PNPIPE_IFINDEX is a read-only integer value. It contains the + interface index of the network interface created by PNPIPE_ENCAP, + or zero if encapsulation is off. + + +Authors +------- + +Linux Phonet was initially written by Sakari Ailus. +Other contributors include Mikä Liljeberg, Andras Domokos, +Carlos Chinea and Rémi Denis-Courmont. +Copyright (C) 2008 Nokia Corporation. diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt new file mode 100644 index 0000000..88bb71b --- /dev/null +++ b/Documentation/networking/phy.txt @@ -0,0 +1,329 @@ + +------- +PHY Abstraction Layer +(Updated 2008-04-08) + +Purpose + + Most network devices consist of set of registers which provide an interface + to a MAC layer, which communicates with the physical connection through a + PHY. The PHY concerns itself with negotiating link parameters with the link + partner on the other side of the network connection (typically, an ethernet + cable), and provides a register interface to allow drivers to determine what + settings were chosen, and to configure what settings are allowed. + + While these devices are distinct from the network devices, and conform to a + standard layout for the registers, it has been common practice to integrate + the PHY management code with the network driver. This has resulted in large + amounts of redundant code. Also, on embedded systems with multiple (and + sometimes quite different) ethernet controllers connected to the same + management bus, it is difficult to ensure safe use of the bus. + + Since the PHYs are devices, and the management busses through which they are + accessed are, in fact, busses, the PHY Abstraction Layer treats them as such. + In doing so, it has these goals: + + 1) Increase code-reuse + 2) Increase overall code-maintainability + 3) Speed development time for new network drivers, and for new systems + + Basically, this layer is meant to provide an interface to PHY devices which + allows network driver writers to write as little code as possible, while + still providing a full feature set. + +The MDIO bus + + Most network devices are connected to a PHY by means of a management bus. + Different devices use different busses (though some share common interfaces). + In order to take advantage of the PAL, each bus interface needs to be + registered as a distinct device. + + 1) read and write functions must be implemented. Their prototypes are: + + int write(struct mii_bus *bus, int mii_id, int regnum, u16 value); + int read(struct mii_bus *bus, int mii_id, int regnum); + + mii_id is the address on the bus for the PHY, and regnum is the register + number. These functions are guaranteed not to be called from interrupt + time, so it is safe for them to block, waiting for an interrupt to signal + the operation is complete + + 2) A reset function is necessary. This is used to return the bus to an + initialized state. + + 3) A probe function is needed. This function should set up anything the bus + driver needs, setup the mii_bus structure, and register with the PAL using + mdiobus_register. Similarly, there's a remove function to undo all of + that (use mdiobus_unregister). + + 4) Like any driver, the device_driver structure must be configured, and init + exit functions are used to register the driver. + + 5) The bus must also be declared somewhere as a device, and registered. + + As an example for how one driver implemented an mdio bus driver, see + drivers/net/gianfar_mii.c and arch/ppc/syslib/mpc85xx_devices.c + +Connecting to a PHY + + Sometime during startup, the network driver needs to establish a connection + between the PHY device, and the network device. At this time, the PHY's bus + and drivers need to all have been loaded, so it is ready for the connection. + At this point, there are several ways to connect to the PHY: + + 1) The PAL handles everything, and only calls the network driver when + the link state changes, so it can react. + + 2) The PAL handles everything except interrupts (usually because the + controller has the interrupt registers). + + 3) The PAL handles everything, but checks in with the driver every second, + allowing the network driver to react first to any changes before the PAL + does. + + 4) The PAL serves only as a library of functions, with the network device + manually calling functions to update status, and configure the PHY + + +Letting the PHY Abstraction Layer do Everything + + If you choose option 1 (The hope is that every driver can, but to still be + useful to drivers that can't), connecting to the PHY is simple: + + First, you need a function to react to changes in the link state. This + function follows this protocol: + + static void adjust_link(struct net_device *dev); + + Next, you need to know the device name of the PHY connected to this device. + The name will look something like, "0:00", where the first number is the + bus id, and the second is the PHY's address on that bus. Typically, + the bus is responsible for making its ID unique. + + Now, to connect, just call this function: + + phydev = phy_connect(dev, phy_name, &adjust_link, flags, interface); + + phydev is a pointer to the phy_device structure which represents the PHY. If + phy_connect is successful, it will return the pointer. dev, here, is the + pointer to your net_device. Once done, this function will have started the + PHY's software state machine, and registered for the PHY's interrupt, if it + has one. The phydev structure will be populated with information about the + current state, though the PHY will not yet be truly operational at this + point. + + flags is a u32 which can optionally contain phy-specific flags. + This is useful if the system has put hardware restrictions on + the PHY/controller, of which the PHY needs to be aware. + + interface is a u32 which specifies the connection type used + between the controller and the PHY. Examples are GMII, MII, + RGMII, and SGMII. For a full list, see include/linux/phy.h + + Now just make sure that phydev->supported and phydev->advertising have any + values pruned from them which don't make sense for your controller (a 10/100 + controller may be connected to a gigabit capable PHY, so you would need to + mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions + for these bitfields. Note that you should not SET any bits, or the PHY may + get put into an unsupported state. + + Lastly, once the controller is ready to handle network traffic, you call + phy_start(phydev). This tells the PAL that you are ready, and configures the + PHY to connect to the network. If you want to handle your own interrupts, + just set phydev->irq to PHY_IGNORE_INTERRUPT before you call phy_start. + Similarly, if you don't want to use interrupts, set phydev->irq to PHY_POLL. + + When you want to disconnect from the network (even if just briefly), you call + phy_stop(phydev). + +Keeping Close Tabs on the PAL + + It is possible that the PAL's built-in state machine needs a little help to + keep your network device and the PHY properly in sync. If so, you can + register a helper function when connecting to the PHY, which will be called + every second before the state machine reacts to any changes. To do this, you + need to manually call phy_attach() and phy_prepare_link(), and then call + phy_start_machine() with the second argument set to point to your special + handler. + + Currently there are no examples of how to use this functionality, and testing + on it has been limited because the author does not have any drivers which use + it (they all use option 1). So Caveat Emptor. + +Doing it all yourself + + There's a remote chance that the PAL's built-in state machine cannot track + the complex interactions between the PHY and your network device. If this is + so, you can simply call phy_attach(), and not call phy_start_machine or + phy_prepare_link(). This will mean that phydev->state is entirely yours to + handle (phy_start and phy_stop toggle between some of the states, so you + might need to avoid them). + + An effort has been made to make sure that useful functionality can be + accessed without the state-machine running, and most of these functions are + descended from functions which did not interact with a complex state-machine. + However, again, no effort has been made so far to test running without the + state machine, so tryer beware. + + Here is a brief rundown of the functions: + + int phy_read(struct phy_device *phydev, u16 regnum); + int phy_write(struct phy_device *phydev, u16 regnum, u16 val); + + Simple read/write primitives. They invoke the bus's read/write function + pointers. + + void phy_print_status(struct phy_device *phydev); + + A convenience function to print out the PHY status neatly. + + int phy_clear_interrupt(struct phy_device *phydev); + int phy_config_interrupt(struct phy_device *phydev, u32 interrupts); + + Clear the PHY's interrupt, and configure which ones are allowed, + respectively. Currently only supports all on, or all off. + + int phy_enable_interrupts(struct phy_device *phydev); + int phy_disable_interrupts(struct phy_device *phydev); + + Functions which enable/disable PHY interrupts, clearing them + before and after, respectively. + + int phy_start_interrupts(struct phy_device *phydev); + int phy_stop_interrupts(struct phy_device *phydev); + + Requests the IRQ for the PHY interrupts, then enables them for + start, or disables then frees them for stop. + + struct phy_device * phy_attach(struct net_device *dev, const char *phy_id, + u32 flags, phy_interface_t interface); + + Attaches a network device to a particular PHY, binding the PHY to a generic + driver if none was found during bus initialization. Passes in + any phy-specific flags as needed. + + int phy_start_aneg(struct phy_device *phydev); + + Using variables inside the phydev structure, either configures advertising + and resets autonegotiation, or disables autonegotiation, and configures + forced settings. + + static inline int phy_read_status(struct phy_device *phydev); + + Fills the phydev structure with up-to-date information about the current + settings in the PHY. + + void phy_sanitize_settings(struct phy_device *phydev) + + Resolves differences between currently desired settings, and + supported settings for the given PHY device. Does not make + the changes in the hardware, though. + + int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd); + int phy_ethtool_gset(struct phy_device *phydev, struct ethtool_cmd *cmd); + + Ethtool convenience functions. + + int phy_mii_ioctl(struct phy_device *phydev, + struct mii_ioctl_data *mii_data, int cmd); + + The MII ioctl. Note that this function will completely screw up the state + machine if you write registers like BMCR, BMSR, ADVERTISE, etc. Best to + use this only to write registers which are not standard, and don't set off + a renegotiation. + + +PHY Device Drivers + + With the PHY Abstraction Layer, adding support for new PHYs is + quite easy. In some cases, no work is required at all! However, + many PHYs require a little hand-holding to get up-and-running. + +Generic PHY driver + + If the desired PHY doesn't have any errata, quirks, or special + features you want to support, then it may be best to not add + support, and let the PHY Abstraction Layer's Generic PHY Driver + do all of the work. + +Writing a PHY driver + + If you do need to write a PHY driver, the first thing to do is + make sure it can be matched with an appropriate PHY device. + This is done during bus initialization by reading the device's + UID (stored in registers 2 and 3), then comparing it to each + driver's phy_id field by ANDing it with each driver's + phy_id_mask field. Also, it needs a name. Here's an example: + + static struct phy_driver dm9161_driver = { + .phy_id = 0x0181b880, + .name = "Davicom DM9161E", + .phy_id_mask = 0x0ffffff0, + ... + } + + Next, you need to specify what features (speed, duplex, autoneg, + etc) your PHY device and driver support. Most PHYs support + PHY_BASIC_FEATURES, but you can look in include/mii.h for other + features. + + Each driver consists of a number of function pointers: + + config_init: configures PHY into a sane state after a reset. + For instance, a Davicom PHY requires descrambling disabled. + probe: Does any setup needed by the driver + suspend/resume: power management + config_aneg: Changes the speed/duplex/negotiation settings + read_status: Reads the current speed/duplex/negotiation settings + ack_interrupt: Clear a pending interrupt + config_intr: Enable or disable interrupts + remove: Does any driver take-down + + Of these, only config_aneg and read_status are required to be + assigned by the driver code. The rest are optional. Also, it is + preferred to use the generic phy driver's versions of these two + functions if at all possible: genphy_read_status and + genphy_config_aneg. If this is not possible, it is likely that + you only need to perform some actions before and after invoking + these functions, and so your functions will wrap the generic + ones. + + Feel free to look at the Marvell, Cicada, and Davicom drivers in + drivers/net/phy/ for examples (the lxt and qsemi drivers have + not been tested as of this writing) + +Board Fixups + + Sometimes the specific interaction between the platform and the PHY requires + special handling. For instance, to change where the PHY's clock input is, + or to add a delay to account for latency issues in the data path. In order + to support such contingencies, the PHY Layer allows platform code to register + fixups to be run when the PHY is brought up (or subsequently reset). + + When the PHY Layer brings up a PHY it checks to see if there are any fixups + registered for it, matching based on UID (contained in the PHY device's phy_id + field) and the bus identifier (contained in phydev->dev.bus_id). Both must + match, however two constants, PHY_ANY_ID and PHY_ANY_UID, are provided as + wildcards for the bus ID and UID, respectively. + + When a match is found, the PHY layer will invoke the run function associated + with the fixup. This function is passed a pointer to the phy_device of + interest. It should therefore only operate on that PHY. + + The platform code can either register the fixup using phy_register_fixup(): + + int phy_register_fixup(const char *phy_id, + u32 phy_uid, u32 phy_uid_mask, + int (*run)(struct phy_device *)); + + Or using one of the two stubs, phy_register_fixup_for_uid() and + phy_register_fixup_for_id(): + + int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask, + int (*run)(struct phy_device *)); + int phy_register_fixup_for_id(const char *phy_id, + int (*run)(struct phy_device *)); + + The stubs set one of the two matching criteria, and set the other one to + match anything. + diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt new file mode 100644 index 0000000..c6cf4a3 --- /dev/null +++ b/Documentation/networking/pktgen.txt @@ -0,0 +1,248 @@ + + + HOWTO for the linux packet generator + ------------------------------------ + +Date: 041221 + +Enable CONFIG_NET_PKTGEN to compile and build pktgen.o either in kernel +or as module. Module is preferred. insmod pktgen if needed. Once running +pktgen creates a thread on each CPU where each thread has affinity to its CPU. +Monitoring and controlling is done via /proc. Easiest to select a suitable +a sample script and configure. + +On a dual CPU: + +ps aux | grep pkt +root 129 0.3 0.0 0 0 ? SW 2003 523:20 [pktgen/0] +root 130 0.3 0.0 0 0 ? SW 2003 509:50 [pktgen/1] + + +For monitoring and control pktgen creates: + /proc/net/pktgen/pgctrl + /proc/net/pktgen/kpktgend_X + /proc/net/pktgen/ethX + + +Viewing threads +=============== +/proc/net/pktgen/kpktgend_0 +Name: kpktgend_0 max_before_softirq: 10000 +Running: +Stopped: eth1 +Result: OK: max_before_softirq=10000 + +Most important the devices assigned to thread. Note! A device can only belong +to one thread. + + +Viewing devices +=============== + +Parm section holds configured info. Current hold running stats. +Result is printed after run or after interruption. Example: + +/proc/net/pktgen/eth1 + +Params: count 10000000 min_pkt_size: 60 max_pkt_size: 60 + frags: 0 delay: 0 clone_skb: 1000000 ifname: eth1 + flows: 0 flowlen: 0 + dst_min: 10.10.11.2 dst_max: + src_min: src_max: + src_mac: 00:00:00:00:00:00 dst_mac: 00:04:23:AC:FD:82 + udp_src_min: 9 udp_src_max: 9 udp_dst_min: 9 udp_dst_max: 9 + src_mac_count: 0 dst_mac_count: 0 + Flags: +Current: + pkts-sofar: 10000000 errors: 39664 + started: 1103053986245187us stopped: 1103053999346329us idle: 880401us + seq_num: 10000011 cur_dst_mac_offset: 0 cur_src_mac_offset: 0 + cur_saddr: 0x10a0a0a cur_daddr: 0x20b0a0a + cur_udp_dst: 9 cur_udp_src: 9 + flows: 0 +Result: OK: 13101142(c12220741+d880401) usec, 10000000 (60byte,0frags) + 763292pps 390Mb/sec (390805504bps) errors: 39664 + +Configuring threads and devices +================================ +This is done via the /proc interface easiest done via pgset in the scripts + +Examples: + + pgset "clone_skb 1" sets the number of copies of the same packet + pgset "clone_skb 0" use single SKB for all transmits + pgset "pkt_size 9014" sets packet size to 9014 + pgset "frags 5" packet will consist of 5 fragments + pgset "count 200000" sets number of packets to send, set to zero + for continuous sends until explicitly stopped. + + pgset "delay 5000" adds delay to hard_start_xmit(). nanoseconds + + pgset "dst 10.0.0.1" sets IP destination address + (BEWARE! This generator is very aggressive!) + + pgset "dst_min 10.0.0.1" Same as dst + pgset "dst_max 10.0.0.254" Set the maximum destination IP. + pgset "src_min 10.0.0.1" Set the minimum (or only) source IP. + pgset "src_max 10.0.0.254" Set the maximum source IP. + pgset "dst6 fec0::1" IPV6 destination address + pgset "src6 fec0::2" IPV6 source address + pgset "dstmac 00:00:00:00:00:00" sets MAC destination address + pgset "srcmac 00:00:00:00:00:00" sets MAC source address + + pgset "src_mac_count 1" Sets the number of MACs we'll range through. + The 'minimum' MAC is what you set with srcmac. + + pgset "dst_mac_count 1" Sets the number of MACs we'll range through. + The 'minimum' MAC is what you set with dstmac. + + pgset "flag [name]" Set a flag to determine behaviour. Current flags + are: IPSRC_RND #IP Source is random (between min/max), + IPDST_RND, UDPSRC_RND, + UDPDST_RND, MACSRC_RND, MACDST_RND + MPLS_RND, VID_RND, SVID_RND + + pgset "udp_src_min 9" set UDP source port min, If < udp_src_max, then + cycle through the port range. + + pgset "udp_src_max 9" set UDP source port max. + pgset "udp_dst_min 9" set UDP destination port min, If < udp_dst_max, then + cycle through the port range. + pgset "udp_dst_max 9" set UDP destination port max. + + pgset "mpls 0001000a,0002000a,0000000a" set MPLS labels (in this example + outer label=16,middle label=32, + inner label=0 (IPv4 NULL)) Note that + there must be no spaces between the + arguments. Leading zeros are required. + Do not set the bottom of stack bit, + that's done automatically. If you do + set the bottom of stack bit, that + indicates that you want to randomly + generate that address and the flag + MPLS_RND will be turned on. You + can have any mix of random and fixed + labels in the label stack. + + pgset "mpls 0" turn off mpls (or any invalid argument works too!) + + pgset "vlan_id 77" set VLAN ID 0-4095 + pgset "vlan_p 3" set priority bit 0-7 (default 0) + pgset "vlan_cfi 0" set canonical format identifier 0-1 (default 0) + + pgset "svlan_id 22" set SVLAN ID 0-4095 + pgset "svlan_p 3" set priority bit 0-7 (default 0) + pgset "svlan_cfi 0" set canonical format identifier 0-1 (default 0) + + pgset "vlan_id 9999" > 4095 remove vlan and svlan tags + pgset "svlan 9999" > 4095 remove svlan tag + + + pgset "tos XX" set former IPv4 TOS field (e.g. "tos 28" for AF11 no ECN, default 00) + pgset "traffic_class XX" set former IPv6 TRAFFIC CLASS (e.g. "traffic_class B8" for EF no ECN, default 00) + + pgset stop aborts injection. Also, ^C aborts generator. + + +Example scripts +=============== + +A collection of small tutorial scripts for pktgen is in examples dir. + +pktgen.conf-1-1 # 1 CPU 1 dev +pktgen.conf-1-2 # 1 CPU 2 dev +pktgen.conf-2-1 # 2 CPU's 1 dev +pktgen.conf-2-2 # 2 CPU's 2 dev +pktgen.conf-1-1-rdos # 1 CPU 1 dev w. route DoS +pktgen.conf-1-1-ip6 # 1 CPU 1 dev ipv6 +pktgen.conf-1-1-ip6-rdos # 1 CPU 1 dev ipv6 w. route DoS +pktgen.conf-1-1-flows # 1 CPU 1 dev multiple flows. + +Run in shell: ./pktgen.conf-X-Y It does all the setup including sending. + + +Interrupt affinity +=================== +Note when adding devices to a specific CPU there good idea to also assign +/proc/irq/XX/smp_affinity so the TX-interrupts gets bound to the same CPU. +as this reduces cache bouncing when freeing skb's. + + +Current commands and configuration options +========================================== + +** Pgcontrol commands: + +start +stop + +** Thread commands: + +add_device +rem_device_all +max_before_softirq + + +** Device commands: + +count +clone_skb +debug + +frags +delay + +src_mac_count +dst_mac_count + +pkt_size +min_pkt_size +max_pkt_size + +mpls + +udp_src_min +udp_src_max + +udp_dst_min +udp_dst_max + +flag + IPSRC_RND + TXSIZE_RND + IPDST_RND + UDPSRC_RND + UDPDST_RND + MACSRC_RND + MACDST_RND + +dst_min +dst_max + +src_min +src_max + +dst_mac +src_mac + +clear_counters + +dst6 +src6 + +flows +flowlen + +References: +ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/ +ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/examples/ + +Paper from Linux-Kongress in Erlangen 2004. +ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/pktgen_paper.pdf + +Thanks to: +Grant Grundler for testing on IA-64 and parisc, Harald Welte, Lennert Buytenhek +Stephen Hemminger, Andi Kleen, Dave Miller and many others. + + +Good luck with the linux net-development. diff --git a/Documentation/networking/policy-routing.txt b/Documentation/networking/policy-routing.txt new file mode 100644 index 0000000..36f6936 --- /dev/null +++ b/Documentation/networking/policy-routing.txt @@ -0,0 +1,150 @@ +Classes +------- + + "Class" is a complete routing table in common sense. + I.e. it is tree of nodes (destination prefix, tos, metric) + with attached information: gateway, device etc. + This tree is looked up as specified in RFC1812 5.2.4.3 + 1. Basic match + 2. Longest match + 3. Weak TOS. + 4. Metric. (should not be in kernel space, but they are) + 5. Additional pruning rules. (not in kernel space). + + We have two special type of nodes: + REJECT - abort route lookup and return an error value. + THROW - abort route lookup in this class. + + + Currently the number of classes is limited to 255 + (0 is reserved for "not specified class") + + Three classes are builtin: + + RT_CLASS_LOCAL=255 - local interface addresses, + broadcasts, nat addresses. + + RT_CLASS_MAIN=254 - all normal routes are put there + by default. + + RT_CLASS_DEFAULT=253 - if ip_fib_model==1, then + normal default routes are put there, if ip_fib_model==2 + all gateway routes are put there. + + +Rules +----- + Rule is a record of (src prefix, src interface, tos, dst prefix) + with attached information. + + Rule types: + RTP_ROUTE - lookup in attached class + RTP_NAT - lookup in attached class and if a match is found, + translate packet source address. + RTP_MASQUERADE - lookup in attached class and if a match is found, + masquerade packet as sourced by us. + RTP_DROP - silently drop the packet. + RTP_REJECT - drop the packet and send ICMP NET UNREACHABLE. + RTP_PROHIBIT - drop the packet and send ICMP COMM. ADM. PROHIBITED. + + Rule flags: + RTRF_LOG - log route creations. + RTRF_VALVE - One way route (used with masquerading) + +Default setup: + +root@amber:/pub/ip-routing # iproute -r +Kernel routing policy rules +Pref Source Destination TOS Iface Cl + 0 default default 00 * 255 + 254 default default 00 * 254 + 255 default default 00 * 253 + + +Lookup algorithm +---------------- + + We scan rules list, and if a rule is matched, apply it. + If a route is found, return it. + If it is not found or a THROW node was matched, continue + to scan rules. + +Applications +------------ + +1. Just ignore classes. All the routes are put into MAIN class + (and/or into DEFAULT class). + + HOWTO: iproute add PREFIX [ tos TOS ] [ gw GW ] [ dev DEV ] + [ metric METRIC ] [ reject ] ... (look at iproute utility) + + or use route utility from current net-tools. + +2. Opposite case. Just forget all that you know about routing + tables. Every rule is supplied with its own gateway, device + info. record. This approach is not appropriate for automated + route maintenance, but it is ideal for manual configuration. + + HOWTO: iproute addrule [ from PREFIX ] [ to PREFIX ] [ tos TOS ] + [ dev INPUTDEV] [ pref PREFERENCE ] route [ gw GATEWAY ] + [ dev OUTDEV ] ..... + + Warning: As of now the size of the routing table in this + approach is limited to 256. If someone likes this model, I'll + relax this limitation. + +3. OSPF classes (see RFC1583, RFC1812 E.3.3) + Very clean, stable and robust algorithm for OSPF routing + domains. Unfortunately, it is not widely used in the Internet. + + Proposed setup: + 255 local addresses + 254 interface routes + 253 ASE routes with external metric + 252 ASE routes with internal metric + 251 inter-area routes + 250 intra-area routes for 1st area + 249 intra-area routes for 2nd area + etc. + + Rules: + iproute addrule class 253 + iproute addrule class 252 + iproute addrule class 251 + iproute addrule to a-prefix-for-1st-area class 250 + iproute addrule to another-prefix-for-1st-area class 250 + ... + iproute addrule to a-prefix-for-2nd-area class 249 + ... + + Area classes must be terminated with reject record. + iproute add default reject class 250 + iproute add default reject class 249 + ... + +4. The Variant Router Requirements Algorithm (RFC1812 E.3.2) + Create 16 classes for different TOS values. + It is a funny, but pretty useless algorithm. + I listed it just to show the power of new routing code. + +5. All the variety of combinations...... + + +GATED +----- + + Gated does not understand classes, but it will work + happily in MAIN+DEFAULT. All policy routes can be set + and maintained manually. + +IMPORTANT NOTE +-------------- + route.c has a compilation time switch CONFIG_IP_LOCAL_RT_POLICY. + If it is set, locally originated packets are routed + using all the policy list. This is not very convenient and + pretty ambiguous when used with NAT and masquerading. + I set it to FALSE by default. + + +Alexey Kuznetov +kuznet@ms2.inr.ac.ru diff --git a/Documentation/networking/ppp_generic.txt b/Documentation/networking/ppp_generic.txt new file mode 100644 index 0000000..15b5172 --- /dev/null +++ b/Documentation/networking/ppp_generic.txt @@ -0,0 +1,432 @@ + PPP Generic Driver and Channel Interface + ---------------------------------------- + + Paul Mackerras + paulus@samba.org + 7 Feb 2002 + +The generic PPP driver in linux-2.4 provides an implementation of the +functionality which is of use in any PPP implementation, including: + +* the network interface unit (ppp0 etc.) +* the interface to the networking code +* PPP multilink: splitting datagrams between multiple links, and + ordering and combining received fragments +* the interface to pppd, via a /dev/ppp character device +* packet compression and decompression +* TCP/IP header compression and decompression +* detecting network traffic for demand dialling and for idle timeouts +* simple packet filtering + +For sending and receiving PPP frames, the generic PPP driver calls on +the services of PPP `channels'. A PPP channel encapsulates a +mechanism for transporting PPP frames from one machine to another. A +PPP channel implementation can be arbitrarily complex internally but +has a very simple interface with the generic PPP code: it merely has +to be able to send PPP frames, receive PPP frames, and optionally +handle ioctl requests. Currently there are PPP channel +implementations for asynchronous serial ports, synchronous serial +ports, and for PPP over ethernet. + +This architecture makes it possible to implement PPP multilink in a +natural and straightforward way, by allowing more than one channel to +be linked to each ppp network interface unit. The generic layer is +responsible for splitting datagrams on transmit and recombining them +on receive. + + +PPP channel API +--------------- + +See include/linux/ppp_channel.h for the declaration of the types and +functions used to communicate between the generic PPP layer and PPP +channels. + +Each channel has to provide two functions to the generic PPP layer, +via the ppp_channel.ops pointer: + +* start_xmit() is called by the generic layer when it has a frame to + send. The channel has the option of rejecting the frame for + flow-control reasons. In this case, start_xmit() should return 0 + and the channel should call the ppp_output_wakeup() function at a + later time when it can accept frames again, and the generic layer + will then attempt to retransmit the rejected frame(s). If the frame + is accepted, the start_xmit() function should return 1. + +* ioctl() provides an interface which can be used by a user-space + program to control aspects of the channel's behaviour. This + procedure will be called when a user-space program does an ioctl + system call on an instance of /dev/ppp which is bound to the + channel. (Usually it would only be pppd which would do this.) + +The generic PPP layer provides seven functions to channels: + +* ppp_register_channel() is called when a channel has been created, to + notify the PPP generic layer of its presence. For example, setting + a serial port to the PPPDISC line discipline causes the ppp_async + channel code to call this function. + +* ppp_unregister_channel() is called when a channel is to be + destroyed. For example, the ppp_async channel code calls this when + a hangup is detected on the serial port. + +* ppp_output_wakeup() is called by a channel when it has previously + rejected a call to its start_xmit function, and can now accept more + packets. + +* ppp_input() is called by a channel when it has received a complete + PPP frame. + +* ppp_input_error() is called by a channel when it has detected that a + frame has been lost or dropped (for example, because of a FCS (frame + check sequence) error). + +* ppp_channel_index() returns the channel index assigned by the PPP + generic layer to this channel. The channel should provide some way + (e.g. an ioctl) to transmit this back to user-space, as user-space + will need it to attach an instance of /dev/ppp to this channel. + +* ppp_unit_number() returns the unit number of the ppp network + interface to which this channel is connected, or -1 if the channel + is not connected. + +Connecting a channel to the ppp generic layer is initiated from the +channel code, rather than from the generic layer. The channel is +expected to have some way for a user-level process to control it +independently of the ppp generic layer. For example, with the +ppp_async channel, this is provided by the file descriptor to the +serial port. + +Generally a user-level process will initialize the underlying +communications medium and prepare it to do PPP. For example, with an +async tty, this can involve setting the tty speed and modes, issuing +modem commands, and then going through some sort of dialog with the +remote system to invoke PPP service there. We refer to this process +as `discovery'. Then the user-level process tells the medium to +become a PPP channel and register itself with the generic PPP layer. +The channel then has to report the channel number assigned to it back +to the user-level process. From that point, the PPP negotiation code +in the PPP daemon (pppd) can take over and perform the PPP +negotiation, accessing the channel through the /dev/ppp interface. + +At the interface to the PPP generic layer, PPP frames are stored in +skbuff structures and start with the two-byte PPP protocol number. +The frame does *not* include the 0xff `address' byte or the 0x03 +`control' byte that are optionally used in async PPP. Nor is there +any escaping of control characters, nor are there any FCS or framing +characters included. That is all the responsibility of the channel +code, if it is needed for the particular medium. That is, the skbuffs +presented to the start_xmit() function contain only the 2-byte +protocol number and the data, and the skbuffs presented to ppp_input() +must be in the same format. + +The channel must provide an instance of a ppp_channel struct to +represent the channel. The channel is free to use the `private' field +however it wishes. The channel should initialize the `mtu' and +`hdrlen' fields before calling ppp_register_channel() and not change +them until after ppp_unregister_channel() returns. The `mtu' field +represents the maximum size of the data part of the PPP frames, that +is, it does not include the 2-byte protocol number. + +If the channel needs some headroom in the skbuffs presented to it for +transmission (i.e., some space free in the skbuff data area before the +start of the PPP frame), it should set the `hdrlen' field of the +ppp_channel struct to the amount of headroom required. The generic +PPP layer will attempt to provide that much headroom but the channel +should still check if there is sufficient headroom and copy the skbuff +if there isn't. + +On the input side, channels should ideally provide at least 2 bytes of +headroom in the skbuffs presented to ppp_input(). The generic PPP +code does not require this but will be more efficient if this is done. + + +Buffering and flow control +-------------------------- + +The generic PPP layer has been designed to minimize the amount of data +that it buffers in the transmit direction. It maintains a queue of +transmit packets for the PPP unit (network interface device) plus a +queue of transmit packets for each attached channel. Normally the +transmit queue for the unit will contain at most one packet; the +exceptions are when pppd sends packets by writing to /dev/ppp, and +when the core networking code calls the generic layer's start_xmit() +function with the queue stopped, i.e. when the generic layer has +called netif_stop_queue(), which only happens on a transmit timeout. +The start_xmit function always accepts and queues the packet which it +is asked to transmit. + +Transmit packets are dequeued from the PPP unit transmit queue and +then subjected to TCP/IP header compression and packet compression +(Deflate or BSD-Compress compression), as appropriate. After this +point the packets can no longer be reordered, as the decompression +algorithms rely on receiving compressed packets in the same order that +they were generated. + +If multilink is not in use, this packet is then passed to the attached +channel's start_xmit() function. If the channel refuses to take +the packet, the generic layer saves it for later transmission. The +generic layer will call the channel's start_xmit() function again +when the channel calls ppp_output_wakeup() or when the core +networking code calls the generic layer's start_xmit() function +again. The generic layer contains no timeout and retransmission +logic; it relies on the core networking code for that. + +If multilink is in use, the generic layer divides the packet into one +or more fragments and puts a multilink header on each fragment. It +decides how many fragments to use based on the length of the packet +and the number of channels which are potentially able to accept a +fragment at the moment. A channel is potentially able to accept a +fragment if it doesn't have any fragments currently queued up for it +to transmit. The channel may still refuse a fragment; in this case +the fragment is queued up for the channel to transmit later. This +scheme has the effect that more fragments are given to higher- +bandwidth channels. It also means that under light load, the generic +layer will tend to fragment large packets across all the channels, +thus reducing latency, while under heavy load, packets will tend to be +transmitted as single fragments, thus reducing the overhead of +fragmentation. + + +SMP safety +---------- + +The PPP generic layer has been designed to be SMP-safe. Locks are +used around accesses to the internal data structures where necessary +to ensure their integrity. As part of this, the generic layer +requires that the channels adhere to certain requirements and in turn +provides certain guarantees to the channels. Essentially the channels +are required to provide the appropriate locking on the ppp_channel +structures that form the basis of the communication between the +channel and the generic layer. This is because the channel provides +the storage for the ppp_channel structure, and so the channel is +required to provide the guarantee that this storage exists and is +valid at the appropriate times. + +The generic layer requires these guarantees from the channel: + +* The ppp_channel object must exist from the time that + ppp_register_channel() is called until after the call to + ppp_unregister_channel() returns. + +* No thread may be in a call to any of ppp_input(), ppp_input_error(), + ppp_output_wakeup(), ppp_channel_index() or ppp_unit_number() for a + channel at the time that ppp_unregister_channel() is called for that + channel. + +* ppp_register_channel() and ppp_unregister_channel() must be called + from process context, not interrupt or softirq/BH context. + +* The remaining generic layer functions may be called at softirq/BH + level but must not be called from a hardware interrupt handler. + +* The generic layer may call the channel start_xmit() function at + softirq/BH level but will not call it at interrupt level. Thus the + start_xmit() function may not block. + +* The generic layer will only call the channel ioctl() function in + process context. + +The generic layer provides these guarantees to the channels: + +* The generic layer will not call the start_xmit() function for a + channel while any thread is already executing in that function for + that channel. + +* The generic layer will not call the ioctl() function for a channel + while any thread is already executing in that function for that + channel. + +* By the time a call to ppp_unregister_channel() returns, no thread + will be executing in a call from the generic layer to that channel's + start_xmit() or ioctl() function, and the generic layer will not + call either of those functions subsequently. + + +Interface to pppd +----------------- + +The PPP generic layer exports a character device interface called +/dev/ppp. This is used by pppd to control PPP interface units and +channels. Although there is only one /dev/ppp, each open instance of +/dev/ppp acts independently and can be attached either to a PPP unit +or a PPP channel. This is achieved using the file->private_data field +to point to a separate object for each open instance of /dev/ppp. In +this way an effect similar to Solaris' clone open is obtained, +allowing us to control an arbitrary number of PPP interfaces and +channels without having to fill up /dev with hundreds of device names. + +When /dev/ppp is opened, a new instance is created which is initially +unattached. Using an ioctl call, it can then be attached to an +existing unit, attached to a newly-created unit, or attached to an +existing channel. An instance attached to a unit can be used to send +and receive PPP control frames, using the read() and write() system +calls, along with poll() if necessary. Similarly, an instance +attached to a channel can be used to send and receive PPP frames on +that channel. + +In multilink terms, the unit represents the bundle, while the channels +represent the individual physical links. Thus, a PPP frame sent by a +write to the unit (i.e., to an instance of /dev/ppp attached to the +unit) will be subject to bundle-level compression and to fragmentation +across the individual links (if multilink is in use). In contrast, a +PPP frame sent by a write to the channel will be sent as-is on that +channel, without any multilink header. + +A channel is not initially attached to any unit. In this state it can +be used for PPP negotiation but not for the transfer of data packets. +It can then be connected to a PPP unit with an ioctl call, which +makes it available to send and receive data packets for that unit. + +The ioctl calls which are available on an instance of /dev/ppp depend +on whether it is unattached, attached to a PPP interface, or attached +to a PPP channel. The ioctl calls which are available on an +unattached instance are: + +* PPPIOCNEWUNIT creates a new PPP interface and makes this /dev/ppp + instance the "owner" of the interface. The argument should point to + an int which is the desired unit number if >= 0, or -1 to assign the + lowest unused unit number. Being the owner of the interface means + that the interface will be shut down if this instance of /dev/ppp is + closed. + +* PPPIOCATTACH attaches this instance to an existing PPP interface. + The argument should point to an int containing the unit number. + This does not make this instance the owner of the PPP interface. + +* PPPIOCATTCHAN attaches this instance to an existing PPP channel. + The argument should point to an int containing the channel number. + +The ioctl calls available on an instance of /dev/ppp attached to a +channel are: + +* PPPIOCDETACH detaches the instance from the channel. This ioctl is + deprecated since the same effect can be achieved by closing the + instance. In order to prevent possible races this ioctl will fail + with an EINVAL error if more than one file descriptor refers to this + instance (i.e. as a result of dup(), dup2() or fork()). + +* PPPIOCCONNECT connects this channel to a PPP interface. The + argument should point to an int containing the interface unit + number. It will return an EINVAL error if the channel is already + connected to an interface, or ENXIO if the requested interface does + not exist. + +* PPPIOCDISCONN disconnects this channel from the PPP interface that + it is connected to. It will return an EINVAL error if the channel + is not connected to an interface. + +* All other ioctl commands are passed to the channel ioctl() function. + +The ioctl calls that are available on an instance that is attached to +an interface unit are: + +* PPPIOCSMRU sets the MRU (maximum receive unit) for the interface. + The argument should point to an int containing the new MRU value. + +* PPPIOCSFLAGS sets flags which control the operation of the + interface. The argument should be a pointer to an int containing + the new flags value. The bits in the flags value that can be set + are: + SC_COMP_TCP enable transmit TCP header compression + SC_NO_TCP_CCID disable connection-id compression for + TCP header compression + SC_REJ_COMP_TCP disable receive TCP header decompression + SC_CCP_OPEN Compression Control Protocol (CCP) is + open, so inspect CCP packets + SC_CCP_UP CCP is up, may (de)compress packets + SC_LOOP_TRAFFIC send IP traffic to pppd + SC_MULTILINK enable PPP multilink fragmentation on + transmitted packets + SC_MP_SHORTSEQ expect short multilink sequence + numbers on received multilink fragments + SC_MP_XSHORTSEQ transmit short multilink sequence nos. + + The values of these flags are defined in <linux/if_ppp.h>. Note + that the values of the SC_MULTILINK, SC_MP_SHORTSEQ and + SC_MP_XSHORTSEQ bits are ignored if the CONFIG_PPP_MULTILINK option + is not selected. + +* PPPIOCGFLAGS returns the value of the status/control flags for the + interface unit. The argument should point to an int where the ioctl + will store the flags value. As well as the values listed above for + PPPIOCSFLAGS, the following bits may be set in the returned value: + SC_COMP_RUN CCP compressor is running + SC_DECOMP_RUN CCP decompressor is running + SC_DC_ERROR CCP decompressor detected non-fatal error + SC_DC_FERROR CCP decompressor detected fatal error + +* PPPIOCSCOMPRESS sets the parameters for packet compression or + decompression. The argument should point to a ppp_option_data + structure (defined in <linux/if_ppp.h>), which contains a + pointer/length pair which should describe a block of memory + containing a CCP option specifying a compression method and its + parameters. The ppp_option_data struct also contains a `transmit' + field. If this is 0, the ioctl will affect the receive path, + otherwise the transmit path. + +* PPPIOCGUNIT returns, in the int pointed to by the argument, the unit + number of this interface unit. + +* PPPIOCSDEBUG sets the debug flags for the interface to the value in + the int pointed to by the argument. Only the least significant bit + is used; if this is 1 the generic layer will print some debug + messages during its operation. This is only intended for debugging + the generic PPP layer code; it is generally not helpful for working + out why a PPP connection is failing. + +* PPPIOCGDEBUG returns the debug flags for the interface in the int + pointed to by the argument. + +* PPPIOCGIDLE returns the time, in seconds, since the last data + packets were sent and received. The argument should point to a + ppp_idle structure (defined in <linux/ppp_defs.h>). If the + CONFIG_PPP_FILTER option is enabled, the set of packets which reset + the transmit and receive idle timers is restricted to those which + pass the `active' packet filter. + +* PPPIOCSMAXCID sets the maximum connection-ID parameter (and thus the + number of connection slots) for the TCP header compressor and + decompressor. The lower 16 bits of the int pointed to by the + argument specify the maximum connection-ID for the compressor. If + the upper 16 bits of that int are non-zero, they specify the maximum + connection-ID for the decompressor, otherwise the decompressor's + maximum connection-ID is set to 15. + +* PPPIOCSNPMODE sets the network-protocol mode for a given network + protocol. The argument should point to an npioctl struct (defined + in <linux/if_ppp.h>). The `protocol' field gives the PPP protocol + number for the protocol to be affected, and the `mode' field + specifies what to do with packets for that protocol: + + NPMODE_PASS normal operation, transmit and receive packets + NPMODE_DROP silently drop packets for this protocol + NPMODE_ERROR drop packets and return an error on transmit + NPMODE_QUEUE queue up packets for transmit, drop received + packets + + At present NPMODE_ERROR and NPMODE_QUEUE have the same effect as + NPMODE_DROP. + +* PPPIOCGNPMODE returns the network-protocol mode for a given + protocol. The argument should point to an npioctl struct with the + `protocol' field set to the PPP protocol number for the protocol of + interest. On return the `mode' field will be set to the network- + protocol mode for that protocol. + +* PPPIOCSPASS and PPPIOCSACTIVE set the `pass' and `active' packet + filters. These ioctls are only available if the CONFIG_PPP_FILTER + option is selected. The argument should point to a sock_fprog + structure (defined in <linux/filter.h>) containing the compiled BPF + instructions for the filter. Packets are dropped if they fail the + `pass' filter; otherwise, if they fail the `active' filter they are + passed but they do not reset the transmit or receive idle timer. + +* PPPIOCSMRRU enables or disables multilink processing for received + packets and sets the multilink MRRU (maximum reconstructed receive + unit). The argument should point to an int containing the new MRRU + value. If the MRRU value is 0, processing of received multilink + fragments is disabled. This ioctl is only available if the + CONFIG_PPP_MULTILINK option is selected. + +Last modified: 7-feb-2002 diff --git a/Documentation/networking/proc_net_tcp.txt b/Documentation/networking/proc_net_tcp.txt new file mode 100644 index 0000000..4a79209 --- /dev/null +++ b/Documentation/networking/proc_net_tcp.txt @@ -0,0 +1,48 @@ +This document describes the interfaces /proc/net/tcp and /proc/net/tcp6. +Note that these interfaces are deprecated in favor of tcp_diag. + +These /proc interfaces provide information about currently active TCP +connections, and are implemented by tcp4_seq_show() in net/ipv4/tcp_ipv4.c +and tcp6_seq_show() in net/ipv6/tcp_ipv6.c, respectively. + +It will first list all listening TCP sockets, and next list all established +TCP connections. A typical entry of /proc/net/tcp would look like this (split +up into 3 parts because of the length of the line): + + 46: 010310AC:9C4C 030310AC:1770 01 + | | | | | |--> connection state + | | | | |------> remote TCP port number + | | | |-------------> remote IPv4 address + | | |--------------------> local TCP port number + | |---------------------------> local IPv4 address + |----------------------------------> number of entry + + 00000150:00000000 01:00000019 00000000 + | | | | |--> number of unrecovered RTO timeouts + | | | |----------> number of jiffies until timer expires + | | |----------------> timer_active (see below) + | |----------------------> receive-queue + |-------------------------------> transmit-queue + + 1000 0 54165785 4 cd1e6040 25 4 27 3 -1 + | | | | | | | | | |--> slow start size threshold, + | | | | | | | | | or -1 if the threshold + | | | | | | | | | is >= 0xFFFF + | | | | | | | | |----> sending congestion window + | | | | | | | |-------> (ack.quick<<1)|ack.pingpong + | | | | | | |---------> Predicted tick of soft clock + | | | | | | (delayed ACK control data) + | | | | | |------------> retransmit timeout + | | | | |------------------> location of socket in memory + | | | |-----------------------> socket reference count + | | |-----------------------------> inode + | |----------------------------------> unanswered 0-window probes + |---------------------------------------------> uid + +timer_active: + 0 no timer is pending + 1 retransmit-timer is pending + 2 another timer (e.g. delayed ack or keepalive) is pending + 3 this is a socket in TIME_WAIT state. Not all fields will contain + data (or even exist) + 4 zero window probe timer is pending diff --git a/Documentation/networking/radiotap-headers.txt b/Documentation/networking/radiotap-headers.txt new file mode 100644 index 0000000..953331c --- /dev/null +++ b/Documentation/networking/radiotap-headers.txt @@ -0,0 +1,152 @@ +How to use radiotap headers +=========================== + +Pointer to the radiotap include file +------------------------------------ + +Radiotap headers are variable-length and extensible, you can get most of the +information you need to know on them from: + +./include/net/ieee80211_radiotap.h + +This document gives an overview and warns on some corner cases. + + +Structure of the header +----------------------- + +There is a fixed portion at the start which contains a u32 bitmap that defines +if the possible argument associated with that bit is present or not. So if b0 +of the it_present member of ieee80211_radiotap_header is set, it means that +the header for argument index 0 (IEEE80211_RADIOTAP_TSFT) is present in the +argument area. + + < 8-byte ieee80211_radiotap_header > + [ <possible argument bitmap extensions ... > ] + [ <argument> ... ] + +At the moment there are only 13 possible argument indexes defined, but in case +we run out of space in the u32 it_present member, it is defined that b31 set +indicates that there is another u32 bitmap following (shown as "possible +argument bitmap extensions..." above), and the start of the arguments is moved +forward 4 bytes each time. + +Note also that the it_len member __le16 is set to the total number of bytes +covered by the ieee80211_radiotap_header and any arguments following. + + +Requirements for arguments +-------------------------- + +After the fixed part of the header, the arguments follow for each argument +index whose matching bit is set in the it_present member of +ieee80211_radiotap_header. + + - the arguments are all stored little-endian! + + - the argument payload for a given argument index has a fixed size. So + IEEE80211_RADIOTAP_TSFT being present always indicates an 8-byte argument is + present. See the comments in ./include/net/ieee80211_radiotap.h for a nice + breakdown of all the argument sizes + + - the arguments must be aligned to a boundary of the argument size using + padding. So a u16 argument must start on the next u16 boundary if it isn't + already on one, a u32 must start on the next u32 boundary and so on. + + - "alignment" is relative to the start of the ieee80211_radiotap_header, ie, + the first byte of the radiotap header. The absolute alignment of that first + byte isn't defined. So even if the whole radiotap header is starting at, eg, + address 0x00000003, still the first byte of the radiotap header is treated as + 0 for alignment purposes. + + - the above point that there may be no absolute alignment for multibyte + entities in the fixed radiotap header or the argument region means that you + have to take special evasive action when trying to access these multibyte + entities. Some arches like Blackfin cannot deal with an attempt to + dereference, eg, a u16 pointer that is pointing to an odd address. Instead + you have to use a kernel API get_unaligned() to dereference the pointer, + which will do it bytewise on the arches that require that. + + - The arguments for a given argument index can be a compound of multiple types + together. For example IEEE80211_RADIOTAP_CHANNEL has an argument payload + consisting of two u16s of total length 4. When this happens, the padding + rule is applied dealing with a u16, NOT dealing with a 4-byte single entity. + + +Example valid radiotap header +----------------------------- + + 0x00, 0x00, // <-- radiotap version + pad byte + 0x0b, 0x00, // <- radiotap header length + 0x04, 0x0c, 0x00, 0x00, // <-- bitmap + 0x6c, // <-- rate (in 500kHz units) + 0x0c, //<-- tx power + 0x01 //<-- antenna + + +Using the Radiotap Parser +------------------------- + +If you are having to parse a radiotap struct, you can radically simplify the +job by using the radiotap parser that lives in net/wireless/radiotap.c and has +its prototypes available in include/net/cfg80211.h. You use it like this: + +#include <net/cfg80211.h> + +/* buf points to the start of the radiotap header part */ + +int MyFunction(u8 * buf, int buflen) +{ + int pkt_rate_100kHz = 0, antenna = 0, pwr = 0; + struct ieee80211_radiotap_iterator iterator; + int ret = ieee80211_radiotap_iterator_init(&iterator, buf, buflen); + + while (!ret) { + + ret = ieee80211_radiotap_iterator_next(&iterator); + + if (ret) + continue; + + /* see if this argument is something we can use */ + + switch (iterator.this_arg_index) { + /* + * You must take care when dereferencing iterator.this_arg + * for multibyte types... the pointer is not aligned. Use + * get_unaligned((type *)iterator.this_arg) to dereference + * iterator.this_arg for type "type" safely on all arches. + */ + case IEEE80211_RADIOTAP_RATE: + /* radiotap "rate" u8 is in + * 500kbps units, eg, 0x02=1Mbps + */ + pkt_rate_100kHz = (*iterator.this_arg) * 5; + break; + + case IEEE80211_RADIOTAP_ANTENNA: + /* radiotap uses 0 for 1st ant */ + antenna = *iterator.this_arg); + break; + + case IEEE80211_RADIOTAP_DBM_TX_POWER: + pwr = *iterator.this_arg; + break; + + default: + break; + } + } /* while more rt headers */ + + if (ret != -ENOENT) + return TXRX_DROP; + + /* discard the radiotap header part */ + buf += iterator.max_length; + buflen -= iterator.max_length; + + ... + +} + +Andy Green <andy@warmcat.com> diff --git a/Documentation/networking/ray_cs.txt b/Documentation/networking/ray_cs.txt new file mode 100644 index 0000000..145d27a --- /dev/null +++ b/Documentation/networking/ray_cs.txt @@ -0,0 +1,150 @@ +September 21, 1999 + +Copyright (c) 1998 Corey Thomas (corey@world.std.com) + +This file is the documentation for the Raylink Wireless LAN card driver for +Linux. The Raylink wireless LAN card is a PCMCIA card which provides IEEE +802.11 compatible wireless network connectivity at 1 and 2 megabits/second. +See http://www.raytheon.com/micro/raylink/ for more information on the Raylink +card. This driver is in early development and does have bugs. See the known +bugs and limitations at the end of this document for more information. +This driver also works with WebGear's Aviator 2.4 and Aviator Pro +wireless LAN cards. + +As of kernel 2.3.18, the ray_cs driver is part of the Linux kernel +source. My web page for the development of ray_cs is at +http://world.std.com/~corey/raylink.html and I can be emailed at +corey@world.std.com + +The kernel driver is based on ray_cs-1.62.tgz + +The driver at my web page is intended to be used as an add on to +David Hinds pcmcia package. All the command line parameters are +available when compiled as a module. When built into the kernel, only +the essid= string parameter is available via the kernel command line. +This will change after the method of sorting out parameters for all +the PCMCIA drivers is agreed upon. If you must have a built in driver +with nondefault parameters, they can be edited in +/usr/src/linux/drivers/net/pcmcia/ray_cs.c. Searching for module_param +will find them all. + +Information on card services is available at: + http://pcmcia-cs.sourceforge.net/ + + +Card services user programs are still required for PCMCIA devices. +pcmcia-cs-3.1.1 or greater is required for the kernel version of +the driver. + +Currently, ray_cs is not part of David Hinds card services package, +so the following magic is required. + +At the end of the /etc/pcmcia/config.opts file, add the line: +source ./ray_cs.opts +This will make card services read the ray_cs.opts file +when starting. Create the file /etc/pcmcia/ray_cs.opts containing the +following: + +#### start of /etc/pcmcia/ray_cs.opts ################### +# Configuration options for Raylink Wireless LAN PCMCIA card +device "ray_cs" + class "network" module "misc/ray_cs" + +card "RayLink PC Card WLAN Adapter" + manfid 0x01a6, 0x0000 + bind "ray_cs" + +module "misc/ray_cs" opts "" +#### end of /etc/pcmcia/ray_cs.opts ##################### + + +To join an existing network with +different parameters, contact the network administrator for the +configuration information, and edit /etc/pcmcia/ray_cs.opts. +Add the parameters below between the empty quotes. + +Parameters for ray_cs driver which may be specified in ray_cs.opts: + +bc integer 0 = normal mode (802.11 timing) + 1 = slow down inter frame timing to allow + operation with older breezecom access + points. + +beacon_period integer beacon period in Kilo-microseconds + legal values = must be integer multiple + of hop dwell + default = 256 + +country integer 1 = USA (default) + 2 = Europe + 3 = Japan + 4 = Korea + 5 = Spain + 6 = France + 7 = Israel + 8 = Australia + +essid string ESS ID - network name to join + string with maximum length of 32 chars + default value = "ADHOC_ESSID" + +hop_dwell integer hop dwell time in Kilo-microseconds + legal values = 16,32,64,128(default),256 + +irq_mask integer linux standard 16 bit value 1bit/IRQ + lsb is IRQ 0, bit 1 is IRQ 1 etc. + Used to restrict choice of IRQ's to use. + Recommended method for controlling + interrupts is in /etc/pcmcia/config.opts + +net_type integer 0 (default) = adhoc network, + 1 = infrastructure + +phy_addr string string containing new MAC address in + hex, must start with x eg + x00008f123456 + +psm integer 0 = continuously active + 1 = power save mode (not useful yet) + +pc_debug integer (0-5) larger values for more verbose + logging. Replaces ray_debug. + +ray_debug integer Replaced with pc_debug + +ray_mem_speed integer defaults to 500 + +sniffer integer 0 = not sniffer (default) + 1 = sniffer which can be used to record all + network traffic using tcpdump or similar, + but no normal network use is allowed. + +translate integer 0 = no translation (encapsulate frames) + 1 = translation (RFC1042/802.1) + + +More on sniffer mode: + +tcpdump does not understand 802.11 headers, so it can't +interpret the contents, but it can record to a file. This is only +useful for debugging 802.11 lowlevel protocols that are not visible to +linux. If you want to watch ftp xfers, or do similar things, you +don't need to use sniffer mode. Also, some packet types are never +sent up by the card, so you will never see them (ack, rts, cts, probe +etc.) There is a simple program (showcap) included in the ray_cs +package which parses the 802.11 headers. + +Known Problems and missing features + + Does not work with non x86 + + Does not work with SMP + + Support for defragmenting frames is not yet debugged, and in + fact is known to not work. I have never encountered a net set + up to fragment, but still, it should be fixed. + + The ioctl support is incomplete. The hardware address cannot be set + using ifconfig yet. If a different hardware address is needed, it may + be set using the phy_addr parameter in ray_cs.opts. This requires + a card insertion to take effect. diff --git a/Documentation/networking/regulatory.txt b/Documentation/networking/regulatory.txt new file mode 100644 index 0000000..a96989a --- /dev/null +++ b/Documentation/networking/regulatory.txt @@ -0,0 +1,194 @@ +Linux wireless regulatory documentation +--------------------------------------- + +This document gives a brief review over how the Linux wireless +regulatory infrastructure works. + +More up to date information can be obtained at the project's web page: + +http://wireless.kernel.org/en/developers/Regulatory + +Keeping regulatory domains in userspace +--------------------------------------- + +Due to the dynamic nature of regulatory domains we keep them +in userspace and provide a framework for userspace to upload +to the kernel one regulatory domain to be used as the central +core regulatory domain all wireless devices should adhere to. + +How to get regulatory domains to the kernel +------------------------------------------- + +Userspace gets a regulatory domain in the kernel by having +a userspace agent build it and send it via nl80211. Only +expected regulatory domains will be respected by the kernel. + +A currently available userspace agent which can accomplish this +is CRDA - central regulatory domain agent. Its documented here: + +http://wireless.kernel.org/en/developers/Regulatory/CRDA + +Essentially the kernel will send a udev event when it knows +it needs a new regulatory domain. A udev rule can be put in place +to trigger crda to send the respective regulatory domain for a +specific ISO/IEC 3166 alpha2. + +Below is an example udev rule which can be used: + +# Example file, should be put in /etc/udev/rules.d/regulatory.rules +KERNEL=="regulatory*", ACTION=="change", SUBSYSTEM=="platform", RUN+="/sbin/crda" + +The alpha2 is passed as an environment variable under the variable COUNTRY. + +Who asks for regulatory domains? +-------------------------------- + +* Users + +Users can use iw: + +http://wireless.kernel.org/en/users/Documentation/iw + +An example: + + # set regulatory domain to "Costa Rica" + iw reg set CR + +This will request the kernel to set the regulatory domain to +the specificied alpha2. The kernel in turn will then ask userspace +to provide a regulatory domain for the alpha2 specified by the user +by sending a uevent. + +* Wireless subsystems for Country Information elements + +The kernel will send a uevent to inform userspace a new +regulatory domain is required. More on this to be added +as its integration is added. + +* Drivers + +If drivers determine they need a specific regulatory domain +set they can inform the wireless core using regulatory_hint(). +They have two options -- they either provide an alpha2 so that +crda can provide back a regulatory domain for that country or +they can build their own regulatory domain based on internal +custom knowledge so the wireless core can respect it. + +*Most* drivers will rely on the first mechanism of providing a +regulatory hint with an alpha2. For these drivers there is an additional +check that can be used to ensure compliance based on custom EEPROM +regulatory data. This additional check can be used by drivers by +registering on its struct wiphy a reg_notifier() callback. This notifier +is called when the core's regulatory domain has been changed. The driver +can use this to review the changes made and also review who made them +(driver, user, country IE) and determine what to allow based on its +internal EEPROM data. Devices drivers wishing to be capable of world +roaming should use this callback. More on world roaming will be +added to this document when its support is enabled. + +Device drivers who provide their own built regulatory domain +do not need a callback as the channels registered by them are +the only ones that will be allowed and therefore *additional* +cannels cannot be enabled. + +Example code - drivers hinting an alpha2: +------------------------------------------ + +This example comes from the zd1211rw device driver. You can start +by having a mapping of your device's EEPROM country/regulatory +domain value to to a specific alpha2 as follows: + +static struct zd_reg_alpha2_map reg_alpha2_map[] = { + { ZD_REGDOMAIN_FCC, "US" }, + { ZD_REGDOMAIN_IC, "CA" }, + { ZD_REGDOMAIN_ETSI, "DE" }, /* Generic ETSI, use most restrictive */ + { ZD_REGDOMAIN_JAPAN, "JP" }, + { ZD_REGDOMAIN_JAPAN_ADD, "JP" }, + { ZD_REGDOMAIN_SPAIN, "ES" }, + { ZD_REGDOMAIN_FRANCE, "FR" }, + +Then you can define a routine to map your read EEPROM value to an alpha2, +as follows: + +static int zd_reg2alpha2(u8 regdomain, char *alpha2) +{ + unsigned int i; + struct zd_reg_alpha2_map *reg_map; + for (i = 0; i < ARRAY_SIZE(reg_alpha2_map); i++) { + reg_map = ®_alpha2_map[i]; + if (regdomain == reg_map->reg) { + alpha2[0] = reg_map->alpha2[0]; + alpha2[1] = reg_map->alpha2[1]; + return 0; + } + } + return 1; +} + +Lastly, you can then hint to the core of your discovered alpha2, if a match +was found. You need to do this after you have registered your wiphy. You +are expected to do this during initialization. + + r = zd_reg2alpha2(mac->regdomain, alpha2); + if (!r) + regulatory_hint(hw->wiphy, alpha2, NULL); + +Example code - drivers providing a built in regulatory domain: +-------------------------------------------------------------- + +If you have regulatory information you can obtain from your +driver and you *need* to use this we let you build a regulatory domain +structure and pass it to the wireless core. To do this you should +kmalloc() a structure big enough to hold your regulatory domain +structure and you should then fill it with your data. Finally you simply +call regulatory_hint() with the regulatory domain structure in it. + +Bellow is a simple example, with a regulatory domain cached using the stack. +Your implementation may vary (read EEPROM cache instead, for example). + +Example cache of some regulatory domain + +struct ieee80211_regdomain mydriver_jp_regdom = { + .n_reg_rules = 3, + .alpha2 = "JP", + //.alpha2 = "99", /* If I have no alpha2 to map it to */ + .reg_rules = { + /* IEEE 802.11b/g, channels 1..14 */ + REG_RULE(2412-20, 2484+20, 40, 6, 20, 0), + /* IEEE 802.11a, channels 34..48 */ + REG_RULE(5170-20, 5240+20, 40, 6, 20, + NL80211_RRF_PASSIVE_SCAN), + /* IEEE 802.11a, channels 52..64 */ + REG_RULE(5260-20, 5320+20, 40, 6, 20, + NL80211_RRF_NO_IBSS | + NL80211_RRF_DFS), + } +}; + +Then in some part of your code after your wiphy has been registered: + + int r; + struct ieee80211_regdomain *rd; + int size_of_regd; + int num_rules = mydriver_jp_regdom.n_reg_rules; + unsigned int i; + + size_of_regd = sizeof(struct ieee80211_regdomain) + + (num_rules * sizeof(struct ieee80211_reg_rule)); + + rd = kzalloc(size_of_regd, GFP_KERNEL); + if (!rd) + return -ENOMEM; + + memcpy(rd, &mydriver_jp_regdom, sizeof(struct ieee80211_regdomain)); + + for (i=0; i < num_rules; i++) { + memcpy(&rd->reg_rules[i], &mydriver_jp_regdom.reg_rules[i], + sizeof(struct ieee80211_reg_rule)); + } + r = regulatory_hint(hw->wiphy, NULL, rd); + if (r) { + kfree(rd); + return r; + } + diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt new file mode 100644 index 0000000..c3669a3 --- /dev/null +++ b/Documentation/networking/rxrpc.txt @@ -0,0 +1,866 @@ + ====================== + RxRPC NETWORK PROTOCOL + ====================== + +The RxRPC protocol driver provides a reliable two-phase transport on top of UDP +that can be used to perform RxRPC remote operations. This is done over sockets +of AF_RXRPC family, using sendmsg() and recvmsg() with control data to send and +receive data, aborts and errors. + +Contents of this document: + + (*) Overview. + + (*) RxRPC protocol summary. + + (*) AF_RXRPC driver model. + + (*) Control messages. + + (*) Socket options. + + (*) Security. + + (*) Example client usage. + + (*) Example server usage. + + (*) AF_RXRPC kernel interface. + + +======== +OVERVIEW +======== + +RxRPC is a two-layer protocol. There is a session layer which provides +reliable virtual connections using UDP over IPv4 (or IPv6) as the transport +layer, but implements a real network protocol; and there's the presentation +layer which renders structured data to binary blobs and back again using XDR +(as does SunRPC): + + +-------------+ + | Application | + +-------------+ + | XDR | Presentation + +-------------+ + | RxRPC | Session + +-------------+ + | UDP | Transport + +-------------+ + + +AF_RXRPC provides: + + (1) Part of an RxRPC facility for both kernel and userspace applications by + making the session part of it a Linux network protocol (AF_RXRPC). + + (2) A two-phase protocol. The client transmits a blob (the request) and then + receives a blob (the reply), and the server receives the request and then + transmits the reply. + + (3) Retention of the reusable bits of the transport system set up for one call + to speed up subsequent calls. + + (4) A secure protocol, using the Linux kernel's key retention facility to + manage security on the client end. The server end must of necessity be + more active in security negotiations. + +AF_RXRPC does not provide XDR marshalling/presentation facilities. That is +left to the application. AF_RXRPC only deals in blobs. Even the operation ID +is just the first four bytes of the request blob, and as such is beyond the +kernel's interest. + + +Sockets of AF_RXRPC family are: + + (1) created as type SOCK_DGRAM; + + (2) provided with a protocol of the type of underlying transport they're going + to use - currently only PF_INET is supported. + + +The Andrew File System (AFS) is an example of an application that uses this and +that has both kernel (filesystem) and userspace (utility) components. + + +====================== +RXRPC PROTOCOL SUMMARY +====================== + +An overview of the RxRPC protocol: + + (*) RxRPC sits on top of another networking protocol (UDP is the only option + currently), and uses this to provide network transport. UDP ports, for + example, provide transport endpoints. + + (*) RxRPC supports multiple virtual "connections" from any given transport + endpoint, thus allowing the endpoints to be shared, even to the same + remote endpoint. + + (*) Each connection goes to a particular "service". A connection may not go + to multiple services. A service may be considered the RxRPC equivalent of + a port number. AF_RXRPC permits multiple services to share an endpoint. + + (*) Client-originating packets are marked, thus a transport endpoint can be + shared between client and server connections (connections have a + direction). + + (*) Up to a billion connections may be supported concurrently between one + local transport endpoint and one service on one remote endpoint. An RxRPC + connection is described by seven numbers: + + Local address } + Local port } Transport (UDP) address + Remote address } + Remote port } + Direction + Connection ID + Service ID + + (*) Each RxRPC operation is a "call". A connection may make up to four + billion calls, but only up to four calls may be in progress on a + connection at any one time. + + (*) Calls are two-phase and asymmetric: the client sends its request data, + which the service receives; then the service transmits the reply data + which the client receives. + + (*) The data blobs are of indefinite size, the end of a phase is marked with a + flag in the packet. The number of packets of data making up one blob may + not exceed 4 billion, however, as this would cause the sequence number to + wrap. + + (*) The first four bytes of the request data are the service operation ID. + + (*) Security is negotiated on a per-connection basis. The connection is + initiated by the first data packet on it arriving. If security is + requested, the server then issues a "challenge" and then the client + replies with a "response". If the response is successful, the security is + set for the lifetime of that connection, and all subsequent calls made + upon it use that same security. In the event that the server lets a + connection lapse before the client, the security will be renegotiated if + the client uses the connection again. + + (*) Calls use ACK packets to handle reliability. Data packets are also + explicitly sequenced per call. + + (*) There are two types of positive acknowledgement: hard-ACKs and soft-ACKs. + A hard-ACK indicates to the far side that all the data received to a point + has been received and processed; a soft-ACK indicates that the data has + been received but may yet be discarded and re-requested. The sender may + not discard any transmittable packets until they've been hard-ACK'd. + + (*) Reception of a reply data packet implicitly hard-ACK's all the data + packets that make up the request. + + (*) An call is complete when the request has been sent, the reply has been + received and the final hard-ACK on the last packet of the reply has + reached the server. + + (*) An call may be aborted by either end at any time up to its completion. + + +===================== +AF_RXRPC DRIVER MODEL +===================== + +About the AF_RXRPC driver: + + (*) The AF_RXRPC protocol transparently uses internal sockets of the transport + protocol to represent transport endpoints. + + (*) AF_RXRPC sockets map onto RxRPC connection bundles. Actual RxRPC + connections are handled transparently. One client socket may be used to + make multiple simultaneous calls to the same service. One server socket + may handle calls from many clients. + + (*) Additional parallel client connections will be initiated to support extra + concurrent calls, up to a tunable limit. + + (*) Each connection is retained for a certain amount of time [tunable] after + the last call currently using it has completed in case a new call is made + that could reuse it. + + (*) Each internal UDP socket is retained [tunable] for a certain amount of + time [tunable] after the last connection using it discarded, in case a new + connection is made that could use it. + + (*) A client-side connection is only shared between calls if they have have + the same key struct describing their security (and assuming the calls + would otherwise share the connection). Non-secured calls would also be + able to share connections with each other. + + (*) A server-side connection is shared if the client says it is. + + (*) ACK'ing is handled by the protocol driver automatically, including ping + replying. + + (*) SO_KEEPALIVE automatically pings the other side to keep the connection + alive [TODO]. + + (*) If an ICMP error is received, all calls affected by that error will be + aborted with an appropriate network error passed through recvmsg(). + + +Interaction with the user of the RxRPC socket: + + (*) A socket is made into a server socket by binding an address with a + non-zero service ID. + + (*) In the client, sending a request is achieved with one or more sendmsgs, + followed by the reply being received with one or more recvmsgs. + + (*) The first sendmsg for a request to be sent from a client contains a tag to + be used in all other sendmsgs or recvmsgs associated with that call. The + tag is carried in the control data. + + (*) connect() is used to supply a default destination address for a client + socket. This may be overridden by supplying an alternate address to the + first sendmsg() of a call (struct msghdr::msg_name). + + (*) If connect() is called on an unbound client, a random local port will + bound before the operation takes place. + + (*) A server socket may also be used to make client calls. To do this, the + first sendmsg() of the call must specify the target address. The server's + transport endpoint is used to send the packets. + + (*) Once the application has received the last message associated with a call, + the tag is guaranteed not to be seen again, and so it can be used to pin + client resources. A new call can then be initiated with the same tag + without fear of interference. + + (*) In the server, a request is received with one or more recvmsgs, then the + the reply is transmitted with one or more sendmsgs, and then the final ACK + is received with a last recvmsg. + + (*) When sending data for a call, sendmsg is given MSG_MORE if there's more + data to come on that call. + + (*) When receiving data for a call, recvmsg flags MSG_MORE if there's more + data to come for that call. + + (*) When receiving data or messages for a call, MSG_EOR is flagged by recvmsg + to indicate the terminal message for that call. + + (*) A call may be aborted by adding an abort control message to the control + data. Issuing an abort terminates the kernel's use of that call's tag. + Any messages waiting in the receive queue for that call will be discarded. + + (*) Aborts, busy notifications and challenge packets are delivered by recvmsg, + and control data messages will be set to indicate the context. Receiving + an abort or a busy message terminates the kernel's use of that call's tag. + + (*) The control data part of the msghdr struct is used for a number of things: + + (*) The tag of the intended or affected call. + + (*) Sending or receiving errors, aborts and busy notifications. + + (*) Notifications of incoming calls. + + (*) Sending debug requests and receiving debug replies [TODO]. + + (*) When the kernel has received and set up an incoming call, it sends a + message to server application to let it know there's a new call awaiting + its acceptance [recvmsg reports a special control message]. The server + application then uses sendmsg to assign a tag to the new call. Once that + is done, the first part of the request data will be delivered by recvmsg. + + (*) The server application has to provide the server socket with a keyring of + secret keys corresponding to the security types it permits. When a secure + connection is being set up, the kernel looks up the appropriate secret key + in the keyring and then sends a challenge packet to the client and + receives a response packet. The kernel then checks the authorisation of + the packet and either aborts the connection or sets up the security. + + (*) The name of the key a client will use to secure its communications is + nominated by a socket option. + + +Notes on recvmsg: + + (*) If there's a sequence of data messages belonging to a particular call on + the receive queue, then recvmsg will keep working through them until: + + (a) it meets the end of that call's received data, + + (b) it meets a non-data message, + + (c) it meets a message belonging to a different call, or + + (d) it fills the user buffer. + + If recvmsg is called in blocking mode, it will keep sleeping, awaiting the + reception of further data, until one of the above four conditions is met. + + (2) MSG_PEEK operates similarly, but will return immediately if it has put any + data in the buffer rather than sleeping until it can fill the buffer. + + (3) If a data message is only partially consumed in filling a user buffer, + then the remainder of that message will be left on the front of the queue + for the next taker. MSG_TRUNC will never be flagged. + + (4) If there is more data to be had on a call (it hasn't copied the last byte + of the last data message in that phase yet), then MSG_MORE will be + flagged. + + +================ +CONTROL MESSAGES +================ + +AF_RXRPC makes use of control messages in sendmsg() and recvmsg() to multiplex +calls, to invoke certain actions and to report certain conditions. These are: + + MESSAGE ID SRT DATA MEANING + ======================= === =========== =============================== + RXRPC_USER_CALL_ID sr- User ID App's call specifier + RXRPC_ABORT srt Abort code Abort code to issue/received + RXRPC_ACK -rt n/a Final ACK received + RXRPC_NET_ERROR -rt error num Network error on call + RXRPC_BUSY -rt n/a Call rejected (server busy) + RXRPC_LOCAL_ERROR -rt error num Local error encountered + RXRPC_NEW_CALL -r- n/a New call received + RXRPC_ACCEPT s-- n/a Accept new call + + (SRT = usable in Sendmsg / delivered by Recvmsg / Terminal message) + + (*) RXRPC_USER_CALL_ID + + This is used to indicate the application's call ID. It's an unsigned long + that the app specifies in the client by attaching it to the first data + message or in the server by passing it in association with an RXRPC_ACCEPT + message. recvmsg() passes it in conjunction with all messages except + those of the RXRPC_NEW_CALL message. + + (*) RXRPC_ABORT + + This is can be used by an application to abort a call by passing it to + sendmsg, or it can be delivered by recvmsg to indicate a remote abort was + received. Either way, it must be associated with an RXRPC_USER_CALL_ID to + specify the call affected. If an abort is being sent, then error EBADSLT + will be returned if there is no call with that user ID. + + (*) RXRPC_ACK + + This is delivered to a server application to indicate that the final ACK + of a call was received from the client. It will be associated with an + RXRPC_USER_CALL_ID to indicate the call that's now complete. + + (*) RXRPC_NET_ERROR + + This is delivered to an application to indicate that an ICMP error message + was encountered in the process of trying to talk to the peer. An + errno-class integer value will be included in the control message data + indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call + affected. + + (*) RXRPC_BUSY + + This is delivered to a client application to indicate that a call was + rejected by the server due to the server being busy. It will be + associated with an RXRPC_USER_CALL_ID to indicate the rejected call. + + (*) RXRPC_LOCAL_ERROR + + This is delivered to an application to indicate that a local error was + encountered and that a call has been aborted because of it. An + errno-class integer value will be included in the control message data + indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call + affected. + + (*) RXRPC_NEW_CALL + + This is delivered to indicate to a server application that a new call has + arrived and is awaiting acceptance. No user ID is associated with this, + as a user ID must subsequently be assigned by doing an RXRPC_ACCEPT. + + (*) RXRPC_ACCEPT + + This is used by a server application to attempt to accept a call and + assign it a user ID. It should be associated with an RXRPC_USER_CALL_ID + to indicate the user ID to be assigned. If there is no call to be + accepted (it may have timed out, been aborted, etc.), then sendmsg will + return error ENODATA. If the user ID is already in use by another call, + then error EBADSLT will be returned. + + +============== +SOCKET OPTIONS +============== + +AF_RXRPC sockets support a few socket options at the SOL_RXRPC level: + + (*) RXRPC_SECURITY_KEY + + This is used to specify the description of the key to be used. The key is + extracted from the calling process's keyrings with request_key() and + should be of "rxrpc" type. + + The optval pointer points to the description string, and optlen indicates + how long the string is, without the NUL terminator. + + (*) RXRPC_SECURITY_KEYRING + + Similar to above but specifies a keyring of server secret keys to use (key + type "keyring"). See the "Security" section. + + (*) RXRPC_EXCLUSIVE_CONNECTION + + This is used to request that new connections should be used for each call + made subsequently on this socket. optval should be NULL and optlen 0. + + (*) RXRPC_MIN_SECURITY_LEVEL + + This is used to specify the minimum security level required for calls on + this socket. optval must point to an int containing one of the following + values: + + (a) RXRPC_SECURITY_PLAIN + + Encrypted checksum only. + + (b) RXRPC_SECURITY_AUTH + + Encrypted checksum plus packet padded and first eight bytes of packet + encrypted - which includes the actual packet length. + + (c) RXRPC_SECURITY_ENCRYPTED + + Encrypted checksum plus entire packet padded and encrypted, including + actual packet length. + + +======== +SECURITY +======== + +Currently, only the kerberos 4 equivalent protocol has been implemented +(security index 2 - rxkad). This requires the rxkad module to be loaded and, +on the client, tickets of the appropriate type to be obtained from the AFS +kaserver or the kerberos server and installed as "rxrpc" type keys. This is +normally done using the klog program. An example simple klog program can be +found at: + + http://people.redhat.com/~dhowells/rxrpc/klog.c + +The payload provided to add_key() on the client should be of the following +form: + + struct rxrpc_key_sec2_v1 { + uint16_t security_index; /* 2 */ + uint16_t ticket_length; /* length of ticket[] */ + uint32_t expiry; /* time at which expires */ + uint8_t kvno; /* key version number */ + uint8_t __pad[3]; + uint8_t session_key[8]; /* DES session key */ + uint8_t ticket[0]; /* the encrypted ticket */ + }; + +Where the ticket blob is just appended to the above structure. + + +For the server, keys of type "rxrpc_s" must be made available to the server. +They have a description of "<serviceID>:<securityIndex>" (eg: "52:2" for an +rxkad key for the AFS VL service). When such a key is created, it should be +given the server's secret key as the instantiation data (see the example +below). + + add_key("rxrpc_s", "52:2", secret_key, 8, keyring); + +A keyring is passed to the server socket by naming it in a sockopt. The server +socket then looks the server secret keys up in this keyring when secure +incoming connections are made. This can be seen in an example program that can +be found at: + + http://people.redhat.com/~dhowells/rxrpc/listen.c + + +==================== +EXAMPLE CLIENT USAGE +==================== + +A client would issue an operation by: + + (1) An RxRPC socket is set up by: + + client = socket(AF_RXRPC, SOCK_DGRAM, PF_INET); + + Where the third parameter indicates the protocol family of the transport + socket used - usually IPv4 but it can also be IPv6 [TODO]. + + (2) A local address can optionally be bound: + + struct sockaddr_rxrpc srx = { + .srx_family = AF_RXRPC, + .srx_service = 0, /* we're a client */ + .transport_type = SOCK_DGRAM, /* type of transport socket */ + .transport.sin_family = AF_INET, + .transport.sin_port = htons(7000), /* AFS callback */ + .transport.sin_address = 0, /* all local interfaces */ + }; + bind(client, &srx, sizeof(srx)); + + This specifies the local UDP port to be used. If not given, a random + non-privileged port will be used. A UDP port may be shared between + several unrelated RxRPC sockets. Security is handled on a basis of + per-RxRPC virtual connection. + + (3) The security is set: + + const char *key = "AFS:cambridge.redhat.com"; + setsockopt(client, SOL_RXRPC, RXRPC_SECURITY_KEY, key, strlen(key)); + + This issues a request_key() to get the key representing the security + context. The minimum security level can be set: + + unsigned int sec = RXRPC_SECURITY_ENCRYPTED; + setsockopt(client, SOL_RXRPC, RXRPC_MIN_SECURITY_LEVEL, + &sec, sizeof(sec)); + + (4) The server to be contacted can then be specified (alternatively this can + be done through sendmsg): + + struct sockaddr_rxrpc srx = { + .srx_family = AF_RXRPC, + .srx_service = VL_SERVICE_ID, + .transport_type = SOCK_DGRAM, /* type of transport socket */ + .transport.sin_family = AF_INET, + .transport.sin_port = htons(7005), /* AFS volume manager */ + .transport.sin_address = ..., + }; + connect(client, &srx, sizeof(srx)); + + (5) The request data should then be posted to the server socket using a series + of sendmsg() calls, each with the following control message attached: + + RXRPC_USER_CALL_ID - specifies the user ID for this call + + MSG_MORE should be set in msghdr::msg_flags on all but the last part of + the request. Multiple requests may be made simultaneously. + + If a call is intended to go to a destination other then the default + specified through connect(), then msghdr::msg_name should be set on the + first request message of that call. + + (6) The reply data will then be posted to the server socket for recvmsg() to + pick up. MSG_MORE will be flagged by recvmsg() if there's more reply data + for a particular call to be read. MSG_EOR will be set on the terminal + read for a call. + + All data will be delivered with the following control message attached: + + RXRPC_USER_CALL_ID - specifies the user ID for this call + + If an abort or error occurred, this will be returned in the control data + buffer instead, and MSG_EOR will be flagged to indicate the end of that + call. + + +==================== +EXAMPLE SERVER USAGE +==================== + +A server would be set up to accept operations in the following manner: + + (1) An RxRPC socket is created by: + + server = socket(AF_RXRPC, SOCK_DGRAM, PF_INET); + + Where the third parameter indicates the address type of the transport + socket used - usually IPv4. + + (2) Security is set up if desired by giving the socket a keyring with server + secret keys in it: + + keyring = add_key("keyring", "AFSkeys", NULL, 0, + KEY_SPEC_PROCESS_KEYRING); + + const char secret_key[8] = { + 0xa7, 0x83, 0x8a, 0xcb, 0xc7, 0x83, 0xec, 0x94 }; + add_key("rxrpc_s", "52:2", secret_key, 8, keyring); + + setsockopt(server, SOL_RXRPC, RXRPC_SECURITY_KEYRING, "AFSkeys", 7); + + The keyring can be manipulated after it has been given to the socket. This + permits the server to add more keys, replace keys, etc. whilst it is live. + + (2) A local address must then be bound: + + struct sockaddr_rxrpc srx = { + .srx_family = AF_RXRPC, + .srx_service = VL_SERVICE_ID, /* RxRPC service ID */ + .transport_type = SOCK_DGRAM, /* type of transport socket */ + .transport.sin_family = AF_INET, + .transport.sin_port = htons(7000), /* AFS callback */ + .transport.sin_address = 0, /* all local interfaces */ + }; + bind(server, &srx, sizeof(srx)); + + (3) The server is then set to listen out for incoming calls: + + listen(server, 100); + + (4) The kernel notifies the server of pending incoming connections by sending + it a message for each. This is received with recvmsg() on the server + socket. It has no data, and has a single dataless control message + attached: + + RXRPC_NEW_CALL + + The address that can be passed back by recvmsg() at this point should be + ignored since the call for which the message was posted may have gone by + the time it is accepted - in which case the first call still on the queue + will be accepted. + + (5) The server then accepts the new call by issuing a sendmsg() with two + pieces of control data and no actual data: + + RXRPC_ACCEPT - indicate connection acceptance + RXRPC_USER_CALL_ID - specify user ID for this call + + (6) The first request data packet will then be posted to the server socket for + recvmsg() to pick up. At that point, the RxRPC address for the call can + be read from the address fields in the msghdr struct. + + Subsequent request data will be posted to the server socket for recvmsg() + to collect as it arrives. All but the last piece of the request data will + be delivered with MSG_MORE flagged. + + All data will be delivered with the following control message attached: + + RXRPC_USER_CALL_ID - specifies the user ID for this call + + (8) The reply data should then be posted to the server socket using a series + of sendmsg() calls, each with the following control messages attached: + + RXRPC_USER_CALL_ID - specifies the user ID for this call + + MSG_MORE should be set in msghdr::msg_flags on all but the last message + for a particular call. + + (9) The final ACK from the client will be posted for retrieval by recvmsg() + when it is received. It will take the form of a dataless message with two + control messages attached: + + RXRPC_USER_CALL_ID - specifies the user ID for this call + RXRPC_ACK - indicates final ACK (no data) + + MSG_EOR will be flagged to indicate that this is the final message for + this call. + +(10) Up to the point the final packet of reply data is sent, the call can be + aborted by calling sendmsg() with a dataless message with the following + control messages attached: + + RXRPC_USER_CALL_ID - specifies the user ID for this call + RXRPC_ABORT - indicates abort code (4 byte data) + + Any packets waiting in the socket's receive queue will be discarded if + this is issued. + +Note that all the communications for a particular service take place through +the one server socket, using control messages on sendmsg() and recvmsg() to +determine the call affected. + + +========================= +AF_RXRPC KERNEL INTERFACE +========================= + +The AF_RXRPC module also provides an interface for use by in-kernel utilities +such as the AFS filesystem. This permits such a utility to: + + (1) Use different keys directly on individual client calls on one socket + rather than having to open a whole slew of sockets, one for each key it + might want to use. + + (2) Avoid having RxRPC call request_key() at the point of issue of a call or + opening of a socket. Instead the utility is responsible for requesting a + key at the appropriate point. AFS, for instance, would do this during VFS + operations such as open() or unlink(). The key is then handed through + when the call is initiated. + + (3) Request the use of something other than GFP_KERNEL to allocate memory. + + (4) Avoid the overhead of using the recvmsg() call. RxRPC messages can be + intercepted before they get put into the socket Rx queue and the socket + buffers manipulated directly. + +To use the RxRPC facility, a kernel utility must still open an AF_RXRPC socket, +bind an address as appropriate and listen if it's to be a server socket, but +then it passes this to the kernel interface functions. + +The kernel interface functions are as follows: + + (*) Begin a new client call. + + struct rxrpc_call * + rxrpc_kernel_begin_call(struct socket *sock, + struct sockaddr_rxrpc *srx, + struct key *key, + unsigned long user_call_ID, + gfp_t gfp); + + This allocates the infrastructure to make a new RxRPC call and assigns + call and connection numbers. The call will be made on the UDP port that + the socket is bound to. The call will go to the destination address of a + connected client socket unless an alternative is supplied (srx is + non-NULL). + + If a key is supplied then this will be used to secure the call instead of + the key bound to the socket with the RXRPC_SECURITY_KEY sockopt. Calls + secured in this way will still share connections if at all possible. + + The user_call_ID is equivalent to that supplied to sendmsg() in the + control data buffer. It is entirely feasible to use this to point to a + kernel data structure. + + If this function is successful, an opaque reference to the RxRPC call is + returned. The caller now holds a reference on this and it must be + properly ended. + + (*) End a client call. + + void rxrpc_kernel_end_call(struct rxrpc_call *call); + + This is used to end a previously begun call. The user_call_ID is expunged + from AF_RXRPC's knowledge and will not be seen again in association with + the specified call. + + (*) Send data through a call. + + int rxrpc_kernel_send_data(struct rxrpc_call *call, struct msghdr *msg, + size_t len); + + This is used to supply either the request part of a client call or the + reply part of a server call. msg.msg_iovlen and msg.msg_iov specify the + data buffers to be used. msg_iov may not be NULL and must point + exclusively to in-kernel virtual addresses. msg.msg_flags may be given + MSG_MORE if there will be subsequent data sends for this call. + + The msg must not specify a destination address, control data or any flags + other than MSG_MORE. len is the total amount of data to transmit. + + (*) Abort a call. + + void rxrpc_kernel_abort_call(struct rxrpc_call *call, u32 abort_code); + + This is used to abort a call if it's still in an abortable state. The + abort code specified will be placed in the ABORT message sent. + + (*) Intercept received RxRPC messages. + + typedef void (*rxrpc_interceptor_t)(struct sock *sk, + unsigned long user_call_ID, + struct sk_buff *skb); + + void + rxrpc_kernel_intercept_rx_messages(struct socket *sock, + rxrpc_interceptor_t interceptor); + + This installs an interceptor function on the specified AF_RXRPC socket. + All messages that would otherwise wind up in the socket's Rx queue are + then diverted to this function. Note that care must be taken to process + the messages in the right order to maintain DATA message sequentiality. + + The interceptor function itself is provided with the address of the socket + and handling the incoming message, the ID assigned by the kernel utility + to the call and the socket buffer containing the message. + + The skb->mark field indicates the type of message: + + MARK MEANING + =============================== ======================================= + RXRPC_SKB_MARK_DATA Data message + RXRPC_SKB_MARK_FINAL_ACK Final ACK received for an incoming call + RXRPC_SKB_MARK_BUSY Client call rejected as server busy + RXRPC_SKB_MARK_REMOTE_ABORT Call aborted by peer + RXRPC_SKB_MARK_NET_ERROR Network error detected + RXRPC_SKB_MARK_LOCAL_ERROR Local error encountered + RXRPC_SKB_MARK_NEW_CALL New incoming call awaiting acceptance + + The remote abort message can be probed with rxrpc_kernel_get_abort_code(). + The two error messages can be probed with rxrpc_kernel_get_error_number(). + A new call can be accepted with rxrpc_kernel_accept_call(). + + Data messages can have their contents extracted with the usual bunch of + socket buffer manipulation functions. A data message can be determined to + be the last one in a sequence with rxrpc_kernel_is_data_last(). When a + data message has been used up, rxrpc_kernel_data_delivered() should be + called on it.. + + Non-data messages should be handled to rxrpc_kernel_free_skb() to dispose + of. It is possible to get extra refs on all types of message for later + freeing, but this may pin the state of a call until the message is finally + freed. + + (*) Accept an incoming call. + + struct rxrpc_call * + rxrpc_kernel_accept_call(struct socket *sock, + unsigned long user_call_ID); + + This is used to accept an incoming call and to assign it a call ID. This + function is similar to rxrpc_kernel_begin_call() and calls accepted must + be ended in the same way. + + If this function is successful, an opaque reference to the RxRPC call is + returned. The caller now holds a reference on this and it must be + properly ended. + + (*) Reject an incoming call. + + int rxrpc_kernel_reject_call(struct socket *sock); + + This is used to reject the first incoming call on the socket's queue with + a BUSY message. -ENODATA is returned if there were no incoming calls. + Other errors may be returned if the call had been aborted (-ECONNABORTED) + or had timed out (-ETIME). + + (*) Record the delivery of a data message and free it. + + void rxrpc_kernel_data_delivered(struct sk_buff *skb); + + This is used to record a data message as having been delivered and to + update the ACK state for the call. The socket buffer will be freed. + + (*) Free a message. + + void rxrpc_kernel_free_skb(struct sk_buff *skb); + + This is used to free a non-DATA socket buffer intercepted from an AF_RXRPC + socket. + + (*) Determine if a data message is the last one on a call. + + bool rxrpc_kernel_is_data_last(struct sk_buff *skb); + + This is used to determine if a socket buffer holds the last data message + to be received for a call (true will be returned if it does, false + if not). + + The data message will be part of the reply on a client call and the + request on an incoming call. In the latter case there will be more + messages, but in the former case there will not. + + (*) Get the abort code from an abort message. + + u32 rxrpc_kernel_get_abort_code(struct sk_buff *skb); + + This is used to extract the abort code from a remote abort message. + + (*) Get the error number from a local or network error message. + + int rxrpc_kernel_get_error_number(struct sk_buff *skb); + + This is used to extract the error number from a message indicating either + a local error occurred or a network error occurred. + + (*) Allocate a null key for doing anonymous security. + + struct key *rxrpc_get_null_key(const char *keyname); + + This is used to allocate a null RxRPC key that can be used to indicate + anonymous security for a particular domain. diff --git a/Documentation/networking/s2io.txt b/Documentation/networking/s2io.txt new file mode 100644 index 0000000..c3d6b4d --- /dev/null +++ b/Documentation/networking/s2io.txt @@ -0,0 +1,150 @@ +Release notes for Neterion's (Formerly S2io) Xframe I/II PCI-X 10GbE driver. + +Contents +======= +- 1. Introduction +- 2. Identifying the adapter/interface +- 3. Features supported +- 4. Command line parameters +- 5. Performance suggestions +- 6. Available Downloads + + +1. Introduction: +This Linux driver supports Neterion's Xframe I PCI-X 1.0 and +Xframe II PCI-X 2.0 adapters. It supports several features +such as jumbo frames, MSI/MSI-X, checksum offloads, TSO, UFO and so on. +See below for complete list of features. +All features are supported for both IPv4 and IPv6. + +2. Identifying the adapter/interface: +a. Insert the adapter(s) in your system. +b. Build and load driver +# insmod s2io.ko +c. View log messages +# dmesg | tail -40 +You will see messages similar to: +eth3: Neterion Xframe I 10GbE adapter (rev 3), Version 2.0.9.1, Intr type INTA +eth4: Neterion Xframe II 10GbE adapter (rev 2), Version 2.0.9.1, Intr type INTA +eth4: Device is on 64 bit 133MHz PCIX(M1) bus + +The above messages identify the adapter type(Xframe I/II), adapter revision, +driver version, interface name(eth3, eth4), Interrupt type(INTA, MSI, MSI-X). +In case of Xframe II, the PCI/PCI-X bus width and frequency are displayed +as well. + +To associate an interface with a physical adapter use "ethtool -p <ethX>". +The corresponding adapter's LED will blink multiple times. + +3. Features supported: +a. Jumbo frames. Xframe I/II supports MTU upto 9600 bytes, +modifiable using ifconfig command. + +b. Offloads. Supports checksum offload(TCP/UDP/IP) on transmit +and receive, TSO. + +c. Multi-buffer receive mode. Scattering of packet across multiple +buffers. Currently driver supports 2-buffer mode which yields +significant performance improvement on certain platforms(SGI Altix, +IBM xSeries). + +d. MSI/MSI-X. Can be enabled on platforms which support this feature +(IA64, Xeon) resulting in noticeable performance improvement(upto 7% +on certain platforms). + +e. Statistics. Comprehensive MAC-level and software statistics displayed +using "ethtool -S" option. + +f. Multi-FIFO/Ring. Supports up to 8 transmit queues and receive rings, +with multiple steering options. + +4. Command line parameters +a. tx_fifo_num +Number of transmit queues +Valid range: 1-8 +Default: 1 + +b. rx_ring_num +Number of receive rings +Valid range: 1-8 +Default: 1 + +c. tx_fifo_len +Size of each transmit queue +Valid range: Total length of all queues should not exceed 8192 +Default: 4096 + +d. rx_ring_sz +Size of each receive ring(in 4K blocks) +Valid range: Limited by memory on system +Default: 30 + +e. intr_type +Specifies interrupt type. Possible values 0(INTA), 2(MSI-X) +Valid values: 0, 2 +Default: 2 + +5. Performance suggestions +General: +a. Set MTU to maximum(9000 for switch setup, 9600 in back-to-back configuration) +b. Set TCP windows size to optimal value. +For instance, for MTU=1500 a value of 210K has been observed to result in +good performance. +# sysctl -w net.ipv4.tcp_rmem="210000 210000 210000" +# sysctl -w net.ipv4.tcp_wmem="210000 210000 210000" +For MTU=9000, TCP window size of 10 MB is recommended. +# sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000" +# sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000" + +Transmit performance: +a. By default, the driver respects BIOS settings for PCI bus parameters. +However, you may want to experiment with PCI bus parameters +max-split-transactions(MOST) and MMRBC (use setpci command). +A MOST value of 2 has been found optimal for Opterons and 3 for Itanium. +It could be different for your hardware. +Set MMRBC to 4K**. + +For example you can set +For opteron +#setpci -d 17d5:* 62=1d +For Itanium +#setpci -d 17d5:* 62=3d + +For detailed description of the PCI registers, please see Xframe User Guide. + +b. Ensure Transmit Checksum offload is enabled. Use ethtool to set/verify this +parameter. +c. Turn on TSO(using "ethtool -K") +# ethtool -K <ethX> tso on + +Receive performance: +a. By default, the driver respects BIOS settings for PCI bus parameters. +However, you may want to set PCI latency timer to 248. +#setpci -d 17d5:* LATENCY_TIMER=f8 +For detailed description of the PCI registers, please see Xframe User Guide. +b. Use 2-buffer mode. This results in large performance boost on +certain platforms(eg. SGI Altix, IBM xSeries). +c. Ensure Receive Checksum offload is enabled. Use "ethtool -K ethX" command to +set/verify this option. +d. Enable NAPI feature(in kernel configuration Device Drivers ---> Network +device support ---> Ethernet (10000 Mbit) ---> S2IO 10Gbe Xframe NIC) to +bring down CPU utilization. + +** For AMD opteron platforms with 8131 chipset, MMRBC=1 and MOST=1 are +recommended as safe parameters. +For more information, please review the AMD8131 errata at +http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26310.pdf + +6. Available Downloads +Neterion "s2io" driver in Red Hat and Suse 2.6-based distributions is kept up +to date, also the latest "s2io" code (including support for 2.4 kernels) is +available via "Support" link on the Neterion site: http://www.neterion.com. + +For Xframe User Guide (Programming manual), visit ftp site ns1.s2io.com, +user: linuxdocs password: HALdocs + +7. Support +For further support please contact either your 10GbE Xframe NIC vendor (IBM, +HP, SGI etc.) or click on the "Support" link on the Neterion site: +http://www.neterion.com. + diff --git a/Documentation/networking/sctp.txt b/Documentation/networking/sctp.txt new file mode 100644 index 0000000..0c790a7 --- /dev/null +++ b/Documentation/networking/sctp.txt @@ -0,0 +1,38 @@ +Linux Kernel SCTP + +This is the current BETA release of the Linux Kernel SCTP reference +implementation. + +SCTP (Stream Control Transmission Protocol) is a IP based, message oriented, +reliable transport protocol, with congestion control, support for +transparent multi-homing, and multiple ordered streams of messages. +RFC2960 defines the core protocol. The IETF SIGTRAN working group originally +developed the SCTP protocol and later handed the protocol over to the +Transport Area (TSVWG) working group for the continued evolvement of SCTP as a +general purpose transport. + +See the IETF website (http://www.ietf.org) for further documents on SCTP. +See http://www.ietf.org/rfc/rfc2960.txt + +The initial project goal is to create an Linux kernel reference implementation +of SCTP that is RFC 2960 compliant and provides an programming interface +referred to as the UDP-style API of the Sockets Extensions for SCTP, as +proposed in IETF Internet-Drafts. + + +Caveats: + +-lksctp can be built as statically or as a module. However, be aware that +module removal of lksctp is not yet a safe activity. + +-There is tentative support for IPv6, but most work has gone towards +implementation and testing lksctp on IPv4. + + +For more information, please visit the lksctp project website: + http://www.sf.net/projects/lksctp + +Or contact the lksctp developers through the mailing list: + <lksctp-developers@lists.sourceforge.net> + + diff --git a/Documentation/networking/secid.txt b/Documentation/networking/secid.txt new file mode 100644 index 0000000..95ea067 --- /dev/null +++ b/Documentation/networking/secid.txt @@ -0,0 +1,14 @@ +flowi structure: + +The secid member in the flow structure is used in LSMs (e.g. SELinux) to indicate +the label of the flow. This label of the flow is currently used in selecting +matching labeled xfrm(s). + +If this is an outbound flow, the label is derived from the socket, if any, or +the incoming packet this flow is being generated as a response to (e.g. tcp +resets, timewait ack, etc.). It is also conceivable that the label could be +derived from other sources such as process context, device, etc., in special +cases, as may be appropriate. + +If this is an inbound flow, the label is derived from the IPSec security +associations, if any, used by the packet. diff --git a/Documentation/networking/skfp.txt b/Documentation/networking/skfp.txt new file mode 100644 index 0000000..abfddf8 --- /dev/null +++ b/Documentation/networking/skfp.txt @@ -0,0 +1,220 @@ +(C)Copyright 1998-2000 SysKonnect, +=========================================================================== + +skfp.txt created 11-May-2000 + +Readme File for skfp.o v2.06 + + +This file contains +(1) OVERVIEW +(2) SUPPORTED ADAPTERS +(3) GENERAL INFORMATION +(4) INSTALLATION +(5) INCLUSION OF THE ADAPTER IN SYSTEM START +(6) TROUBLESHOOTING +(7) FUNCTION OF THE ADAPTER LEDS +(8) HISTORY + +=========================================================================== + + + +(1) OVERVIEW +============ + +This README explains how to use the driver 'skfp' for Linux with your +network adapter. + +Chapter 2: Contains a list of all network adapters that are supported by + this driver. + +Chapter 3: Gives some general information. + +Chapter 4: Describes common problems and solutions. + +Chapter 5: Shows the changed functionality of the adapter LEDs. + +Chapter 6: History of development. + +*** + + +(2) SUPPORTED ADAPTERS +====================== + +The network driver 'skfp' supports the following network adapters: +SysKonnect adapters: + - SK-5521 (SK-NET FDDI-UP) + - SK-5522 (SK-NET FDDI-UP DAS) + - SK-5541 (SK-NET FDDI-FP) + - SK-5543 (SK-NET FDDI-LP) + - SK-5544 (SK-NET FDDI-LP DAS) + - SK-5821 (SK-NET FDDI-UP64) + - SK-5822 (SK-NET FDDI-UP64 DAS) + - SK-5841 (SK-NET FDDI-FP64) + - SK-5843 (SK-NET FDDI-LP64) + - SK-5844 (SK-NET FDDI-LP64 DAS) +Compaq adapters (not tested): + - Netelligent 100 FDDI DAS Fibre SC + - Netelligent 100 FDDI SAS Fibre SC + - Netelligent 100 FDDI DAS UTP + - Netelligent 100 FDDI SAS UTP + - Netelligent 100 FDDI SAS Fibre MIC +*** + + +(3) GENERAL INFORMATION +======================= + +From v2.01 on, the driver is integrated in the linux kernel sources. +Therefor, the installation is the same as for any other adapter +supported by the kernel. +Refer to the manual of your distribution about the installation +of network adapters. +Makes my life much easier :-) +*** + + +(4) TROUBLESHOOTING +=================== + +If you run into problems during installation, check those items: + +Problem: The FDDI adapter cannot be found by the driver. +Reason: Look in /proc/pci for the following entry: + 'FDDI network controller: SysKonnect SK-FDDI-PCI ...' + If this entry exists, then the FDDI adapter has been + found by the system and should be able to be used. + If this entry does not exist or if the file '/proc/pci' + is not there, then you may have a hardware problem or PCI + support may not be enabled in your kernel. + The adapter can be checked using the diagnostic program + which is available from the SysKonnect web site: + www.syskonnect.de + Some COMPAQ machines have a problem with PCI under + Linux. This is described in the 'PCI howto' document + (included in some distributions or available from the + www, e.g. at 'www.linux.org') and no workaround is available. + +Problem: You want to use your computer as a router between + multiple IP subnetworks (using multiple adapters), but + you cannot reach computers in other subnetworks. +Reason: Either the router's kernel is not configured for IP + forwarding or there is a problem with the routing table + and gateway configuration in at least one of the + computers. + +If your problem is not listed here, please contact our +technical support for help. +You can send email to: + linux@syskonnect.de +When contacting our technical support, +please ensure that the following information is available: +- System Manufacturer and Model +- Boards in your system +- Distribution +- Kernel version + +*** + + +(5) FUNCTION OF THE ADAPTER LEDS +================================ + + The functionality of the LED's on the FDDI network adapters was + changed in SMT version v2.82. With this new SMT version, the yellow + LED works as a ring operational indicator. An active yellow LED + indicates that the ring is down. The green LED on the adapter now + works as a link indicator where an active GREEN LED indicates that + the respective port has a physical connection. + + With versions of SMT prior to v2.82 a ring up was indicated if the + yellow LED was off while the green LED(s) showed the connection + status of the adapter. During a ring down the green LED was off and + the yellow LED was on. + + All implementations indicate that a driver is not loaded if + all LEDs are off. + +*** + + +(6) HISTORY +=========== + +v2.06 (20000511) (In-Kernel version) + New features: + - 64 bit support + - new pci dma interface + - in kernel 2.3.99 + +v2.05 (20000217) (In-Kernel version) + New features: + - Changes for 2.3.45 kernel + +v2.04 (20000207) (Standalone version) + New features: + - Added rx/tx byte counter + +v2.03 (20000111) (Standalone version) + Problems fixed: + - Fixed printk statements from v2.02 + +v2.02 (991215) (Standalone version) + Problems fixed: + - Removed unnecessary output + - Fixed path for "printver.sh" in makefile + +v2.01 (991122) (In-Kernel version) + New features: + - Integration in Linux kernel sources + - Support for memory mapped I/O. + +v2.00 (991112) + New features: + - Full source released under GPL + +v1.05 (991023) + Problems fixed: + - Compilation with kernel version 2.2.13 failed + +v1.04 (990427) + Changes: + - New SMT module included, changing LED functionality + Problems fixed: + - Synchronization on SMP machines was buggy + +v1.03 (990325) + Problems fixed: + - Interrupt routing on SMP machines could be incorrect + +v1.02 (990310) + New features: + - Support for kernel versions 2.2.x added + - Kernel patch instead of private duplicate of kernel functions + +v1.01 (980812) + Problems fixed: + Connection hangup with telnet + Slow telnet connection + +v1.00 beta 01 (980507) + New features: + None. + Problems fixed: + None. + Known limitations: + - tar archive instead of standard package format (rpm). + - FDDI statistic is empty. + - not tested with 2.1.xx kernels + - integration in kernel not tested + - not tested simultaneously with FDDI adapters from other vendors. + - only X86 processors supported. + - SBA (Synchronous Bandwidth Allocator) parameters can + not be configured. + - does not work on some COMPAQ machines. See the PCI howto + document for details about this problem. + - data corruption with kernel versions below 2.0.33. + +*** End of information file *** diff --git a/Documentation/networking/smc9.txt b/Documentation/networking/smc9.txt new file mode 100644 index 0000000..d1e1507 --- /dev/null +++ b/Documentation/networking/smc9.txt @@ -0,0 +1,42 @@ + +SMC 9xxxx Driver +Revision 0.12 +3/5/96 +Copyright 1996 Erik Stahlman +Released under terms of the GNU General Public License. + +This file contains the instructions and caveats for my SMC9xxx driver. You +should not be using the driver without reading this file. + +Things to note about installation: + + 1. The driver should work on all kernels from 1.2.13 until 1.3.71. + (A kernel patch is supplied for 1.3.71 ) + + 2. If you include this into the kernel, you might need to change some + options, such as for forcing IRQ. + + + 3. To compile as a module, run 'make' . + Make will give you the appropriate options for various kernel support. + + 4. Loading the driver as a module : + + use: insmod smc9194.o + optional parameters: + io=xxxx : your base address + irq=xx : your irq + ifport=x : 0 for whatever is default + 1 for twisted pair + 2 for AUI ( or BNC on some cards ) + +How to obtain the latest version? + +FTP: + ftp://fenris.campus.vt.edu/smc9/smc9-12.tar.gz + ftp://sfbox.vt.edu/filebox/F/fenris/smc9/smc9-12.tar.gz + + +Contacting me: + erik@mail.vt.edu + diff --git a/Documentation/networking/smctr.txt b/Documentation/networking/smctr.txt new file mode 100644 index 0000000..9af25b8 --- /dev/null +++ b/Documentation/networking/smctr.txt @@ -0,0 +1,66 @@ +Text File for the SMC TokenCard TokenRing Linux driver (smctr.c). + By Jay Schulist <jschlst@samba.org> + +The Linux SMC Token Ring driver works with the SMC TokenCard Elite (8115T) +ISA and SMC TokenCard Elite/A (8115T/A) MCA adapters. + +Latest information on this driver can be obtained on the Linux-SNA WWW site. +Please point your browser to: http://www.linux-sna.org + +This driver is rather simple to use. Select Y to Token Ring adapter support +in the kernel configuration. A choice for SMC Token Ring adapters will +appear. This drives supports all SMC ISA/MCA adapters. Choose this +option. I personally recommend compiling the driver as a module (M), but if you +you would like to compile it statically answer Y instead. + +This driver supports multiple adapters without the need to load multiple copies +of the driver. You should be able to load up to 7 adapters without any kernel +modifications, if you are in need of more please contact the maintainer of this +driver. + +Load the driver either by lilo/loadlin or as a module. When a module using the +following command will suffice for most: + +# modprobe smctr +smctr.c: v1.00 12/6/99 by jschlst@samba.org +tr0: SMC TokenCard 8115T at Io 0x300, Irq 10, Rom 0xd8000, Ram 0xcc000. + +Now just setup the device via ifconfig and set and routes you may have. After +this you are ready to start sending some tokens. + +Errata: +1). For anyone wondering where to pick up the SMC adapters please browse + to http://www.smc.com + +2). If you are the first/only Token Ring Client on a Token Ring LAN, please + specify the ringspeed with the ringspeed=[4/16] module option. If no + ringspeed is specified the driver will attempt to autodetect the ring + speed and/or if the adapter is the first/only station on the ring take + the appropriate actions. + + NOTE: Default ring speed is 16MB UTP. + +3). PnP support for this adapter sucks. I recommend hard setting the + IO/MEM/IRQ by the jumpers on the adapter. If this is not possible + load the module with the following io=[ioaddr] mem=[mem_addr] + irq=[irq_num]. + + The following IRQ, IO, and MEM settings are supported. + + IO ports: + 0x200, 0x220, 0x240, 0x260, 0x280, 0x2A0, 0x2C0, 0x2E0, 0x300, + 0x320, 0x340, 0x360, 0x380. + + IRQs: + 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15 + + Memory addresses: + 0xA0000, 0xA4000, 0xA8000, 0xAC000, 0xB0000, 0xB4000, + 0xB8000, 0xBC000, 0xC0000, 0xC4000, 0xC8000, 0xCC000, + 0xD0000, 0xD4000, 0xD8000, 0xDC000, 0xE0000, 0xE4000, + 0xE8000, 0xEC000, 0xF0000, 0xF4000, 0xF8000, 0xFC000 + +This driver is under the GNU General Public License. Its Firmware image is +included as an initialized C-array and is licensed by SMC to the Linux +users of this driver. However no warranty about its fitness is expressed or +implied by SMC. diff --git a/Documentation/networking/spider_net.txt b/Documentation/networking/spider_net.txt new file mode 100644 index 0000000..4b4adb8 --- /dev/null +++ b/Documentation/networking/spider_net.txt @@ -0,0 +1,204 @@ + + The Spidernet Device Driver + =========================== + +Written by Linas Vepstas <linas@austin.ibm.com> + +Version of 7 June 2007 + +Abstract +======== +This document sketches the structure of portions of the spidernet +device driver in the Linux kernel tree. The spidernet is a gigabit +ethernet device built into the Toshiba southbridge commonly used +in the SONY Playstation 3 and the IBM QS20 Cell blade. + +The Structure of the RX Ring. +============================= +The receive (RX) ring is a circular linked list of RX descriptors, +together with three pointers into the ring that are used to manage its +contents. + +The elements of the ring are called "descriptors" or "descrs"; they +describe the received data. This includes a pointer to a buffer +containing the received data, the buffer size, and various status bits. + +There are three primary states that a descriptor can be in: "empty", +"full" and "not-in-use". An "empty" or "ready" descriptor is ready +to receive data from the hardware. A "full" descriptor has data in it, +and is waiting to be emptied and processed by the OS. A "not-in-use" +descriptor is neither empty or full; it is simply not ready. It may +not even have a data buffer in it, or is otherwise unusable. + +During normal operation, on device startup, the OS (specifically, the +spidernet device driver) allocates a set of RX descriptors and RX +buffers. These are all marked "empty", ready to receive data. This +ring is handed off to the hardware, which sequentially fills in the +buffers, and marks them "full". The OS follows up, taking the full +buffers, processing them, and re-marking them empty. + +This filling and emptying is managed by three pointers, the "head" +and "tail" pointers, managed by the OS, and a hardware current +descriptor pointer (GDACTDPA). The GDACTDPA points at the descr +currently being filled. When this descr is filled, the hardware +marks it full, and advances the GDACTDPA by one. Thus, when there is +flowing RX traffic, every descr behind it should be marked "full", +and everything in front of it should be "empty". If the hardware +discovers that the current descr is not empty, it will signal an +interrupt, and halt processing. + +The tail pointer tails or trails the hardware pointer. When the +hardware is ahead, the tail pointer will be pointing at a "full" +descr. The OS will process this descr, and then mark it "not-in-use", +and advance the tail pointer. Thus, when there is flowing RX traffic, +all of the descrs in front of the tail pointer should be "full", and +all of those behind it should be "not-in-use". When RX traffic is not +flowing, then the tail pointer can catch up to the hardware pointer. +The OS will then note that the current tail is "empty", and halt +processing. + +The head pointer (somewhat mis-named) follows after the tail pointer. +When traffic is flowing, then the head pointer will be pointing at +a "not-in-use" descr. The OS will perform various housekeeping duties +on this descr. This includes allocating a new data buffer and +dma-mapping it so as to make it visible to the hardware. The OS will +then mark the descr as "empty", ready to receive data. Thus, when there +is flowing RX traffic, everything in front of the head pointer should +be "not-in-use", and everything behind it should be "empty". If no +RX traffic is flowing, then the head pointer can catch up to the tail +pointer, at which point the OS will notice that the head descr is +"empty", and it will halt processing. + +Thus, in an idle system, the GDACTDPA, tail and head pointers will +all be pointing at the same descr, which should be "empty". All of the +other descrs in the ring should be "empty" as well. + +The show_rx_chain() routine will print out the the locations of the +GDACTDPA, tail and head pointers. It will also summarize the contents +of the ring, starting at the tail pointer, and listing the status +of the descrs that follow. + +A typical example of the output, for a nearly idle system, might be + +net eth1: Total number of descrs=256 +net eth1: Chain tail located at descr=20 +net eth1: Chain head is at 20 +net eth1: HW curr desc (GDACTDPA) is at 21 +net eth1: Have 1 descrs with stat=x40800101 +net eth1: HW next desc (GDACNEXTDA) is at 22 +net eth1: Last 255 descrs with stat=xa0800000 + +In the above, the hardware has filled in one descr, number 20. Both +head and tail are pointing at 20, because it has not yet been emptied. +Meanwhile, hw is pointing at 21, which is free. + +The "Have nnn decrs" refers to the descr starting at the tail: in this +case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers +to all of the rest of the descrs, from the last status change. The "nnn" +is a count of how many descrs have exactly the same status. + +The status x4... corresponds to "full" and status xa... corresponds +to "empty". The actual value printed is RXCOMST_A. + +In the device driver source code, a different set of names are +used for these same concepts, so that + +"empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa +"full" == SPIDER_NET_DESCR_FRAME_END == 0x4 +"not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf + + +The RX RAM full bug/feature +=========================== + +As long as the OS can empty out the RX buffers at a rate faster than +the hardware can fill them, there is no problem. If, for some reason, +the OS fails to empty the RX ring fast enough, the hardware GDACTDPA +pointer will catch up to the head, notice the not-empty condition, +ad stop. However, RX packets may still continue arriving on the wire. +The spidernet chip can save some limited number of these in local RAM. +When this local ram fills up, the spider chip will issue an interrupt +indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit +will be set in GHIINT1STS). When the RX ram full condition occurs, +a certain bug/feature is triggered that has to be specially handled. +This section describes the special handling for this condition. + +When the OS finally has a chance to run, it will empty out the RX ring. +In particular, it will clear the descriptor on which the hardware had +stopped. However, once the hardware has decided that a certain +descriptor is invalid, it will not restart at that descriptor; instead +it will restart at the next descr. This potentially will lead to a +deadlock condition, as the tail pointer will be pointing at this descr, +which, from the OS point of view, is empty; the OS will be waiting for +this descr to be filled. However, the hardware has skipped this descr, +and is filling the next descrs. Since the OS doesn't see this, there +is a potential deadlock, with the OS waiting for one descr to fill, +while the hardware is waiting for a different set of descrs to become +empty. + +A call to show_rx_chain() at this point indicates the nature of the +problem. A typical print when the network is hung shows the following: + +net eth1: Spider RX RAM full, incoming packets might be discarded! +net eth1: Total number of descrs=256 +net eth1: Chain tail located at descr=255 +net eth1: Chain head is at 255 +net eth1: HW curr desc (GDACTDPA) is at 0 +net eth1: Have 1 descrs with stat=xa0800000 +net eth1: HW next desc (GDACNEXTDA) is at 1 +net eth1: Have 127 descrs with stat=x40800101 +net eth1: Have 1 descrs with stat=x40800001 +net eth1: Have 126 descrs with stat=x40800101 +net eth1: Last 1 descrs with stat=xa0800000 + +Both the tail and head pointers are pointing at descr 255, which is +marked xa... which is "empty". Thus, from the OS point of view, there +is nothing to be done. In particular, there is the implicit assumption +that everything in front of the "empty" descr must surely also be empty, +as explained in the last section. The OS is waiting for descr 255 to +become non-empty, which, in this case, will never happen. + +The HW pointer is at descr 0. This descr is marked 0x4.. or "full". +Since its already full, the hardware can do nothing more, and thus has +halted processing. Notice that descrs 0 through 254 are all marked +"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is +descr 254, since tail was at 255.) Thus, the system is deadlocked, +and there can be no forward progress; the OS thinks there's nothing +to do, and the hardware has nowhere to put incoming data. + +This bug/feature is worked around with the spider_net_resync_head_ptr() +routine. When the driver receives RX interrupts, but an examination +of the RX chain seems to show it is empty, then it is probable that +the hardware has skipped a descr or two (sometimes dozens under heavy +network conditions). The spider_net_resync_head_ptr() subroutine will +search the ring for the next full descr, and the driver will resume +operations there. Since this will leave "holes" in the ring, there +is also a spider_net_resync_tail_ptr() that will skip over such holes. + +As of this writing, the spider_net_resync() strategy seems to work very +well, even under heavy network loads. + + +The TX ring +=========== +The TX ring uses a low-watermark interrupt scheme to make sure that +the TX queue is appropriately serviced for large packet sizes. + +For packet sizes greater than about 1KBytes, the kernel can fill +the TX ring quicker than the device can drain it. Once the ring +is full, the netdev is stopped. When there is room in the ring, +the netdev needs to be reawakened, so that more TX packets are placed +in the ring. The hardware can empty the ring about four times per jiffy, +so its not appropriate to wait for the poll routine to refill, since +the poll routine runs only once per jiffy. The low-watermark mechanism +marks a descr about 1/4th of the way from the bottom of the queue, so +that an interrupt is generated when the descr is processed. This +interrupt wakes up the netdev, which can then refill the queue. +For large packets, this mechanism generates a relatively small number +of interrupts, about 1K/sec. For smaller packets, this will drop to zero +interrupts, as the hardware can empty the queue faster than the kernel +can fill it. + + + ======= END OF DOCUMENT ======== + diff --git a/Documentation/networking/tc-actions-env-rules.txt b/Documentation/networking/tc-actions-env-rules.txt new file mode 100644 index 0000000..dcadf6f --- /dev/null +++ b/Documentation/networking/tc-actions-env-rules.txt @@ -0,0 +1,30 @@ + +The "enviromental" rules for authors of any new tc actions are: + +1) If you stealeth or borroweth any packet thou shalt be branching +from the righteous path and thou shalt cloneth. + +For example if your action queues a packet to be processed later, +or intentionally branches by redirecting a packet, then you need to +clone the packet. + +There are certain fields in the skb tc_verd that need to be reset so we +avoid loops, etc. A few are generic enough that skb_act_clone() +resets them for you, so invoke skb_act_clone() rather than skb_clone(). + +2) If you munge any packet thou shalt call pskb_expand_head in the case +someone else is referencing the skb. After that you "own" the skb. +You must also tell us if it is ok to munge the packet (TC_OK2MUNGE), +this way any action downstream can stomp on the packet. + +3) Dropping packets you don't own is a no-no. You simply return +TC_ACT_SHOT to the caller and they will drop it. + +The "enviromental" rules for callers of actions (qdiscs etc) are: + +*) Thou art responsible for freeing anything returned as being +TC_ACT_SHOT/STOLEN/QUEUED. If none of TC_ACT_SHOT/STOLEN/QUEUED is +returned, then all is great and you don't need to do anything. + +Post on netdev if something is unclear. + diff --git a/Documentation/networking/tcp.txt b/Documentation/networking/tcp.txt new file mode 100644 index 0000000..7d11bb5 --- /dev/null +++ b/Documentation/networking/tcp.txt @@ -0,0 +1,106 @@ +TCP protocol +============ + +Last updated: 9 February 2008 + +Contents +======== + +- Congestion control +- How the new TCP output machine [nyi] works + +Congestion control +================== + +The following variables are used in the tcp_sock for congestion control: +snd_cwnd The size of the congestion window +snd_ssthresh Slow start threshold. We are in slow start if + snd_cwnd is less than this. +snd_cwnd_cnt A counter used to slow down the rate of increase + once we exceed slow start threshold. +snd_cwnd_clamp This is the maximum size that snd_cwnd can grow to. +snd_cwnd_stamp Timestamp for when congestion window last validated. +snd_cwnd_used Used as a highwater mark for how much of the + congestion window is in use. It is used to adjust + snd_cwnd down when the link is limited by the + application rather than the network. + +As of 2.6.13, Linux supports pluggable congestion control algorithms. +A congestion control mechanism can be registered through functions in +tcp_cong.c. The functions used by the congestion control mechanism are +registered via passing a tcp_congestion_ops struct to +tcp_register_congestion_control. As a minimum name, ssthresh, +cong_avoid, min_cwnd must be valid. + +Private data for a congestion control mechanism is stored in tp->ca_priv. +tcp_ca(tp) returns a pointer to this space. This is preallocated space - it +is important to check the size of your private data will fit this space, or +alternatively space could be allocated elsewhere and a pointer to it could +be stored here. + +There are three kinds of congestion control algorithms currently: The +simplest ones are derived from TCP reno (highspeed, scalable) and just +provide an alternative the congestion window calculation. More complex +ones like BIC try to look at other events to provide better +heuristics. There are also round trip time based algorithms like +Vegas and Westwood+. + +Good TCP congestion control is a complex problem because the algorithm +needs to maintain fairness and performance. Please review current +research and RFC's before developing new modules. + +The method that is used to determine which congestion control mechanism is +determined by the setting of the sysctl net.ipv4.tcp_congestion_control. +The default congestion control will be the last one registered (LIFO); +so if you built everything as modules, the default will be reno. If you +build with the defaults from Kconfig, then CUBIC will be builtin (not a +module) and it will end up the default. + +If you really want a particular default value then you will need +to set it with the sysctl. If you use a sysctl, the module will be autoloaded +if needed and you will get the expected protocol. If you ask for an +unknown congestion method, then the sysctl attempt will fail. + +If you remove a tcp congestion control module, then you will get the next +available one. Since reno cannot be built as a module, and cannot be +deleted, it will always be available. + +How the new TCP output machine [nyi] works. +=========================================== + +Data is kept on a single queue. The skb->users flag tells us if the frame is +one that has been queued already. To add a frame we throw it on the end. Ack +walks down the list from the start. + +We keep a set of control flags + + + sk->tcp_pend_event + + TCP_PEND_ACK Ack needed + TCP_ACK_NOW Needed now + TCP_WINDOW Window update check + TCP_WINZERO Zero probing + + + sk->transmit_queue The transmission frame begin + sk->transmit_new First new frame pointer + sk->transmit_end Where to add frames + + sk->tcp_last_tx_ack Last ack seen + sk->tcp_dup_ack Dup ack count for fast retransmit + + +Frames are queued for output by tcp_write. We do our best to send the frames +off immediately if possible, but otherwise queue and compute the body +checksum in the copy. + +When a write is done we try to clear any pending events and piggy back them. +If the window is full we queue full sized frames. On the first timeout in +zero window we split this. + +On a timer we walk the retransmit list to send any retransmits, update the +backoff timers etc. A change of route table stamp causes a change of header +and recompute. We add any new tcp level headers and refinish the checksum +before sending. + diff --git a/Documentation/networking/tlan.txt b/Documentation/networking/tlan.txt new file mode 100644 index 0000000..7e6aa5b --- /dev/null +++ b/Documentation/networking/tlan.txt @@ -0,0 +1,117 @@ +(C) 1997-1998 Caldera, Inc. +(C) 1998 James Banks +(C) 1999-2001 Torben Mathiasen <tmm@image.dk, torben.mathiasen@compaq.com> + +For driver information/updates visit http://opensource.compaq.com + + +TLAN driver for Linux, version 1.14a +README + + +I. Supported Devices. + + Only PCI devices will work with this driver. + + Supported: + Vendor ID Device ID Name + 0e11 ae32 Compaq Netelligent 10/100 TX PCI UTP + 0e11 ae34 Compaq Netelligent 10 T PCI UTP + 0e11 ae35 Compaq Integrated NetFlex 3/P + 0e11 ae40 Compaq Netelligent Dual 10/100 TX PCI UTP + 0e11 ae43 Compaq Netelligent Integrated 10/100 TX UTP + 0e11 b011 Compaq Netelligent 10/100 TX Embedded UTP + 0e11 b012 Compaq Netelligent 10 T/2 PCI UTP/Coax + 0e11 b030 Compaq Netelligent 10/100 TX UTP + 0e11 f130 Compaq NetFlex 3/P + 0e11 f150 Compaq NetFlex 3/P + 108d 0012 Olicom OC-2325 + 108d 0013 Olicom OC-2183 + 108d 0014 Olicom OC-2326 + + + Caveats: + + I am not sure if 100BaseTX daughterboards (for those cards which + support such things) will work. I haven't had any solid evidence + either way. + + However, if a card supports 100BaseTx without requiring an add + on daughterboard, it should work with 100BaseTx. + + The "Netelligent 10 T/2 PCI UTP/Coax" (b012) device is untested, + but I do not expect any problems. + + +II. Driver Options + 1. You can append debug=x to the end of the insmod line to get + debug messages, where x is a bit field where the bits mean + the following: + + 0x01 Turn on general debugging messages. + 0x02 Turn on receive debugging messages. + 0x04 Turn on transmit debugging messages. + 0x08 Turn on list debugging messages. + + 2. You can append aui=1 to the end of the insmod line to cause + the adapter to use the AUI interface instead of the 10 Base T + interface. This is also what to do if you want to use the BNC + connector on a TLAN based device. (Setting this option on a + device that does not have an AUI/BNC connector will probably + cause it to not function correctly.) + + 3. You can set duplex=1 to force half duplex, and duplex=2 to + force full duplex. + + 4. You can set speed=10 to force 10Mbs operation, and speed=100 + to force 100Mbs operation. (I'm not sure what will happen + if a card which only supports 10Mbs is forced into 100Mbs + mode.) + + 5. You have to use speed=X duplex=Y together now. If you just + do "insmod tlan.o speed=100" the driver will do Auto-Neg. + To force a 10Mbps Half-Duplex link do "insmod tlan.o speed=10 + duplex=1". + + 6. If the driver is built into the kernel, you can use the 3rd + and 4th parameters to set aui and debug respectively. For + example: + + ether=0,0,0x1,0x7,eth0 + + This sets aui to 0x1 and debug to 0x7, assuming eth0 is a + supported TLAN device. + + The bits in the third byte are assigned as follows: + + 0x01 = aui + 0x02 = use half duplex + 0x04 = use full duplex + 0x08 = use 10BaseT + 0x10 = use 100BaseTx + + You also need to set both speed and duplex settings when forcing + speeds with kernel-parameters. + ether=0,0,0x12,0,eth0 will force link to 100Mbps Half-Duplex. + + 7. If you have more than one tlan adapter in your system, you can + use the above options on a per adapter basis. To force a 100Mbit/HD + link with your eth1 adapter use: + + insmod tlan speed=0,100 duplex=0,1 + + Now eth0 will use auto-neg and eth1 will be forced to 100Mbit/HD. + Note that the tlan driver supports a maximum of 8 adapters. + + +III. Things to try if you have problems. + 1. Make sure your card's PCI id is among those listed in + section I, above. + 2. Make sure routing is correct. + 3. Try forcing different speed/duplex settings + + +There is also a tlan mailing list which you can join by sending "subscribe tlan" +in the body of an email to majordomo@vuser.vu.union.edu. +There is also a tlan website at http://opensource.compaq.com + diff --git a/Documentation/networking/tms380tr.txt b/Documentation/networking/tms380tr.txt new file mode 100644 index 0000000..1f73e13 --- /dev/null +++ b/Documentation/networking/tms380tr.txt @@ -0,0 +1,147 @@ +Text file for the Linux SysKonnect Token Ring ISA/PCI Adapter Driver. + Text file by: Jay Schulist <jschlst@samba.org> + +The Linux SysKonnect Token Ring driver works with the SysKonnect TR4/16(+) ISA, +SysKonnect TR4/16(+) PCI, SysKonnect TR4/16 PCI, and older revisions of the +SK NET TR4/16 ISA card. + +Latest information on this driver can be obtained on the Linux-SNA WWW site. +Please point your browser to: +http://www.linux-sna.org + +Many thanks to Christoph Goos for his excellent work on this driver and +SysKonnect for donating the adapters to Linux-SNA for the testing and +maintenance of this device driver. + +Important information to be noted: +1. Adapters can be slow to open (~20 secs) and close (~5 secs), please be + patient. +2. This driver works very well when autoprobing for adapters. Why even + think about those nasty io/int/dma settings of modprobe when the driver + will do it all for you! + +This driver is rather simple to use. Select Y to Token Ring adapter support +in the kernel configuration. A choice for SysKonnect Token Ring adapters will +appear. This drives supports all SysKonnect ISA and PCI adapters. Choose this +option. I personally recommend compiling the driver as a module (M), but if you +you would like to compile it statically answer Y instead. + +This driver supports multiple adapters without the need to load multiple copies +of the driver. You should be able to load up to 7 adapters without any kernel +modifications, if you are in need of more please contact the maintainer of this +driver. + +Load the driver either by lilo/loadlin or as a module. When a module using the +following command will suffice for most: + +# modprobe sktr + +This will produce output similar to the following: (Output is user specific) + +sktr.c: v1.01 08/29/97 by Christoph Goos +tr0: SK NET TR 4/16 PCI found at 0x6100, using IRQ 17. +tr1: SK NET TR 4/16 PCI found at 0x6200, using IRQ 16. +tr2: SK NET TR 4/16 ISA found at 0xa20, using IRQ 10 and DMA 5. + +Now just setup the device via ifconfig and set and routes you may have. After +this you are ready to start sending some tokens. + +Errata: +For anyone wondering where to pick up the SysKonnect adapters please browse +to http://www.syskonnect.com + +This driver is under the GNU General Public License. Its Firmware image is +included as an initialized C-array and is licensed by SysKonnect to the Linux +users of this driver. However no warranty about its fitness is expressed or +implied by SysKonnect. + +Below find attached the setting for the SK NET TR 4/16 ISA adapters +------------------------------------------------------------------- + + *************************** + *** C O N T E N T S *** + *************************** + + 1) Location of DIP-Switch W1 + 2) Default settings + 3) DIP-Switch W1 description + + + ============================================================== + CHAPTER 1 LOCATION OF DIP-SWITCH + ============================================================== + +UÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ +þUÄÄÄÄÄÄ¿ UÄÄÄÄÄ¿ UÄÄÄ¿ þ +þAÄÄÄÄÄÄU W1 AÄÄÄÄÄU UÄÄÄÄ¿ þ þ þ +þUÄÄÄÄÄÄ¿ þ þ þ þ UÄÄÅ¿ +þAÄÄÄÄÄÄU UÄÄÄÄÄÄÄÄÄÄÄ¿ AÄÄÄÄU þ þ þ þþ +þUÄÄÄÄÄÄ¿ þ þ UÄÄÄ¿ AÄÄÄU AÄÄÅU +þAÄÄÄÄÄÄU þ TMS380C26 þ þ þ þ +þUÄÄÄÄÄÄ¿ þ þ AÄÄÄU AÄ¿ +þAÄÄÄÄÄÄU þ þ þ þ +þ AÄÄÄÄÄÄÄÄÄÄÄU þ þ +þ þ þ +þ AÄU +þ þ +þ þ +þ þ +þ þ +AÄÄÄÄÄÄÄÄÄÄÄÄAÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄAÄÄAÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄAÄÄÄÄÄÄÄÄÄU + AÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄU AÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄU + + ============================================================== + CHAPTER 2 DEFAULT SETTINGS + ============================================================== + + W1 1 2 3 4 5 6 7 8 + +------------------------------+ + | ON X | + | OFF X X X X X X X | + +------------------------------+ + + W1.1 = ON Adapter drives address lines SA17..19 + W1.2 - 1.5 = OFF BootROM disabled + W1.6 - 1.8 = OFF I/O address 0A20h + + ============================================================== + CHAPTER 3 DIP SWITCH W1 DESCRIPTION + ============================================================== + + UÄÄÄAÄÄÄAÄÄÄAÄÄÄAÄÄÄAÄÄÄAÄÄÄAÄÄÄ¿ ON + þ 1 þ 2 þ 3 þ 4 þ 5 þ 6 þ 7 þ 8 þ + AÄÄÄAÄÄÄAÄÄÄAÄÄÄAÄÄÄAÄÄÄAÄÄÄAÄÄÄU OFF + |AD | BootROM Addr. | I/O | + +-+-+-------+-------+-----+-----+ + | | | + | | +------ 6 7 8 + | | ON ON ON 1900h + | | ON ON OFF 0900h + | | ON OFF ON 1980h + | | ON OFF OFF 0980h + | | OFF ON ON 1b20h + | | OFF ON OFF 0b20h + | | OFF OFF ON 1a20h + | | OFF OFF OFF 0a20h (+) + | | + | | + | +-------- 2 3 4 5 + | OFF x x x disabled (+) + | ON ON ON ON C0000 + | ON ON ON OFF C4000 + | ON ON OFF ON C8000 + | ON ON OFF OFF CC000 + | ON OFF ON ON D0000 + | ON OFF ON OFF D4000 + | ON OFF OFF ON D8000 + | ON OFF OFF OFF DC000 + | + | + +----- 1 + OFF adapter does NOT drive SA<17..19> + ON adapter drives SA<17..19> (+) + + + (+) means default setting + + ******************************** diff --git a/Documentation/networking/tproxy.txt b/Documentation/networking/tproxy.txt new file mode 100644 index 0000000..7b5996d --- /dev/null +++ b/Documentation/networking/tproxy.txt @@ -0,0 +1,85 @@ +Transparent proxy support +========================= + +This feature adds Linux 2.2-like transparent proxy support to current kernels. +To use it, enable NETFILTER_TPROXY, the socket match and the TPROXY target in +your kernel config. You will need policy routing too, so be sure to enable that +as well. + + +1. Making non-local sockets work +================================ + +The idea is that you identify packets with destination address matching a local +socket on your box, set the packet mark to a certain value, and then match on that +value using policy routing to have those packets delivered locally: + +# iptables -t mangle -N DIVERT +# iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT +# iptables -t mangle -A DIVERT -j MARK --set-mark 1 +# iptables -t mangle -A DIVERT -j ACCEPT + +# ip rule add fwmark 1 lookup 100 +# ip route add local 0.0.0.0/0 dev lo table 100 + +Because of certain restrictions in the IPv4 routing output code you'll have to +modify your application to allow it to send datagrams _from_ non-local IP +addresses. All you have to do is enable the (SOL_IP, IP_TRANSPARENT) socket +option before calling bind: + +fd = socket(AF_INET, SOCK_STREAM, 0); +/* - 8< -*/ +int value = 1; +setsockopt(fd, SOL_IP, IP_TRANSPARENT, &value, sizeof(value)); +/* - 8< -*/ +name.sin_family = AF_INET; +name.sin_port = htons(0xCAFE); +name.sin_addr.s_addr = htonl(0xDEADBEEF); +bind(fd, &name, sizeof(name)); + +A trivial patch for netcat is available here: +http://people.netfilter.org/hidden/tproxy/netcat-ip_transparent-support.patch + + +2. Redirecting traffic +====================== + +Transparent proxying often involves "intercepting" traffic on a router. This is +usually done with the iptables REDIRECT target; however, there are serious +limitations of that method. One of the major issues is that it actually +modifies the packets to change the destination address -- which might not be +acceptable in certain situations. (Think of proxying UDP for example: you won't +be able to find out the original destination address. Even in case of TCP +getting the original destination address is racy.) + +The 'TPROXY' target provides similar functionality without relying on NAT. Simply +add rules like this to the iptables ruleset above: + +# iptables -t mangle -A PREROUTING -p tcp --dport 80 -j TPROXY \ + --tproxy-mark 0x1/0x1 --on-port 50080 + +Note that for this to work you'll have to modify the proxy to enable (SOL_IP, +IP_TRANSPARENT) for the listening socket. + + +3. Iptables extensions +====================== + +To use tproxy you'll need to have the 'socket' and 'TPROXY' modules +compiled for iptables. A patched version of iptables is available +here: http://git.balabit.hu/?p=bazsi/iptables-tproxy.git + + +4. Application support +====================== + +4.1. Squid +---------- + +Squid 3.HEAD has support built-in. To use it, pass +'--enable-linux-netfilter' to configure and set the 'tproxy' option on +the HTTP listener you redirect traffic to with the TPROXY iptables +target. + +For more information please consult the following page on the Squid +wiki: http://wiki.squid-cache.org/Features/Tproxy4 diff --git a/Documentation/networking/tuntap.txt b/Documentation/networking/tuntap.txt new file mode 100644 index 0000000..839cbb7 --- /dev/null +++ b/Documentation/networking/tuntap.txt @@ -0,0 +1,150 @@ +Universal TUN/TAP device driver. +Copyright (C) 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com> + + Linux, Solaris drivers + Copyright (C) 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com> + + FreeBSD TAP driver + Copyright (c) 1999-2000 Maksim Yevmenkin <m_evmenkin@yahoo.com> + + Revision of this document 2002 by Florian Thiel <florian.thiel@gmx.net> + +1. Description + TUN/TAP provides packet reception and transmission for user space programs. + It can be seen as a simple Point-to-Point or Ethernet device, which, + instead of receiving packets from physical media, receives them from + user space program and instead of sending packets via physical media + writes them to the user space program. + + In order to use the driver a program has to open /dev/net/tun and issue a + corresponding ioctl() to register a network device with the kernel. A network + device will appear as tunXX or tapXX, depending on the options chosen. When + the program closes the file descriptor, the network device and all + corresponding routes will disappear. + + Depending on the type of device chosen the userspace program has to read/write + IP packets (with tun) or ethernet frames (with tap). Which one is being used + depends on the flags given with the ioctl(). + + The package from http://vtun.sourceforge.net/tun contains two simple examples + for how to use tun and tap devices. Both programs work like a bridge between + two network interfaces. + br_select.c - bridge based on select system call. + br_sigio.c - bridge based on async io and SIGIO signal. + However, the best example is VTun http://vtun.sourceforge.net :)) + +2. Configuration + Create device node: + mkdir /dev/net (if it doesn't exist already) + mknod /dev/net/tun c 10 200 + + Set permissions: + e.g. chmod 0666 /dev/net/tun + There's no harm in allowing the device to be accessible by non-root users, + since CAP_NET_ADMIN is required for creating network devices or for + connecting to network devices which aren't owned by the user in question. + If you want to create persistent devices and give ownership of them to + unprivileged users, then you need the /dev/net/tun device to be usable by + those users. + + Driver module autoloading + + Make sure that "Kernel module loader" - module auto-loading + support is enabled in your kernel. The kernel should load it on + first access. + + Manual loading + insert the module by hand: + modprobe tun + + If you do it the latter way, you have to load the module every time you + need it, if you do it the other way it will be automatically loaded when + /dev/net/tun is being opened. + +3. Program interface + 3.1 Network device allocation: + + char *dev should be the name of the device with a format string (e.g. + "tun%d"), but (as far as I can see) this can be any valid network device name. + Note that the character pointer becomes overwritten with the real device name + (e.g. "tun0") + + #include <linux/if.h> + #include <linux/if_tun.h> + + int tun_alloc(char *dev) + { + struct ifreq ifr; + int fd, err; + + if( (fd = open("/dev/net/tun", O_RDWR)) < 0 ) + return tun_alloc_old(dev); + + memset(&ifr, 0, sizeof(ifr)); + + /* Flags: IFF_TUN - TUN device (no Ethernet headers) + * IFF_TAP - TAP device + * + * IFF_NO_PI - Do not provide packet information + */ + ifr.ifr_flags = IFF_TUN; + if( *dev ) + strncpy(ifr.ifr_name, dev, IFNAMSIZ); + + if( (err = ioctl(fd, TUNSETIFF, (void *) &ifr)) < 0 ){ + close(fd); + return err; + } + strcpy(dev, ifr.ifr_name); + return fd; + } + + 3.2 Frame format: + If flag IFF_NO_PI is not set each frame format is: + Flags [2 bytes] + Proto [2 bytes] + Raw protocol(IP, IPv6, etc) frame. + +Universal TUN/TAP device driver Frequently Asked Question. + +1. What platforms are supported by TUN/TAP driver ? +Currently driver has been written for 3 Unices: + Linux kernels 2.2.x, 2.4.x + FreeBSD 3.x, 4.x, 5.x + Solaris 2.6, 7.0, 8.0 + +2. What is TUN/TAP driver used for? +As mentioned above, main purpose of TUN/TAP driver is tunneling. +It is used by VTun (http://vtun.sourceforge.net). + +Another interesting application using TUN/TAP is pipsecd +(http://perso.enst.fr/~beyssac/pipsec/), an userspace IPSec +implementation that can use complete kernel routing (unlike FreeS/WAN). + +3. How does Virtual network device actually work ? +Virtual network device can be viewed as a simple Point-to-Point or +Ethernet device, which instead of receiving packets from a physical +media, receives them from user space program and instead of sending +packets via physical media sends them to the user space program. + +Let's say that you configured IPX on the tap0, then whenever +the kernel sends an IPX packet to tap0, it is passed to the application +(VTun for example). The application encrypts, compresses and sends it to +the other side over TCP or UDP. The application on the other side decompresses +and decrypts the data received and writes the packet to the TAP device, +the kernel handles the packet like it came from real physical device. + +4. What is the difference between TUN driver and TAP driver? +TUN works with IP frames. TAP works with Ethernet frames. + +This means that you have to read/write IP packets when you are using tun and +ethernet frames when using tap. + +5. What is the difference between BPF and TUN/TAP driver? +BPF is an advanced packet filter. It can be attached to existing +network interface. It does not provide a virtual network interface. +A TUN/TAP driver does provide a virtual network interface and it is possible +to attach BPF to this interface. + +6. Does TAP driver support kernel Ethernet bridging? +Yes. Linux and FreeBSD drivers support Ethernet bridging. diff --git a/Documentation/networking/udplite.txt b/Documentation/networking/udplite.txt new file mode 100644 index 0000000..855d8da --- /dev/null +++ b/Documentation/networking/udplite.txt @@ -0,0 +1,281 @@ + =========================================================================== + The UDP-Lite protocol (RFC 3828) + =========================================================================== + + + UDP-Lite is a Standards-Track IETF transport protocol whose characteristic + is a variable-length checksum. This has advantages for transport of multimedia + (video, VoIP) over wireless networks, as partly damaged packets can still be + fed into the codec instead of being discarded due to a failed checksum test. + + This file briefly describes the existing kernel support and the socket API. + For in-depth information, you can consult: + + o The UDP-Lite Homepage: http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/ + From here you can also download some example application source code. + + o The UDP-Lite HOWTO on + http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/UDP-Lite-HOWTO.txt + + o The Wireshark UDP-Lite WiKi (with capture files): + http://wiki.wireshark.org/Lightweight_User_Datagram_Protocol + + o The Protocol Spec, RFC 3828, http://www.ietf.org/rfc/rfc3828.txt + + + I) APPLICATIONS + + Several applications have been ported successfully to UDP-Lite. Ethereal + (now called wireshark) has UDP-Litev4/v6 support by default. The tarball on + + http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz + + has source code for several v4/v6 client-server and network testing examples. + + Porting applications to UDP-Lite is straightforward: only socket level and + IPPROTO need to be changed; senders additionally set the checksum coverage + length (default = header length = 8). Details are in the next section. + + + II) PROGRAMMING API + + UDP-Lite provides a connectionless, unreliable datagram service and hence + uses the same socket type as UDP. In fact, porting from UDP to UDP-Lite is + very easy: simply add `IPPROTO_UDPLITE' as the last argument of the socket(2) + call so that the statement looks like: + + s = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDPLITE); + + or, respectively, + + s = socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDPLITE); + + With just the above change you are able to run UDP-Lite services or connect + to UDP-Lite servers. The kernel will assume that you are not interested in + using partial checksum coverage and so emulate UDP mode (full coverage). + + To make use of the partial checksum coverage facilities requires setting a + single socket option, which takes an integer specifying the coverage length: + + * Sender checksum coverage: UDPLITE_SEND_CSCOV + + For example, + + int val = 20; + setsockopt(s, SOL_UDPLITE, UDPLITE_SEND_CSCOV, &val, sizeof(int)); + + sets the checksum coverage length to 20 bytes (12b data + 8b header). + Of each packet only the first 20 bytes (plus the pseudo-header) will be + checksummed. This is useful for RTP applications which have a 12-byte + base header. + + + * Receiver checksum coverage: UDPLITE_RECV_CSCOV + + This option is the receiver-side analogue. It is truly optional, i.e. not + required to enable traffic with partial checksum coverage. Its function is + that of a traffic filter: when enabled, it instructs the kernel to drop + all packets which have a coverage _less_ than this value. For example, if + RTP and UDP headers are to be protected, a receiver can enforce that only + packets with a minimum coverage of 20 are admitted: + + int min = 20; + setsockopt(s, SOL_UDPLITE, UDPLITE_RECV_CSCOV, &min, sizeof(int)); + + The calls to getsockopt(2) are analogous. Being an extension and not a stand- + alone protocol, all socket options known from UDP can be used in exactly the + same manner as before, e.g. UDP_CORK or UDP_ENCAP. + + A detailed discussion of UDP-Lite checksum coverage options is in section IV. + + + III) HEADER FILES + + The socket API requires support through header files in /usr/include: + + * /usr/include/netinet/in.h + to define IPPROTO_UDPLITE + + * /usr/include/netinet/udplite.h + for UDP-Lite header fields and protocol constants + + For testing purposes, the following can serve as a `mini' header file: + + #define IPPROTO_UDPLITE 136 + #define SOL_UDPLITE 136 + #define UDPLITE_SEND_CSCOV 10 + #define UDPLITE_RECV_CSCOV 11 + + Ready-made header files for various distros are in the UDP-Lite tarball. + + + IV) KERNEL BEHAVIOUR WITH REGARD TO THE VARIOUS SOCKET OPTIONS + + To enable debugging messages, the log level need to be set to 8, as most + messages use the KERN_DEBUG level (7). + + 1) Sender Socket Options + + If the sender specifies a value of 0 as coverage length, the module + assumes full coverage, transmits a packet with coverage length of 0 + and according checksum. If the sender specifies a coverage < 8 and + different from 0, the kernel assumes 8 as default value. Finally, + if the specified coverage length exceeds the packet length, the packet + length is used instead as coverage length. + + 2) Receiver Socket Options + + The receiver specifies the minimum value of the coverage length it + is willing to accept. A value of 0 here indicates that the receiver + always wants the whole of the packet covered. In this case, all + partially covered packets are dropped and an error is logged. + + It is not possible to specify illegal values (<0 and <8); in these + cases the default of 8 is assumed. + + All packets arriving with a coverage value less than the specified + threshold are discarded, these events are also logged. + + 3) Disabling the Checksum Computation + + On both sender and receiver, checksumming will always be performed + and cannot be disabled using SO_NO_CHECK. Thus + + setsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, ... ); + + will always will be ignored, while the value of + + getsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, &value, ...); + + is meaningless (as in TCP). Packets with a zero checksum field are + illegal (cf. RFC 3828, sec. 3.1) and will be silently discarded. + + 4) Fragmentation + + The checksum computation respects both buffersize and MTU. The size + of UDP-Lite packets is determined by the size of the send buffer. The + minimum size of the send buffer is 2048 (defined as SOCK_MIN_SNDBUF + in include/net/sock.h), the default value is configurable as + net.core.wmem_default or via setting the SO_SNDBUF socket(7) + option. The maximum upper bound for the send buffer is determined + by net.core.wmem_max. + + Given a payload size larger than the send buffer size, UDP-Lite will + split the payload into several individual packets, filling up the + send buffer size in each case. + + The precise value also depends on the interface MTU. The interface MTU, + in turn, may trigger IP fragmentation. In this case, the generated + UDP-Lite packet is split into several IP packets, of which only the + first one contains the L4 header. + + The send buffer size has implications on the checksum coverage length. + Consider the following example: + + Payload: 1536 bytes Send Buffer: 1024 bytes + MTU: 1500 bytes Coverage Length: 856 bytes + + UDP-Lite will ship the 1536 bytes in two separate packets: + + Packet 1: 1024 payload + 8 byte header + 20 byte IP header = 1052 bytes + Packet 2: 512 payload + 8 byte header + 20 byte IP header = 540 bytes + + The coverage packet covers the UDP-Lite header and 848 bytes of the + payload in the first packet, the second packet is fully covered. Note + that for the second packet, the coverage length exceeds the packet + length. The kernel always re-adjusts the coverage length to the packet + length in such cases. + + As an example of what happens when one UDP-Lite packet is split into + several tiny fragments, consider the following example. + + Payload: 1024 bytes Send buffer size: 1024 bytes + MTU: 300 bytes Coverage length: 575 bytes + + +-+-----------+--------------+--------------+--------------+ + |8| 272 | 280 | 280 | 280 | + +-+-----------+--------------+--------------+--------------+ + 280 560 840 1032 + ^ + *****checksum coverage************* + + The UDP-Lite module generates one 1032 byte packet (1024 + 8 byte + header). According to the interface MTU, these are split into 4 IP + packets (280 byte IP payload + 20 byte IP header). The kernel module + sums the contents of the entire first two packets, plus 15 bytes of + the last packet before releasing the fragments to the IP module. + + To see the analogous case for IPv6 fragmentation, consider a link + MTU of 1280 bytes and a write buffer of 3356 bytes. If the checksum + coverage is less than 1232 bytes (MTU minus IPv6/fragment header + lengths), only the first fragment needs to be considered. When using + larger checksum coverage lengths, each eligible fragment needs to be + checksummed. Suppose we have a checksum coverage of 3062. The buffer + of 3356 bytes will be split into the following fragments: + + Fragment 1: 1280 bytes carrying 1232 bytes of UDP-Lite data + Fragment 2: 1280 bytes carrying 1232 bytes of UDP-Lite data + Fragment 3: 948 bytes carrying 900 bytes of UDP-Lite data + + The first two fragments have to be checksummed in full, of the last + fragment only 598 (= 3062 - 2*1232) bytes are checksummed. + + While it is important that such cases are dealt with correctly, they + are (annoyingly) rare: UDP-Lite is designed for optimising multimedia + performance over wireless (or generally noisy) links and thus smaller + coverage lengths are likely to be expected. + + + V) UDP-LITE RUNTIME STATISTICS AND THEIR MEANING + + Exceptional and error conditions are logged to syslog at the KERN_DEBUG + level. Live statistics about UDP-Lite are available in /proc/net/snmp + and can (with newer versions of netstat) be viewed using + + netstat -svu + + This displays UDP-Lite statistics variables, whose meaning is as follows. + + InDatagrams: The total number of datagrams delivered to users. + + NoPorts: Number of packets received to an unknown port. + These cases are counted separately (not as InErrors). + + InErrors: Number of erroneous UDP-Lite packets. Errors include: + * internal socket queue receive errors + * packet too short (less than 8 bytes or stated + coverage length exceeds received length) + * xfrm4_policy_check() returned with error + * application has specified larger min. coverage + length than that of incoming packet + * checksum coverage violated + * bad checksum + + OutDatagrams: Total number of sent datagrams. + + These statistics derive from the UDP MIB (RFC 2013). + + + VI) IPTABLES + + There is packet match support for UDP-Lite as well as support for the LOG target. + If you copy and paste the following line into /etc/protocols, + + udplite 136 UDP-Lite # UDP-Lite [RFC 3828] + + then + iptables -A INPUT -p udplite -j LOG + + will produce logging output to syslog. Dropping and rejecting packets also works. + + + VII) MAINTAINER ADDRESS + + The UDP-Lite patch was developed at + University of Aberdeen + Electronics Research Group + Department of Engineering + Fraser Noble Building + Aberdeen AB24 3UE; UK + The current maintainer is Gerrit Renker, <gerrit@erg.abdn.ac.uk>. Initial + code was developed by William Stanislaus, <william@erg.abdn.ac.uk>. diff --git a/Documentation/networking/vortex.txt b/Documentation/networking/vortex.txt new file mode 100644 index 0000000..bd70976 --- /dev/null +++ b/Documentation/networking/vortex.txt @@ -0,0 +1,448 @@ +Documentation/networking/vortex.txt +Andrew Morton +30 April 2000 + + +This document describes the usage and errata of the 3Com "Vortex" device +driver for Linux, 3c59x.c. + +The driver was written by Donald Becker <becker@scyld.com> + +Don is no longer the prime maintainer of this version of the driver. +Please report problems to one or more of: + + Andrew Morton + Netdev mailing list <netdev@vger.kernel.org> + Linux kernel mailing list <linux-kernel@vger.kernel.org> + +Please note the 'Reporting and Diagnosing Problems' section at the end +of this file. + + +Since kernel 2.3.99-pre6, this driver incorporates the support for the +3c575-series Cardbus cards which used to be handled by 3c575_cb.c. + +This driver supports the following hardware: + + 3c590 Vortex 10Mbps + 3c592 EISA 10Mbps Demon/Vortex + 3c597 EISA Fast Demon/Vortex + 3c595 Vortex 100baseTx + 3c595 Vortex 100baseT4 + 3c595 Vortex 100base-MII + 3c900 Boomerang 10baseT + 3c900 Boomerang 10Mbps Combo + 3c900 Cyclone 10Mbps TPO + 3c900 Cyclone 10Mbps Combo + 3c900 Cyclone 10Mbps TPC + 3c900B-FL Cyclone 10base-FL + 3c905 Boomerang 100baseTx + 3c905 Boomerang 100baseT4 + 3c905B Cyclone 100baseTx + 3c905B Cyclone 10/100/BNC + 3c905B-FX Cyclone 100baseFx + 3c905C Tornado + 3c920B-EMB-WNM (ATI Radeon 9100 IGP) + 3c980 Cyclone + 3c980C Python-T + 3cSOHO100-TX Hurricane + 3c555 Laptop Hurricane + 3c556 Laptop Tornado + 3c556B Laptop Hurricane + 3c575 [Megahertz] 10/100 LAN CardBus + 3c575 Boomerang CardBus + 3CCFE575BT Cyclone CardBus + 3CCFE575CT Tornado CardBus + 3CCFE656 Cyclone CardBus + 3CCFEM656B Cyclone+Winmodem CardBus + 3CXFEM656C Tornado+Winmodem CardBus + 3c450 HomePNA Tornado + 3c920 Tornado + 3c982 Hydra Dual Port A + 3c982 Hydra Dual Port B + 3c905B-T4 + 3c920B-EMB-WNM Tornado + +Module parameters +================= + +There are several parameters which may be provided to the driver when +its module is loaded. These are usually placed in /etc/modprobe.conf +(/etc/modules.conf in 2.4). Example: + +options 3c59x debug=3 rx_copybreak=300 + +If you are using the PCMCIA tools (cardmgr) then the options may be +placed in /etc/pcmcia/config.opts: + +module "3c59x" opts "debug=3 rx_copybreak=300" + + +The supported parameters are: + +debug=N + + Where N is a number from 0 to 7. Anything above 3 produces a lot + of output in your system logs. debug=1 is default. + +options=N1,N2,N3,... + + Each number in the list provides an option to the corresponding + network card. So if you have two 3c905's and you wish to provide + them with option 0x204 you would use: + + options=0x204,0x204 + + The individual options are composed of a number of bitfields which + have the following meanings: + + Possible media type settings + 0 10baseT + 1 10Mbs AUI + 2 undefined + 3 10base2 (BNC) + 4 100base-TX + 5 100base-FX + 6 MII (Media Independent Interface) + 7 Use default setting from EEPROM + 8 Autonegotiate + 9 External MII + 10 Use default setting from EEPROM + + When generating a value for the 'options' setting, the above media + selection values may be OR'ed (or added to) the following: + + 0x8000 Set driver debugging level to 7 + 0x4000 Set driver debugging level to 2 + 0x0400 Enable Wake-on-LAN + 0x0200 Force full duplex mode. + 0x0010 Bus-master enable bit (Old Vortex cards only) + + For example: + + insmod 3c59x options=0x204 + + will force full-duplex 100base-TX, rather than allowing the usual + autonegotiation. + +global_options=N + + Sets the `options' parameter for all 3c59x NICs in the machine. + Entries in the `options' array above will override any setting of + this. + +full_duplex=N1,N2,N3... + + Similar to bit 9 of 'options'. Forces the corresponding card into + full-duplex mode. Please use this in preference to the `options' + parameter. + + In fact, please don't use this at all! You're better off getting + autonegotiation working properly. + +global_full_duplex=N1 + + Sets full duplex mode for all 3c59x NICs in the machine. Entries + in the `full_duplex' array above will override any setting of this. + +flow_ctrl=N1,N2,N3... + + Use 802.3x MAC-layer flow control. The 3com cards only support the + PAUSE command, which means that they will stop sending packets for a + short period if they receive a PAUSE frame from the link partner. + + The driver only allows flow control on a link which is operating in + full duplex mode. + + This feature does not appear to work on the 3c905 - only 3c905B and + 3c905C have been tested. + + The 3com cards appear to only respond to PAUSE frames which are + sent to the reserved destination address of 01:80:c2:00:00:01. They + do not honour PAUSE frames which are sent to the station MAC address. + +rx_copybreak=M + + The driver preallocates 32 full-sized (1536 byte) network buffers + for receiving. When a packet arrives, the driver has to decide + whether to leave the packet in its full-sized buffer, or to allocate + a smaller buffer and copy the packet across into it. + + This is a speed/space tradeoff. + + The value of rx_copybreak is used to decide when to make the copy. + If the packet size is less than rx_copybreak, the packet is copied. + The default value for rx_copybreak is 200 bytes. + +max_interrupt_work=N + + The driver's interrupt service routine can handle many receive and + transmit packets in a single invocation. It does this in a loop. + The value of max_interrupt_work governs how mnay times the interrupt + service routine will loop. The default value is 32 loops. If this + is exceeded the interrupt service routine gives up and generates a + warning message "eth0: Too much work in interrupt". + +hw_checksums=N1,N2,N3,... + + Recent 3com NICs are able to generate IPv4, TCP and UDP checksums + in hardware. Linux has used the Rx checksumming for a long time. + The "zero copy" patch which is planned for the 2.4 kernel series + allows you to make use of the NIC's DMA scatter/gather and transmit + checksumming as well. + + The driver is set up so that, when the zerocopy patch is applied, + all Tornado and Cyclone devices will use S/G and Tx checksums. + + This module parameter has been provided so you can override this + decision. If you think that Tx checksums are causing a problem, you + may disable the feature with `hw_checksums=0'. + + If you think your NIC should be performing Tx checksumming and the + driver isn't enabling it, you can force the use of hardware Tx + checksumming with `hw_checksums=1'. + + The driver drops a message in the logfiles to indicate whether or + not it is using hardware scatter/gather and hardware Tx checksums. + + Scatter/gather and hardware checksums provide considerable + performance improvement for the sendfile() system call, but a small + decrease in throughput for send(). There is no effect upon receive + efficiency. + +compaq_ioaddr=N +compaq_irq=N +compaq_device_id=N + + "Variables to work-around the Compaq PCI BIOS32 problem".... + +watchdog=N + + Sets the time duration (in milliseconds) after which the kernel + decides that the transmitter has become stuck and needs to be reset. + This is mainly for debugging purposes, although it may be advantageous + to increase this value on LANs which have very high collision rates. + The default value is 5000 (5.0 seconds). + +enable_wol=N1,N2,N3,... + + Enable Wake-on-LAN support for the relevant interface. Donald + Becker's `ether-wake' application may be used to wake suspended + machines. + + Also enables the NIC's power management support. + +global_enable_wol=N + + Sets enable_wol mode for all 3c59x NICs in the machine. Entries in + the `enable_wol' array above will override any setting of this. + +Media selection +--------------- + +A number of the older NICs such as the 3c590 and 3c900 series have +10base2 and AUI interfaces. + +Prior to January, 2001 this driver would autoeselect the 10base2 or AUI +port if it didn't detect activity on the 10baseT port. It would then +get stuck on the 10base2 port and a driver reload was necessary to +switch back to 10baseT. This behaviour could not be prevented with a +module option override. + +Later (current) versions of the driver _do_ support locking of the +media type. So if you load the driver module with + + modprobe 3c59x options=0 + +it will permanently select the 10baseT port. Automatic selection of +other media types does not occur. + + +Transmit error, Tx status register 82 +------------------------------------- + +This is a common error which is almost always caused by another host on +the same network being in full-duplex mode, while this host is in +half-duplex mode. You need to find that other host and make it run in +half-duplex mode or fix this host to run in full-duplex mode. + +As a last resort, you can force the 3c59x driver into full-duplex mode +with + + options 3c59x full_duplex=1 + +but this has to be viewed as a workaround for broken network gear and +should only really be used for equipment which cannot autonegotiate. + + +Additional resources +-------------------- + +Details of the device driver implementation are at the top of the source file. + +Additional documentation is available at Don Becker's Linux Drivers site: + + http://www.scyld.com/vortex.html + +Donald Becker's driver development site: + + http://www.scyld.com/network.html + +Donald's vortex-diag program is useful for inspecting the NIC's state: + + http://www.scyld.com/ethercard_diag.html + +Donald's mii-diag program may be used for inspecting and manipulating +the NIC's Media Independent Interface subsystem: + + http://www.scyld.com/ethercard_diag.html#mii-diag + +Donald's wake-on-LAN page: + + http://www.scyld.com/wakeonlan.html + +3Com's DOS-based application for setting up the NICs EEPROMs: + + ftp://ftp.3com.com/pub/nic/3c90x/3c90xx2.exe + + +Autonegotiation notes +--------------------- + + The driver uses a one-minute heartbeat for adapting to changes in + the external LAN environment if link is up and 5 seconds if link is down. + This means that when, for example, a machine is unplugged from a hubbed + 10baseT LAN plugged into a switched 100baseT LAN, the throughput + will be quite dreadful for up to sixty seconds. Be patient. + + Cisco interoperability note from Walter Wong <wcw+@CMU.EDU>: + + On a side note, adding HAS_NWAY seems to share a problem with the + Cisco 6509 switch. Specifically, you need to change the spanning + tree parameter for the port the machine is plugged into to 'portfast' + mode. Otherwise, the negotiation fails. This has been an issue + we've noticed for a while but haven't had the time to track down. + + Cisco switches (Jeff Busch <jbusch@deja.com>) + + My "standard config" for ports to which PC's/servers connect directly: + + interface FastEthernet0/N + description machinename + load-interval 30 + spanning-tree portfast + + If autonegotiation is a problem, you may need to specify "speed + 100" and "duplex full" as well (or "speed 10" and "duplex half"). + + WARNING: DO NOT hook up hubs/switches/bridges to these + specially-configured ports! The switch will become very confused. + + +Reporting and diagnosing problems +--------------------------------- + +Maintainers find that accurate and complete problem reports are +invaluable in resolving driver problems. We are frequently not able to +reproduce problems and must rely on your patience and efforts to get to +the bottom of the problem. + +If you believe you have a driver problem here are some of the +steps you should take: + +- Is it really a driver problem? + + Eliminate some variables: try different cards, different + computers, different cables, different ports on the switch/hub, + different versions of the kernel or of the driver, etc. + +- OK, it's a driver problem. + + You need to generate a report. Typically this is an email to the + maintainer and/or linux-net@vger.kernel.org. The maintainer's + email address will be in the driver source or in the MAINTAINERS file. + +- The contents of your report will vary a lot depending upon the + problem. If it's a kernel crash then you should refer to the + REPORTING-BUGS file. + + But for most problems it is useful to provide the following: + + o Kernel version, driver version + + o A copy of the banner message which the driver generates when + it is initialised. For example: + + eth0: 3Com PCI 3c905C Tornado at 0xa400, 00:50:da:6a:88:f0, IRQ 19 + 8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface. + MII transceiver found at address 24, status 782d. + Enabling bus-master transmits and whole-frame receives. + + NOTE: You must provide the `debug=2' modprobe option to generate + a full detection message. Please do this: + + modprobe 3c59x debug=2 + + o If it is a PCI device, the relevant output from 'lspci -vx', eg: + + 00:09.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 74) + Subsystem: 3Com Corporation: Unknown device 9200 + Flags: bus master, medium devsel, latency 32, IRQ 19 + I/O ports at a400 [size=128] + Memory at db000000 (32-bit, non-prefetchable) [size=128] + Expansion ROM at <unassigned> [disabled] [size=128K] + Capabilities: [dc] Power Management version 2 + 00: b7 10 00 92 07 00 10 02 74 00 00 02 08 20 00 00 + 10: 01 a4 00 00 00 00 00 db 00 00 00 00 00 00 00 00 + 20: 00 00 00 00 00 00 00 00 00 00 00 00 b7 10 00 10 + 30: 00 00 00 00 dc 00 00 00 00 00 00 00 05 01 0a 0a + + o A description of the environment: 10baseT? 100baseT? + full/half duplex? switched or hubbed? + + o Any additional module parameters which you may be providing to the driver. + + o Any kernel logs which are produced. The more the merrier. + If this is a large file and you are sending your report to a + mailing list, mention that you have the logfile, but don't send + it. If you're reporting direct to the maintainer then just send + it. + + To ensure that all kernel logs are available, add the + following line to /etc/syslog.conf: + + kern.* /var/log/messages + + Then restart syslogd with: + + /etc/rc.d/init.d/syslog restart + + (The above may vary, depending upon which Linux distribution you use). + + o If your problem is reproducible then that's great. Try the + following: + + 1) Increase the debug level. Usually this is done via: + + a) modprobe driver debug=7 + b) In /etc/modprobe.conf (or /etc/modules.conf for 2.4): + options driver debug=7 + + 2) Recreate the problem with the higher debug level, + send all logs to the maintainer. + + 3) Download you card's diagnostic tool from Donald + Becker's website <http://www.scyld.com/ethercard_diag.html>. + Download mii-diag.c as well. Build these. + + a) Run 'vortex-diag -aaee' and 'mii-diag -v' when the card is + working correctly. Save the output. + + b) Run the above commands when the card is malfunctioning. Send + both sets of output. + +Finally, please be patient and be prepared to do some work. You may +end up working on this problem for a week or more as the maintainer +asks more questions, asks for more tests, asks for patches to be +applied, etc. At the end of it all, the problem may even remain +unresolved. diff --git a/Documentation/networking/wavelan.txt b/Documentation/networking/wavelan.txt new file mode 100644 index 0000000..afa6e52 --- /dev/null +++ b/Documentation/networking/wavelan.txt @@ -0,0 +1,73 @@ + The Wavelan drivers saga + ------------------------ + + By Jean Tourrilhes <jt@hpl.hp.com> + + The Wavelan is a Radio network adapter designed by +Lucent. Under this generic name is hidden quite a variety of hardware, +and many Linux driver to support it. + The get the full story on Wireless LANs, please consult : + http://www.hpl.hp.com/personal/Jean_Tourrilhes/Linux/ + +"wavelan" driver (old ISA Wavelan) +---------------- + o Config : Network device -> Wireless LAN -> AT&T WaveLAN + o Location : .../drivers/net/wireless/wavelan* + o in-line doc : .../drivers/net/wireless/wavelan.p.h + o on-line doc : + http://www.hpl.hp.com/personal/Jean_Tourrilhes/Linux/Wavelan.html + + This is the driver for the ISA version of the first generation +of the Wavelan, now discontinued. The device is 2 Mb/s, composed of a +Intel 82586 controller and a Lucent Modem, and is NOT 802.11 compliant. + The driver has been tested with the following hardware : + o Wavelan ISA 915 MHz (full length ISA card) + o Wavelan ISA 915 MHz 2.0 (half length ISA card) + o Wavelan ISA 2.4 GHz (full length ISA card, fixed frequency) + o Wavelan ISA 2.4 GHz 2.0 (half length ISA card, frequency selectable) + o Above cards with the optional DES encryption feature + +"wavelan_cs" driver (old Pcmcia Wavelan) +------------------- + o Config : Network device -> PCMCIA network -> + Pcmcia Wireless LAN -> AT&T/Lucent WaveLAN + o Location : .../drivers/net/pcmcia/wavelan* + o in-line doc : .../drivers/net/pcmcia/wavelan_cs.h + o on-line doc : + http://www.hpl.hp.com/personal/Jean_Tourrilhes/Linux/Wavelan.html + + This is the driver for the PCMCIA version of the first +generation of the Wavelan, now discontinued. The device is 2 Mb/s, +composed of a Intel 82593 controller (totally different from the 82586) +and a Lucent Modem, and NOT 802.11 compatible. + The driver has been tested with the following hardware : + o Wavelan Pcmcia 915 MHz 2.0 (Pcmcia card + separate + modem/antenna block) + o Wavelan Pcmcia 2.4 GHz 2.0 (Pcmcia card + separate + modem/antenna block) + +"wvlan_cs" driver (Wavelan IEEE, GPL) +----------------- + o Config : Not yet in kernel + o Location : Pcmcia package 3.1.10+ + o on-line doc : http://www.fasta.fh-dortmund.de/users/andy/wvlan/ + + This is the driver for the current generation of Wavelan IEEE, +which is 802.11 compatible. Depending on version, it is 2 Mb/s or 11 +Mb/s, with or without encryption, all implemented in Lucent specific +DSP (the Hermes). + This is a GPL full source PCMCIA driver (ISA is just a Pcmcia +card with ISA-Pcmcia bridge). + +"wavelan2_cs" driver (Wavelan IEEE, binary) +-------------------- + o Config : Not yet in kernel + o Location : ftp://sourceforge.org/pcmcia/contrib/ + + This driver support exactly the same hardware as the previous +driver, the main difference is that it is based on a binary library +and supported by Lucent. + + I hope it clears the confusion ;-) + + Jean diff --git a/Documentation/networking/x25-iface.txt b/Documentation/networking/x25-iface.txt new file mode 100644 index 0000000..975cc87 --- /dev/null +++ b/Documentation/networking/x25-iface.txt @@ -0,0 +1,123 @@ + X.25 Device Driver Interface 1.1 + + Jonathan Naylor 26.12.96 + +This is a description of the messages to be passed between the X.25 Packet +Layer and the X.25 device driver. They are designed to allow for the easy +setting of the LAPB mode from within the Packet Layer. + +The X.25 device driver will be coded normally as per the Linux device driver +standards. Most X.25 device drivers will be moderately similar to the +already existing Ethernet device drivers. However unlike those drivers, the +X.25 device driver has a state associated with it, and this information +needs to be passed to and from the Packet Layer for proper operation. + +All messages are held in sk_buff's just like real data to be transmitted +over the LAPB link. The first byte of the skbuff indicates the meaning of +the rest of the skbuff, if any more information does exist. + + +Packet Layer to Device Driver +----------------------------- + +First Byte = 0x00 + +This indicates that the rest of the skbuff contains data to be transmitted +over the LAPB link. The LAPB link should already exist before any data is +passed down. + +First Byte = 0x01 + +Establish the LAPB link. If the link is already established then the connect +confirmation message should be returned as soon as possible. + +First Byte = 0x02 + +Terminate the LAPB link. If it is already disconnected then the disconnect +confirmation message should be returned as soon as possible. + +First Byte = 0x03 + +LAPB parameters. To be defined. + + +Device Driver to Packet Layer +----------------------------- + +First Byte = 0x00 + +This indicates that the rest of the skbuff contains data that has been +received over the LAPB link. + +First Byte = 0x01 + +LAPB link has been established. The same message is used for both a LAPB +link connect_confirmation and a connect_indication. + +First Byte = 0x02 + +LAPB link has been terminated. This same message is used for both a LAPB +link disconnect_confirmation and a disconnect_indication. + +First Byte = 0x03 + +LAPB parameters. To be defined. + + + +Possible Problems +================= + +(Henner Eisen, 2000-10-28) + +The X.25 packet layer protocol depends on a reliable datalink service. +The LAPB protocol provides such reliable service. But this reliability +is not preserved by the Linux network device driver interface: + +- With Linux 2.4.x (and above) SMP kernels, packet ordering is not + preserved. Even if a device driver calls netif_rx(skb1) and later + netif_rx(skb2), skb2 might be delivered to the network layer + earlier that skb1. +- Data passed upstream by means of netif_rx() might be dropped by the + kernel if the backlog queue is congested. + +The X.25 packet layer protocol will detect this and reset the virtual +call in question. But many upper layer protocols are not designed to +handle such N-Reset events gracefully. And frequent N-Reset events +will always degrade performance. + +Thus, driver authors should make netif_rx() as reliable as possible: + +SMP re-ordering will not occur if the driver's interrupt handler is +always executed on the same CPU. Thus, + +- Driver authors should use irq affinity for the interrupt handler. + +The probability of packet loss due to backlog congestion can be +reduced by the following measures or a combination thereof: + +(1) Drivers for kernel versions 2.4.x and above should always check the + return value of netif_rx(). If it returns NET_RX_DROP, the + driver's LAPB protocol must not confirm reception of the frame + to the peer. + This will reliably suppress packet loss. The LAPB protocol will + automatically cause the peer to re-transmit the dropped packet + later. + The lapb module interface was modified to support this. Its + data_indication() method should now transparently pass the + netif_rx() return value to the (lapb mopdule) caller. +(2) Drivers for kernel versions 2.2.x should always check the global + variable netdev_dropping when a new frame is received. The driver + should only call netif_rx() if netdev_dropping is zero. Otherwise + the driver should not confirm delivery of the frame and drop it. + Alternatively, the driver can queue the frame internally and call + netif_rx() later when netif_dropping is 0 again. In that case, delivery + confirmation should also be deferred such that the internal queue + cannot grow to much. + This will not reliably avoid packet loss, but the probability + of packet loss in netif_rx() path will be significantly reduced. +(3) Additionally, driver authors might consider to support + CONFIG_NET_HW_FLOWCONTROL. This allows the driver to be woken up + when a previously congested backlog queue becomes empty again. + The driver could uses this for flow-controlling the peer by means + of the LAPB protocol's flow-control service. diff --git a/Documentation/networking/x25.txt b/Documentation/networking/x25.txt new file mode 100644 index 0000000..c91c6d7 --- /dev/null +++ b/Documentation/networking/x25.txt @@ -0,0 +1,44 @@ +Linux X.25 Project + +As my third year dissertation at University I have taken it upon myself to +write an X.25 implementation for Linux. My aim is to provide a complete X.25 +Packet Layer and a LAPB module to allow for "normal" X.25 to be run using +Linux. There are two sorts of X.25 cards available, intelligent ones that +implement LAPB on the card itself, and unintelligent ones that simply do +framing, bit-stuffing and checksumming. These both need to be handled by the +system. + +I therefore decided to write the implementation such that as far as the +Packet Layer is concerned, the link layer was being performed by a lower +layer of the Linux kernel and therefore it did not concern itself with +implementation of LAPB. Therefore the LAPB modules would be called by +unintelligent X.25 card drivers and not by intelligent ones, this would +provide a uniform device driver interface, and simplify configuration. + +To confuse matters a little, an 802.2 LLC implementation for Linux is being +written which will allow X.25 to be run over an Ethernet (or Token Ring) and +conform with the JNT "Pink Book", this will have a different interface to +the Packet Layer but there will be no confusion since the class of device +being served by the LLC will be completely separate from LAPB. The LLC +implementation is being done as part of another protocol project (SNA) and +by a different author. + +Just when you thought that it could not become more confusing, another +option appeared, XOT. This allows X.25 Packet Layer frames to operate over +the Internet using TCP/IP as a reliable link layer. RFC1613 specifies the +format and behaviour of the protocol. If time permits this option will also +be actively considered. + +A linux-x25 mailing list has been created at vger.kernel.org to support the +development and use of Linux X.25. It is early days yet, but interested +parties are welcome to subscribe to it. Just send a message to +majordomo@vger.kernel.org with the following in the message body: + +subscribe linux-x25 +end + +The contents of the Subject line are ignored. + +Jonathan + +g4klx@g4klx.demon.co.uk diff --git a/Documentation/networking/xfrm_proc.txt b/Documentation/networking/xfrm_proc.txt new file mode 100644 index 0000000..d0d8baf --- /dev/null +++ b/Documentation/networking/xfrm_proc.txt @@ -0,0 +1,74 @@ +XFRM proc - /proc/net/xfrm_* files +================================== +Masahide NAKAMURA <nakam@linux-ipv6.org> + + +Transformation Statistics +------------------------- +xfrm_proc is a statistics shown factor dropped by transformation +for developer. +It is a counter designed from current transformation source code +and defined like linux private MIB. + +Inbound statistics +~~~~~~~~~~~~~~~~~~ +XfrmInError: + All errors which is not matched others +XfrmInBufferError: + No buffer is left +XfrmInHdrError: + Header error +XfrmInNoStates: + No state is found + i.e. Either inbound SPI, address, or IPsec protocol at SA is wrong +XfrmInStateProtoError: + Transformation protocol specific error + e.g. SA key is wrong +XfrmInStateModeError: + Transformation mode specific error +XfrmInStateSeqError: + Sequence error + i.e. Sequence number is out of window +XfrmInStateExpired: + State is expired +XfrmInStateMismatch: + State has mismatch option + e.g. UDP encapsulation type is mismatch +XfrmInStateInvalid: + State is invalid +XfrmInTmplMismatch: + No matching template for states + e.g. Inbound SAs are correct but SP rule is wrong +XfrmInNoPols: + No policy is found for states + e.g. Inbound SAs are correct but no SP is found +XfrmInPolBlock: + Policy discards +XfrmInPolError: + Policy error + +Outbound errors +~~~~~~~~~~~~~~~ +XfrmOutError: + All errors which is not matched others +XfrmOutBundleGenError: + Bundle generation error +XfrmOutBundleCheckError: + Bundle check error +XfrmOutNoStates: + No state is found +XfrmOutStateProtoError: + Transformation protocol specific error +XfrmOutStateModeError: + Transformation mode specific error +XfrmOutStateSeqError: + Sequence error + i.e. Sequence number overflow +XfrmOutStateExpired: + State is expired +XfrmOutPolBlock: + Policy discards +XfrmOutPolDead: + Policy is dead +XfrmOutPolError: + Policy error diff --git a/Documentation/networking/xfrm_sync.txt b/Documentation/networking/xfrm_sync.txt new file mode 100644 index 0000000..d7aac9d --- /dev/null +++ b/Documentation/networking/xfrm_sync.txt @@ -0,0 +1,169 @@ + +The sync patches work is based on initial patches from +Krisztian <hidden@balabit.hu> and others and additional patches +from Jamal <hadi@cyberus.ca>. + +The end goal for syncing is to be able to insert attributes + generate +events so that the an SA can be safely moved from one machine to another +for HA purposes. +The idea is to synchronize the SA so that the takeover machine can do +the processing of the SA as accurate as possible if it has access to it. + +We already have the ability to generate SA add/del/upd events. +These patches add ability to sync and have accurate lifetime byte (to +ensure proper decay of SAs) and replay counters to avoid replay attacks +with as minimal loss at failover time. +This way a backup stays as closely uptodate as an active member. + +Because the above items change for every packet the SA receives, +it is possible for a lot of the events to be generated. +For this reason, we also add a nagle-like algorithm to restrict +the events. i.e we are going to set thresholds to say "let me +know if the replay sequence threshold is reached or 10 secs have passed" +These thresholds are set system-wide via sysctls or can be updated +per SA. + +The identified items that need to be synchronized are: +- the lifetime byte counter +note that: lifetime time limit is not important if you assume the failover +machine is known ahead of time since the decay of the time countdown +is not driven by packet arrival. +- the replay sequence for both inbound and outbound + +1) Message Structure +---------------------- + +nlmsghdr:aevent_id:optional-TLVs. + +The netlink message types are: + +XFRM_MSG_NEWAE and XFRM_MSG_GETAE. + +A XFRM_MSG_GETAE does not have TLVs. +A XFRM_MSG_NEWAE will have at least two TLVs (as is +discussed further below). + +aevent_id structure looks like: + + struct xfrm_aevent_id { + struct xfrm_usersa_id sa_id; + xfrm_address_t saddr; + __u32 flags; + __u32 reqid; + }; + +The unique SA is identified by the combination of xfrm_usersa_id, +reqid and saddr. + +flags are used to indicate different things. The possible +flags are: + XFRM_AE_RTHR=1, /* replay threshold*/ + XFRM_AE_RVAL=2, /* replay value */ + XFRM_AE_LVAL=4, /* lifetime value */ + XFRM_AE_ETHR=8, /* expiry timer threshold */ + XFRM_AE_CR=16, /* Event cause is replay update */ + XFRM_AE_CE=32, /* Event cause is timer expiry */ + XFRM_AE_CU=64, /* Event cause is policy update */ + +How these flags are used is dependent on the direction of the +message (kernel<->user) as well the cause (config, query or event). +This is described below in the different messages. + +The pid will be set appropriately in netlink to recognize direction +(0 to the kernel and pid = processid that created the event +when going from kernel to user space) + +A program needs to subscribe to multicast group XFRMNLGRP_AEVENTS +to get notified of these events. + +2) TLVS reflect the different parameters: +----------------------------------------- + +a) byte value (XFRMA_LTIME_VAL) +This TLV carries the running/current counter for byte lifetime since +last event. + +b)replay value (XFRMA_REPLAY_VAL) +This TLV carries the running/current counter for replay sequence since +last event. + +c)replay threshold (XFRMA_REPLAY_THRESH) +This TLV carries the threshold being used by the kernel to trigger events +when the replay sequence is exceeded. + +d) expiry timer (XFRMA_ETIMER_THRESH) +This is a timer value in milliseconds which is used as the nagle +value to rate limit the events. + +3) Default configurations for the parameters: +---------------------------------------------- + +By default these events should be turned off unless there is +at least one listener registered to listen to the multicast +group XFRMNLGRP_AEVENTS. + +Programs installing SAs will need to specify the two thresholds, however, +in order to not change existing applications such as racoon +we also provide default threshold values for these different parameters +in case they are not specified. + +the two sysctls/proc entries are: +a) /proc/sys/net/core/sysctl_xfrm_aevent_etime +used to provide default values for the XFRMA_ETIMER_THRESH in incremental +units of time of 100ms. The default is 10 (1 second) + +b) /proc/sys/net/core/sysctl_xfrm_aevent_rseqth +used to provide default values for XFRMA_REPLAY_THRESH parameter +in incremental packet count. The default is two packets. + +4) Message types +---------------- + +a) XFRM_MSG_GETAE issued by user-->kernel. +XFRM_MSG_GETAE does not carry any TLVs. +The response is a XFRM_MSG_NEWAE which is formatted based on what +XFRM_MSG_GETAE queried for. +The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs. +*if XFRM_AE_RTHR flag is set, then XFRMA_REPLAY_THRESH is also retrieved +*if XFRM_AE_ETHR flag is set, then XFRMA_ETIMER_THRESH is also retrieved + +b) XFRM_MSG_NEWAE is issued by either user space to configure +or kernel to announce events or respond to a XFRM_MSG_GETAE. + +i) user --> kernel to configure a specific SA. +any of the values or threshold parameters can be updated by passing the +appropriate TLV. +A response is issued back to the sender in user space to indicate success +or failure. +In the case of success, additionally an event with +XFRM_MSG_NEWAE is also issued to any listeners as described in iii). + +ii) kernel->user direction as a response to XFRM_MSG_GETAE +The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs. +The threshold TLVs will be included if explicitly requested in +the XFRM_MSG_GETAE message. + +iii) kernel->user to report as event if someone sets any values or +thresholds for an SA using XFRM_MSG_NEWAE (as described in #i above). +In such a case XFRM_AE_CU flag is set to inform the user that +the change happened as a result of an update. +The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs. + +iv) kernel->user to report event when replay threshold or a timeout +is exceeded. +In such a case either XFRM_AE_CR (replay exceeded) or XFRM_AE_CE (timeout +happened) is set to inform the user what happened. +Note the two flags are mutually exclusive. +The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs. + +Exceptions to threshold settings +-------------------------------- + +If you have an SA that is getting hit by traffic in bursts such that +there is a period where the timer threshold expires with no packets +seen, then an odd behavior is seen as follows: +The first packet arrival after a timer expiry will trigger a timeout +aevent; i.e we dont wait for a timeout period or a packet threshold +to be reached. This is done for simplicity and efficiency reasons. + +-JHS diff --git a/Documentation/networking/xfrm_sysctl.txt b/Documentation/networking/xfrm_sysctl.txt new file mode 100644 index 0000000..5bbd167 --- /dev/null +++ b/Documentation/networking/xfrm_sysctl.txt @@ -0,0 +1,4 @@ +/proc/sys/net/core/xfrm_* Variables: + +xfrm_acq_expires - INTEGER + default 30 - hard timeout in seconds for acquire requests diff --git a/Documentation/networking/z8530drv.txt b/Documentation/networking/z8530drv.txt new file mode 100644 index 0000000..2206abb --- /dev/null +++ b/Documentation/networking/z8530drv.txt @@ -0,0 +1,657 @@ +This is a subset of the documentation. To use this driver you MUST have the +full package from: + +Internet: +========= + +1. ftp://ftp.ccac.rwth-aachen.de/pub/jr/z8530drv-utils_3.0-3.tar.gz + +2. ftp://ftp.pspt.fi/pub/ham/linux/ax25/z8530drv-utils_3.0-3.tar.gz + +Please note that the information in this document may be hopelessly outdated. +A new version of the documentation, along with links to other important +Linux Kernel AX.25 documentation and programs, is available on +http://yaina.de/jreuter + +----------------------------------------------------------------------------- + + + SCC.C - Linux driver for Z8530 based HDLC cards for AX.25 + + ******************************************************************** + + (c) 1993,2000 by Joerg Reuter DL1BKE <jreuter@yaina.de> + + portions (c) 1993 Guido ten Dolle PE1NNZ + + for the complete copyright notice see >> Copying.Z8530DRV << + + ******************************************************************** + + +1. Initialization of the driver +=============================== + +To use the driver, 3 steps must be performed: + + 1. if compiled as module: loading the module + 2. Setup of hardware, MODEM and KISS parameters with sccinit + 3. Attach each channel to the Linux kernel AX.25 with "ifconfig" + +Unlike the versions below 2.4 this driver is a real network device +driver. If you want to run xNOS instead of our fine kernel AX.25 +use a 2.x version (available from above sites) or read the +AX.25-HOWTO on how to emulate a KISS TNC on network device drivers. + + +1.1 Loading the module +====================== + +(If you're going to compile the driver as a part of the kernel image, + skip this chapter and continue with 1.2) + +Before you can use a module, you'll have to load it with + + insmod scc.o + +please read 'man insmod' that comes with module-init-tools. + +You should include the insmod in one of the /etc/rc.d/rc.* files, +and don't forget to insert a call of sccinit after that. It +will read your /etc/z8530drv.conf. + +1.2. /etc/z8530drv.conf +======================= + +To setup all parameters you must run /sbin/sccinit from one +of your rc.*-files. This has to be done BEFORE you can +"ifconfig" an interface. Sccinit reads the file /etc/z8530drv.conf +and sets the hardware, MODEM and KISS parameters. A sample file is +delivered with this package. Change it to your needs. + +The file itself consists of two main sections. + +1.2.1 configuration of hardware parameters +========================================== + +The hardware setup section defines the following parameters for each +Z8530: + +chip 1 +data_a 0x300 # data port A +ctrl_a 0x304 # control port A +data_b 0x301 # data port B +ctrl_b 0x305 # control port B +irq 5 # IRQ No. 5 +pclock 4915200 # clock +board BAYCOM # hardware type +escc no # enhanced SCC chip? (8580/85180/85280) +vector 0 # latch for interrupt vector +special no # address of special function register +option 0 # option to set via sfr + + +chip - this is just a delimiter to make sccinit a bit simpler to + program. A parameter has no effect. + +data_a - the address of the data port A of this Z8530 (needed) +ctrl_a - the address of the control port A (needed) +data_b - the address of the data port B (needed) +ctrl_b - the address of the control port B (needed) + +irq - the used IRQ for this chip. Different chips can use different + IRQs or the same. If they share an interrupt, it needs to be + specified within one chip-definition only. + +pclock - the clock at the PCLK pin of the Z8530 (option, 4915200 is + default), measured in Hertz + +board - the "type" of the board: + + SCC type value + --------------------------------- + PA0HZP SCC card PA0HZP + EAGLE card EAGLE + PC100 card PC100 + PRIMUS-PC (DG9BL) card PRIMUS + BayCom (U)SCC card BAYCOM + +escc - if you want support for ESCC chips (8580, 85180, 85280), set + this to "yes" (option, defaults to "no") + +vector - address of the vector latch (aka "intack port") for PA0HZP + cards. There can be only one vector latch for all chips! + (option, defaults to 0) + +special - address of the special function register on several cards. + (option, defaults to 0) + +option - The value you write into that register (option, default is 0) + +You can specify up to four chips (8 channels). If this is not enough, +just change + + #define MAXSCC 4 + +to a higher value. + +Example for the BAYCOM USCC: +---------------------------- + +chip 1 +data_a 0x300 # data port A +ctrl_a 0x304 # control port A +data_b 0x301 # data port B +ctrl_b 0x305 # control port B +irq 5 # IRQ No. 5 (#) +board BAYCOM # hardware type (*) +# +# SCC chip 2 +# +chip 2 +data_a 0x302 +ctrl_a 0x306 +data_b 0x303 +ctrl_b 0x307 +board BAYCOM + +An example for a PA0HZP card: +----------------------------- + +chip 1 +data_a 0x153 +data_b 0x151 +ctrl_a 0x152 +ctrl_b 0x150 +irq 9 +pclock 4915200 +board PA0HZP +vector 0x168 +escc no +# +# +# +chip 2 +data_a 0x157 +data_b 0x155 +ctrl_a 0x156 +ctrl_b 0x154 +irq 9 +pclock 4915200 +board PA0HZP +vector 0x168 +escc no + +A DRSI would should probably work with this: +-------------------------------------------- +(actually: two DRSI cards...) + +chip 1 +data_a 0x303 +data_b 0x301 +ctrl_a 0x302 +ctrl_b 0x300 +irq 7 +pclock 4915200 +board DRSI +escc no +# +# +# +chip 2 +data_a 0x313 +data_b 0x311 +ctrl_a 0x312 +ctrl_b 0x310 +irq 7 +pclock 4915200 +board DRSI +escc no + +Note that you cannot use the on-board baudrate generator off DRSI +cards. Use "mode dpll" for clock source (see below). + +This is based on information provided by Mike Bilow (and verified +by Paul Helay) + +The utility "gencfg" +-------------------- + +If you only know the parameters for the PE1CHL driver for DOS, +run gencfg. It will generate the correct port addresses (I hope). +Its parameters are exactly the same as the ones you use with +the "attach scc" command in net, except that the string "init" must +not appear. Example: + +gencfg 2 0x150 4 2 0 1 0x168 9 4915200 + +will print a skeleton z8530drv.conf for the OptoSCC to stdout. + +gencfg 2 0x300 2 4 5 -4 0 7 4915200 0x10 + +does the same for the BAYCOM USCC card. In my opinion it is much easier +to edit scc_config.h... + + +1.2.2 channel configuration +=========================== + +The channel definition is divided into three sub sections for each +channel: + +An example for scc0: + +# DEVICE + +device scc0 # the device for the following params + +# MODEM / BUFFERS + +speed 1200 # the default baudrate +clock dpll # clock source: + # dpll = normal half duplex operation + # external = MODEM provides own Rx/Tx clock + # divider = use full duplex divider if + # installed (1) +mode nrzi # HDLC encoding mode + # nrzi = 1k2 MODEM, G3RUH 9k6 MODEM + # nrz = DF9IC 9k6 MODEM + # +bufsize 384 # size of buffers. Note that this must include + # the AX.25 header, not only the data field! + # (optional, defaults to 384) + +# KISS (Layer 1) + +txdelay 36 # (see chapter 1.4) +persist 64 +slot 8 +tail 8 +fulldup 0 +wait 12 +min 3 +maxkey 7 +idle 3 +maxdef 120 +group 0 +txoff off +softdcd on +slip off + +The order WITHIN these sections is unimportant. The order OF these +sections IS important. The MODEM parameters are set with the first +recognized KISS parameter... + +Please note that you can initialize the board only once after boot +(or insmod). You can change all parameters but "mode" and "clock" +later with the Sccparam program or through KISS. Just to avoid +security holes... + +(1) this divider is usually mounted on the SCC-PBC (PA0HZP) or not + present at all (BayCom). It feeds back the output of the DPLL + (digital pll) as transmit clock. Using this mode without a divider + installed will normally result in keying the transceiver until + maxkey expires --- of course without sending anything (useful). + +2. Attachment of a channel by your AX.25 software +================================================= + +2.1 Kernel AX.25 +================ + +To set up an AX.25 device you can simply type: + + ifconfig scc0 44.128.1.1 hw ax25 dl0tha-7 + +This will create a network interface with the IP number 44.128.20.107 +and the callsign "dl0tha". If you do not have any IP number (yet) you +can use any of the 44.128.0.0 network. Note that you do not need +axattach. The purpose of axattach (like slattach) is to create a KISS +network device linked to a TTY. Please read the documentation of the +ax25-utils and the AX.25-HOWTO to learn how to set the parameters of +the kernel AX.25. + +2.2 NOS, NET and TFKISS +======================= + +Since the TTY driver (aka KISS TNC emulation) is gone you need +to emulate the old behaviour. The cost of using these programs is +that you probably need to compile the kernel AX.25, regardless of whether +you actually use it or not. First setup your /etc/ax25/axports, +for example: + + 9k6 dl0tha-9 9600 255 4 9600 baud port (scc3) + axlink dl0tha-15 38400 255 4 Link to NOS + +Now "ifconfig" the scc device: + + ifconfig scc3 44.128.1.1 hw ax25 dl0tha-9 + +You can now axattach a pseudo-TTY: + + axattach /dev/ptys0 axlink + +and start your NOS and attach /dev/ptys0 there. The problem is that +NOS is reachable only via digipeating through the kernel AX.25 +(disastrous on a DAMA controlled channel). To solve this problem, +configure "rxecho" to echo the incoming frames from "9k6" to "axlink" +and outgoing frames from "axlink" to "9k6" and start: + + rxecho + +Or simply use "kissbridge" coming with z8530drv-utils: + + ifconfig scc3 hw ax25 dl0tha-9 + kissbridge scc3 /dev/ptys0 + + +3. Adjustment and Display of parameters +======================================= + +3.1 Displaying SCC Parameters: +============================== + +Once a SCC channel has been attached, the parameter settings and +some statistic information can be shown using the param program: + +dl1bke-u:~$ sccstat scc0 + +Parameters: + +speed : 1200 baud +txdelay : 36 +persist : 255 +slottime : 0 +txtail : 8 +fulldup : 1 +waittime : 12 +mintime : 3 sec +maxkeyup : 7 sec +idletime : 3 sec +maxdefer : 120 sec +group : 0x00 +txoff : off +softdcd : on +SLIP : off + +Status: + +HDLC Z8530 Interrupts Buffers +----------------------------------------------------------------------- +Sent : 273 RxOver : 0 RxInts : 125074 Size : 384 +Received : 1095 TxUnder: 0 TxInts : 4684 NoSpace : 0 +RxErrors : 1591 ExInts : 11776 +TxErrors : 0 SpInts : 1503 +Tx State : idle + + +The status info shown is: + +Sent - number of frames transmitted +Received - number of frames received +RxErrors - number of receive errors (CRC, ABORT) +TxErrors - number of discarded Tx frames (due to various reasons) +Tx State - status of the Tx interrupt handler: idle/busy/active/tail (2) +RxOver - number of receiver overruns +TxUnder - number of transmitter underruns +RxInts - number of receiver interrupts +TxInts - number of transmitter interrupts +EpInts - number of receiver special condition interrupts +SpInts - number of external/status interrupts +Size - maximum size of an AX.25 frame (*with* AX.25 headers!) +NoSpace - number of times a buffer could not get allocated + +An overrun is abnormal. If lots of these occur, the product of +baudrate and number of interfaces is too high for the processing +power of your computer. NoSpace errors are unlikely to be caused by the +driver or the kernel AX.25. + + +3.2 Setting Parameters +====================== + + +The setting of parameters of the emulated KISS TNC is done in the +same way in the SCC driver. You can change parameters by using +the kissparms program from the ax25-utils package or use the program +"sccparam": + + sccparam <device> <paramname> <decimal-|hexadecimal value> + +You can change the following parameters: + +param : value +------------------------ +speed : 1200 +txdelay : 36 +persist : 255 +slottime : 0 +txtail : 8 +fulldup : 1 +waittime : 12 +mintime : 3 +maxkeyup : 7 +idletime : 3 +maxdefer : 120 +group : 0x00 +txoff : off +softdcd : on +SLIP : off + + +The parameters have the following meaning: + +speed: + The baudrate on this channel in bits/sec + + Example: sccparam /dev/scc3 speed 9600 + +txdelay: + The delay (in units of 10 ms) after keying of the + transmitter, until the first byte is sent. This is usually + called "TXDELAY" in a TNC. When 0 is specified, the driver + will just wait until the CTS signal is asserted. This + assumes the presence of a timer or other circuitry in the + MODEM and/or transmitter, that asserts CTS when the + transmitter is ready for data. + A normal value of this parameter is 30-36. + + Example: sccparam /dev/scc0 txd 20 + +persist: + This is the probability that the transmitter will be keyed + when the channel is found to be free. It is a value from 0 + to 255, and the probability is (value+1)/256. The value + should be somewhere near 50-60, and should be lowered when + the channel is used more heavily. + + Example: sccparam /dev/scc2 persist 20 + +slottime: + This is the time between samples of the channel. It is + expressed in units of 10 ms. About 200-300 ms (value 20-30) + seems to be a good value. + + Example: sccparam /dev/scc0 slot 20 + +tail: + The time the transmitter will remain keyed after the last + byte of a packet has been transferred to the SCC. This is + necessary because the CRC and a flag still have to leave the + SCC before the transmitter is keyed down. The value depends + on the baudrate selected. A few character times should be + sufficient, e.g. 40ms at 1200 baud. (value 4) + The value of this parameter is in 10 ms units. + + Example: sccparam /dev/scc2 4 + +full: + The full-duplex mode switch. This can be one of the following + values: + + 0: The interface will operate in CSMA mode (the normal + half-duplex packet radio operation) + 1: Fullduplex mode, i.e. the transmitter will be keyed at + any time, without checking the received carrier. It + will be unkeyed when there are no packets to be sent. + 2: Like 1, but the transmitter will remain keyed, also + when there are no packets to be sent. Flags will be + sent in that case, until a timeout (parameter 10) + occurs. + + Example: sccparam /dev/scc0 fulldup off + +wait: + The initial waittime before any transmit attempt, after the + frame has been queue for transmit. This is the length of + the first slot in CSMA mode. In full duplex modes it is + set to 0 for maximum performance. + The value of this parameter is in 10 ms units. + + Example: sccparam /dev/scc1 wait 4 + +maxkey: + The maximal time the transmitter will be keyed to send + packets, in seconds. This can be useful on busy CSMA + channels, to avoid "getting a bad reputation" when you are + generating a lot of traffic. After the specified time has + elapsed, no new frame will be started. Instead, the trans- + mitter will be switched off for a specified time (parameter + min), and then the selected algorithm for keyup will be + started again. + The value 0 as well as "off" will disable this feature, + and allow infinite transmission time. + + Example: sccparam /dev/scc0 maxk 20 + +min: + This is the time the transmitter will be switched off when + the maximum transmission time is exceeded. + + Example: sccparam /dev/scc3 min 10 + +idle + This parameter specifies the maximum idle time in full duplex + 2 mode, in seconds. When no frames have been sent for this + time, the transmitter will be keyed down. A value of 0 is + has same result as the fullduplex mode 1. This parameter + can be disabled. + + Example: sccparam /dev/scc2 idle off # transmit forever + +maxdefer + This is the maximum time (in seconds) to wait for a free channel + to send. When this timer expires the transmitter will be keyed + IMMEDIATELY. If you love to get trouble with other users you + should set this to a very low value ;-) + + Example: sccparam /dev/scc0 maxdefer 240 # 2 minutes + + +txoff: + When this parameter has the value 0, the transmission of packets + is enable. Otherwise it is disabled. + + Example: sccparam /dev/scc2 txoff on + +group: + It is possible to build special radio equipment to use more than + one frequency on the same band, e.g. using several receivers and + only one transmitter that can be switched between frequencies. + Also, you can connect several radios that are active on the same + band. In these cases, it is not possible, or not a good idea, to + transmit on more than one frequency. The SCC driver provides a + method to lock transmitters on different interfaces, using the + "param <interface> group <x>" command. This will only work when + you are using CSMA mode (parameter full = 0). + The number <x> must be 0 if you want no group restrictions, and + can be computed as follows to create restricted groups: + <x> is the sum of some OCTAL numbers: + + 200 This transmitter will only be keyed when all other + transmitters in the group are off. + 100 This transmitter will only be keyed when the carrier + detect of all other interfaces in the group is off. + 0xx A byte that can be used to define different groups. + Interfaces are in the same group, when the logical AND + between their xx values is nonzero. + + Examples: + When 2 interfaces use group 201, their transmitters will never be + keyed at the same time. + When 2 interfaces use group 101, the transmitters will only key + when both channels are clear at the same time. When group 301, + the transmitters will not be keyed at the same time. + + Don't forget to convert the octal numbers into decimal before + you set the parameter. + + Example: (to be written) + +softdcd: + use a software dcd instead of the real one... Useful for a very + slow squelch. + + Example: sccparam /dev/scc0 soft on + + +4. Problems +=========== + +If you have tx-problems with your BayCom USCC card please check +the manufacturer of the 8530. SGS chips have a slightly +different timing. Try Zilog... A solution is to write to register 8 +instead to the data port, but this won't work with the ESCC chips. +*SIGH!* + +A very common problem is that the PTT locks until the maxkeyup timer +expires, although interrupts and clock source are correct. In most +cases compiling the driver with CONFIG_SCC_DELAY (set with +make config) solves the problems. For more hints read the (pseudo) FAQ +and the documentation coming with z8530drv-utils. + +I got reports that the driver has problems on some 386-based systems. +(i.e. Amstrad) Those systems have a bogus AT bus timing which will +lead to delayed answers on interrupts. You can recognize these +problems by looking at the output of Sccstat for the suspected +port. If it shows under- and overruns you own such a system. + +Delayed processing of received data: This depends on + +- the kernel version + +- kernel profiling compiled or not + +- a high interrupt load + +- a high load of the machine --- running X, Xmorph, XV and Povray, + while compiling the kernel... hmm ... even with 32 MB RAM ... ;-) + Or running a named for the whole .ampr.org domain on an 8 MB + box... + +- using information from rxecho or kissbridge. + +Kernel panics: please read /linux/README and find out if it +really occurred within the scc driver. + +If you cannot solve a problem, send me + +- a description of the problem, +- information on your hardware (computer system, scc board, modem) +- your kernel version +- the output of cat /proc/net/z8530 + +4. Thor RLC100 +============== + +Mysteriously this board seems not to work with the driver. Anyone +got it up-and-running? + + +Many thanks to Linus Torvalds and Alan Cox for including the driver +in the Linux standard distribution and their support. + +Joerg Reuter ampr-net: dl1bke@db0pra.ampr.org + AX-25 : DL1BKE @ DB0ABH.#BAY.DEU.EU + Internet: jreuter@yaina.de + WWW : http://yaina.de/jreuter |