diff options
Diffstat (limited to 'Documentation')
36 files changed, 1743 insertions, 369 deletions
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index 1af0f2d..2ffb0d6 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt @@ -33,7 +33,9 @@ pci_alloc_consistent(struct pci_dev *dev, size_t size, Consistent memory is memory for which a write by either the device or the processor can immediately be read by the processor or device -without having to worry about caching effects. +without having to worry about caching effects. (You may however need +to make sure to flush the processor's write buffers before telling +devices to read that memory.) This routine allocates a region of <size> bytes of consistent memory. it also returns a <dma_handle> which may be cast to an unsigned @@ -304,12 +306,12 @@ dma address with dma_mapping_error(). A non zero return value means the mapping could not be created and the driver should take appropriate action (eg reduce current DMA mapping usage or delay and try again later). -int -dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, - enum dma_data_direction direction) -int -pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, int direction) + int + dma_map_sg(struct device *dev, struct scatterlist *sg, + int nents, enum dma_data_direction direction) + int + pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, + int nents, int direction) Maps a scatter gather list from the block layer. @@ -327,12 +329,33 @@ critical that the driver do something, in the case of a block driver aborting the request or even oopsing is better than doing nothing and corrupting the filesystem. -void -dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nhwentries, - enum dma_data_direction direction) -void -pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, int direction) +With scatterlists, you use the resulting mapping like this: + + int i, count = dma_map_sg(dev, sglist, nents, direction); + struct scatterlist *sg; + + for (i = 0, sg = sglist; i < count; i++, sg++) { + hw_address[i] = sg_dma_address(sg); + hw_len[i] = sg_dma_len(sg); + } + +where nents is the number of entries in the sglist. + +The implementation is free to merge several consecutive sglist entries +into one (e.g. with an IOMMU, or if several pages just happen to be +physically contiguous) and returns the actual number of sg entries it +mapped them to. On failure 0, is returned. + +Then you should loop count times (note: this can be less than nents times) +and use sg_dma_address() and sg_dma_len() macros where you previously +accessed sg->address and sg->length as shown above. + + void + dma_unmap_sg(struct device *dev, struct scatterlist *sg, + int nhwentries, enum dma_data_direction direction) + void + pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg, + int nents, int direction) unmap the previously mapped scatter/gather list. All the parameters must be the same as those and passed in to the scatter/gather mapping diff --git a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt index ee4bb73..7c717699 100644 --- a/Documentation/DMA-mapping.txt +++ b/Documentation/DMA-mapping.txt @@ -58,11 +58,15 @@ translating each of those pages back to a kernel address using something like __va(). [ EDIT: Update this when we integrate Gerd Knorr's generic code which does this. ] -This rule also means that you may not use kernel image addresses -(ie. items in the kernel's data/text/bss segment, or your driver's) -nor may you use kernel stack addresses for DMA. Both of these items -might be mapped somewhere entirely different than the rest of physical -memory. +This rule also means that you may use neither kernel image addresses +(items in data/text/bss segments), nor module image addresses, nor +stack addresses for DMA. These could all be mapped somewhere entirely +different than the rest of physical memory. Even if those classes of +memory could physically work with DMA, you'd need to ensure the I/O +buffers were cacheline-aligned. Without that, you'd see cacheline +sharing problems (data corruption) on CPUs with DMA-incoherent caches. +(The CPU could write to one word, DMA would write to a different one +in the same cache line, and one of them could be overwritten.) Also, this means that you cannot take the return of a kmap() call and DMA to/from that. This is similar to vmalloc(). @@ -194,7 +198,7 @@ document for how to handle this case. Finally, if your device can only drive the low 24-bits of address during PCI bus mastering you might do something like: - if (pci_set_dma_mask(pdev, 0x00ffffff)) { + if (pci_set_dma_mask(pdev, DMA_24BIT_MASK)) { printk(KERN_WARNING "mydev: 24-bit DMA addressing not available.\n"); goto ignore_this_device; @@ -212,7 +216,7 @@ functions (for example a sound card provides playback and record functions) and the various different functions have _different_ DMA addressing limitations, you may wish to probe each mask and only provide the functionality which the machine can handle. It -is important that the last call to pci_set_dma_mask() be for the +is important that the last call to pci_set_dma_mask() be for the most specific mask. Here is pseudo-code showing how this might be done: @@ -284,6 +288,11 @@ There are two types of DMA mappings: in order to get correct behavior on all platforms. + Also, on some platforms your driver may need to flush CPU write + buffers in much the same way as it needs to flush write buffers + found in PCI bridges (such as by reading a register's value + after writing it). + - Streaming DMA mappings which are usually mapped for one DMA transfer, unmapped right after it (unless you use pci_dma_sync_* below) and for which hardware can optimize for sequential accesses. @@ -303,6 +312,9 @@ There are two types of DMA mappings: Neither type of DMA mapping has alignment restrictions that come from PCI, although some devices may have such restrictions. +Also, systems with caches that aren't DMA-coherent will work better +when the underlying buffers don't share cache lines with other data. + Using Consistent DMA mappings. diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl index 5bcbb6e..f869b03 100644 --- a/Documentation/DocBook/libata.tmpl +++ b/Documentation/DocBook/libata.tmpl @@ -705,7 +705,7 @@ and other resources, etc. <sect1><title>ata_scsi_error()</title> <para> - ata_scsi_error() is the current hostt->eh_strategy_handler() + ata_scsi_error() is the current transportt->eh_strategy_handler() for libata. As discussed above, this will be entered in two cases - timeout and ATAPI error completion. This function calls low level libata driver's eng_timeout() callback, the diff --git a/Documentation/HOWTO b/Documentation/HOWTO index 6c9e746..915ae8c 100644 --- a/Documentation/HOWTO +++ b/Documentation/HOWTO @@ -603,7 +603,8 @@ start exactly where you are now. ---------- -Thanks to Paolo Ciarrocchi who allowed the "Development Process" section +Thanks to Paolo Ciarrocchi who allowed the "Development Process" +(http://linux.tar.bz/articles/2.6-development_process) section to be based on text he had written, and to Randy Dunlap and Gerrit Huizenga for some of the list of things you should and should not say. Also thanks to Pat Mochel, Hanna Linder, Randy Dunlap, Kay Sievers, diff --git a/Documentation/block/switching-sched.txt b/Documentation/block/switching-sched.txt new file mode 100644 index 0000000..5fa130a --- /dev/null +++ b/Documentation/block/switching-sched.txt @@ -0,0 +1,22 @@ +As of the Linux 2.6.10 kernel, it is now possible to change the +IO scheduler for a given block device on the fly (thus making it possible, +for instance, to set the CFQ scheduler for the system default, but +set a specific device to use the anticipatory or noop schedulers - which +can improve that device's throughput). + +To set a specific scheduler, simply do this: + +echo SCHEDNAME > /sys/block/DEV/queue/scheduler + +where SCHEDNAME is the name of a defined IO scheduler, and DEV is the +device name (hda, hdb, sga, or whatever you happen to have). + +The list of defined schedulers can be found by simply doing +a "cat /sys/block/DEV/queue/scheduler" - the list of valid names +will be displayed, with the currently selected scheduler in brackets: + +# cat /sys/block/hda/queue/scheduler +noop anticipatory deadline [cfq] +# echo anticipatory > /sys/block/hda/queue/scheduler +# cat /sys/block/hda/queue/scheduler +noop [anticipatory] deadline cfq diff --git a/Documentation/cpu-freq/index.txt b/Documentation/cpu-freq/index.txt index 5009805..ffdb532 100644 --- a/Documentation/cpu-freq/index.txt +++ b/Documentation/cpu-freq/index.txt @@ -53,4 +53,4 @@ the CPUFreq Mailing list: * http://lists.linux.org.uk/mailman/listinfo/cpufreq Clock and voltage scaling for the SA-1100: -* http://www.lart.tudelft.nl/projects/scaling +* http://www.lartmaker.nl/projects/scaling diff --git a/Documentation/devices.txt b/Documentation/devices.txt index 3c406ac..b369a8c 100644 --- a/Documentation/devices.txt +++ b/Documentation/devices.txt @@ -1721,11 +1721,6 @@ Your cooperation is appreciated. These devices support the same API as the generic SCSI devices. - 97 block Packet writing for CD/DVD devices - 0 = /dev/pktcdvd0 First packet-writing module - 1 = /dev/pktcdvd1 Second packet-writing module - ... - 98 char Control and Measurement Device (comedi) 0 = /dev/comedi0 First comedi device 1 = /dev/comedi1 Second comedi device diff --git a/Documentation/dvb/get_dvb_firmware b/Documentation/dvb/get_dvb_firmware index 15fc8fb..4820366 100644 --- a/Documentation/dvb/get_dvb_firmware +++ b/Documentation/dvb/get_dvb_firmware @@ -259,9 +259,9 @@ sub dibusb { } sub nxt2002 { - my $sourcefile = "Broadband4PC_4_2_11.zip"; + my $sourcefile = "Technisat_DVB-PC_4_4_COMPACT.zip"; my $url = "http://www.bbti.us/download/windows/$sourcefile"; - my $hash = "c6d2ea47a8f456d887ada0cfb718ff2a"; + my $hash = "476befae8c7c1bb9648954060b1eec1f"; my $outfile = "dvb-fe-nxt2002.fw"; my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1); @@ -269,8 +269,8 @@ sub nxt2002 { wgetfile($sourcefile, $url); unzip($sourcefile, $tmpdir); - verify("$tmpdir/SkyNETU.sys", $hash); - extract("$tmpdir/SkyNETU.sys", 375832, 5908, $outfile); + verify("$tmpdir/SkyNET.sys", $hash); + extract("$tmpdir/SkyNET.sys", 331624, 5908, $outfile); $outfile; } diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 59d0c74..43ab119 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -25,8 +25,9 @@ Who: Adrian Bunk <bunk@stusta.de> --------------------------- -What: drivers depending on OBSOLETE_OSS_DRIVER -When: January 2006 +What: drivers that were depending on OBSOLETE_OSS_DRIVER + (config options already removed) +When: before 2.6.19 Why: OSS drivers with ALSA replacements Who: Adrian Bunk <bunk@stusta.de> @@ -56,6 +57,15 @@ Who: Jody McIntyre <scjody@steamballoon.com> --------------------------- +What: sbp2: module parameter "force_inquiry_hack" +When: July 2006 +Why: Superceded by parameter "workarounds". Both parameters are meant to be + used ad-hoc and for single devices only, i.e. not in modprobe.conf, + therefore the impact of this feature replacement should be low. +Who: Stefan Richter <stefanr@s5r6.in-berlin.de> + +--------------------------- + What: Video4Linux API 1 ioctls and video_decoder.h from Video devices. When: July 2006 Why: V4L1 AP1 was replaced by V4L2 API. during migration from 2.4 to 2.6 @@ -71,14 +81,6 @@ Who: Mauro Carvalho Chehab <mchehab@brturbo.com.br> --------------------------- -What: remove EXPORT_SYMBOL(panic_timeout) -When: April 2006 -Files: kernel/panic.c -Why: No modular usage in the kernel. -Who: Adrian Bunk <bunk@stusta.de> - ---------------------------- - What: remove EXPORT_SYMBOL(insert_resource) When: April 2006 Files: kernel/resource.c diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt index c8bce82..89b1d19 100644 --- a/Documentation/filesystems/sysfs.txt +++ b/Documentation/filesystems/sysfs.txt @@ -246,6 +246,7 @@ class/ devices/ firmware/ net/ +fs/ devices/ contains a filesystem representation of the device tree. It maps directly to the internal kernel device tree, which is a hierarchy of @@ -264,6 +265,10 @@ drivers/ contains a directory for each device driver that is loaded for devices on that particular bus (this assumes that drivers do not span multiple bus types). +fs/ contains a directory for some filesystems. Currently each +filesystem wanting to export attributes must create its own hierarchy +below fs/ (see ./fuse.txt for an example). + More information can driver-model specific features can be found in Documentation/driver-model/. diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index adaa899..3a2e552 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -694,7 +694,7 @@ struct file_operations ---------------------- This describes how the VFS can manipulate an open file. As of kernel -2.6.13, the following members are defined: +2.6.17, the following members are defined: struct file_operations { loff_t (*llseek) (struct file *, loff_t, int); @@ -723,6 +723,10 @@ struct file_operations { int (*check_flags)(int); int (*dir_notify)(struct file *filp, unsigned long arg); int (*flock) (struct file *, int, struct file_lock *); + ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, size_t, unsigned +int); + ssize_t (*splice_read)(struct file *, struct pipe_inode_info *, size_t, unsigned +int); }; Again, all methods are called without any locks being held, unless @@ -790,6 +794,12 @@ otherwise noted. flock: called by the flock(2) system call + splice_write: called by the VFS to splice data from a pipe to a file. This + method is used by the splice(2) system call + + splice_read: called by the VFS to splice data from file to a pipe. This + method is used by the splice(2) system call + Note that the file operations are implemented by the specific filesystem in which the inode resides. When opening a device node (character or block special) most filesystems will call special diff --git a/Documentation/firmware_class/README b/Documentation/firmware_class/README index 43e836c..e9cc8bb 100644 --- a/Documentation/firmware_class/README +++ b/Documentation/firmware_class/README @@ -105,20 +105,3 @@ on the setup, so I think that the choice on what firmware to make persistent should be left to userspace. - - Why register_firmware()+__init can be useful: - - For boot devices needing firmware. - - To make the transition easier: - The firmware can be declared __init and register_firmware() - called on module_init. Then the firmware is warranted to be - there even if "firmware hotplug userspace" is not there yet or - it doesn't yet provide the needed firmware. - Once the firmware is widely available in userspace, it can be - removed from the kernel. Or made optional (CONFIG_.*_FIRMWARE). - - In either case, if firmware hotplug support is there, it can move the - firmware out of kernel memory into the real filesystem for later - usage. - - Note: If persistence is implemented on top of initramfs, - register_firmware() may not be appropriate. - diff --git a/Documentation/firmware_class/firmware_sample_driver.c b/Documentation/firmware_class/firmware_sample_driver.c index ad3edab..87feccd 100644 --- a/Documentation/firmware_class/firmware_sample_driver.c +++ b/Documentation/firmware_class/firmware_sample_driver.c @@ -5,8 +5,6 @@ * * Sample code on how to use request_firmware() from drivers. * - * Note that register_firmware() is currently useless. - * */ #include <linux/module.h> @@ -17,11 +15,6 @@ #include "linux/firmware.h" -#define WE_CAN_NEED_FIRMWARE_BEFORE_USERSPACE_IS_AVAILABLE -#ifdef WE_CAN_NEED_FIRMWARE_BEFORE_USERSPACE_IS_AVAILABLE -char __init inkernel_firmware[] = "let's say that this is firmware\n"; -#endif - static struct device ghost_device = { .bus_id = "ghost0", }; @@ -104,10 +97,6 @@ static void sample_probe_async(void) static int sample_init(void) { -#ifdef WE_CAN_NEED_FIRMWARE_BEFORE_USERSPACE_IS_AVAILABLE - register_firmware("sample_driver_fw", inkernel_firmware, - sizeof(inkernel_firmware)); -#endif device_initialize(&ghost_device); /* since there is no real hardware insertion I just call the * sample probe functions here */ diff --git a/Documentation/i2c/busses/i2c-parport b/Documentation/i2c/busses/i2c-parport index d9f23c0..77b995d 100644 --- a/Documentation/i2c/busses/i2c-parport +++ b/Documentation/i2c/busses/i2c-parport @@ -12,18 +12,22 @@ meant as a replacement for the older, individual drivers: teletext adapters) It currently supports the following devices: - * Philips adapter - * home brew teletext adapter - * Velleman K8000 adapter - * ELV adapter - * Analog Devices evaluation boards (ADM1025, ADM1030, ADM1031, ADM1032) - * Barco LPT->DVI (K5800236) adapter + * (type=0) Philips adapter + * (type=1) home brew teletext adapter + * (type=2) Velleman K8000 adapter + * (type=3) ELV adapter + * (type=4) Analog Devices ADM1032 evaluation board + * (type=5) Analog Devices evaluation boards: ADM1025, ADM1030, ADM1031 + * (type=6) Barco LPT->DVI (K5800236) adapter These devices use different pinout configurations, so you have to tell the driver what you have, using the type module parameter. There is no way to autodetect the devices. Support for different pinout configurations can be easily added when needed. +Earlier kernels defaulted to type=0 (Philips). But now, if the type +parameter is missing, the driver will simply fail to initialize. + Building your own adapter ------------------------- diff --git a/Documentation/isdn/README.gigaset b/Documentation/isdn/README.gigaset new file mode 100644 index 0000000..85a64de --- /dev/null +++ b/Documentation/isdn/README.gigaset @@ -0,0 +1,286 @@ +GigaSet 307x Device Driver +========================== + +1. Requirements + ------------ +1.1. Hardware + -------- + This release supports the connection of the Gigaset 307x/417x family of + ISDN DECT bases via Gigaset M101 Data, Gigaset M105 Data or direct USB + connection. The following devices are reported to be compatible: + 307x/417x: + Gigaset SX255isdn + Gigaset SX353isdn + Sinus 45 [AB] isdn (Deutsche Telekom) + Sinus 721X/XA + Vox Chicago 390 ISDN (KPN Telecom) + M101: + Sinus 45 Data 1 (Telekom) + M105: + Gigaset USB Adapter DECT + Sinus 45 Data 2 (Telekom) + Sinus 721 data + Chicago 390 USB (KPN) + See also http://www.erbze.info/sinus_gigaset.htm and + http://gigaset307x.sourceforge.net/ + + We had also reports from users of Gigaset M105 who could use the drivers + with SX 100 and CX 100 ISDN bases (only in unimodem mode, see section 2.4.) + If you have another device that works with our driver, please let us know. + For example, Gigaset SX205isdn/Sinus 721 X SE and Gigaset SX303isdn bases + are just versions without answering machine of models known to work, so + they should work just as well; but so far we are lacking positive reports + on these. + + Chances of getting an USB device to work are good if the output of + lsusb + at the command line contains one of the following: + ID 0681:0001 + ID 0681:0002 + ID 0681:0009 + ID 0681:0021 + ID 0681:0022 + +1.2. Software + -------- + The driver works with ISDN4linux and so can be used with any software + which is able to use ISDN4linux for ISDN connections (voice or data). + CAPI4Linux support is planned but not yet available. + + There are some user space tools available at + http://sourceforge.net/projects/gigaset307x/ + which provide access to additional device specific functions like SMS, + phonebook or call journal. + + +2. How to use the driver + --------------------- +2.1. Modules + ------- + To get the device working, you have to load the proper kernel module. You + can do this using + modprobe modulename + where modulename is usb_gigaset (M105) or bas_gigaset (direct USB + connection to the base). + +2.2. Device nodes for user space programs + ------------------------------------ + The device can be accessed from user space (eg. by the user space tools + mentioned in 1.2.) through the device nodes: + + - /dev/ttyGU0 for M105 (USB data boxes) + - /dev/ttyGB0 for the base driver (direct USB connection) + + You can also select a "default device" which is used by the frontends when + no device node is given as parameter, by creating a symlink /dev/ttyG to + one of them, eg.: + + ln -s /dev/ttyGB0 /dev/ttyG + +2.3. ISDN4linux + ---------- + This is the "normal" mode of operation. After loading the module you can + set up the ISDN system just as you'd do with any ISDN card. + Your distribution should provide some configuration utility. + If not, you can use some HOWTOs like + http://www.linuxhaven.de/dlhp/HOWTO/DE-ISDN-HOWTO-5.html + If this doesn't work, because you have some recent device like SX100 where + debug output (see section 3.2.) shows something like this when dialing + CMD Received: ERROR + Available Params: 0 + Connection State: 0, Response: -1 + gigaset_process_response: resp_code -1 in ConState 0 ! + Timeout occurred + you might need to use unimodem mode: + +2.4. Unimodem mode + ------------- + This is needed for some devices [e.g. SX100] as they have problems with + the "normal" commands. + + If you have installed the command line tool gigacontr, you can enter + unimodem mode using + gigacontr --mode unimodem + You can switch back using + gigacontr --mode isdn + + You can also load the driver using e.g. + modprobe usb_gigaset startmode=0 + to prevent the driver from starting in "isdn4linux mode". + + In this mode the device works like a modem connected to a serial port + (the /dev/ttyGU0, ... mentioned above) which understands the commands + ATZ init, reset + => OK or ERROR + ATD + ATDT dial + => OK, CONNECT, + BUSY, + NO DIAL TONE, + NO CARRIER, + NO ANSWER + <pause>+++<pause> change to command mode when connected + ATH hangup + + You can use some configuration tool of your distribution to configure this + "modem" or configure pppd/wvdial manually. There are some example ppp + configuration files and chat scripts in the gigaset-VERSION/ppp directory. + Please note that the USB drivers are not able to change the state of the + control lines (the M105 driver can be configured to use some undocumented + control requests, if you really need the control lines, though). This means + you must use "Stupid Mode" if you are using wvdial or you should use the + nocrtscts option of pppd. + You must also assure that the ppp_async module is loaded with the parameter + flag_time=0. You can do this e.g. by adding a line like + + options ppp_async flag_time=0 + + to /etc/modprobe.conf. If your distribution has some local module + configuration file like /etc/modprobe.conf.local, + using that should be preferred. + +2.5. Call-ID (CID) mode + ------------------ + Call-IDs are numbers used to tag commands to, and responses from, the + Gigaset base in order to support the simultaneous handling of multiple + ISDN calls. Their use can be enabled ("CID mode") or disabled ("Unimodem + mode"). Without Call-IDs (in Unimodem mode), only a very limited set of + functions is available. It allows outgoing data connections only, but + does not signal incoming calls or other base events. + + DECT cordless data devices (M10x) permanently occupy the cordless + connection to the base while Call-IDs are activated. As the Gigaset + bases only support one DECT data connection at a time, this prevents + other DECT cordless data devices from accessing the base. + + During active operation, the driver switches to the necessary mode + automatically. However, for the reasons above, the mode chosen when + the device is not in use (idle) can be selected by the user. + - If you want to receive incoming calls, you can use the default + settings (CID mode). + - If you have several DECT data devices (M10x) which you want to use + in turn, select Unimodem mode by passing the parameter "cidmode=0" to + the driver ("modprobe usb_gigaset cidmode=0" or modprobe.conf). + + If you want both of these at once, you are out of luck. + + You can also use /sys/module/<name>/parameters/cidmode for changing + the CID mode setting (<name> is usb_gigaset or bas_gigaset). + + +3. Troubleshooting + --------------- +3.1. Solutions to frequently reported problems + ----------------------------------------- + Problem: + You have a slow provider and isdn4linux gives up dialing too early. + Solution: + Load the isdn module using the dialtimeout option. You can do this e.g. + by adding a line like + + options isdn dialtimeout=15 + + to /etc/modprobe.conf. If your distribution has some local module + configuration file like /etc/modprobe.conf.local, + using that should be preferred. + + Problem: + Your isdn script aborts with a message about isdnlog. + Solution: + Try deactivating (or commenting out) isdnlog. This driver does not + support it. + + Problem: + You have two or more DECT data adapters (M101/M105) and only the + first one you turn on works. + Solution: + Select Unimodem mode for all DECT data adapters. (see section 2.4.) + +3.2. Telling the driver to provide more information + ---------------------------------------------- + Building the driver with the "Gigaset debugging" kernel configuration + option (CONFIG_GIGASET_DEBUG) gives it the ability to produce additional + information useful for debugging. + + You can control the amount of debugging information the driver produces by + writing an appropriate value to /sys/module/gigaset/parameters/debug, e.g. + echo 0 > /sys/module/gigaset/parameters/debug + switches off debugging output completely, + echo 0x10a020 > /sys/module/gigaset/parameters/debug + enables the standard set of debugging output messages. These values are + bit patterns where every bit controls a certain type of debugging output. + See the constants DEBUG_* in the source file gigaset.h for details. + + The initial value can be set using the debug parameter when loading the + module "gigaset", e.g. by adding a line + options gigaset debug=0 + to /etc/modprobe.conf, ... + + Generated debugging information can be found + - as output of the command + dmesg + - in system log files written by your syslog daemon, usually + in /var/log/, e.g. /var/log/messages. + +3.3. Reporting problems and bugs + --------------------------- + If you can't solve problems with the driver on your own, feel free to + use one of the forums, bug trackers, or mailing lists on + http://sourceforge.net/projects/gigaset307x + or write an electronic mail to the maintainers. + + Try to provide as much information as possible, such as + - distribution + - kernel version (uname -r) + - gcc version (gcc --version) + - hardware architecture (uname -m, ...) + - type and firmware version of your device (base and wireless module, + if any) + - output of "lsusb -v" (if using an USB device) + - error messages + - relevant system log messages (it would help if you activate debug + output as described in 3.2.) + + For help with general configuration problems not specific to our driver, + such as isdn4linux and network configuration issues, please refer to the + appropriate forums and newsgroups. + +3.4. Reporting problem solutions + --------------------------- + If you solved a problem with our drivers, wrote startup scripts for your + distribution, ... feel free to contact us (using one of the places + mentioned in 3.3.). We'd like to add scripts, hints, documentation + to the driver and/or the project web page. + + +4. Links, other software + --------------------- + - Sourceforge project developing this driver and associated tools + http://sourceforge.net/projects/gigaset307x + - Yahoo! Group on the Siemens Gigaset family of devices + http://de.groups.yahoo.com/group/Siemens-Gigaset + - Siemens Gigaset/T-Sinus compatibility table + http://www.erbze.info/sinus_gigaset.htm + + +5. Credits + ------- + Thanks to + + Karsten Keil + for his help with isdn4linux + Deti Fliegl + for his base driver code + Dennis Dietrich + for his kernel 2.6 patches + Andreas Rummel + for his work and logs to get unimodem mode working + Andreas Degert + for his logs and patches to get cx 100 working + Dietrich Feist + for his generous donation of one M105 and two M101 cordless adapters + Christoph Schweers + for his generous donation of a M34 device + + and all the other people who sent logs and other information. + diff --git a/Documentation/kbuild/modules.txt b/Documentation/kbuild/modules.txt index fcccf24..61fc079 100644 --- a/Documentation/kbuild/modules.txt +++ b/Documentation/kbuild/modules.txt @@ -44,7 +44,7 @@ What is covered within this file is mainly information to authors of modules. The author of an external modules should supply a makefile that hides most of the complexity so one only has to type 'make' to build the module. A complete example will be present in -chapter ¤. Creating a kbuild file for an external module". +chapter 4, "Creating a kbuild file for an external module". === 2. How to build external modules diff --git a/Documentation/laptop-mode.txt b/Documentation/laptop-mode.txt index b18e216..5696e87 100644 --- a/Documentation/laptop-mode.txt +++ b/Documentation/laptop-mode.txt @@ -919,11 +919,11 @@ int main(int argc, char **argv) int settle_time = 60; /* Parse the simple command-line */ - if (ac == 2) - disk = av[1]; - else if (ac == 4) { - settle_time = atoi(av[2]); - disk = av[3]; + if (argc == 2) + disk = argv[1]; + else if (argc == 4) { + settle_time = atoi(argv[2]); + disk = argv[3]; } else usage(); diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index f855031..4710845 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -19,6 +19,7 @@ Contents: - Control dependencies. - SMP barrier pairing. - Examples of memory barrier sequences. + - Read memory barriers vs load speculation. (*) Explicit kernel barriers. @@ -248,7 +249,7 @@ And there are a number of things that _must_ or _must_not_ be assumed: we may get either of: STORE *A = X; Y = LOAD *A; - STORE *A = Y; + STORE *A = Y = X; ========================= @@ -344,9 +345,12 @@ Memory barriers come in four basic varieties: (4) General memory barriers. - A general memory barrier is a combination of both a read memory barrier - and a write memory barrier. It is a partial ordering over both loads and - stores. + A general memory barrier gives a guarantee that all the LOAD and STORE + operations specified before the barrier will appear to happen before all + the LOAD and STORE operations specified after the barrier with respect to + the other components of the system. + + A general memory barrier is a partial ordering over both loads and stores. General memory barriers imply both read and write memory barriers, and so can substitute for either. @@ -546,9 +550,9 @@ write barrier, though, again, a general barrier is viable: =============== =============== a = 1; <write barrier> - b = 2; x = a; + b = 2; x = b; <read barrier> - y = b; + y = a; Or: @@ -563,6 +567,18 @@ Or: Basically, the read barrier always has to be there, even though it can be of the "weaker" type. +[!] Note that the stores before the write barrier would normally be expected to +match the loads after the read barrier or data dependency barrier, and vice +versa: + + CPU 1 CPU 2 + =============== =============== + a = 1; }---- --->{ v = c + b = 2; } \ / { w = d + <write barrier> \ <read barrier> + c = 3; } / \ { x = a; + d = 4; }---- --->{ y = b; + EXAMPLES OF MEMORY BARRIER SEQUENCES ------------------------------------ @@ -600,8 +616,8 @@ STORE B, STORE C } all occuring before the unordered set of { STORE D, STORE E | | +------+ +-------+ : : | - | Sequence in which stores committed to memory system - | by CPU 1 + | Sequence in which stores are committed to the + | memory system by CPU 1 V @@ -610,6 +626,7 @@ loads. Consider the following sequence of events: CPU 1 CPU 2 ======================= ======================= + { B = 7; X = 9; Y = 8; C = &Y } STORE A = 1 STORE B = 2 <write barrier> @@ -651,7 +668,20 @@ In the above example, CPU 2 perceives that B is 7, despite the load of *C (which would be B) coming after the the LOAD of C. If, however, a data dependency barrier were to be placed between the load of C -and the load of *C (ie: B) on CPU 2, then the following will occur: +and the load of *C (ie: B) on CPU 2: + + CPU 1 CPU 2 + ======================= ======================= + { B = 7; X = 9; Y = 8; C = &Y } + STORE A = 1 + STORE B = 2 + <write barrier> + STORE C = &B LOAD X + STORE D = 4 LOAD C (gets &B) + <data dependency barrier> + LOAD *C (reads B) + +then the following will occur: +-------+ : : : : | | +------+ +-------+ @@ -669,14 +699,12 @@ and the load of *C (ie: B) on CPU 2, then the following will occur: | : : | | | : : | CPU 2 | | +-------+ | | - \ | X->9 |------>| | - \ +-------+ | | - ----->| B->2 | | | - +-------+ | | - Makes sure all effects ---> ddddddddddddddddd | | - prior to the store of C +-------+ | | - are perceptible to | B->2 |------>| | - successive loads +-------+ | | + | | X->9 |------>| | + | +-------+ | | + Makes sure all effects ---> \ ddddddddddddddddd | | + prior to the store of C \ +-------+ | | + are perceptible to ----->| B->2 |------>| | + subsequent loads +-------+ | | : : +-------+ @@ -685,73 +713,239 @@ following sequence of events: CPU 1 CPU 2 ======================= ======================= + { A = 0, B = 9 } STORE A=1 - STORE B=2 - STORE C=3 <write barrier> - STORE D=4 - STORE E=5 - LOAD A + STORE B=2 LOAD B - LOAD C - LOAD D - LOAD E + LOAD A Without intervention, CPU 2 may then choose to perceive the events on CPU 1 in some effectively random order, despite the write barrier issued by CPU 1: - +-------+ : : - | | +------+ - | |------>| C=3 | } - | | : +------+ } - | | : | A=1 | } - | | : +------+ } - | CPU 1 | : | B=2 | }--- - | | +------+ } \ - | | wwwwwwwwwwwww} \ - | | +------+ } \ : : +-------+ - | | : | E=5 | } \ +-------+ | | - | | : +------+ } \ { | C->3 |------>| | - | |------>| D=4 | } \ { +-------+ : | | - | | +------+ \ { | E->5 | : | | - +-------+ : : \ { +-------+ : | | - Transfer -->{ | A->1 | : | CPU 2 | - from CPU 1 { +-------+ : | | - to CPU 2 { | D->4 | : | | - { +-------+ : | | - { | B->2 |------>| | - +-------+ | | - : : +-------+ - - -If, however, a read barrier were to be placed between the load of C and the -load of D on CPU 2, then the partial ordering imposed by CPU 1 will be -perceived correctly by CPU 2. + +-------+ : : : : + | | +------+ +-------+ + | |------>| A=1 |------ --->| A->0 | + | | +------+ \ +-------+ + | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 | + | | +------+ | +-------+ + | |------>| B=2 |--- | : : + | | +------+ \ | : : +-------+ + +-------+ : : \ | +-------+ | | + ---------->| B->2 |------>| | + | +-------+ | CPU 2 | + | | A->0 |------>| | + | +-------+ | | + | : : +-------+ + \ : : + \ +-------+ + ---->| A->1 | + +-------+ + : : - +-------+ : : - | | +------+ - | |------>| C=3 | } - | | : +------+ } - | | : | A=1 | }--- - | | : +------+ } \ - | CPU 1 | : | B=2 | } \ - | | +------+ \ - | | wwwwwwwwwwwwwwww \ - | | +------+ \ : : +-------+ - | | : | E=5 | } \ +-------+ | | - | | : +------+ }--- \ { | C->3 |------>| | - | |------>| D=4 | } \ \ { +-------+ : | | - | | +------+ \ -->{ | B->2 | : | | - +-------+ : : \ { +-------+ : | | - \ { | A->1 | : | CPU 2 | - \ +-------+ | | - At this point the read ----> \ rrrrrrrrrrrrrrrrr | | - barrier causes all effects \ +-------+ | | - prior to the storage of C \ { | E->5 | : | | - to be perceptible to CPU 2 -->{ +-------+ : | | - { | D->4 |------>| | - +-------+ | | - : : +-------+ + +If, however, a read barrier were to be placed between the load of E and the +load of A on CPU 2: + + CPU 1 CPU 2 + ======================= ======================= + { A = 0, B = 9 } + STORE A=1 + <write barrier> + STORE B=2 + LOAD B + <read barrier> + LOAD A + +then the partial ordering imposed by CPU 1 will be perceived correctly by CPU +2: + + +-------+ : : : : + | | +------+ +-------+ + | |------>| A=1 |------ --->| A->0 | + | | +------+ \ +-------+ + | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 | + | | +------+ | +-------+ + | |------>| B=2 |--- | : : + | | +------+ \ | : : +-------+ + +-------+ : : \ | +-------+ | | + ---------->| B->2 |------>| | + | +-------+ | CPU 2 | + | : : | | + | : : | | + At this point the read ----> \ rrrrrrrrrrrrrrrrr | | + barrier causes all effects \ +-------+ | | + prior to the storage of B ---->| A->1 |------>| | + to be perceptible to CPU 2 +-------+ | | + : : +-------+ + + +To illustrate this more completely, consider what could happen if the code +contained a load of A either side of the read barrier: + + CPU 1 CPU 2 + ======================= ======================= + { A = 0, B = 9 } + STORE A=1 + <write barrier> + STORE B=2 + LOAD B + LOAD A [first load of A] + <read barrier> + LOAD A [second load of A] + +Even though the two loads of A both occur after the load of B, they may both +come up with different values: + + +-------+ : : : : + | | +------+ +-------+ + | |------>| A=1 |------ --->| A->0 | + | | +------+ \ +-------+ + | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 | + | | +------+ | +-------+ + | |------>| B=2 |--- | : : + | | +------+ \ | : : +-------+ + +-------+ : : \ | +-------+ | | + ---------->| B->2 |------>| | + | +-------+ | CPU 2 | + | : : | | + | : : | | + | +-------+ | | + | | A->0 |------>| 1st | + | +-------+ | | + At this point the read ----> \ rrrrrrrrrrrrrrrrr | | + barrier causes all effects \ +-------+ | | + prior to the storage of B ---->| A->1 |------>| 2nd | + to be perceptible to CPU 2 +-------+ | | + : : +-------+ + + +But it may be that the update to A from CPU 1 becomes perceptible to CPU 2 +before the read barrier completes anyway: + + +-------+ : : : : + | | +------+ +-------+ + | |------>| A=1 |------ --->| A->0 | + | | +------+ \ +-------+ + | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 | + | | +------+ | +-------+ + | |------>| B=2 |--- | : : + | | +------+ \ | : : +-------+ + +-------+ : : \ | +-------+ | | + ---------->| B->2 |------>| | + | +-------+ | CPU 2 | + | : : | | + \ : : | | + \ +-------+ | | + ---->| A->1 |------>| 1st | + +-------+ | | + rrrrrrrrrrrrrrrrr | | + +-------+ | | + | A->1 |------>| 2nd | + +-------+ | | + : : +-------+ + + +The guarantee is that the second load will always come up with A == 1 if the +load of B came up with B == 2. No such guarantee exists for the first load of +A; that may come up with either A == 0 or A == 1. + + +READ MEMORY BARRIERS VS LOAD SPECULATION +---------------------------------------- + +Many CPUs speculate with loads: that is they see that they will need to load an +item from memory, and they find a time where they're not using the bus for any +other loads, and so do the load in advance - even though they haven't actually +got to that point in the instruction execution flow yet. This permits the +actual load instruction to potentially complete immediately because the CPU +already has the value to hand. + +It may turn out that the CPU didn't actually need the value - perhaps because a +branch circumvented the load - in which case it can discard the value or just +cache it for later use. + +Consider: + + CPU 1 CPU 2 + ======================= ======================= + LOAD B + DIVIDE } Divide instructions generally + DIVIDE } take a long time to perform + LOAD A + +Which might appear as this: + + : : +-------+ + +-------+ | | + --->| B->2 |------>| | + +-------+ | CPU 2 | + : :DIVIDE | | + +-------+ | | + The CPU being busy doing a ---> --->| A->0 |~~~~ | | + division speculates on the +-------+ ~ | | + LOAD of A : : ~ | | + : :DIVIDE | | + : : ~ | | + Once the divisions are complete --> : : ~-->| | + the CPU can then perform the : : | | + LOAD with immediate effect : : +-------+ + + +Placing a read barrier or a data dependency barrier just before the second +load: + + CPU 1 CPU 2 + ======================= ======================= + LOAD B + DIVIDE + DIVIDE + <read barrier> + LOAD A + +will force any value speculatively obtained to be reconsidered to an extent +dependent on the type of barrier used. If there was no change made to the +speculated memory location, then the speculated value will just be used: + + : : +-------+ + +-------+ | | + --->| B->2 |------>| | + +-------+ | CPU 2 | + : :DIVIDE | | + +-------+ | | + The CPU being busy doing a ---> --->| A->0 |~~~~ | | + division speculates on the +-------+ ~ | | + LOAD of A : : ~ | | + : :DIVIDE | | + : : ~ | | + : : ~ | | + rrrrrrrrrrrrrrrr~ | | + : : ~ | | + : : ~-->| | + : : | | + : : +-------+ + + +but if there was an update or an invalidation from another CPU pending, then +the speculation will be cancelled and the value reloaded: + + : : +-------+ + +-------+ | | + --->| B->2 |------>| | + +-------+ | CPU 2 | + : :DIVIDE | | + +-------+ | | + The CPU being busy doing a ---> --->| A->0 |~~~~ | | + division speculates on the +-------+ ~ | | + LOAD of A : : ~ | | + : :DIVIDE | | + : : ~ | | + : : ~ | | + rrrrrrrrrrrrrrrrr | | + +-------+ | | + The speculation is discarded ---> --->| A->1 |------>| | + and an updated value is +-------+ | | + retrieved : : +-------+ ======================== @@ -829,8 +1023,8 @@ There are some more advanced barrier functions: (*) smp_mb__after_atomic_inc(); These are for use with atomic add, subtract, increment and decrement - functions, especially when used for reference counting. These functions - do not imply memory barriers. + functions that don't return a value, especially when used for reference + counting. These functions do not imply memory barriers. As an example, consider a piece of code that marks an object as being dead and then decrements the object's reference count: @@ -887,7 +1081,7 @@ IMPLICIT KERNEL MEMORY BARRIERS =============================== Some of the other functions in the linux kernel imply memory barriers, amongst -which are locking, scheduling and memory allocation functions. +which are locking and scheduling functions. This specification is a _minimum_ guarantee; any particular architecture may provide more substantial guarantees, but these may not be relied upon outside @@ -952,6 +1146,20 @@ equivalent to a full barrier, but a LOCK followed by an UNLOCK is not. barriers is that the effects instructions outside of a critical section may seep into the inside of the critical section. +A LOCK followed by an UNLOCK may not be assumed to be full memory barrier +because it is possible for an access preceding the LOCK to happen after the +LOCK, and an access following the UNLOCK to happen before the UNLOCK, and the +two accesses can themselves then cross: + + *A = a; + LOCK + UNLOCK + *B = b; + +may occur as: + + LOCK, STORE *B, STORE *A, UNLOCK + Locks and semaphores may not provide any guarantee of ordering on UP compiled systems, and so cannot be counted on in such a situation to actually achieve anything at all - especially with respect to I/O accesses - unless combined @@ -1002,8 +1210,6 @@ Other functions that imply barriers: (*) schedule() and similar imply full memory barriers. - (*) Memory allocation and release functions imply full memory barriers. - ================================= INTER-CPU LOCKING BARRIER EFFECTS @@ -1017,7 +1223,7 @@ conflict on any particular lock. LOCKS VS MEMORY ACCESSES ------------------------ -Consider the following: the system has a pair of spinlocks (N) and (Q), and +Consider the following: the system has a pair of spinlocks (M) and (Q), and three CPUs; then should the following sequence of events occur: CPU 1 CPU 2 @@ -1263,15 +1469,17 @@ else. ATOMIC OPERATIONS ----------------- -Though they are technically interprocessor interaction considerations, atomic -operations are noted specially as they do _not_ generally imply memory -barriers. The possible offenders include: +Whilst they are technically interprocessor interaction considerations, atomic +operations are noted specially as some of them imply full memory barriers and +some don't, but they're very heavily relied on as a group throughout the +kernel. + +Any atomic operation that modifies some state in memory and returns information +about the state (old or new) implies an SMP-conditional general memory barrier +(smp_mb()) on each side of the actual operation. These include: xchg(); cmpxchg(); - test_and_set_bit(); - test_and_clear_bit(); - test_and_change_bit(); atomic_cmpxchg(); atomic_inc_return(); atomic_dec_return(); @@ -1282,21 +1490,31 @@ barriers. The possible offenders include: atomic_sub_and_test(); atomic_add_negative(); atomic_add_unless(); + test_and_set_bit(); + test_and_clear_bit(); + test_and_change_bit(); + +These are used for such things as implementing LOCK-class and UNLOCK-class +operations and adjusting reference counters towards object destruction, and as +such the implicit memory barrier effects are necessary. -These may be used for such things as implementing LOCK operations or controlling -the lifetime of objects by decreasing their reference counts. In such cases -they need preceding memory barriers. -The following may also be possible offenders as they may be used as UNLOCK -operations. +The following operation are potential problems as they do _not_ imply memory +barriers, but might be used for implementing such things as UNLOCK-class +operations: + atomic_set(); set_bit(); clear_bit(); change_bit(); - atomic_set(); +With these the appropriate explicit memory barrier should be used if necessary +(smp_mb__before_clear_bit() for instance). -The following are a little tricky: + +The following also do _not_ imply memory barriers, and so may require explicit +memory barriers under some circumstances (smp_mb__before_atomic_dec() for +instance)): atomic_add(); atomic_sub(); @@ -1317,10 +1535,12 @@ specific order. Basically, each usage case has to be carefully considered as to whether memory -barriers are needed or not. The simplest rule is probably: if the atomic -operation is protected by a lock, then it does not require a barrier unless -there's another operation within the critical section with respect to which an -ordering must be maintained. +barriers are needed or not. + +[!] Note that special memory barrier primitives are available for these +situations because on some CPUs the atomic instructions used imply full memory +barriers, and so barrier instructions are superfluous in conjunction with them, +and in such cases the special barrier primitives will be no-ops. See Documentation/atomic_ops.txt for more information. @@ -1650,7 +1870,7 @@ CPU's caches by some other cache event: smp_wmb(); <A:modify v=2> <C:busy> <C:queue v=2> - p = &b; q = p; + p = &v; q = p; <D:request p> <B:modify p=&v> <D:commit p=&v> <D:read p> diff --git a/Documentation/mtrr.txt b/Documentation/mtrr.txt index b78af1c..c39ac39 100644 --- a/Documentation/mtrr.txt +++ b/Documentation/mtrr.txt @@ -138,19 +138,29 @@ Reading MTRRs from a C program using ioctl()'s: */ #include <stdio.h> +#include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/ioctl.h> #include <errno.h> -#define MTRR_NEED_STRINGS #include <asm/mtrr.h> #define TRUE 1 #define FALSE 0 #define ERRSTRING strerror (errno) +static char *mtrr_strings[MTRR_NUM_TYPES] = +{ + "uncachable", /* 0 */ + "write-combining", /* 1 */ + "?", /* 2 */ + "?", /* 3 */ + "write-through", /* 4 */ + "write-protect", /* 5 */ + "write-back", /* 6 */ +}; int main () { @@ -232,13 +242,22 @@ Creating MTRRs from a C programme using ioctl()'s: #include <fcntl.h> #include <sys/ioctl.h> #include <errno.h> -#define MTRR_NEED_STRINGS #include <asm/mtrr.h> #define TRUE 1 #define FALSE 0 #define ERRSTRING strerror (errno) +static char *mtrr_strings[MTRR_NUM_TYPES] = +{ + "uncachable", /* 0 */ + "write-combining", /* 1 */ + "?", /* 2 */ + "?", /* 3 */ + "write-through", /* 4 */ + "write-protect", /* 5 */ + "write-back", /* 6 */ +}; int main (int argc, char **argv) { diff --git a/Documentation/networking/README.ipw2200 b/Documentation/networking/README.ipw2200 index acb30c5..4f2a40f 100644 --- a/Documentation/networking/README.ipw2200 +++ b/Documentation/networking/README.ipw2200 @@ -14,8 +14,8 @@ Copyright (C) 2004-2006, Intel Corporation README.ipw2200 -Version: 1.0.8 -Date : October 20, 2005 +Version: 1.1.2 +Date : March 30, 2006 Index @@ -103,7 +103,7 @@ file. 1.1. Overview of Features ----------------------------------------------- -The current release (1.0.8) supports the following features: +The current release (1.1.2) supports the following features: + BSS mode (Infrastructure, Managed) + IBSS mode (Ad-Hoc) @@ -247,8 +247,8 @@ and can set the contents via echo. For example: % cat /sys/bus/pci/drivers/ipw2200/debug_level Will report the current debug level of the driver's logging subsystem -(only available if CONFIG_IPW_DEBUG was configured when the driver was -built). +(only available if CONFIG_IPW2200_DEBUG was configured when the driver +was built). You can set the debug level via: diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 8d8b4e5..afac780 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -1,7 +1,7 @@ Linux Ethernet Bonding Driver HOWTO - Latest update: 21 June 2005 + Latest update: 24 April 2006 Initial release : Thomas Davis <tadavis at lbl.gov> Corrections, HA extensions : 2000/10/03-15 : @@ -12,6 +12,8 @@ Corrections, HA extensions : 2000/10/03-15 : - Jay Vosburgh <fubar at us dot ibm dot com> Reorganized and updated Feb 2005 by Jay Vosburgh +Added Sysfs information: 2006/04/24 + - Mitch Williams <mitch.a.williams at intel.com> Introduction ============ @@ -38,61 +40,62 @@ Table of Contents 2. Bonding Driver Options 3. Configuring Bonding Devices -3.1 Configuration with sysconfig support -3.1.1 Using DHCP with sysconfig -3.1.2 Configuring Multiple Bonds with sysconfig -3.2 Configuration with initscripts support -3.2.1 Using DHCP with initscripts -3.2.2 Configuring Multiple Bonds with initscripts -3.3 Configuring Bonding Manually +3.1 Configuration with Sysconfig Support +3.1.1 Using DHCP with Sysconfig +3.1.2 Configuring Multiple Bonds with Sysconfig +3.2 Configuration with Initscripts Support +3.2.1 Using DHCP with Initscripts +3.2.2 Configuring Multiple Bonds with Initscripts +3.3 Configuring Bonding Manually with Ifenslave 3.3.1 Configuring Multiple Bonds Manually +3.4 Configuring Bonding Manually via Sysfs -5. Querying Bonding Configuration -5.1 Bonding Configuration -5.2 Network Configuration +4. Querying Bonding Configuration +4.1 Bonding Configuration +4.2 Network Configuration -6. Switch Configuration +5. Switch Configuration -7. 802.1q VLAN Support +6. 802.1q VLAN Support -8. Link Monitoring -8.1 ARP Monitor Operation -8.2 Configuring Multiple ARP Targets -8.3 MII Monitor Operation +7. Link Monitoring +7.1 ARP Monitor Operation +7.2 Configuring Multiple ARP Targets +7.3 MII Monitor Operation -9. Potential Trouble Sources -9.1 Adventures in Routing -9.2 Ethernet Device Renaming -9.3 Painfully Slow Or No Failed Link Detection By Miimon +8. Potential Trouble Sources +8.1 Adventures in Routing +8.2 Ethernet Device Renaming +8.3 Painfully Slow Or No Failed Link Detection By Miimon -10. SNMP agents +9. SNMP agents -11. Promiscuous mode +10. Promiscuous mode -12. Configuring Bonding for High Availability -12.1 High Availability in a Single Switch Topology -12.2 High Availability in a Multiple Switch Topology -12.2.1 HA Bonding Mode Selection for Multiple Switch Topology -12.2.2 HA Link Monitoring for Multiple Switch Topology +11. Configuring Bonding for High Availability +11.1 High Availability in a Single Switch Topology +11.2 High Availability in a Multiple Switch Topology +11.2.1 HA Bonding Mode Selection for Multiple Switch Topology +11.2.2 HA Link Monitoring for Multiple Switch Topology -13. Configuring Bonding for Maximum Throughput -13.1 Maximum Throughput in a Single Switch Topology -13.1.1 MT Bonding Mode Selection for Single Switch Topology -13.1.2 MT Link Monitoring for Single Switch Topology -13.2 Maximum Throughput in a Multiple Switch Topology -13.2.1 MT Bonding Mode Selection for Multiple Switch Topology -13.2.2 MT Link Monitoring for Multiple Switch Topology +12. Configuring Bonding for Maximum Throughput +12.1 Maximum Throughput in a Single Switch Topology +12.1.1 MT Bonding Mode Selection for Single Switch Topology +12.1.2 MT Link Monitoring for Single Switch Topology +12.2 Maximum Throughput in a Multiple Switch Topology +12.2.1 MT Bonding Mode Selection for Multiple Switch Topology +12.2.2 MT Link Monitoring for Multiple Switch Topology -14. Switch Behavior Issues -14.1 Link Establishment and Failover Delays -14.2 Duplicated Incoming Packets +13. Switch Behavior Issues +13.1 Link Establishment and Failover Delays +13.2 Duplicated Incoming Packets -15. Hardware Specific Considerations -15.1 IBM BladeCenter +14. Hardware Specific Considerations +14.1 IBM BladeCenter -16. Frequently Asked Questions +15. Frequently Asked Questions -17. Resources and Links +16. Resources and Links 1. Bonding Driver Installation @@ -156,6 +159,9 @@ you're trying to build it for. Some distros (e.g., Red Hat from 7.1 onwards) do not have /usr/include/linux symbolically linked to the default kernel source include directory. +SECOND IMPORTANT NOTE: + If you plan to configure bonding using sysfs, you do not need +to use ifenslave. 2. Bonding Driver Options ========================= @@ -270,7 +276,7 @@ mode In bonding version 2.6.2 or later, when a failover occurs in active-backup mode, bonding will issue one or more gratuitous ARPs on the newly active slave. - One gratutious ARP is issued for the bonding master + One gratuitous ARP is issued for the bonding master interface and each VLAN interfaces configured above it, provided that the interface has at least one IP address configured. Gratuitous ARPs issued for VLAN @@ -377,7 +383,7 @@ mode When a link is reconnected or a new slave joins the bond the receive traffic is redistributed among all active slaves in the bond by initiating ARP Replies - with the selected mac address to each of the + with the selected MAC address to each of the clients. The updelay parameter (detailed below) must be set to a value equal or greater than the switch's forwarding delay so that the ARP Replies sent to the @@ -498,11 +504,12 @@ not exist, and the layer2 policy is the only policy. 3. Configuring Bonding Devices ============================== - There are, essentially, two methods for configuring bonding: -with support from the distro's network initialization scripts, and -without. Distros generally use one of two packages for the network -initialization scripts: initscripts or sysconfig. Recent versions of -these packages have support for bonding, while older versions do not. + You can configure bonding using either your distro's network +initialization scripts, or manually using either ifenslave or the +sysfs interface. Distros generally use one of two packages for the +network initialization scripts: initscripts or sysconfig. Recent +versions of these packages have support for bonding, while older +versions do not. We will first describe the options for configuring bonding for distros using versions of initscripts and sysconfig with full or @@ -530,7 +537,7 @@ $ grep ifenslave /sbin/ifup If this returns any matches, then your initscripts or sysconfig has support for bonding. -3.1 Configuration with sysconfig support +3.1 Configuration with Sysconfig Support ---------------------------------------- This section applies to distros using a version of sysconfig @@ -538,7 +545,7 @@ with bonding support, for example, SuSE Linux Enterprise Server 9. SuSE SLES 9's networking configuration system does support bonding, however, at this writing, the YaST system configuration -frontend does not provide any means to work with bonding devices. +front end does not provide any means to work with bonding devices. Bonding devices can be managed by hand, however, as follows. First, if they have not already been configured, configure the @@ -660,7 +667,7 @@ format can be found in an example ifcfg template file: Note that the template does not document the various BONDING_ settings described above, but does describe many of the other options. -3.1.1 Using DHCP with sysconfig +3.1.1 Using DHCP with Sysconfig ------------------------------- Under sysconfig, configuring a device with BOOTPROTO='dhcp' @@ -670,7 +677,7 @@ attempt to obtain the device address from DHCP prior to adding any of the slave devices. Without active slaves, the DHCP requests are not sent to the network. -3.1.2 Configuring Multiple Bonds with sysconfig +3.1.2 Configuring Multiple Bonds with Sysconfig ----------------------------------------------- The sysconfig network initialization system is capable of @@ -685,7 +692,7 @@ ifcfg-bondX files. options in the ifcfg-bondX file, it is not necessary to add them to the system /etc/modules.conf or /etc/modprobe.conf configuration file. -3.2 Configuration with initscripts support +3.2 Configuration with Initscripts Support ------------------------------------------ This section applies to distros using a version of initscripts @@ -756,7 +763,7 @@ options for your configuration. will restart the networking subsystem and your bond link should be now up and running. -3.2.1 Using DHCP with initscripts +3.2.1 Using DHCP with Initscripts --------------------------------- Recent versions of initscripts (the version supplied with @@ -768,7 +775,7 @@ above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp" and add a line consisting of "TYPE=Bonding". Note that the TYPE value is case sensitive. -3.2.2 Configuring Multiple Bonds with initscripts +3.2.2 Configuring Multiple Bonds with Initscripts ------------------------------------------------- At this writing, the initscripts package does not directly @@ -784,8 +791,8 @@ Fedora Core kernels, and has been seen on RHEL 4 as well. On kernels exhibiting this problem, it will be impossible to configure multiple bonds with differing parameters. -3.3 Configuring Bonding Manually --------------------------------- +3.3 Configuring Bonding Manually with Ifenslave +----------------------------------------------- This section applies to distros whose network initialization scripts (the sysconfig or initscripts package) do not have specific @@ -889,11 +896,139 @@ install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \ This may be repeated any number of times, specifying a new and unique name in place of bond1 for each subsequent instance. +3.4 Configuring Bonding Manually via Sysfs +------------------------------------------ + + Starting with version 3.0, Channel Bonding may be configured +via the sysfs interface. This interface allows dynamic configuration +of all bonds in the system without unloading the module. It also +allows for adding and removing bonds at runtime. Ifenslave is no +longer required, though it is still supported. + + Use of the sysfs interface allows you to use multiple bonds +with different configurations without having to reload the module. +It also allows you to use multiple, differently configured bonds when +bonding is compiled into the kernel. + + You must have the sysfs filesystem mounted to configure +bonding this way. The examples in this document assume that you +are using the standard mount point for sysfs, e.g. /sys. If your +sysfs filesystem is mounted elsewhere, you will need to adjust the +example paths accordingly. + +Creating and Destroying Bonds +----------------------------- +To add a new bond foo: +# echo +foo > /sys/class/net/bonding_masters + +To remove an existing bond bar: +# echo -bar > /sys/class/net/bonding_masters + +To show all existing bonds: +# cat /sys/class/net/bonding_masters + +NOTE: due to 4K size limitation of sysfs files, this list may be +truncated if you have more than a few hundred bonds. This is unlikely +to occur under normal operating conditions. + +Adding and Removing Slaves +-------------------------- + Interfaces may be enslaved to a bond using the file +/sys/class/net/<bond>/bonding/slaves. The semantics for this file +are the same as for the bonding_masters file. + +To enslave interface eth0 to bond bond0: +# ifconfig bond0 up +# echo +eth0 > /sys/class/net/bond0/bonding/slaves + +To free slave eth0 from bond bond0: +# echo -eth0 > /sys/class/net/bond0/bonding/slaves + + NOTE: The bond must be up before slaves can be added. All +slaves are freed when the interface is brought down. + + When an interface is enslaved to a bond, symlinks between the +two are created in the sysfs filesystem. In this case, you would get +/sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and +/sys/class/net/eth0/master pointing to /sys/class/net/bond0. + + This means that you can tell quickly whether or not an +interface is enslaved by looking for the master symlink. Thus: +# echo -eth0 > /sys/class/net/eth0/master/bonding/slaves +will free eth0 from whatever bond it is enslaved to, regardless of +the name of the bond interface. + +Changing a Bond's Configuration +------------------------------- + Each bond may be configured individually by manipulating the +files located in /sys/class/net/<bond name>/bonding + + The names of these files correspond directly with the command- +line parameters described elsewhere in in this file, and, with the +exception of arp_ip_target, they accept the same values. To see the +current setting, simply cat the appropriate file. + + A few examples will be given here; for specific usage +guidelines for each parameter, see the appropriate section in this +document. + +To configure bond0 for balance-alb mode: +# ifconfig bond0 down +# echo 6 > /sys/class/net/bond0/bonding/mode + - or - +# echo balance-alb > /sys/class/net/bond0/bonding/mode + NOTE: The bond interface must be down before the mode can be +changed. + +To enable MII monitoring on bond0 with a 1 second interval: +# echo 1000 > /sys/class/net/bond0/bonding/miimon + NOTE: If ARP monitoring is enabled, it will disabled when MII +monitoring is enabled, and vice-versa. + +To add ARP targets: +# echo +192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target +# echo +192.168.0.101 > /sys/class/net/bond0/bonding/arp_ip_target + NOTE: up to 10 target addresses may be specified. + +To remove an ARP target: +# echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target + +Example Configuration +--------------------- + We begin with the same example that is shown in section 3.3, +executed with sysfs, and without using ifenslave. + + To make a simple bond of two e100 devices (presumed to be eth0 +and eth1), and have it persist across reboots, edit the appropriate +file (/etc/init.d/boot.local or /etc/rc.d/rc.local), and add the +following: + +modprobe bonding +modprobe e100 +echo balance-alb > /sys/class/net/bond0/bonding/mode +ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up +echo 100 > /sys/class/net/bond0/bonding/miimon +echo +eth0 > /sys/class/net/bond0/bonding/slaves +echo +eth1 > /sys/class/net/bond0/bonding/slaves + + To add a second bond, with two e1000 interfaces in +active-backup mode, using ARP monitoring, add the following lines to +your init script: + +modprobe e1000 +echo +bond1 > /sys/class/net/bonding_masters +echo active-backup > /sys/class/net/bond1/bonding/mode +ifconfig bond1 192.168.2.1 netmask 255.255.255.0 up +echo +192.168.2.100 /sys/class/net/bond1/bonding/arp_ip_target +echo 2000 > /sys/class/net/bond1/bonding/arp_interval +echo +eth2 > /sys/class/net/bond1/bonding/slaves +echo +eth3 > /sys/class/net/bond1/bonding/slaves + -5. Querying Bonding Configuration +4. Querying Bonding Configuration ================================= -5.1 Bonding Configuration +4.1 Bonding Configuration ------------------------- Each bonding device has a read-only file residing in the @@ -923,7 +1058,7 @@ generally as follows: The precise format and contents will change depending upon the bonding configuration, state, and version of the bonding driver. -5.2 Network configuration +4.2 Network configuration ------------------------- The network configuration can be inspected using the ifconfig @@ -958,7 +1093,7 @@ eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 collisions:0 txqueuelen:100 Interrupt:9 Base address:0x1400 -6. Switch Configuration +5. Switch Configuration ======================= For this section, "switch" refers to whatever system the @@ -991,7 +1126,7 @@ transmit policy for an EtherChannel group; all three will interoperate with another EtherChannel group. -7. 802.1q VLAN Support +6. 802.1q VLAN Support ====================== It is possible to configure VLAN devices over a bond interface @@ -1042,7 +1177,7 @@ underlying device -- i.e. the bonding interface -- to promiscuous mode, which might not be what you want. -8. Link Monitoring +7. Link Monitoring ================== The bonding driver at present supports two schemes for @@ -1053,7 +1188,7 @@ monitor. bonding driver itself, it is not possible to enable both ARP and MII monitoring simultaneously. -8.1 ARP Monitor Operation +7.1 ARP Monitor Operation ------------------------- The ARP monitor operates as its name suggests: it sends ARP @@ -1071,7 +1206,7 @@ those slaves will stay down. If networking monitoring (tcpdump, etc) shows the ARP requests and replies on the network, then it may be that your device driver is not updating last_rx and trans_start. -8.2 Configuring Multiple ARP Targets +7.2 Configuring Multiple ARP Targets ------------------------------------ While ARP monitoring can be done with just one target, it can @@ -1094,7 +1229,7 @@ alias bond0 bonding options bond0 arp_interval=60 arp_ip_target=192.168.0.100 -8.3 MII Monitor Operation +7.3 MII Monitor Operation ------------------------- The MII monitor monitors only the carrier state of the local @@ -1120,14 +1255,14 @@ does not support or had some error in processing both the MII register and ethtool requests), then the MII monitor will assume the link is up. -9. Potential Sources of Trouble +8. Potential Sources of Trouble =============================== -9.1 Adventures in Routing +8.1 Adventures in Routing ------------------------- When bonding is configured, it is important that the slave -devices not have routes that supercede routes of the master (or, +devices not have routes that supersede routes of the master (or, generally, not have routes at all). For example, suppose the bonding device bond0 has two slaves, eth0 and eth1, and the routing table is as follows: @@ -1154,11 +1289,11 @@ by the state of the routing table. The solution here is simply to insure that slaves do not have routes of their own, and if for some reason they must, those routes do -not supercede routes of their master. This should generally be the +not supersede routes of their master. This should generally be the case, but unusual configurations or errant manual or automatic static route additions may cause trouble. -9.2 Ethernet Device Renaming +8.2 Ethernet Device Renaming ---------------------------- On systems with network configuration scripts that do not @@ -1207,7 +1342,7 @@ modprobe with --ignore-install to cause the normal action to then take place. Full documentation on this can be found in the modprobe.conf and modprobe manual pages. -9.3. Painfully Slow Or No Failed Link Detection By Miimon +8.3. Painfully Slow Or No Failed Link Detection By Miimon --------------------------------------------------------- By default, bonding enables the use_carrier option, which @@ -1235,7 +1370,7 @@ carrier state. It has no way to determine the state of devices on or beyond other ports of a switch, or if a switch is refusing to pass traffic while still maintaining carrier on. -10. SNMP agents +9. SNMP agents =============== If running SNMP agents, the bonding driver should be loaded @@ -1281,7 +1416,7 @@ ifDescr, the association between the IP address and IfIndex remains and SNMP functions such as Interface_Scan_Next will report that association. -11. Promiscuous mode +10. Promiscuous mode ==================== When running network monitoring tools, e.g., tcpdump, it is @@ -1308,7 +1443,7 @@ sending to peers that are unassigned or if the load is unbalanced. the active slave changes (e.g., due to a link failure), the promiscuous setting will be propagated to the new active slave. -12. Configuring Bonding for High Availability +11. Configuring Bonding for High Availability ============================================= High Availability refers to configurations that provide @@ -1318,7 +1453,7 @@ goal is to provide the maximum availability of network connectivity (i.e., the network always works), even though other configurations could provide higher throughput. -12.1 High Availability in a Single Switch Topology +11.1 High Availability in a Single Switch Topology -------------------------------------------------- If two hosts (or a host and a single switch) are directly @@ -1332,7 +1467,7 @@ the load will be rebalanced across the remaining devices. See Section 13, "Configuring Bonding for Maximum Throughput" for information on configuring bonding with one peer device. -12.2 High Availability in a Multiple Switch Topology +11.2 High Availability in a Multiple Switch Topology ---------------------------------------------------- With multiple switches, the configuration of bonding and the @@ -1359,7 +1494,7 @@ switches (ISL, or inter switch link), and multiple ports connecting to the outside world ("port3" on each switch). There is no technical reason that this could not be extended to a third switch. -12.2.1 HA Bonding Mode Selection for Multiple Switch Topology +11.2.1 HA Bonding Mode Selection for Multiple Switch Topology ------------------------------------------------------------- In a topology such as the example above, the active-backup and @@ -1381,7 +1516,7 @@ broadcast: This mode is really a special purpose mode, and is suitable necessary for some specific one-way traffic to reach both independent networks, then the broadcast mode may be suitable. -12.2.2 HA Link Monitoring Selection for Multiple Switch Topology +11.2.2 HA Link Monitoring Selection for Multiple Switch Topology ---------------------------------------------------------------- The choice of link monitoring ultimately depends upon your @@ -1402,10 +1537,10 @@ regardless of which switch is active, the ARP monitor has a suitable target to query. -13. Configuring Bonding for Maximum Throughput +12. Configuring Bonding for Maximum Throughput ============================================== -13.1 Maximizing Throughput in a Single Switch Topology +12.1 Maximizing Throughput in a Single Switch Topology ------------------------------------------------------ In a single switch configuration, the best method to maximize @@ -1476,7 +1611,7 @@ destination to make load balancing decisions. The behavior of each mode is described below. -13.1.1 MT Bonding Mode Selection for Single Switch Topology +12.1.1 MT Bonding Mode Selection for Single Switch Topology ----------------------------------------------------------- This configuration is the easiest to set up and to understand, @@ -1607,7 +1742,7 @@ balance-alb: This mode is everything that balance-tlb is, and more. device driver must support changing the hardware address while the device is open. -13.1.2 MT Link Monitoring for Single Switch Topology +12.1.2 MT Link Monitoring for Single Switch Topology ---------------------------------------------------- The choice of link monitoring may largely depend upon which @@ -1616,7 +1751,7 @@ support the use of the ARP monitor, and are thus restricted to using the MII monitor (which does not provide as high a level of end to end assurance as the ARP monitor). -13.2 Maximum Throughput in a Multiple Switch Topology +12.2 Maximum Throughput in a Multiple Switch Topology ----------------------------------------------------- Multiple switches may be utilized to optimize for throughput @@ -1651,7 +1786,7 @@ a single 72 port switch. can be equipped with an additional network device connected to an external network; this host then additionally acts as a gateway. -13.2.1 MT Bonding Mode Selection for Multiple Switch Topology +12.2.1 MT Bonding Mode Selection for Multiple Switch Topology ------------------------------------------------------------- In actual practice, the bonding mode typically employed in @@ -1664,7 +1799,7 @@ packets has arrived). When employed in this fashion, the balance-rr mode allows individual connections between two hosts to effectively utilize greater than one interface's bandwidth. -13.2.2 MT Link Monitoring for Multiple Switch Topology +12.2.2 MT Link Monitoring for Multiple Switch Topology ------------------------------------------------------ Again, in actual practice, the MII monitor is most often used @@ -1674,10 +1809,10 @@ advantages over the MII monitor are mitigated by the volume of probes needed as the number of systems involved grows (remember that each host in the network is configured with bonding). -14. Switch Behavior Issues +13. Switch Behavior Issues ========================== -14.1 Link Establishment and Failover Delays +13.1 Link Establishment and Failover Delays ------------------------------------------- Some switches exhibit undesirable behavior with regard to the @@ -1712,7 +1847,7 @@ switches take a long time to go into backup mode, it may be desirable to not activate a backup interface immediately after a link goes down. Failover may be delayed via the downdelay bonding module option. -14.2 Duplicated Incoming Packets +13.2 Duplicated Incoming Packets -------------------------------- It is not uncommon to observe a short burst of duplicated @@ -1751,14 +1886,14 @@ behavior, it can be induced by clearing the MAC forwarding table (on most Cisco switches, the privileged command "clear mac address-table dynamic" will accomplish this). -15. Hardware Specific Considerations +14. Hardware Specific Considerations ==================================== This section contains additional information for configuring bonding on specific hardware platforms, or for interfacing bonding with particular switches or other devices. -15.1 IBM BladeCenter +14.1 IBM BladeCenter -------------------- This applies to the JS20 and similar systems. @@ -1861,7 +1996,7 @@ bonding driver. avoid fail-over delay issues when using bonding. -16. Frequently Asked Questions +15. Frequently Asked Questions ============================== 1. Is it SMP safe? @@ -1925,7 +2060,7 @@ not have special switch requirements, but do need device drivers that support specific features (described in the appropriate section under module parameters, above). - In 802.3ad mode, it works with with systems that support IEEE + In 802.3ad mode, it works with systems that support IEEE 802.3ad Dynamic Link Aggregation. Most managed and many unmanaged switches currently available support 802.3ad. diff --git a/Documentation/networking/operstates.txt b/Documentation/networking/operstates.txt new file mode 100644 index 0000000..4a21d9b --- /dev/null +++ b/Documentation/networking/operstates.txt @@ -0,0 +1,161 @@ + +1. Introduction + +Linux distinguishes between administrative and operational state of an +interface. Admininstrative state is the result of "ip link set dev +<dev> up or down" and reflects whether the administrator wants to use +the device for traffic. + +However, an interface is not usable just because the admin enabled it +- ethernet requires to be plugged into the switch and, depending on +a site's networking policy and configuration, an 802.1X authentication +to be performed before user data can be transferred. Operational state +shows the ability of an interface to transmit this user data. + +Thanks to 802.1X, userspace must be granted the possibility to +influence operational state. To accommodate this, operational state is +split into two parts: Two flags that can be set by the driver only, and +a RFC2863 compatible state that is derived from these flags, a policy, +and changeable from userspace under certain rules. + + +2. Querying from userspace + +Both admin and operational state can be queried via the netlink +operation RTM_GETLINK. It is also possible to subscribe to RTMGRP_LINK +to be notified of updates. This is important for setting from userspace. + +These values contain interface state: + +ifinfomsg::if_flags & IFF_UP: + Interface is admin up +ifinfomsg::if_flags & IFF_RUNNING: + Interface is in RFC2863 operational state UP or UNKNOWN. This is for + backward compatibility, routing daemons, dhcp clients can use this + flag to determine whether they should use the interface. +ifinfomsg::if_flags & IFF_LOWER_UP: + Driver has signaled netif_carrier_on() +ifinfomsg::if_flags & IFF_DORMANT: + Driver has signaled netif_dormant_on() + +These interface flags can also be queried without netlink using the +SIOCGIFFLAGS ioctl. + +TLV IFLA_OPERSTATE + +contains RFC2863 state of the interface in numeric representation: + +IF_OPER_UNKNOWN (0): + Interface is in unknown state, neither driver nor userspace has set + operational state. Interface must be considered for user data as + setting operational state has not been implemented in every driver. +IF_OPER_NOTPRESENT (1): + Unused in current kernel (notpresent interfaces normally disappear), + just a numerical placeholder. +IF_OPER_DOWN (2): + Interface is unable to transfer data on L1, f.e. ethernet is not + plugged or interface is ADMIN down. +IF_OPER_LOWERLAYERDOWN (3): + Interfaces stacked on an interface that is IF_OPER_DOWN show this + state (f.e. VLAN). +IF_OPER_TESTING (4): + Unused in current kernel. +IF_OPER_DORMANT (5): + Interface is L1 up, but waiting for an external event, f.e. for a + protocol to establish. (802.1X) +IF_OPER_UP (6): + Interface is operational up and can be used. + +This TLV can also be queried via sysfs. + +TLV IFLA_LINKMODE + +contains link policy. This is needed for userspace interaction +described below. + +This TLV can also be queried via sysfs. + + +3. Kernel driver API + +Kernel drivers have access to two flags that map to IFF_LOWER_UP and +IFF_DORMANT. These flags can be set from everywhere, even from +interrupts. It is guaranteed that only the driver has write access, +however, if different layers of the driver manipulate the same flag, +the driver has to provide the synchronisation needed. + +__LINK_STATE_NOCARRIER, maps to !IFF_LOWER_UP: + +The driver uses netif_carrier_on() to clear and netif_carrier_off() to +set this flag. On netif_carrier_off(), the scheduler stops sending +packets. The name 'carrier' and the inversion are historical, think of +it as lower layer. + +netif_carrier_ok() can be used to query that bit. + +__LINK_STATE_DORMANT, maps to IFF_DORMANT: + +Set by the driver to express that the device cannot yet be used +because some driver controlled protocol establishment has to +complete. Corresponding functions are netif_dormant_on() to set the +flag, netif_dormant_off() to clear it and netif_dormant() to query. + +On device allocation, networking core sets the flags equivalent to +netif_carrier_ok() and !netif_dormant(). + + +Whenever the driver CHANGES one of these flags, a workqueue event is +scheduled to translate the flag combination to IFLA_OPERSTATE as +follows: + +!netif_carrier_ok(): + IF_OPER_LOWERLAYERDOWN if the interface is stacked, IF_OPER_DOWN + otherwise. Kernel can recognise stacked interfaces because their + ifindex != iflink. + +netif_carrier_ok() && netif_dormant(): + IF_OPER_DORMANT + +netif_carrier_ok() && !netif_dormant(): + IF_OPER_UP if userspace interaction is disabled. Otherwise + IF_OPER_DORMANT with the possibility for userspace to initiate the + IF_OPER_UP transition afterwards. + + +4. Setting from userspace + +Applications have to use the netlink interface to influence the +RFC2863 operational state of an interface. Setting IFLA_LINKMODE to 1 +via RTM_SETLINK instructs the kernel that an interface should go to +IF_OPER_DORMANT instead of IF_OPER_UP when the combination +netif_carrier_ok() && !netif_dormant() is set by the +driver. Afterwards, the userspace application can set IFLA_OPERSTATE +to IF_OPER_DORMANT or IF_OPER_UP as long as the driver does not set +netif_carrier_off() or netif_dormant_on(). Changes made by userspace +are multicasted on the netlink group RTMGRP_LINK. + +So basically a 802.1X supplicant interacts with the kernel like this: + +-subscribe to RTMGRP_LINK +-set IFLA_LINKMODE to 1 via RTM_SETLINK +-query RTM_GETLINK once to get initial state +-if initial flags are not (IFF_LOWER_UP && !IFF_DORMANT), wait until + netlink multicast signals this state +-do 802.1X, eventually abort if flags go down again +-send RTM_SETLINK to set operstate to IF_OPER_UP if authentication + succeeds, IF_OPER_DORMANT otherwise +-see how operstate and IFF_RUNNING is echoed via netlink multicast +-set interface back to IF_OPER_DORMANT if 802.1X reauthentication + fails +-restart if kernel changes IFF_LOWER_UP or IFF_DORMANT flag + +if supplicant goes down, bring back IFLA_LINKMODE to 0 and +IFLA_OPERSTATE to a sane value. + +A routing daemon or dhcp client just needs to care for IFF_RUNNING or +waiting for operstate to go IF_OPER_UP/IF_OPER_UNKNOWN before +considering the interface / querying a DHCP address. + + +For technical questions and/or comments please e-mail to Stefan Rompf +(stefan at loplof.de). diff --git a/Documentation/networking/xfrm_sync.txt b/Documentation/networking/xfrm_sync.txt new file mode 100644 index 0000000..8be626f --- /dev/null +++ b/Documentation/networking/xfrm_sync.txt @@ -0,0 +1,166 @@ + +The sync patches work is based on initial patches from +Krisztian <hidden@balabit.hu> and others and additional patches +from Jamal <hadi@cyberus.ca>. + +The end goal for syncing is to be able to insert attributes + generate +events so that the an SA can be safely moved from one machine to another +for HA purposes. +The idea is to synchronize the SA so that the takeover machine can do +the processing of the SA as accurate as possible if it has access to it. + +We already have the ability to generate SA add/del/upd events. +These patches add ability to sync and have accurate lifetime byte (to +ensure proper decay of SAs) and replay counters to avoid replay attacks +with as minimal loss at failover time. +This way a backup stays as closely uptodate as an active member. + +Because the above items change for every packet the SA receives, +it is possible for a lot of the events to be generated. +For this reason, we also add a nagle-like algorithm to restrict +the events. i.e we are going to set thresholds to say "let me +know if the replay sequence threshold is reached or 10 secs have passed" +These thresholds are set system-wide via sysctls or can be updated +per SA. + +The identified items that need to be synchronized are: +- the lifetime byte counter +note that: lifetime time limit is not important if you assume the failover +machine is known ahead of time since the decay of the time countdown +is not driven by packet arrival. +- the replay sequence for both inbound and outbound + +1) Message Structure +---------------------- + +nlmsghdr:aevent_id:optional-TLVs. + +The netlink message types are: + +XFRM_MSG_NEWAE and XFRM_MSG_GETAE. + +A XFRM_MSG_GETAE does not have TLVs. +A XFRM_MSG_NEWAE will have at least two TLVs (as is +discussed further below). + +aevent_id structure looks like: + + struct xfrm_aevent_id { + struct xfrm_usersa_id sa_id; + __u32 flags; + }; + +xfrm_usersa_id in this message layout identifies the SA. + +flags are used to indicate different things. The possible +flags are: + XFRM_AE_RTHR=1, /* replay threshold*/ + XFRM_AE_RVAL=2, /* replay value */ + XFRM_AE_LVAL=4, /* lifetime value */ + XFRM_AE_ETHR=8, /* expiry timer threshold */ + XFRM_AE_CR=16, /* Event cause is replay update */ + XFRM_AE_CE=32, /* Event cause is timer expiry */ + XFRM_AE_CU=64, /* Event cause is policy update */ + +How these flags are used is dependent on the direction of the +message (kernel<->user) as well the cause (config, query or event). +This is described below in the different messages. + +The pid will be set appropriately in netlink to recognize direction +(0 to the kernel and pid = processid that created the event +when going from kernel to user space) + +A program needs to subscribe to multicast group XFRMNLGRP_AEVENTS +to get notified of these events. + +2) TLVS reflect the different parameters: +----------------------------------------- + +a) byte value (XFRMA_LTIME_VAL) +This TLV carries the running/current counter for byte lifetime since +last event. + +b)replay value (XFRMA_REPLAY_VAL) +This TLV carries the running/current counter for replay sequence since +last event. + +c)replay threshold (XFRMA_REPLAY_THRESH) +This TLV carries the threshold being used by the kernel to trigger events +when the replay sequence is exceeded. + +d) expiry timer (XFRMA_ETIMER_THRESH) +This is a timer value in milliseconds which is used as the nagle +value to rate limit the events. + +3) Default configurations for the parameters: +---------------------------------------------- + +By default these events should be turned off unless there is +at least one listener registered to listen to the multicast +group XFRMNLGRP_AEVENTS. + +Programs installing SAs will need to specify the two thresholds, however, +in order to not change existing applications such as racoon +we also provide default threshold values for these different parameters +in case they are not specified. + +the two sysctls/proc entries are: +a) /proc/sys/net/core/sysctl_xfrm_aevent_etime +used to provide default values for the XFRMA_ETIMER_THRESH in incremental +units of time of 100ms. The default is 10 (1 second) + +b) /proc/sys/net/core/sysctl_xfrm_aevent_rseqth +used to provide default values for XFRMA_REPLAY_THRESH parameter +in incremental packet count. The default is two packets. + +4) Message types +---------------- + +a) XFRM_MSG_GETAE issued by user-->kernel. +XFRM_MSG_GETAE does not carry any TLVs. +The response is a XFRM_MSG_NEWAE which is formatted based on what +XFRM_MSG_GETAE queried for. +The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs. +*if XFRM_AE_RTHR flag is set, then XFRMA_REPLAY_THRESH is also retrieved +*if XFRM_AE_ETHR flag is set, then XFRMA_ETIMER_THRESH is also retrieved + +b) XFRM_MSG_NEWAE is issued by either user space to configure +or kernel to announce events or respond to a XFRM_MSG_GETAE. + +i) user --> kernel to configure a specific SA. +any of the values or threshold parameters can be updated by passing the +appropriate TLV. +A response is issued back to the sender in user space to indicate success +or failure. +In the case of success, additionally an event with +XFRM_MSG_NEWAE is also issued to any listeners as described in iii). + +ii) kernel->user direction as a response to XFRM_MSG_GETAE +The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs. +The threshold TLVs will be included if explicitly requested in +the XFRM_MSG_GETAE message. + +iii) kernel->user to report as event if someone sets any values or +thresholds for an SA using XFRM_MSG_NEWAE (as described in #i above). +In such a case XFRM_AE_CU flag is set to inform the user that +the change happened as a result of an update. +The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs. + +iv) kernel->user to report event when replay threshold or a timeout +is exceeded. +In such a case either XFRM_AE_CR (replay exceeded) or XFRM_AE_CE (timeout +happened) is set to inform the user what happened. +Note the two flags are mutually exclusive. +The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs. + +Exceptions to threshold settings +-------------------------------- + +If you have an SA that is getting hit by traffic in bursts such that +there is a period where the timer threshold expires with no packets +seen, then an odd behavior is seen as follows: +The first packet arrival after a timer expiry will trigger a timeout +aevent; i.e we dont wait for a timeout period or a packet threshold +to be reached. This is done for simplicity and efficiency reasons. + +-JHS diff --git a/Documentation/pci.txt b/Documentation/pci.txt index 711210b..66bbbf1 100644 --- a/Documentation/pci.txt +++ b/Documentation/pci.txt @@ -259,7 +259,17 @@ on the bus need to be capable of doing it, so this is something which needs to be handled by platform and generic code, not individual drivers. -8. Obsolete functions +8. Vendor and device identifications +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +For the future, let's avoid adding device ids to include/linux/pci_ids.h. + +PCI_VENDOR_ID_xxx for vendors, and a hex constant for device ids. + +Rationale: PCI_VENDOR_ID_xxx constants are re-used, but device ids are not. + Further, device ids are arbitrary hex numbers, normally used only in a + single location, the pci_device_id table. + +9. Obsolete functions ~~~~~~~~~~~~~~~~~~~~~ There are several functions which you might come across when trying to port an old driver to the new PCI interface. They are no longer present diff --git a/Documentation/power/video.txt b/Documentation/power/video.txt index d18a57d..43a889f 100644 --- a/Documentation/power/video.txt +++ b/Documentation/power/video.txt @@ -140,7 +140,7 @@ IBM TP T41p s3_bios (2), switch to X after resume IBM TP T42 s3_bios (2) IBM ThinkPad T42p (2373-GTG) s3_bios (2) IBM TP X20 ??? (*) -IBM TP X30 s3_bios (2) +IBM TP X30 s3_bios, s3_mode (4) IBM TP X31 / Type 2672-XXH none (1), use radeontool (http://fdd.com/software/radeon/) to turn off backlight. IBM TP X32 none (1), but backlight is on and video is trashed after long suspend. s3_bios,s3_mode (4) works too. Perhaps that gets better results? IBM Thinkpad X40 Type 2371-7JG s3_bios,s3_mode (4) diff --git a/Documentation/scsi/ChangeLog.megaraid b/Documentation/scsi/ChangeLog.megaraid index 09f6300..c173806 100644 --- a/Documentation/scsi/ChangeLog.megaraid +++ b/Documentation/scsi/ChangeLog.megaraid @@ -1,3 +1,28 @@ +Release Date : Mon Apr 11 12:27:22 EST 2006 - Seokmann Ju <sju@lsil.com> +Current Version : 2.20.4.8 (scsi module), 2.20.2.6 (cmm module) +Older Version : 2.20.4.7 (scsi module), 2.20.2.6 (cmm module) + +1. Fixed a bug in megaraid_reset_handler(). + Customer reported "Unable to handle kernel NULL pointer dereference + at virtual address 00000000" when system goes to reset condition + for some reason. It happened randomly. + Root Cause: in the megaraid_reset_handler(), there is possibility not + returning pending packets in the pend_list if there are multiple + pending packets. + Fix: Made the change in the driver so that it will return all packets + in the pend_list. + +2. Added change request. + As found in the following URL, rmb() only didn't help the + problem. I had to increase the loop counter to 0xFFFFFF. (6 F's) + http://marc.theaimsgroup.com/?l=linux-scsi&m=110971060502497&w=2 + + I attached a patch for your reference, too. + Could you check and get this fix in your driver? + + Best Regards, + Jun'ichi Nomura + Release Date : Fri Nov 11 12:27:22 EST 2005 - Seokmann Ju <sju@lsil.com> Current Version : 2.20.4.7 (scsi module), 2.20.2.6 (cmm module) Older Version : 2.20.4.6 (scsi module), 2.20.2.6 (cmm module) diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt index 331afd7..ce767b9 100644 --- a/Documentation/scsi/scsi_eh.txt +++ b/Documentation/scsi/scsi_eh.txt @@ -19,9 +19,9 @@ TABLE OF CONTENTS [2-1-1] Overview [2-1-2] Flow of scmds through EH [2-1-3] Flow of control - [2-2] EH through hostt->eh_strategy_handler() - [2-2-1] Pre hostt->eh_strategy_handler() SCSI midlayer conditions - [2-2-2] Post hostt->eh_strategy_handler() SCSI midlayer conditions + [2-2] EH through transportt->eh_strategy_handler() + [2-2-1] Pre transportt->eh_strategy_handler() SCSI midlayer conditions + [2-2-2] Post transportt->eh_strategy_handler() SCSI midlayer conditions [2-2-3] Things to consider @@ -413,9 +413,9 @@ scmd->allowed. layer of failure of the scmds. -[2-2] EH through hostt->eh_strategy_handler() +[2-2] EH through transportt->eh_strategy_handler() - hostt->eh_strategy_handler() is invoked in the place of + transportt->eh_strategy_handler() is invoked in the place of scsi_unjam_host() and it is responsible for whole recovery process. On completion, the handler should have made lower layers forget about all failed scmds and either ready for new commands or offline. Also, @@ -424,7 +424,7 @@ SCSI midlayer. IOW, of the steps described in [2-1-2], all steps except for #1 must be implemented by eh_strategy_handler(). -[2-2-1] Pre hostt->eh_strategy_handler() SCSI midlayer conditions +[2-2-1] Pre transportt->eh_strategy_handler() SCSI midlayer conditions The following conditions are true on entry to the handler. @@ -437,7 +437,7 @@ except for #1 must be implemented by eh_strategy_handler(). - shost->host_failed == shost->host_busy -[2-2-2] Post hostt->eh_strategy_handler() SCSI midlayer conditions +[2-2-2] Post transportt->eh_strategy_handler() SCSI midlayer conditions The following conditions must be true on exit from the handler. diff --git a/Documentation/scsi/scsi_mid_low_api.txt b/Documentation/scsi/scsi_mid_low_api.txt index 8bbae3e..75a535a 100644 --- a/Documentation/scsi/scsi_mid_low_api.txt +++ b/Documentation/scsi/scsi_mid_low_api.txt @@ -804,7 +804,6 @@ Summary: eh_bus_reset_handler - issue SCSI bus reset eh_device_reset_handler - issue SCSI device reset eh_host_reset_handler - reset host (host bus adapter) - eh_strategy_handler - driver supplied alternate to scsi_unjam_host() info - supply information about given host ioctl - driver can respond to ioctls proc_info - supports /proc/scsi/{driver_name}/{host_no} @@ -970,24 +969,6 @@ Details: /** - * eh_strategy_handler - driver supplied alternate to scsi_unjam_host() - * @shp: host on which error has occurred - * - * Returns TRUE if host unjammed, else FALSE. - * - * Locks: none - * - * Calling context: kernel thread - * - * Notes: Invoked from scsi_eh thread. LLD supplied alternate to - * scsi_unjam_host() found in scsi_error.c - * - * Optionally defined in: LLD - **/ - int eh_strategy_handler(struct Scsi_Host * shp) - - -/** * info - supply information about given host: driver name plus data * to distinguish given host * @shp: host to supply information about diff --git a/Documentation/serial/driver b/Documentation/serial/driver index 42ef997..88ad615 100644 --- a/Documentation/serial/driver +++ b/Documentation/serial/driver @@ -3,14 +3,11 @@ -------------------- - $Id: driver,v 1.10 2002/07/22 15:27:30 rmk Exp $ - - This document is meant as a brief overview of some aspects of the new serial driver. It is not complete, any questions you have should be directed to <rmk@arm.linux.org.uk> -The reference implementation is contained within serial_amba.c. +The reference implementation is contained within amba_pl011.c. @@ -31,6 +28,11 @@ The serial core provides a few helper functions. This includes identifing the correct port structure (via uart_get_console) and decoding command line arguments (uart_parse_options). +There is also a helper function (uart_write_console) which performs a +character by character write, translating newlines to CRLF sequences. +Driver writers are recommended to use this function rather than implementing +their own version. + Locking ------- @@ -86,6 +88,7 @@ hardware. - TIOCM_DTR DTR signal. - TIOCM_OUT1 OUT1 signal. - TIOCM_OUT2 OUT2 signal. + - TIOCM_LOOP Set the port into loopback mode. If the appropriate bit is set, the signal should be driven active. If the bit is clear, the signal should be driven inactive. @@ -141,6 +144,10 @@ hardware. enable_ms(port) Enable the modem status interrupts. + This method may be called multiple times. Modem status + interrupts should be disabled when the shutdown method is + called. + Locking: port->lock taken. Interrupts: locally disabled. This call must not sleep @@ -160,6 +167,8 @@ hardware. state. Enable the port for reception. It should not activate RTS nor DTR; this will be done via a separate call to set_mctrl. + This method will only be called when the port is initially opened. + Locking: port_sem taken. Interrupts: globally disabled. @@ -169,6 +178,11 @@ hardware. RTS nor DTR; this will have already been done via a separate call to set_mctrl. + Drivers must not access port->info once this call has completed. + + This method will only be called when there are no more users of + this port. + Locking: port_sem taken. Interrupts: caller dependent. @@ -200,12 +214,13 @@ hardware. The interaction of the iflag bits is as follows (parity error given as an example): Parity error INPCK IGNPAR - None n/a n/a character received - Yes n/a 0 character discarded - Yes 0 1 character received, marked as + n/a 0 n/a character received, marked as + TTY_NORMAL + None 1 n/a character received, marked as TTY_NORMAL - Yes 1 1 character received, marked as + Yes 1 0 character received, marked as TTY_PARITY + Yes 1 1 character discarded Other flags may be used (eg, xon/xoff characters) if your hardware supports hardware "soft" flow control. diff --git a/Documentation/sound/alsa/Audiophile-Usb.txt b/Documentation/sound/alsa/Audiophile-Usb.txt index 4692c8e..b535c2a 100644 --- a/Documentation/sound/alsa/Audiophile-Usb.txt +++ b/Documentation/sound/alsa/Audiophile-Usb.txt @@ -1,4 +1,4 @@ - Guide to using M-Audio Audiophile USB with ALSA and Jack v1.2 + Guide to using M-Audio Audiophile USB with ALSA and Jack v1.3 ======================================================== Thibault Le Meur <Thibault.LeMeur@supelec.fr> @@ -22,16 +22,16 @@ The device has 4 audio interfaces, and 2 MIDI ports: * Midi In (Mi) * Midi Out (Mo) -The internal DAC/ADC has the following caracteristics: +The internal DAC/ADC has the following characteristics: * sample depth of 16 or 24 bits * sample rate from 8kHz to 96kHz -* Two ports can't use different sample depths at the same time.Moreover, the +* Two ports can't use different sample depths at the same time. Moreover, the Audiophile USB documentation gives the following Warning: "Please exit any audio application running before switching between bit depths" Due to the USB 1.1 bandwidth limitation, a limited number of interfaces can be activated at the same time depending on the audio mode selected: - * 16-bit/48kHz ==> 4 channels in/ 4 channels out + * 16-bit/48kHz ==> 4 channels in/4 channels out - Ai+Ao+Di+Do * 24-bit/48kHz ==> 4 channels in/2 channels out, or 2 channels in/4 channels out @@ -41,8 +41,8 @@ activated at the same time depending on the audio mode selected: Important facts about the Digital interface: -------------------------------------------- - * The Do port additionnaly supports surround-encoded AC-3 and DTS passthrough, -though I haven't tested it under linux + * The Do port additionally supports surround-encoded AC-3 and DTS passthrough, +though I haven't tested it under Linux - Note that in this setup only the Do interface can be enabled * Apart from recording an audio digital stream, enabling the Di port is a way to synchronize the device to an external sample clock @@ -60,24 +60,23 @@ synchronization error (for instance sound played at an odd sample rate) The Audiophile USB MIDI ports will be automatically supported once the following modules have been loaded: * snd-usb-audio - * snd-seq * snd-seq-midi -No additionnal setting is required. +No additional setting is required. 2.2 - Audio ports ----------------- Audio functions of the Audiophile USB device are handled by the snd-usb-audio module. This module can work in a default mode (without any device-specific -parameter), or in an advanced mode with the device-specific parameter called +parameter), or in an "advanced" mode with the device-specific parameter called "device_setup". 2.2.1 - Default Alsa driver mode -The default behaviour of the snd-usb-audio driver is to parse the device +The default behavior of the snd-usb-audio driver is to parse the device capabilities at startup and enable all functions inside the device (including -all ports at any sample rates and any sample depths supported). This approach +all ports at any supported sample rates and sample depths). This approach has the advantage to let the driver easily switch from sample rates/depths automatically according to the need of the application claiming the device. @@ -114,9 +113,9 @@ gain). For people having this problem, the snd-usb-audio module has a new module parameter called "device_setup". -2.2.2.1 - Initializing the working mode of the Audiohile USB +2.2.2.1 - Initializing the working mode of the Audiophile USB -As far as the Audiohile USB device is concerned, this value let the user +As far as the Audiophile USB device is concerned, this value let the user specify: * the sample depth * the sample rate @@ -174,20 +173,20 @@ The parameter can be given: IMPORTANT NOTE WHEN SWITCHING CONFIGURATION: ------------------------------------------- - * You may need to _first_ intialize the module with the correct device_setup + * You may need to _first_ initialize the module with the correct device_setup parameter and _only_after_ turn on the Audiophile USB device * This is especially true when switching the sample depth: - - first trun off the device - - de-register the snd-usb-audio module - - change the device_setup parameter (by either manually reprobing the module - or changing modprobe.conf) + - first turn off the device + - de-register the snd-usb-audio module (modprobe -r) + - change the device_setup parameter by changing the device_setup + option in /etc/modprobe.conf - turn on the device 2.2.2.3 - Audiophile USB's device_setup structure If you want to understand the device_setup magic numbers for the Audiophile USB, you need some very basic understanding of binary computation. However, -this is not required to use the parameter and you may skip thi section. +this is not required to use the parameter and you may skip this section. The device_setup is one byte long and its structure is the following: @@ -231,11 +230,11 @@ Caution: 2.2.3 - USB implementation details for this device -You may safely skip this section if you're not interrested in driver +You may safely skip this section if you're not interested in driver development. -This section describes some internals aspect of the device and summarize the -data I got by usb-snooping the windows and linux drivers. +This section describes some internal aspects of the device and summarize the +data I got by usb-snooping the windows and Linux drivers. The M-Audio Audiophile USB has 7 USB Interfaces: a "USB interface": @@ -277,9 +276,9 @@ Here is a short description of the AltSettings capabilities: - 16-bit depth, 8-48kHz sample mode - Synch playback (Do), audio format type III IEC1937_AC-3 -In order to ensure a correct intialization of the device, the driver +In order to ensure a correct initialization of the device, the driver _must_know_ how the device will be used: - * if DTS is choosen, only Interface 2 with AltSet nb.6 must be + * if DTS is chosen, only Interface 2 with AltSet nb.6 must be registered * if 96KHz only AltSets nb.1 of each interface must be selected * if samples are using 24bits/48KHz then AltSet 2 must me used if @@ -290,7 +289,7 @@ _must_know_ how the device will be used: is not connected When device_setup is given as a parameter to the snd-usb-audio module, the -parse_audio_enpoint function uses a quirk called +parse_audio_endpoints function uses a quirk called "audiophile_skip_setting_quirk" in order to prevent AltSettings not corresponding to device_setup from being registered in the driver. @@ -317,9 +316,8 @@ However you may see the following warning message: using the "default" ALSA device. This is less efficient than it could be. Consider using a hardware device instead rather than using the plug layer." - 3.2 - Patching alsa to use direct pcm device -------------------------------------------- +-------------------------------------------- A patch for Jack by Andreas Steinmetz adds support for Big Endian devices. However it has not been included in the CVS tree. @@ -331,3 +329,32 @@ After having applied the patch you can run jackd with the following command line: % jackd -R -dalsa -Phw:1,0 -r48000 -p128 -n2 -D -Chw:1,1 +3.2 - Getting 2 input and/or output interfaces in Jack +------------------------------------------------------ + +As you can see, starting the Jack server this way will only enable 1 stereo +input (Di or Ai) and 1 stereo output (Ao or Do). + +This is due to the following restrictions: +* Jack can only open one capture device and one playback device at a time +* The Audiophile USB is seen as 2 (or three) Alsa devices: hw:1,0, hw:1,1 + (and optionally hw:1,2) +If you want to get Ai+Di and/or Ao+Do support with Jack, you would need to +combine the Alsa devices into one logical "complex" device. + +If you want to give it a try, I recommend reading the information from +this page: http://www.sound-man.co.uk/linuxaudio/ice1712multi.html +It is related to another device (ice1712) but can be adapted to suit +the Audiophile USB. + +Enabling multiple Audiophile USB interfaces for Jackd will certainly require: +* patching Jack with the previously mentioned "Big Endian" patch +* patching Jackd with the MMAP_COMPLEX patch (see the ice1712 page) +* patching the alsa-lib/src/pcm/pcm_multi.c file (see the ice1712 page) +* define a multi device (combination of hw:1,0 and hw:1,1) in your .asoundrc + file +* start jackd with this device + +I had no success in testing this for now, but this may be due to my OS +configuration. If you have any success with this kind of setup, please +drop me an email. diff --git a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl index 6feef9e..1faf763 100644 --- a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl +++ b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl @@ -1123,8 +1123,8 @@ if ((err = pci_enable_device(pci)) < 0) return err; /* check PCI availability (28bit DMA) */ - if (pci_set_dma_mask(pci, 0x0fffffff) < 0 || - pci_set_consistent_dma_mask(pci, 0x0fffffff) < 0) { + if (pci_set_dma_mask(pci, DMA_28BIT_MASK) < 0 || + pci_set_consistent_dma_mask(pci, DMA_28BIT_MASK) < 0) { printk(KERN_ERR "error to set 28bit mask DMA\n"); pci_disable_device(pci); return -ENXIO; @@ -1172,7 +1172,7 @@ } /* PCI IDs */ - static struct pci_device_id snd_mychip_ids[] = { + static struct pci_device_id snd_mychip_ids[] __devinitdata = { { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, }, .... @@ -1216,7 +1216,7 @@ The allocation of PCI resources is done in the <function>probe()</function> function, and usually an extra <function>xxx_create()</function> function is written for this - purpose. + purpose. </para> <para> @@ -1225,7 +1225,7 @@ allocating resources. Also, you need to set the proper PCI DMA mask to limit the accessed i/o range. In some cases, you might need to call <function>pci_set_master()</function> function, - too. + too. </para> <para> @@ -1236,8 +1236,8 @@ <![CDATA[ if ((err = pci_enable_device(pci)) < 0) return err; - if (pci_set_dma_mask(pci, 0x0fffffff) < 0 || - pci_set_consistent_dma_mask(pci, 0x0fffffff) < 0) { + if (pci_set_dma_mask(pci, DMA_28BIT_MASK) < 0 || + pci_set_consistent_dma_mask(pci, DMA_28BIT_MASK) < 0) { printk(KERN_ERR "error to set 28bit mask DMA\n"); pci_disable_device(pci); return -ENXIO; @@ -1256,13 +1256,13 @@ functions. Unlike ALSA ver.0.5.x., there are no helpers for that. And these resources must be released in the destructor function (see below). Also, on ALSA 0.9.x, you don't need to - allocate (pseudo-)DMA for PCI like ALSA 0.5.x. + allocate (pseudo-)DMA for PCI like ALSA 0.5.x. </para> <para> Now assume that this PCI device has an I/O port with 8 bytes and an interrupt. Then struct <structname>mychip</structname> will have the - following fields: + following fields: <informalexample> <programlisting> @@ -1565,7 +1565,7 @@ <informalexample> <programlisting> <![CDATA[ - static struct pci_device_id snd_mychip_ids[] = { + static struct pci_device_id snd_mychip_ids[] __devinitdata = { { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, }, .... diff --git a/Documentation/spi/pxa2xx b/Documentation/spi/pxa2xx new file mode 100644 index 0000000..9c45f3d --- /dev/null +++ b/Documentation/spi/pxa2xx @@ -0,0 +1,234 @@ +PXA2xx SPI on SSP driver HOWTO +=================================================== +This a mini howto on the pxa2xx_spi driver. The driver turns a PXA2xx +synchronous serial port into a SPI master controller +(see Documentation/spi/spi_summary). The driver has the following features + +- Support for any PXA2xx SSP +- SSP PIO and SSP DMA data transfers. +- External and Internal (SSPFRM) chip selects. +- Per slave device (chip) configuration. +- Full suspend, freeze, resume support. + +The driver is built around a "spi_message" fifo serviced by workqueue and a +tasklet. The workqueue, "pump_messages", drives message fifo and the tasklet +(pump_transfer) is responsible for queuing SPI transactions and setting up and +launching the dma/interrupt driven transfers. + +Declaring PXA2xx Master Controllers +----------------------------------- +Typically a SPI master is defined in the arch/.../mach-*/board-*.c as a +"platform device". The master configuration is passed to the driver via a table +found in include/asm-arm/arch-pxa/pxa2xx_spi.h: + +struct pxa2xx_spi_master { + enum pxa_ssp_type ssp_type; + u32 clock_enable; + u16 num_chipselect; + u8 enable_dma; +}; + +The "pxa2xx_spi_master.ssp_type" field must have a value between 1 and 3 and +informs the driver which features a particular SSP supports. + +The "pxa2xx_spi_master.clock_enable" field is used to enable/disable the +corresponding SSP peripheral block in the "Clock Enable Register (CKEN"). See +the "PXA2xx Developer Manual" section "Clocks and Power Management". + +The "pxa2xx_spi_master.num_chipselect" field is used to determine the number of +slave device (chips) attached to this SPI master. + +The "pxa2xx_spi_master.enable_dma" field informs the driver that SSP DMA should +be used. This caused the driver to acquire two DMA channels: rx_channel and +tx_channel. The rx_channel has a higher DMA service priority the tx_channel. +See the "PXA2xx Developer Manual" section "DMA Controller". + +NSSP MASTER SAMPLE +------------------ +Below is a sample configuration using the PXA255 NSSP. + +static struct resource pxa_spi_nssp_resources[] = { + [0] = { + .start = __PREG(SSCR0_P(2)), /* Start address of NSSP */ + .end = __PREG(SSCR0_P(2)) + 0x2c, /* Range of registers */ + .flags = IORESOURCE_MEM, + }, + [1] = { + .start = IRQ_NSSP, /* NSSP IRQ */ + .end = IRQ_NSSP, + .flags = IORESOURCE_IRQ, + }, +}; + +static struct pxa2xx_spi_master pxa_nssp_master_info = { + .ssp_type = PXA25x_NSSP, /* Type of SSP */ + .clock_enable = CKEN9_NSSP, /* NSSP Peripheral clock */ + .num_chipselect = 1, /* Matches the number of chips attached to NSSP */ + .enable_dma = 1, /* Enables NSSP DMA */ +}; + +static struct platform_device pxa_spi_nssp = { + .name = "pxa2xx-spi", /* MUST BE THIS VALUE, so device match driver */ + .id = 2, /* Bus number, MUST MATCH SSP number 1..n */ + .resource = pxa_spi_nssp_resources, + .num_resources = ARRAY_SIZE(pxa_spi_nssp_resources), + .dev = { + .platform_data = &pxa_nssp_master_info, /* Passed to driver */ + }, +}; + +static struct platform_device *devices[] __initdata = { + &pxa_spi_nssp, +}; + +static void __init board_init(void) +{ + (void)platform_add_device(devices, ARRAY_SIZE(devices)); +} + +Declaring Slave Devices +----------------------- +Typically each SPI slave (chip) is defined in the arch/.../mach-*/board-*.c +using the "spi_board_info" structure found in "linux/spi/spi.h". See +"Documentation/spi/spi_summary" for additional information. + +Each slave device attached to the PXA must provide slave specific configuration +information via the structure "pxa2xx_spi_chip" found in +"include/asm-arm/arch-pxa/pxa2xx_spi.h". The pxa2xx_spi master controller driver +will uses the configuration whenever the driver communicates with the slave +device. + +struct pxa2xx_spi_chip { + u8 tx_threshold; + u8 rx_threshold; + u8 dma_burst_size; + u32 timeout_microsecs; + u8 enable_loopback; + void (*cs_control)(u32 command); +}; + +The "pxa2xx_spi_chip.tx_threshold" and "pxa2xx_spi_chip.rx_threshold" fields are +used to configure the SSP hardware fifo. These fields are critical to the +performance of pxa2xx_spi driver and misconfiguration will result in rx +fifo overruns (especially in PIO mode transfers). Good default values are + + .tx_threshold = 12, + .rx_threshold = 4, + +The "pxa2xx_spi_chip.dma_burst_size" field is used to configure PXA2xx DMA +engine and is related the "spi_device.bits_per_word" field. Read and understand +the PXA2xx "Developer Manual" sections on the DMA controller and SSP Controllers +to determine the correct value. An SSP configured for byte-wide transfers would +use a value of 8. + +The "pxa2xx_spi_chip.timeout_microsecs" fields is used to efficiently handle +trailing bytes in the SSP receiver fifo. The correct value for this field is +dependent on the SPI bus speed ("spi_board_info.max_speed_hz") and the specific +slave device. Please note the the PXA2xx SSP 1 does not support trailing byte +timeouts and must busy-wait any trailing bytes. + +The "pxa2xx_spi_chip.enable_loopback" field is used to place the SSP porting +into internal loopback mode. In this mode the SSP controller internally +connects the SSPTX pin the the SSPRX pin. This is useful for initial setup +testing. + +The "pxa2xx_spi_chip.cs_control" field is used to point to a board specific +function for asserting/deasserting a slave device chip select. If the field is +NULL, the pxa2xx_spi master controller driver assumes that the SSP port is +configured to use SSPFRM instead. + +NSSP SALVE SAMPLE +----------------- +The pxa2xx_spi_chip structure is passed to the pxa2xx_spi driver in the +"spi_board_info.controller_data" field. Below is a sample configuration using +the PXA255 NSSP. + +/* Chip Select control for the CS8415A SPI slave device */ +static void cs8415a_cs_control(u32 command) +{ + if (command & PXA2XX_CS_ASSERT) + GPCR(2) = GPIO_bit(2); + else + GPSR(2) = GPIO_bit(2); +} + +/* Chip Select control for the CS8405A SPI slave device */ +static void cs8405a_cs_control(u32 command) +{ + if (command & PXA2XX_CS_ASSERT) + GPCR(3) = GPIO_bit(3); + else + GPSR(3) = GPIO_bit(3); +} + +static struct pxa2xx_spi_chip cs8415a_chip_info = { + .tx_threshold = 12, /* SSP hardward FIFO threshold */ + .rx_threshold = 4, /* SSP hardward FIFO threshold */ + .dma_burst_size = 8, /* Byte wide transfers used so 8 byte bursts */ + .timeout_microsecs = 64, /* Wait at least 64usec to handle trailing */ + .cs_control = cs8415a_cs_control, /* Use external chip select */ +}; + +static struct pxa2xx_spi_chip cs8405a_chip_info = { + .tx_threshold = 12, /* SSP hardward FIFO threshold */ + .rx_threshold = 4, /* SSP hardward FIFO threshold */ + .dma_burst_size = 8, /* Byte wide transfers used so 8 byte bursts */ + .timeout_microsecs = 64, /* Wait at least 64usec to handle trailing */ + .cs_control = cs8405a_cs_control, /* Use external chip select */ +}; + +static struct spi_board_info streetracer_spi_board_info[] __initdata = { + { + .modalias = "cs8415a", /* Name of spi_driver for this device */ + .max_speed_hz = 3686400, /* Run SSP as fast a possbile */ + .bus_num = 2, /* Framework bus number */ + .chip_select = 0, /* Framework chip select */ + .platform_data = NULL; /* No spi_driver specific config */ + .controller_data = &cs8415a_chip_info, /* Master chip config */ + .irq = STREETRACER_APCI_IRQ, /* Slave device interrupt */ + }, + { + .modalias = "cs8405a", /* Name of spi_driver for this device */ + .max_speed_hz = 3686400, /* Run SSP as fast a possbile */ + .bus_num = 2, /* Framework bus number */ + .chip_select = 1, /* Framework chip select */ + .controller_data = &cs8405a_chip_info, /* Master chip config */ + .irq = STREETRACER_APCI_IRQ, /* Slave device interrupt */ + }, +}; + +static void __init streetracer_init(void) +{ + spi_register_board_info(streetracer_spi_board_info, + ARRAY_SIZE(streetracer_spi_board_info)); +} + + +DMA and PIO I/O Support +----------------------- +The pxa2xx_spi driver support both DMA and interrupt driven PIO message +transfers. The driver defaults to PIO mode and DMA transfers must enabled by +setting the "enable_dma" flag in the "pxa2xx_spi_master" structure and and +ensuring that the "pxa2xx_spi_chip.dma_burst_size" field is non-zero. The DMA +mode support both coherent and stream based DMA mappings. + +The following logic is used to determine the type of I/O to be used on +a per "spi_transfer" basis: + +if !enable_dma or dma_burst_size == 0 then + always use PIO transfers + +if spi_message.is_dma_mapped and rx_dma_buf != 0 and tx_dma_buf != 0 then + use coherent DMA mode + +if rx_buf and tx_buf are aligned on 8 byte boundary then + use streaming DMA mode + +otherwise + use PIO transfer + +THANKS TO +--------- + +David Brownell and others for mentoring the development of this driver. + diff --git a/Documentation/spi/spi-summary b/Documentation/spi/spi-summary index a5ffba3..068732d3 100644 --- a/Documentation/spi/spi-summary +++ b/Documentation/spi/spi-summary @@ -414,7 +414,33 @@ to get the driver-private data allocated for that device. The driver will initialize the fields of that spi_master, including the bus number (maybe the same as the platform device ID) and three methods used to interact with the SPI core and SPI protocol drivers. It will -also initialize its own internal state. +also initialize its own internal state. (See below about bus numbering +and those methods.) + +After you initialize the spi_master, then use spi_register_master() to +publish it to the rest of the system. At that time, device nodes for +the controller and any predeclared spi devices will be made available, +and the driver model core will take care of binding them to drivers. + +If you need to remove your SPI controller driver, spi_unregister_master() +will reverse the effect of spi_register_master(). + + +BUS NUMBERING + +Bus numbering is important, since that's how Linux identifies a given +SPI bus (shared SCK, MOSI, MISO). Valid bus numbers start at zero. On +SOC systems, the bus numbers should match the numbers defined by the chip +manufacturer. For example, hardware controller SPI2 would be bus number 2, +and spi_board_info for devices connected to it would use that number. + +If you don't have such hardware-assigned bus number, and for some reason +you can't just assign them, then provide a negative bus number. That will +then be replaced by a dynamically assigned number. You'd then need to treat +this as a non-static configuration (see above). + + +SPI MASTER METHODS master->setup(struct spi_device *spi) This sets up the device clock rate, SPI mode, and word sizes. @@ -431,6 +457,9 @@ also initialize its own internal state. state it dynamically associates with that device. If you do that, be sure to provide the cleanup() method to free that state. + +SPI MESSAGE QUEUE + The bulk of the driver will be managing the I/O queue fed by transfer(). That queue could be purely conceptual. For example, a driver used only @@ -440,6 +469,9 @@ But the queue will probably be very real, using message->queue, PIO, often DMA (especially if the root filesystem is in SPI flash), and execution contexts like IRQ handlers, tasklets, or workqueues (such as keventd). Your driver can be as fancy, or as simple, as you need. +Such a transfer() method would normally just add the message to a +queue, and then start some asynchronous transfer engine (unless it's +already running). THANKS TO diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt index 1ad9af1..687104b 100644 --- a/Documentation/vm/hugetlbpage.txt +++ b/Documentation/vm/hugetlbpage.txt @@ -27,12 +27,21 @@ number of free hugetlb pages at any time. It also displays information about the configured hugepage size - this is needed for generating the proper alignment and size of the arguments to the above system calls. -The output of "cat /proc/meminfo" will have output like: +The output of "cat /proc/meminfo" will have lines like: ..... HugePages_Total: xxx HugePages_Free: yyy -Hugepagesize: zzz KB +HugePages_Rsvd: www +Hugepagesize: zzz kB + +where: +HugePages_Total is the size of the pool of hugepages. +HugePages_Free is the number of hugepages in the pool that are not yet +allocated. +HugePages_Rsvd is short for "reserved," and is the number of hugepages +for which a commitment to allocate from the pool has been made, but no +allocation has yet been made. It's vaguely analogous to overcommit. /proc/filesystems should also show a filesystem of type "hugetlbfs" configured in the kernel. @@ -42,11 +51,11 @@ pages in the kernel. Super user can dynamically request more (or free some pre-configured) hugepages. The allocation (or deallocation) of hugetlb pages is possible only if there are enough physically contiguous free pages in system (freeing of hugepages is -possible only if there are enough hugetlb pages free that can be transfered +possible only if there are enough hugetlb pages free that can be transferred back to regular memory pool). -Pages that are used as hugetlb pages are reserved inside the kernel and can -not be used for other purposes. +Pages that are used as hugetlb pages are reserved inside the kernel and cannot +be used for other purposes. Once the kernel with Hugetlb page support is built and running, a user can use either the mmap system call or shared memory system calls to start using @@ -60,7 +69,7 @@ Use the following command to dynamically allocate/deallocate hugepages: This command will try to configure 20 hugepages in the system. The success or failure of allocation depends on the amount of physically contiguous memory that is preset in system at this time. System administrators may want -to put this command in one of the local rc init file. This will enable the +to put this command in one of the local rc init files. This will enable the kernel to request huge pages early in the boot process (when the possibility of getting physical contiguous pages is still very high). @@ -78,8 +87,8 @@ the uid and gid of the current process are taken. The mode option sets the mode of root of file system to value & 0777. This value is given in octal. By default the value 0755 is picked. The size option sets the maximum value of memory (huge pages) allowed for that filesystem (/mnt/huge). The size is -rounded down to HPAGE_SIZE. The option nr_inode sets the maximum number of -inodes that /mnt/huge can use. If the size or nr_inode options are not +rounded down to HPAGE_SIZE. The option nr_inodes sets the maximum number of +inodes that /mnt/huge can use. If the size or nr_inodes options are not provided on command line then no limits are set. For size and nr_inodes options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For example, size=2K has the same meaning as size=2048. An example is given at @@ -88,7 +97,7 @@ the end of this document. read and write system calls are not supported on files that reside on hugetlb file systems. -A regular chown, chgrp and chmod commands (with right permissions) could be +Regular chown, chgrp, and chmod commands (with right permissions) could be used to change the file attributes on hugetlbfs. Also, it is important to note that no such mount command is required if the @@ -96,8 +105,8 @@ applications are going to use only shmat/shmget system calls. Users who wish to use hugetlb page via shared memory segment should be a member of a supplementary group and system admin needs to configure that gid into /proc/sys/vm/hugetlb_shm_group. It is possible for same or different -applications to use any combination of mmaps and shm* calls. Though the -mount of filesystem will be required for using mmaps. +applications to use any combination of mmaps and shm* calls, though the +mount of filesystem will be required for using mmap calls. ******************************************************************* diff --git a/Documentation/watchdog/watchdog-api.txt b/Documentation/watchdog/watchdog-api.txt index c5beb54..21ed511 100644 --- a/Documentation/watchdog/watchdog-api.txt +++ b/Documentation/watchdog/watchdog-api.txt @@ -36,6 +36,9 @@ timeout or margin. The simplest way to ping the watchdog is to write some data to the device. So a very simple watchdog daemon would look like this: +#include <stdlib.h> +#include <fcntl.h> + int main(int argc, const char *argv[]) { int fd=open("/dev/watchdog",O_WRONLY); if (fd==-1) { diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt index 1921353..f2cd6ef 100644 --- a/Documentation/x86_64/boot-options.txt +++ b/Documentation/x86_64/boot-options.txt @@ -151,6 +151,11 @@ NUMA numa=fake=X Fake X nodes and ignore NUMA setup of the actual machine. + numa=hotadd=percent + Only allow hotadd memory to preallocate page structures upto + percent of already available memory. + numa=hotadd=0 will disable hotadd memory. + ACPI acpi=off Don't enable ACPI |