From 15293df82bd1c15196e7cb336130c243e9a41806 Mon Sep 17 00:00:00 2001 From: Detlef Riekenberg Date: Thu, 10 Dec 2009 13:55:48 +0100 Subject: vgaarbiter: fix a typo in the vgaarbiter Documentation I detected a typo, while reading "Documentation/vgaarbiter.txt". Fix the 'fieldd' mispelling. Signed-off-by: Detlef Riekenberg Signed-off-by: Jesse Barnes --- Documentation/vgaarbiter.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/vgaarbiter.txt b/Documentation/vgaarbiter.txt index 987f9b0..43a9b06 100644 --- a/Documentation/vgaarbiter.txt +++ b/Documentation/vgaarbiter.txt @@ -103,7 +103,7 @@ I.2 libpciaccess ---------------- To use the vga arbiter char device it was implemented an API inside the -libpciaccess library. One fieldd was added to struct pci_device (each device +libpciaccess library. One field was added to struct pci_device (each device on the system): /* the type of resource decoded by the device */ -- cgit v1.1 From 1de6129f381b4907013ccea08a3bdea8c966d50a Mon Sep 17 00:00:00 2001 From: FUJITA Tomonori Date: Fri, 18 Dec 2009 12:37:48 +0100 Subject: block: remove Documentation/block/as-iosched.txt Commit 492af6350a5ccf087e4964104a276ed358811458 removed the AS IO scheduler, so remove its documentation too. Signed-off-by: FUJITA Tomonori Signed-off-by: Jens Axboe --- Documentation/block/00-INDEX | 2 - Documentation/block/as-iosched.txt | 172 ------------------------------------- 2 files changed, 174 deletions(-) delete mode 100644 Documentation/block/as-iosched.txt (limited to 'Documentation') diff --git a/Documentation/block/00-INDEX b/Documentation/block/00-INDEX index 961a051..a406286f 100644 --- a/Documentation/block/00-INDEX +++ b/Documentation/block/00-INDEX @@ -1,7 +1,5 @@ 00-INDEX - This file -as-iosched.txt - - Anticipatory IO scheduler barrier.txt - I/O Barriers biodoc.txt diff --git a/Documentation/block/as-iosched.txt b/Documentation/block/as-iosched.txt deleted file mode 100644 index 738b72b..0000000 --- a/Documentation/block/as-iosched.txt +++ /dev/null @@ -1,172 +0,0 @@ -Anticipatory IO scheduler -------------------------- -Nick Piggin 13 Sep 2003 - -Attention! Database servers, especially those using "TCQ" disks should -investigate performance with the 'deadline' IO scheduler. Any system with high -disk performance requirements should do so, in fact. - -If you see unusual performance characteristics of your disk systems, or you -see big performance regressions versus the deadline scheduler, please email -me. Database users don't bother unless you're willing to test a lot of patches -from me ;) its a known issue. - -Also, users with hardware RAID controllers, doing striping, may find -highly variable performance results with using the as-iosched. The -as-iosched anticipatory implementation is based on the notion that a disk -device has only one physical seeking head. A striped RAID controller -actually has a head for each physical device in the logical RAID device. - -However, setting the antic_expire (see tunable parameters below) produces -very similar behavior to the deadline IO scheduler. - -Selecting IO schedulers ------------------------ -Refer to Documentation/block/switching-sched.txt for information on -selecting an io scheduler on a per-device basis. - -Anticipatory IO scheduler Policies ----------------------------------- -The as-iosched implementation implements several layers of policies -to determine when an IO request is dispatched to the disk controller. -Here are the policies outlined, in order of application. - -1. one-way Elevator algorithm. - -The elevator algorithm is similar to that used in deadline scheduler, with -the addition that it allows limited backward movement of the elevator -(i.e. seeks backwards). A seek backwards can occur when choosing between -two IO requests where one is behind the elevator's current position, and -the other is in front of the elevator's position. If the seek distance to -the request in back of the elevator is less than half the seek distance to -the request in front of the elevator, then the request in back can be chosen. -Backward seeks are also limited to a maximum of MAXBACK (1024*1024) sectors. -This favors forward movement of the elevator, while allowing opportunistic -"short" backward seeks. - -2. FIFO expiration times for reads and for writes. - -This is again very similar to the deadline IO scheduler. The expiration -times for requests on these lists is tunable using the parameters read_expire -and write_expire discussed below. When a read or a write expires in this way, -the IO scheduler will interrupt its current elevator sweep or read anticipation -to service the expired request. - -3. Read and write request batching - -A batch is a collection of read requests or a collection of write -requests. The as scheduler alternates dispatching read and write batches -to the driver. In the case a read batch, the scheduler submits read -requests to the driver as long as there are read requests to submit, and -the read batch time limit has not been exceeded (read_batch_expire). -The read batch time limit begins counting down only when there are -competing write requests pending. - -In the case of a write batch, the scheduler submits write requests to -the driver as long as there are write requests available, and the -write batch time limit has not been exceeded (write_batch_expire). -However, the length of write batches will be gradually shortened -when read batches frequently exceed their time limit. - -When changing between batch types, the scheduler waits for all requests -from the previous batch to complete before scheduling requests for the -next batch. - -The read and write fifo expiration times described in policy 2 above -are checked only when in scheduling IO of a batch for the corresponding -(read/write) type. So for example, the read FIFO timeout values are -tested only during read batches. Likewise, the write FIFO timeout -values are tested only during write batches. For this reason, -it is generally not recommended for the read batch time -to be longer than the write expiration time, nor for the write batch -time to exceed the read expiration time (see tunable parameters below). - -When the IO scheduler changes from a read to a write batch, -it begins the elevator from the request that is on the head of the -write expiration FIFO. Likewise, when changing from a write batch to -a read batch, scheduler begins the elevator from the first entry -on the read expiration FIFO. - -4. Read anticipation. - -Read anticipation occurs only when scheduling a read batch. -This implementation of read anticipation allows only one read request -to be dispatched to the disk controller at a time. In -contrast, many write requests may be dispatched to the disk controller -at a time during a write batch. It is this characteristic that can make -the anticipatory scheduler perform anomalously with controllers supporting -TCQ, or with hardware striped RAID devices. Setting the antic_expire -queue parameter (see below) to zero disables this behavior, and the -anticipatory scheduler behaves essentially like the deadline scheduler. - -When read anticipation is enabled (antic_expire is not zero), reads -are dispatched to the disk controller one at a time. -At the end of each read request, the IO scheduler examines its next -candidate read request from its sorted read list. If that next request -is from the same process as the request that just completed, -or if the next request in the queue is "very close" to the -just completed request, it is dispatched immediately. Otherwise, -statistics (average think time, average seek distance) on the process -that submitted the just completed request are examined. If it seems -likely that that process will submit another request soon, and that -request is likely to be near the just completed request, then the IO -scheduler will stop dispatching more read requests for up to (antic_expire) -milliseconds, hoping that process will submit a new request near the one -that just completed. If such a request is made, then it is dispatched -immediately. If the antic_expire wait time expires, then the IO scheduler -will dispatch the next read request from the sorted read queue. - -To decide whether an anticipatory wait is worthwhile, the scheduler -maintains statistics for each process that can be used to compute -mean "think time" (the time between read requests), and mean seek -distance for that process. One observation is that these statistics -are associated with each process, but those statistics are not associated -with a specific IO device. So for example, if a process is doing IO -on several file systems on separate devices, the statistics will be -a combination of IO behavior from all those devices. - - -Tuning the anticipatory IO scheduler ------------------------------------- -When using 'as', the anticipatory IO scheduler there are 5 parameters under -/sys/block/*/queue/iosched/. All are units of milliseconds. - -The parameters are: -* read_expire - Controls how long until a read request becomes "expired". It also controls the - interval between which expired requests are served, so set to 50, a request - might take anywhere < 100ms to be serviced _if_ it is the next on the - expired list. Obviously request expiration strategies won't make the disk - go faster. The result basically equates to the timeslice a single reader - gets in the presence of other IO. 100*((seek time / read_expire) + 1) is - very roughly the % streaming read efficiency your disk should get with - multiple readers. - -* read_batch_expire - Controls how much time a batch of reads is given before pending writes are - served. A higher value is more efficient. This might be set below read_expire - if writes are to be given higher priority than reads, but reads are to be - as efficient as possible when there are no writes. Generally though, it - should be some multiple of read_expire. - -* write_expire, and -* write_batch_expire are equivalent to the above, for writes. - -* antic_expire - Controls the maximum amount of time we can anticipate a good read (one - with a short seek distance from the most recently completed request) before - giving up. Many other factors may cause anticipation to be stopped early, - or some processes will not be "anticipated" at all. Should be a bit higher - for big seek time devices though not a linear correspondence - most - processes have only a few ms thinktime. - -In addition to the tunables above there is a read-only file named est_time -which, when read, will show: - - - The probability of a task exiting without a cooperating task - submitting an anticipated IO. - - - The current mean think time. - - - The seek distance used to determine if an incoming IO is better. - -- cgit v1.1 From 360b6e5cab1cea1d838b0100956ce0d3dbccbb6f Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 18 Dec 2009 15:16:56 -0800 Subject: Documentation: Update mmiotrace.txt Fix typos, spellos, hyphenation, line lengths. BTW: are there some userspace tools? There is a reference to some at the wiki page, but there are no tools listed there. Signed-off-by: Randy Dunlap Acked-by: Pekka Paalanen LKML-Reference: <4B2C0D68.6080401@oracle.com> Signed-off-by: Ingo Molnar --- Documentation/trace/mmiotrace.txt | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) (limited to 'Documentation') diff --git a/Documentation/trace/mmiotrace.txt b/Documentation/trace/mmiotrace.txt index 162effb..664e738 100644 --- a/Documentation/trace/mmiotrace.txt +++ b/Documentation/trace/mmiotrace.txt @@ -44,7 +44,8 @@ Check for lost events. Usage ----- -Make sure debugfs is mounted to /sys/kernel/debug. If not, (requires root privileges) +Make sure debugfs is mounted to /sys/kernel/debug. +If not (requires root privileges): $ mount -t debugfs debugfs /sys/kernel/debug Check that the driver you are about to trace is not loaded. @@ -91,7 +92,7 @@ $ dmesg > dmesg.txt $ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt and then send the .tar.gz file. The trace compresses considerably. Replace "pciid" and "nick" with the PCI ID or model name of your piece of hardware -under investigation and your nick name. +under investigation and your nickname. How Mmiotrace Works @@ -100,7 +101,7 @@ How Mmiotrace Works Access to hardware IO-memory is gained by mapping addresses from PCI bus by calling one of the ioremap_*() functions. Mmiotrace is hooked into the __ioremap() function and gets called whenever a mapping is created. Mapping is -an event that is recorded into the trace log. Note, that ISA range mappings +an event that is recorded into the trace log. Note that ISA range mappings are not caught, since the mapping always exists and is returned directly. MMIO accesses are recorded via page faults. Just before __ioremap() returns, @@ -122,11 +123,11 @@ Trace Log Format ---------------- The raw log is text and easily filtered with e.g. grep and awk. One record is -one line in the log. A record starts with a keyword, followed by keyword -dependant arguments. Arguments are separated by a space, or continue until the +one line in the log. A record starts with a keyword, followed by keyword- +dependent arguments. Arguments are separated by a space, or continue until the end of line. The format for version 20070824 is as follows: -Explanation Keyword Space separated arguments +Explanation Keyword Space-separated arguments --------------------------------------------------------------------------- read event R width, timestamp, map id, physical, value, PC, PID @@ -136,7 +137,7 @@ iounmap event UNMAP timestamp, map id, PC, PID marker MARK timestamp, text version VERSION the string "20070824" info for reader LSPCI one line from lspci -v -PCI address map PCIDEV space separated /proc/bus/pci/devices data +PCI address map PCIDEV space-separated /proc/bus/pci/devices data unk. opcode UNKNOWN timestamp, map id, physical, data, PC, PID Timestamp is in seconds with decimals. Physical is a PCI bus address, virtual -- cgit v1.1 From b41df645c829d961068aecd30909c2675acbaaea Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 18 Dec 2009 15:17:04 -0800 Subject: Documentation: Update tracepoint-analysis.txt Fix grammar, spelling, punctuation, hyphenation, section numbering. Tell what PCL means. Signed-off-by: Randy Dunlap Cc: Mel Gorman Cc: Steven Rostedt LKML-Reference: <4B2C0D70.4030707@oracle.com> Signed-off-by: Ingo Molnar --- Documentation/trace/tracepoint-analysis.txt | 60 ++++++++++++++--------------- 1 file changed, 30 insertions(+), 30 deletions(-) (limited to 'Documentation') diff --git a/Documentation/trace/tracepoint-analysis.txt b/Documentation/trace/tracepoint-analysis.txt index 5eb4e487..87bee3c 100644 --- a/Documentation/trace/tracepoint-analysis.txt +++ b/Documentation/trace/tracepoint-analysis.txt @@ -10,8 +10,8 @@ Tracepoints (see Documentation/trace/tracepoints.txt) can be used without creating custom kernel modules to register probe functions using the event tracing infrastructure. -Simplistically, tracepoints will represent an important event that when can -be taken in conjunction with other tracepoints to build a "Big Picture" of +Simplistically, tracepoints represent important events that can be +taken in conjunction with other tracepoints to build a "Big Picture" of what is going on within the system. There are a large number of methods for gathering and interpreting these events. Lacking any current Best Practises, this document describes some of the methods that can be used. @@ -33,12 +33,12 @@ calling will give a fair indication of the number of events available. -2.2 PCL +2.2 PCL (Performance Counters for Linux) ------- -Discovery and enumeration of all counters and events, including tracepoints +Discovery and enumeration of all counters and events, including tracepoints, are available with the perf tool. Getting a list of available events is a -simple case of +simple case of: $ perf list 2>&1 | grep Tracepoint ext4:ext4_free_inode [Tracepoint event] @@ -49,19 +49,19 @@ simple case of [ .... remaining output snipped .... ] -2. Enabling Events +3. Enabling Events ================== -2.1 System-Wide Event Enabling +3.1 System-Wide Event Enabling ------------------------------ See Documentation/trace/events.txt for a proper description on how events can be enabled system-wide. A short example of enabling all events related -to page allocation would look something like +to page allocation would look something like: $ for i in `find /sys/kernel/debug/tracing/events -name "enable" | grep mm_`; do echo 1 > $i; done -2.2 System-Wide Event Enabling with SystemTap +3.2 System-Wide Event Enabling with SystemTap --------------------------------------------- In SystemTap, tracepoints are accessible using the kernel.trace() function @@ -86,7 +86,7 @@ were allocating the pages. print_count() } -2.3 System-Wide Event Enabling with PCL +3.3 System-Wide Event Enabling with PCL --------------------------------------- By specifying the -a switch and analysing sleep, the system-wide events @@ -107,16 +107,16 @@ for a duration of time can be examined. Similarly, one could execute a shell and exit it as desired to get a report at that point. -2.4 Local Event Enabling +3.4 Local Event Enabling ------------------------ Documentation/trace/ftrace.txt describes how to enable events on a per-thread basis using set_ftrace_pid. -2.5 Local Event Enablement with PCL +3.5 Local Event Enablement with PCL ----------------------------------- -Events can be activate and tracked for the duration of a process on a local +Events can be activated and tracked for the duration of a process on a local basis using PCL such as follows. $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ @@ -131,18 +131,18 @@ basis using PCL such as follows. 0.973913387 seconds time elapsed -3. Event Filtering +4. Event Filtering ================== Documentation/trace/ftrace.txt covers in-depth how to filter events in ftrace. Obviously using grep and awk of trace_pipe is an option as well as any script reading trace_pipe. -4. Analysing Event Variances with PCL +5. Analysing Event Variances with PCL ===================================== Any workload can exhibit variances between runs and it can be important -to know what the standard deviation in. By and large, this is left to the +to know what the standard deviation is. By and large, this is left to the performance analyst to do it by hand. In the event that the discrete event occurrences are useful to the performance analyst, then perf can be used. @@ -166,7 +166,7 @@ In the event that some higher-level event is required that depends on some aggregation of discrete events, then a script would need to be developed. Using --repeat, it is also possible to view how events are fluctuating over -time on a system wide basis using -a and sleep. +time on a system-wide basis using -a and sleep. $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ -e kmem:mm_pagevec_free \ @@ -180,7 +180,7 @@ time on a system wide basis using -a and sleep. 1.002251757 seconds time elapsed ( +- 0.005% ) -5. Higher-Level Analysis with Helper Scripts +6. Higher-Level Analysis with Helper Scripts ============================================ When events are enabled the events that are triggering can be read from @@ -190,11 +190,11 @@ be gathered on-line as appropriate. Examples of post-processing might include o Reading information from /proc for the PID that triggered the event o Deriving a higher-level event from a series of lower-level events. - o Calculate latencies between two events + o Calculating latencies between two events Documentation/trace/postprocess/trace-pagealloc-postprocess.pl is an example script that can read trace_pipe from STDIN or a copy of a trace. When used -on-line, it can be interrupted once to generate a report without existing +on-line, it can be interrupted once to generate a report without exiting and twice to exit. Simplistically, the script just reads STDIN and counts up events but it @@ -212,12 +212,12 @@ also can do more such as processes, the parent process responsible for creating all the helpers can be identified -6. Lower-Level Analysis with PCL +7. Lower-Level Analysis with PCL ================================ -There may also be a requirement to identify what functions with a program +There may also be a requirement to identify what functions within a program were generating events within the kernel. To begin this sort of analysis, the -data must be recorded. At the time of writing, this required root +data must be recorded. At the time of writing, this required root: $ perf record -c 1 \ -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ @@ -253,11 +253,11 @@ perf report. # (For more details, try: perf report --sort comm,dso,symbol) # -According to this, the vast majority of events occured triggered on events -within the VDSO. With simple binaries, this will often be the case so lets +According to this, the vast majority of events triggered on events +within the VDSO. With simple binaries, this will often be the case so let's take a slightly different example. In the course of writing this, it was -noticed that X was generating an insane amount of page allocations so lets look -at it +noticed that X was generating an insane amount of page allocations so let's look +at it: $ perf record -c 1 -f \ -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ @@ -280,8 +280,8 @@ This was interrupted after a few seconds and # (For more details, try: perf report --sort comm,dso,symbol) # -So, almost half of the events are occuring in a library. To get an idea which -symbol. +So, almost half of the events are occurring in a library. To get an idea which +symbol: $ perf report --sort comm,dso,symbol # Samples: 27666 @@ -297,7 +297,7 @@ symbol. 0.01% Xorg /opt/gfx-test/lib/libpixman-1.so.0.13.1 [.] get_fast_path 0.00% Xorg [kernel] [k] ftrace_trace_userstack -To see where within the function pixmanFillsse2 things are going wrong +To see where within the function pixmanFillsse2 things are going wrong: $ perf annotate pixmanFillsse2 [ ... ] -- cgit v1.1 From 7e25f44cbf8d95a9748fdfd19c06145f19fd10e3 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 18 Dec 2009 15:17:12 -0800 Subject: Documentation: Update ftrace-design.txt Correct grammos. Spell out words. Add missing words. Consistent use of "mcount()" function name. Signed-off-by: Randy Dunlap Acked-by: Steven Rostedt LKML-Reference: <4B2C0D78.6060707@oracle.com> Signed-off-by: Ingo Molnar --- Documentation/trace/ftrace-design.txt | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'Documentation') diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt index 641a1ef..239f14b 100644 --- a/Documentation/trace/ftrace-design.txt +++ b/Documentation/trace/ftrace-design.txt @@ -53,14 +53,14 @@ size of the mcount call that is embedded in the function). For example, if the function foo() calls bar(), when the bar() function calls mcount(), the arguments mcount() will pass to the tracer are: "frompc" - the address bar() will use to return to foo() - "selfpc" - the address bar() (with _mcount() size adjustment) + "selfpc" - the address bar() (with mcount() size adjustment) Also keep in mind that this mcount function will be called *a lot*, so optimizing for the default case of no tracer will help the smooth running of your system when tracing is disabled. So the start of the mcount function is -typically the bare min with checking things before returning. That also means -the code flow should usually kept linear (i.e. no branching in the nop case). -This is of course an optimization and not a hard requirement. +typically the bare minimum with checking things before returning. That also +means the code flow should usually be kept linear (i.e. no branching in the nop +case). This is of course an optimization and not a hard requirement. Here is some pseudo code that should help (these functions should actually be implemented in assembly): @@ -131,10 +131,10 @@ some functions to save (hijack) and restore the return address. The mcount function should check the function pointers ftrace_graph_return (compare to ftrace_stub) and ftrace_graph_entry (compare to -ftrace_graph_entry_stub). If either of those are not set to the relevant stub +ftrace_graph_entry_stub). If either of those is not set to the relevant stub function, call the arch-specific function ftrace_graph_caller which in turn calls the arch-specific function prepare_ftrace_return. Neither of these -function names are strictly required, but you should use them anyways to stay +function names is strictly required, but you should use them anyway to stay consistent across the architecture ports -- easier to compare & contrast things. @@ -144,7 +144,7 @@ but the first argument should be a pointer to the "frompc". Typically this is located on the stack. This allows the function to hijack the return address temporarily to have it point to the arch-specific function return_to_handler. That function will simply call the common ftrace_return_to_handler function and -that will return the original return address with which, you can return to the +that will return the original return address with which you can return to the original call site. Here is the updated mcount pseudo code: -- cgit v1.1 From 6d3b82f2d31f22085e5711b28dddcb9fb3d97a25 Mon Sep 17 00:00:00 2001 From: Fang Wenqi Date: Thu, 24 Dec 2009 17:51:42 -0500 Subject: ext4: Update documentation to correct the inode_readahead_blks option name Per commit 240799cd, the option name for readahead should be inode_readahead_blks, not inode_readahead. Signed-off-by: Fang Wenqi Signed-off-by: "Theodore Ts'o" --- Documentation/filesystems/ext4.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index af6885c..e1def17 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt @@ -196,7 +196,7 @@ nobarrier This also requires an IO stack which can support also be used to enable or disable barriers, for consistency with other ext4 mount options. -inode_readahead=n This tuning parameter controls the maximum +inode_readahead_blks=n This tuning parameter controls the maximum number of inode table blocks that ext4's inode table readahead algorithm will pre-read into the buffer cache. The default value is 32 blocks. -- cgit v1.1 From fdfa68298816192d47fc7443d1e2f00fa1420985 Mon Sep 17 00:00:00 2001 From: Masanari Iida Date: Sun, 27 Dec 2009 00:09:45 +0900 Subject: ALSA: Fix a typo in Procfile.txt Fix a typo in Documentation/sound/alsa/Procfile.txt Signed-off-by Masanari Iida Signed-off-by: Takashi Iwai --- Documentation/sound/alsa/Procfile.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/sound/alsa/Procfile.txt b/Documentation/sound/alsa/Procfile.txt index 719a819..07301de 100644 --- a/Documentation/sound/alsa/Procfile.txt +++ b/Documentation/sound/alsa/Procfile.txt @@ -95,7 +95,7 @@ card*/pcm*/xrun_debug It takes an integer value, can be changed by writing to this file, such as - # cat 5 > /proc/asound/card0/pcm0p/xrun_debug + # echo 5 > /proc/asound/card0/pcm0p/xrun_debug The value consists of the following bit flags: bit 0 = Enable XRUN/jiffies debug messages -- cgit v1.1 From 169220f88f0f26f4450ac0bc8ff0f807b453ec58 Mon Sep 17 00:00:00 2001 From: Henrique de Moraes Holschuh Date: Sat, 26 Dec 2009 22:52:16 -0200 Subject: thinkpad-acpi: update volume subdriver documentation Signed-off-by: Henrique de Moraes Holschuh Signed-off-by: Len Brown --- Documentation/laptops/thinkpad-acpi.txt | 58 ++++++++++++++++++++++++++++----- 1 file changed, 50 insertions(+), 8 deletions(-) (limited to 'Documentation') diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index 169091f..75afa12 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt @@ -1092,8 +1092,8 @@ WARNING: its level up and down at every change. -Volume control --------------- +Volume control (Console Audio control) +-------------------------------------- procfs: /proc/acpi/ibm/volume ALSA: "ThinkPad Console Audio Control", default ID: "ThinkPadEC" @@ -1110,9 +1110,53 @@ the desktop environment to just provide on-screen-display feedback. Software volume control should be done only in the main AC97/HDA mixer. -This feature allows volume control on ThinkPad models with a digital -volume knob (when available, not all models have it), as well as -mute/unmute control. The available commands are: + +About the ThinkPad Console Audio control: + +ThinkPads have a built-in amplifier and muting circuit that drives the +console headphone and speakers. This circuit is after the main AC97 +or HDA mixer in the audio path, and under exclusive control of the +firmware. + +ThinkPads have three special hotkeys to interact with the console +audio control: volume up, volume down and mute. + +It is worth noting that the normal way the mute function works (on +ThinkPads that do not have a "mute LED") is: + +1. Press mute to mute. It will *always* mute, you can press it as + many times as you want, and the sound will remain mute. + +2. Press either volume key to unmute the ThinkPad (it will _not_ + change the volume, it will just unmute). + +This is a very superior design when compared to the cheap software-only +mute-toggle solution found on normal consumer laptops: you can be +absolutely sure the ThinkPad will not make noise if you press the mute +button, no matter the previous state. + +The IBM ThinkPads, and the earlier Lenovo ThinkPads have variable-gain +amplifiers driving the speakers and headphone output, and the firmware +also handles volume control for the headphone and speakers on these +ThinkPads without any help from the operating system (this volume +control stage exists after the main AC97 or HDA mixer in the audio +path). + +The newer Lenovo models only have firmware mute control, and depend on +the main HDA mixer to do volume control (which is done by the operating +system). In this case, the volume keys are filtered out for unmute +key press (there are some firmware bugs in this area) and delivered as +normal key presses to the operating system (thinkpad-acpi is not +involved). + + +The ThinkPad-ACPI volume control: + +The preferred way to interact with the Console Audio control is the +ALSA interface. + +The legacy procfs interface allows one to read the current state, +and if volume control is enabled, accepts the following commands: echo up >/proc/acpi/ibm/volume echo down >/proc/acpi/ibm/volume @@ -1121,12 +1165,10 @@ mute/unmute control. The available commands are: echo 'level ' >/proc/acpi/ibm/volume The number range is 0 to 14 although not all of them may be -distinct. The unmute the volume after the mute command, use either the +distinct. To unmute the volume after the mute command, use either the up or down command (the level command will not unmute the volume), or the unmute command. -The current volume level and mute state is shown in the file. - You can use the volume_capabilities parameter to tell the driver whether your thinkpad has volume control or mute-only control: volume_capabilities=1 for mixers with mute and volume control, -- cgit v1.1 From dab4b911a5327859bb8f969249c6978c26cd4853 Mon Sep 17 00:00:00 2001 From: Jan Kiszka Date: Sun, 6 Dec 2009 18:24:15 +0100 Subject: KVM: x86: Extend KVM_SET_VCPU_EVENTS with selective updates User space may not want to overwrite asynchronously changing VCPU event states on write-back. So allow to skip nmi.pending and sipi_vector by setting corresponding bits in the flags field of kvm_vcpu_events. [avi: advertise the bits in KVM_GET_VCPU_EVENTS] Signed-off-by: Jan Kiszka Signed-off-by: Avi Kivity --- Documentation/kvm/api.txt | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index e1a1141..2811e45 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -685,7 +685,7 @@ struct kvm_vcpu_events { __u8 pad; } nmi; __u32 sipi_vector; - __u32 flags; /* must be zero */ + __u32 flags; }; 4.30 KVM_SET_VCPU_EVENTS @@ -701,6 +701,14 @@ vcpu. See KVM_GET_VCPU_EVENTS for the data structure. +Fields that may be modified asynchronously by running VCPUs can be excluded +from the update. These fields are nmi.pending and sipi_vector. Keep the +corresponding bits in the flags field cleared to suppress overwriting the +current in-kernel state. The bits are: + +KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel +KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector + 5. The kvm_run structure -- cgit v1.1 From d7f0eea9e431e1b8b0742a74db1a9490730b2a25 Mon Sep 17 00:00:00 2001 From: Zhang Rui Date: Wed, 30 Dec 2009 15:36:42 +0800 Subject: ACPI: introduce kernel parameter acpi_sleep=sci_force_enable Introduce kernel parameter acpi_sleep=sci_force_enable some laptop requires SCI_EN being set directly on resume, or else they hung somewhere in the resume code path. We already have a blacklist for these laptops but we still need this option, especially when debugging some suspend/resume problems, in case there are systems that need this workaround and are not yet in the blacklist. Signed-off-by: Zhang Rui Acked-by: Rafael J. Wysocki Signed-off-by: Len Brown --- Documentation/kernel-parameters.txt | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 5ba4d9d..736d456 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -240,7 +240,7 @@ and is between 256 and 4096 characters. It is defined in the file acpi_sleep= [HW,ACPI] Sleep options Format: { s3_bios, s3_mode, s3_beep, s4_nohwsig, - old_ordering, s4_nonvs } + old_ordering, s4_nonvs, sci_force_enable } See Documentation/power/video.txt for information on s3_bios and s3_mode. s3_beep is for debugging; it makes the PC's speaker beep @@ -253,6 +253,9 @@ and is between 256 and 4096 characters. It is defined in the file of _PTS is used by default). s4_nonvs prevents the kernel from saving/restoring the ACPI NVS memory during hibernation. + sci_force_enable causes the kernel to set SCI_EN directly + on resume from S1/S3 (which is against the ACPI spec, + but some broken systems don't work without it). acpi_use_timer_override [HW,ACPI] Use timer override. For some broken Nvidia NF5 boards -- cgit v1.1 From 6aff43f817ddc54fcd6f0215bfba5d334b0bbbbd Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Sat, 2 Jan 2010 21:41:53 +0900 Subject: nilfs2: update mailing list address This replaces the list address for nilfs discussion to linux-nilfs at vger.kernel.org from users at nilfs.org. Signed-off-by: Ryusuke Konishi --- Documentation/filesystems/nilfs2.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.txt index 4949fca..839efd8 100644 --- a/Documentation/filesystems/nilfs2.txt +++ b/Documentation/filesystems/nilfs2.txt @@ -28,7 +28,7 @@ described in the man pages included in the package. Project web page: http://www.nilfs.org/en/ Download page: http://www.nilfs.org/en/download.html Git tree web page: http://www.nilfs.org/git/ -NILFS mailing lists: http://www.nilfs.org/mailman/listinfo/users +List info: http://vger.kernel.org/vger-lists.html#linux-nilfs Caveats ======= -- cgit v1.1 From 143724fd3d3c154009fe95846dcbf7afadca8ab1 Mon Sep 17 00:00:00 2001 From: H Hartley Sweeten Date: Fri, 1 Jan 2010 20:35:41 -0800 Subject: Documentation: fix ioremap return type ioremap() returns a void __iomem * not a char *. Update the documentation file to reflect this. Signed-off-by: H Hartley Sweeten Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds --- Documentation/IO-mapping.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/IO-mapping.txt b/Documentation/IO-mapping.txt index 78a4406..1b5aa10 100644 --- a/Documentation/IO-mapping.txt +++ b/Documentation/IO-mapping.txt @@ -157,7 +157,7 @@ For such memory, you can do things like * access only the 640k-1MB area, so anything else * has to be remapped. */ - char * baseptr = ioremap(0xFC000000, 1024*1024); + void __iomem *baseptr = ioremap(0xFC000000, 1024*1024); /* write a 'A' to the offset 10 of the area */ writeb('A',baseptr+10); -- cgit v1.1 From 8d9f99c335ef66e4c44afe8f61816b0edeafba91 Mon Sep 17 00:00:00 2001 From: H Hartley Sweeten Date: Fri, 1 Jan 2010 20:35:54 -0800 Subject: DocBook: fix ioremap return type ioremap() returns a void __iomem * not an unsigned long. Update the Documentation file to reflect this. Signed-off-by: H Hartley Sweeten Signed-off-by: Randy Dunlap Cc: David Woodhouse Signed-off-by: Linus Torvalds --- Documentation/DocBook/mtdnand.tmpl | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/DocBook/mtdnand.tmpl b/Documentation/DocBook/mtdnand.tmpl index f508a8a..5e7d84b 100644 --- a/Documentation/DocBook/mtdnand.tmpl +++ b/Documentation/DocBook/mtdnand.tmpl @@ -174,7 +174,7 @@ static struct mtd_info *board_mtd; -static unsigned long baseaddr; +static void __iomem *baseaddr; Static example @@ -182,7 +182,7 @@ static unsigned long baseaddr; static struct mtd_info board_mtd; static struct nand_chip board_chip; -static unsigned long baseaddr; +static void __iomem *baseaddr; @@ -283,8 +283,8 @@ int __init board_init (void) } /* map physical address */ - baseaddr = (unsigned long)ioremap(CHIP_PHYSICAL_ADDRESS, 1024); - if(!baseaddr){ + baseaddr = ioremap(CHIP_PHYSICAL_ADDRESS, 1024); + if (!baseaddr) { printk("Ioremap to access NAND chip failed\n"); err = -EIO; goto out_mtd; @@ -316,7 +316,7 @@ int __init board_init (void) goto out; out_ior: - iounmap((void *)baseaddr); + iounmap(baseaddr); out_mtd: kfree (board_mtd); out: @@ -341,7 +341,7 @@ static void __exit board_cleanup (void) nand_release (board_mtd); /* unmap physical address */ - iounmap((void *)baseaddr); + iounmap(baseaddr); /* Free the MTD device structure */ kfree (board_mtd); -- cgit v1.1 From 4207a152bc242effd0b8231143aa5b9f7a1593a9 Mon Sep 17 00:00:00 2001 From: Kusanagi Kouichi Date: Fri, 1 Jan 2010 20:36:09 -0800 Subject: Documentation: Rename Documentation/DMA-mapping.txt It seems that Documentation/DMA-mapping.txt was supposed to be renamed to Documentation/PCI/PCI-DMA-mapping.txt. Signed-off-by: Kusanagi Kouichi Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds --- Documentation/DMA-mapping.txt | 766 ---------------------------------- Documentation/PCI/PCI-DMA-mapping.txt | 766 ++++++++++++++++++++++++++++++++++ Documentation/block/biodoc.txt | 2 +- 3 files changed, 767 insertions(+), 767 deletions(-) delete mode 100644 Documentation/DMA-mapping.txt create mode 100644 Documentation/PCI/PCI-DMA-mapping.txt (limited to 'Documentation') diff --git a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt deleted file mode 100644 index ecad88d..0000000 --- a/Documentation/DMA-mapping.txt +++ /dev/null @@ -1,766 +0,0 @@ - Dynamic DMA mapping - =================== - - David S. Miller - Richard Henderson - Jakub Jelinek - -This document describes the DMA mapping system in terms of the pci_ -API. For a similar API that works for generic devices, see -DMA-API.txt. - -Most of the 64bit platforms have special hardware that translates bus -addresses (DMA addresses) into physical addresses. This is similar to -how page tables and/or a TLB translates virtual addresses to physical -addresses on a CPU. This is needed so that e.g. PCI devices can -access with a Single Address Cycle (32bit DMA address) any page in the -64bit physical address space. Previously in Linux those 64bit -platforms had to set artificial limits on the maximum RAM size in the -system, so that the virt_to_bus() static scheme works (the DMA address -translation tables were simply filled on bootup to map each bus -address to the physical page __pa(bus_to_virt())). - -So that Linux can use the dynamic DMA mapping, it needs some help from the -drivers, namely it has to take into account that DMA addresses should be -mapped only for the time they are actually used and unmapped after the DMA -transfer. - -The following API will work of course even on platforms where no such -hardware exists, see e.g. arch/x86/include/asm/pci.h for how it is implemented on -top of the virt_to_bus interface. - -First of all, you should make sure - -#include - -is in your driver. This file will obtain for you the definition of the -dma_addr_t (which can hold any valid DMA address for the platform) -type which should be used everywhere you hold a DMA (bus) address -returned from the DMA mapping functions. - - What memory is DMA'able? - -The first piece of information you must know is what kernel memory can -be used with the DMA mapping facilities. There has been an unwritten -set of rules regarding this, and this text is an attempt to finally -write them down. - -If you acquired your memory via the page allocator -(i.e. __get_free_page*()) or the generic memory allocators -(i.e. kmalloc() or kmem_cache_alloc()) then you may DMA to/from -that memory using the addresses returned from those routines. - -This means specifically that you may _not_ use the memory/addresses -returned from vmalloc() for DMA. It is possible to DMA to the -_underlying_ memory mapped into a vmalloc() area, but this requires -walking page tables to get the physical addresses, and then -translating each of those pages back to a kernel address using -something like __va(). [ EDIT: Update this when we integrate -Gerd Knorr's generic code which does this. ] - -This rule also means that you may use neither kernel image addresses -(items in data/text/bss segments), nor module image addresses, nor -stack addresses for DMA. These could all be mapped somewhere entirely -different than the rest of physical memory. Even if those classes of -memory could physically work with DMA, you'd need to ensure the I/O -buffers were cacheline-aligned. Without that, you'd see cacheline -sharing problems (data corruption) on CPUs with DMA-incoherent caches. -(The CPU could write to one word, DMA would write to a different one -in the same cache line, and one of them could be overwritten.) - -Also, this means that you cannot take the return of a kmap() -call and DMA to/from that. This is similar to vmalloc(). - -What about block I/O and networking buffers? The block I/O and -networking subsystems make sure that the buffers they use are valid -for you to DMA from/to. - - DMA addressing limitations - -Does your device have any DMA addressing limitations? For example, is -your device only capable of driving the low order 24-bits of address -on the PCI bus for SAC DMA transfers? If so, you need to inform the -PCI layer of this fact. - -By default, the kernel assumes that your device can address the full -32-bits in a SAC cycle. For a 64-bit DAC capable device, this needs -to be increased. And for a device with limitations, as discussed in -the previous paragraph, it needs to be decreased. - -pci_alloc_consistent() by default will return 32-bit DMA addresses. -PCI-X specification requires PCI-X devices to support 64-bit -addressing (DAC) for all transactions. And at least one platform (SGI -SN2) requires 64-bit consistent allocations to operate correctly when -the IO bus is in PCI-X mode. Therefore, like with pci_set_dma_mask(), -it's good practice to call pci_set_consistent_dma_mask() to set the -appropriate mask even if your device only supports 32-bit DMA -(default) and especially if it's a PCI-X device. - -For correct operation, you must interrogate the PCI layer in your -device probe routine to see if the PCI controller on the machine can -properly support the DMA addressing limitation your device has. It is -good style to do this even if your device holds the default setting, -because this shows that you did think about these issues wrt. your -device. - -The query is performed via a call to pci_set_dma_mask(): - - int pci_set_dma_mask(struct pci_dev *pdev, u64 device_mask); - -The query for consistent allocations is performed via a call to -pci_set_consistent_dma_mask(): - - int pci_set_consistent_dma_mask(struct pci_dev *pdev, u64 device_mask); - -Here, pdev is a pointer to the PCI device struct of your device, and -device_mask is a bit mask describing which bits of a PCI address your -device supports. It returns zero if your card can perform DMA -properly on the machine given the address mask you provided. - -If it returns non-zero, your device cannot perform DMA properly on -this platform, and attempting to do so will result in undefined -behavior. You must either use a different mask, or not use DMA. - -This means that in the failure case, you have three options: - -1) Use another DMA mask, if possible (see below). -2) Use some non-DMA mode for data transfer, if possible. -3) Ignore this device and do not initialize it. - -It is recommended that your driver print a kernel KERN_WARNING message -when you end up performing either #2 or #3. In this manner, if a user -of your driver reports that performance is bad or that the device is not -even detected, you can ask them for the kernel messages to find out -exactly why. - -The standard 32-bit addressing PCI device would do something like -this: - - if (pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) { - printk(KERN_WARNING - "mydev: No suitable DMA available.\n"); - goto ignore_this_device; - } - -Another common scenario is a 64-bit capable device. The approach -here is to try for 64-bit DAC addressing, but back down to a -32-bit mask should that fail. The PCI platform code may fail the -64-bit mask not because the platform is not capable of 64-bit -addressing. Rather, it may fail in this case simply because -32-bit SAC addressing is done more efficiently than DAC addressing. -Sparc64 is one platform which behaves in this way. - -Here is how you would handle a 64-bit capable device which can drive -all 64-bits when accessing streaming DMA: - - int using_dac; - - if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) { - using_dac = 1; - } else if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) { - using_dac = 0; - } else { - printk(KERN_WARNING - "mydev: No suitable DMA available.\n"); - goto ignore_this_device; - } - -If a card is capable of using 64-bit consistent allocations as well, -the case would look like this: - - int using_dac, consistent_using_dac; - - if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) { - using_dac = 1; - consistent_using_dac = 1; - pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)); - } else if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) { - using_dac = 0; - consistent_using_dac = 0; - pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32)); - } else { - printk(KERN_WARNING - "mydev: No suitable DMA available.\n"); - goto ignore_this_device; - } - -pci_set_consistent_dma_mask() will always be able to set the same or a -smaller mask as pci_set_dma_mask(). However for the rare case that a -device driver only uses consistent allocations, one would have to -check the return value from pci_set_consistent_dma_mask(). - -Finally, if your device can only drive the low 24-bits of -address during PCI bus mastering you might do something like: - - if (pci_set_dma_mask(pdev, DMA_BIT_MASK(24))) { - printk(KERN_WARNING - "mydev: 24-bit DMA addressing not available.\n"); - goto ignore_this_device; - } - -When pci_set_dma_mask() is successful, and returns zero, the PCI layer -saves away this mask you have provided. The PCI layer will use this -information later when you make DMA mappings. - -There is a case which we are aware of at this time, which is worth -mentioning in this documentation. If your device supports multiple -functions (for example a sound card provides playback and record -functions) and the various different functions have _different_ -DMA addressing limitations, you may wish to probe each mask and -only provide the functionality which the machine can handle. It -is important that the last call to pci_set_dma_mask() be for the -most specific mask. - -Here is pseudo-code showing how this might be done: - - #define PLAYBACK_ADDRESS_BITS DMA_BIT_MASK(32) - #define RECORD_ADDRESS_BITS DMA_BIT_MASK(24) - - struct my_sound_card *card; - struct pci_dev *pdev; - - ... - if (!pci_set_dma_mask(pdev, PLAYBACK_ADDRESS_BITS)) { - card->playback_enabled = 1; - } else { - card->playback_enabled = 0; - printk(KERN_WARNING "%s: Playback disabled due to DMA limitations.\n", - card->name); - } - if (!pci_set_dma_mask(pdev, RECORD_ADDRESS_BITS)) { - card->record_enabled = 1; - } else { - card->record_enabled = 0; - printk(KERN_WARNING "%s: Record disabled due to DMA limitations.\n", - card->name); - } - -A sound card was used as an example here because this genre of PCI -devices seems to be littered with ISA chips given a PCI front end, -and thus retaining the 16MB DMA addressing limitations of ISA. - - Types of DMA mappings - -There are two types of DMA mappings: - -- Consistent DMA mappings which are usually mapped at driver - initialization, unmapped at the end and for which the hardware should - guarantee that the device and the CPU can access the data - in parallel and will see updates made by each other without any - explicit software flushing. - - Think of "consistent" as "synchronous" or "coherent". - - The current default is to return consistent memory in the low 32 - bits of the PCI bus space. However, for future compatibility you - should set the consistent mask even if this default is fine for your - driver. - - Good examples of what to use consistent mappings for are: - - - Network card DMA ring descriptors. - - SCSI adapter mailbox command data structures. - - Device firmware microcode executed out of - main memory. - - The invariant these examples all require is that any CPU store - to memory is immediately visible to the device, and vice - versa. Consistent mappings guarantee this. - - IMPORTANT: Consistent DMA memory does not preclude the usage of - proper memory barriers. The CPU may reorder stores to - consistent memory just as it may normal memory. Example: - if it is important for the device to see the first word - of a descriptor updated before the second, you must do - something like: - - desc->word0 = address; - wmb(); - desc->word1 = DESC_VALID; - - in order to get correct behavior on all platforms. - - Also, on some platforms your driver may need to flush CPU write - buffers in much the same way as it needs to flush write buffers - found in PCI bridges (such as by reading a register's value - after writing it). - -- Streaming DMA mappings which are usually mapped for one DMA transfer, - unmapped right after it (unless you use pci_dma_sync_* below) and for which - hardware can optimize for sequential accesses. - - This of "streaming" as "asynchronous" or "outside the coherency - domain". - - Good examples of what to use streaming mappings for are: - - - Networking buffers transmitted/received by a device. - - Filesystem buffers written/read by a SCSI device. - - The interfaces for using this type of mapping were designed in - such a way that an implementation can make whatever performance - optimizations the hardware allows. To this end, when using - such mappings you must be explicit about what you want to happen. - -Neither type of DMA mapping has alignment restrictions that come -from PCI, although some devices may have such restrictions. -Also, systems with caches that aren't DMA-coherent will work better -when the underlying buffers don't share cache lines with other data. - - - Using Consistent DMA mappings. - -To allocate and map large (PAGE_SIZE or so) consistent DMA regions, -you should do: - - dma_addr_t dma_handle; - - cpu_addr = pci_alloc_consistent(pdev, size, &dma_handle); - -where pdev is a struct pci_dev *. This may be called in interrupt context. -You should use dma_alloc_coherent (see DMA-API.txt) for buses -where devices don't have struct pci_dev (like ISA, EISA). - -This argument is needed because the DMA translations may be bus -specific (and often is private to the bus which the device is attached -to). - -Size is the length of the region you want to allocate, in bytes. - -This routine will allocate RAM for that region, so it acts similarly to -__get_free_pages (but takes size instead of a page order). If your -driver needs regions sized smaller than a page, you may prefer using -the pci_pool interface, described below. - -The consistent DMA mapping interfaces, for non-NULL pdev, will by -default return a DMA address which is SAC (Single Address Cycle) -addressable. Even if the device indicates (via PCI dma mask) that it -may address the upper 32-bits and thus perform DAC cycles, consistent -allocation will only return > 32-bit PCI addresses for DMA if the -consistent dma mask has been explicitly changed via -pci_set_consistent_dma_mask(). This is true of the pci_pool interface -as well. - -pci_alloc_consistent returns two values: the virtual address which you -can use to access it from the CPU and dma_handle which you pass to the -card. - -The cpu return address and the DMA bus master address are both -guaranteed to be aligned to the smallest PAGE_SIZE order which -is greater than or equal to the requested size. This invariant -exists (for example) to guarantee that if you allocate a chunk -which is smaller than or equal to 64 kilobytes, the extent of the -buffer you receive will not cross a 64K boundary. - -To unmap and free such a DMA region, you call: - - pci_free_consistent(pdev, size, cpu_addr, dma_handle); - -where pdev, size are the same as in the above call and cpu_addr and -dma_handle are the values pci_alloc_consistent returned to you. -This function may not be called in interrupt context. - -If your driver needs lots of smaller memory regions, you can write -custom code to subdivide pages returned by pci_alloc_consistent, -or you can use the pci_pool API to do that. A pci_pool is like -a kmem_cache, but it uses pci_alloc_consistent not __get_free_pages. -Also, it understands common hardware constraints for alignment, -like queue heads needing to be aligned on N byte boundaries. - -Create a pci_pool like this: - - struct pci_pool *pool; - - pool = pci_pool_create(name, pdev, size, align, alloc); - -The "name" is for diagnostics (like a kmem_cache name); pdev and size -are as above. The device's hardware alignment requirement for this -type of data is "align" (which is expressed in bytes, and must be a -power of two). If your device has no boundary crossing restrictions, -pass 0 for alloc; passing 4096 says memory allocated from this pool -must not cross 4KByte boundaries (but at that time it may be better to -go for pci_alloc_consistent directly instead). - -Allocate memory from a pci pool like this: - - cpu_addr = pci_pool_alloc(pool, flags, &dma_handle); - -flags are SLAB_KERNEL if blocking is permitted (not in_interrupt nor -holding SMP locks), SLAB_ATOMIC otherwise. Like pci_alloc_consistent, -this returns two values, cpu_addr and dma_handle. - -Free memory that was allocated from a pci_pool like this: - - pci_pool_free(pool, cpu_addr, dma_handle); - -where pool is what you passed to pci_pool_alloc, and cpu_addr and -dma_handle are the values pci_pool_alloc returned. This function -may be called in interrupt context. - -Destroy a pci_pool by calling: - - pci_pool_destroy(pool); - -Make sure you've called pci_pool_free for all memory allocated -from a pool before you destroy the pool. This function may not -be called in interrupt context. - - DMA Direction - -The interfaces described in subsequent portions of this document -take a DMA direction argument, which is an integer and takes on -one of the following values: - - PCI_DMA_BIDIRECTIONAL - PCI_DMA_TODEVICE - PCI_DMA_FROMDEVICE - PCI_DMA_NONE - -One should provide the exact DMA direction if you know it. - -PCI_DMA_TODEVICE means "from main memory to the PCI device" -PCI_DMA_FROMDEVICE means "from the PCI device to main memory" -It is the direction in which the data moves during the DMA -transfer. - -You are _strongly_ encouraged to specify this as precisely -as you possibly can. - -If you absolutely cannot know the direction of the DMA transfer, -specify PCI_DMA_BIDIRECTIONAL. It means that the DMA can go in -either direction. The platform guarantees that you may legally -specify this, and that it will work, but this may be at the -cost of performance for example. - -The value PCI_DMA_NONE is to be used for debugging. One can -hold this in a data structure before you come to know the -precise direction, and this will help catch cases where your -direction tracking logic has failed to set things up properly. - -Another advantage of specifying this value precisely (outside of -potential platform-specific optimizations of such) is for debugging. -Some platforms actually have a write permission boolean which DMA -mappings can be marked with, much like page protections in the user -program address space. Such platforms can and do report errors in the -kernel logs when the PCI controller hardware detects violation of the -permission setting. - -Only streaming mappings specify a direction, consistent mappings -implicitly have a direction attribute setting of -PCI_DMA_BIDIRECTIONAL. - -The SCSI subsystem tells you the direction to use in the -'sc_data_direction' member of the SCSI command your driver is -working on. - -For Networking drivers, it's a rather simple affair. For transmit -packets, map/unmap them with the PCI_DMA_TODEVICE direction -specifier. For receive packets, just the opposite, map/unmap them -with the PCI_DMA_FROMDEVICE direction specifier. - - Using Streaming DMA mappings - -The streaming DMA mapping routines can be called from interrupt -context. There are two versions of each map/unmap, one which will -map/unmap a single memory region, and one which will map/unmap a -scatterlist. - -To map a single region, you do: - - struct pci_dev *pdev = mydev->pdev; - dma_addr_t dma_handle; - void *addr = buffer->ptr; - size_t size = buffer->len; - - dma_handle = pci_map_single(pdev, addr, size, direction); - -and to unmap it: - - pci_unmap_single(pdev, dma_handle, size, direction); - -You should call pci_unmap_single when the DMA activity is finished, e.g. -from the interrupt which told you that the DMA transfer is done. - -Using cpu pointers like this for single mappings has a disadvantage, -you cannot reference HIGHMEM memory in this way. Thus, there is a -map/unmap interface pair akin to pci_{map,unmap}_single. These -interfaces deal with page/offset pairs instead of cpu pointers. -Specifically: - - struct pci_dev *pdev = mydev->pdev; - dma_addr_t dma_handle; - struct page *page = buffer->page; - unsigned long offset = buffer->offset; - size_t size = buffer->len; - - dma_handle = pci_map_page(pdev, page, offset, size, direction); - - ... - - pci_unmap_page(pdev, dma_handle, size, direction); - -Here, "offset" means byte offset within the given page. - -With scatterlists, you map a region gathered from several regions by: - - int i, count = pci_map_sg(pdev, sglist, nents, direction); - struct scatterlist *sg; - - for_each_sg(sglist, sg, count, i) { - hw_address[i] = sg_dma_address(sg); - hw_len[i] = sg_dma_len(sg); - } - -where nents is the number of entries in the sglist. - -The implementation is free to merge several consecutive sglist entries -into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any -consecutive sglist entries can be merged into one provided the first one -ends and the second one starts on a page boundary - in fact this is a huge -advantage for cards which either cannot do scatter-gather or have very -limited number of scatter-gather entries) and returns the actual number -of sg entries it mapped them to. On failure 0 is returned. - -Then you should loop count times (note: this can be less than nents times) -and use sg_dma_address() and sg_dma_len() macros where you previously -accessed sg->address and sg->length as shown above. - -To unmap a scatterlist, just call: - - pci_unmap_sg(pdev, sglist, nents, direction); - -Again, make sure DMA activity has already finished. - -PLEASE NOTE: The 'nents' argument to the pci_unmap_sg call must be - the _same_ one you passed into the pci_map_sg call, - it should _NOT_ be the 'count' value _returned_ from the - pci_map_sg call. - -Every pci_map_{single,sg} call should have its pci_unmap_{single,sg} -counterpart, because the bus address space is a shared resource (although -in some ports the mapping is per each BUS so less devices contend for the -same bus address space) and you could render the machine unusable by eating -all bus addresses. - -If you need to use the same streaming DMA region multiple times and touch -the data in between the DMA transfers, the buffer needs to be synced -properly in order for the cpu and device to see the most uptodate and -correct copy of the DMA buffer. - -So, firstly, just map it with pci_map_{single,sg}, and after each DMA -transfer call either: - - pci_dma_sync_single_for_cpu(pdev, dma_handle, size, direction); - -or: - - pci_dma_sync_sg_for_cpu(pdev, sglist, nents, direction); - -as appropriate. - -Then, if you wish to let the device get at the DMA area again, -finish accessing the data with the cpu, and then before actually -giving the buffer to the hardware call either: - - pci_dma_sync_single_for_device(pdev, dma_handle, size, direction); - -or: - - pci_dma_sync_sg_for_device(dev, sglist, nents, direction); - -as appropriate. - -After the last DMA transfer call one of the DMA unmap routines -pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_* -call till pci_unmap_*, then you don't have to call the pci_dma_sync_* -routines at all. - -Here is pseudo code which shows a situation in which you would need -to use the pci_dma_sync_*() interfaces. - - my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len) - { - dma_addr_t mapping; - - mapping = pci_map_single(cp->pdev, buffer, len, PCI_DMA_FROMDEVICE); - - cp->rx_buf = buffer; - cp->rx_len = len; - cp->rx_dma = mapping; - - give_rx_buf_to_card(cp); - } - - ... - - my_card_interrupt_handler(int irq, void *devid, struct pt_regs *regs) - { - struct my_card *cp = devid; - - ... - if (read_card_status(cp) == RX_BUF_TRANSFERRED) { - struct my_card_header *hp; - - /* Examine the header to see if we wish - * to accept the data. But synchronize - * the DMA transfer with the CPU first - * so that we see updated contents. - */ - pci_dma_sync_single_for_cpu(cp->pdev, cp->rx_dma, - cp->rx_len, - PCI_DMA_FROMDEVICE); - - /* Now it is safe to examine the buffer. */ - hp = (struct my_card_header *) cp->rx_buf; - if (header_is_ok(hp)) { - pci_unmap_single(cp->pdev, cp->rx_dma, cp->rx_len, - PCI_DMA_FROMDEVICE); - pass_to_upper_layers(cp->rx_buf); - make_and_setup_new_rx_buf(cp); - } else { - /* Just sync the buffer and give it back - * to the card. - */ - pci_dma_sync_single_for_device(cp->pdev, - cp->rx_dma, - cp->rx_len, - PCI_DMA_FROMDEVICE); - give_rx_buf_to_card(cp); - } - } - } - -Drivers converted fully to this interface should not use virt_to_bus any -longer, nor should they use bus_to_virt. Some drivers have to be changed a -little bit, because there is no longer an equivalent to bus_to_virt in the -dynamic DMA mapping scheme - you have to always store the DMA addresses -returned by the pci_alloc_consistent, pci_pool_alloc, and pci_map_single -calls (pci_map_sg stores them in the scatterlist itself if the platform -supports dynamic DMA mapping in hardware) in your driver structures and/or -in the card registers. - -All PCI drivers should be using these interfaces with no exceptions. -It is planned to completely remove virt_to_bus() and bus_to_virt() as -they are entirely deprecated. Some ports already do not provide these -as it is impossible to correctly support them. - - Optimizing Unmap State Space Consumption - -On many platforms, pci_unmap_{single,page}() is simply a nop. -Therefore, keeping track of the mapping address and length is a waste -of space. Instead of filling your drivers up with ifdefs and the like -to "work around" this (which would defeat the whole purpose of a -portable API) the following facilities are provided. - -Actually, instead of describing the macros one by one, we'll -transform some example code. - -1) Use DECLARE_PCI_UNMAP_{ADDR,LEN} in state saving structures. - Example, before: - - struct ring_state { - struct sk_buff *skb; - dma_addr_t mapping; - __u32 len; - }; - - after: - - struct ring_state { - struct sk_buff *skb; - DECLARE_PCI_UNMAP_ADDR(mapping) - DECLARE_PCI_UNMAP_LEN(len) - }; - - NOTE: DO NOT put a semicolon at the end of the DECLARE_*() - macro. - -2) Use pci_unmap_{addr,len}_set to set these values. - Example, before: - - ringp->mapping = FOO; - ringp->len = BAR; - - after: - - pci_unmap_addr_set(ringp, mapping, FOO); - pci_unmap_len_set(ringp, len, BAR); - -3) Use pci_unmap_{addr,len} to access these values. - Example, before: - - pci_unmap_single(pdev, ringp->mapping, ringp->len, - PCI_DMA_FROMDEVICE); - - after: - - pci_unmap_single(pdev, - pci_unmap_addr(ringp, mapping), - pci_unmap_len(ringp, len), - PCI_DMA_FROMDEVICE); - -It really should be self-explanatory. We treat the ADDR and LEN -separately, because it is possible for an implementation to only -need the address in order to perform the unmap operation. - - Platform Issues - -If you are just writing drivers for Linux and do not maintain -an architecture port for the kernel, you can safely skip down -to "Closing". - -1) Struct scatterlist requirements. - - Struct scatterlist must contain, at a minimum, the following - members: - - struct page *page; - unsigned int offset; - unsigned int length; - - The base address is specified by a "page+offset" pair. - - Previous versions of struct scatterlist contained a "void *address" - field that was sometimes used instead of page+offset. As of Linux - 2.5., page+offset is always used, and the "address" field has been - deleted. - -2) More to come... - - Handling Errors - -DMA address space is limited on some architectures and an allocation -failure can be determined by: - -- checking if pci_alloc_consistent returns NULL or pci_map_sg returns 0 - -- checking the returned dma_addr_t of pci_map_single and pci_map_page - by using pci_dma_mapping_error(): - - dma_addr_t dma_handle; - - dma_handle = pci_map_single(pdev, addr, size, direction); - if (pci_dma_mapping_error(pdev, dma_handle)) { - /* - * reduce current DMA mapping usage, - * delay and try again later or - * reset driver. - */ - } - - Closing - -This document, and the API itself, would not be in it's current -form without the feedback and suggestions from numerous individuals. -We would like to specifically mention, in no particular order, the -following people: - - Russell King - Leo Dagum - Ralf Baechle - Grant Grundler - Jay Estabrook - Thomas Sailer - Andrea Arcangeli - Jens Axboe - David Mosberger-Tang diff --git a/Documentation/PCI/PCI-DMA-mapping.txt b/Documentation/PCI/PCI-DMA-mapping.txt new file mode 100644 index 0000000..ecad88d --- /dev/null +++ b/Documentation/PCI/PCI-DMA-mapping.txt @@ -0,0 +1,766 @@ + Dynamic DMA mapping + =================== + + David S. Miller + Richard Henderson + Jakub Jelinek + +This document describes the DMA mapping system in terms of the pci_ +API. For a similar API that works for generic devices, see +DMA-API.txt. + +Most of the 64bit platforms have special hardware that translates bus +addresses (DMA addresses) into physical addresses. This is similar to +how page tables and/or a TLB translates virtual addresses to physical +addresses on a CPU. This is needed so that e.g. PCI devices can +access with a Single Address Cycle (32bit DMA address) any page in the +64bit physical address space. Previously in Linux those 64bit +platforms had to set artificial limits on the maximum RAM size in the +system, so that the virt_to_bus() static scheme works (the DMA address +translation tables were simply filled on bootup to map each bus +address to the physical page __pa(bus_to_virt())). + +So that Linux can use the dynamic DMA mapping, it needs some help from the +drivers, namely it has to take into account that DMA addresses should be +mapped only for the time they are actually used and unmapped after the DMA +transfer. + +The following API will work of course even on platforms where no such +hardware exists, see e.g. arch/x86/include/asm/pci.h for how it is implemented on +top of the virt_to_bus interface. + +First of all, you should make sure + +#include + +is in your driver. This file will obtain for you the definition of the +dma_addr_t (which can hold any valid DMA address for the platform) +type which should be used everywhere you hold a DMA (bus) address +returned from the DMA mapping functions. + + What memory is DMA'able? + +The first piece of information you must know is what kernel memory can +be used with the DMA mapping facilities. There has been an unwritten +set of rules regarding this, and this text is an attempt to finally +write them down. + +If you acquired your memory via the page allocator +(i.e. __get_free_page*()) or the generic memory allocators +(i.e. kmalloc() or kmem_cache_alloc()) then you may DMA to/from +that memory using the addresses returned from those routines. + +This means specifically that you may _not_ use the memory/addresses +returned from vmalloc() for DMA. It is possible to DMA to the +_underlying_ memory mapped into a vmalloc() area, but this requires +walking page tables to get the physical addresses, and then +translating each of those pages back to a kernel address using +something like __va(). [ EDIT: Update this when we integrate +Gerd Knorr's generic code which does this. ] + +This rule also means that you may use neither kernel image addresses +(items in data/text/bss segments), nor module image addresses, nor +stack addresses for DMA. These could all be mapped somewhere entirely +different than the rest of physical memory. Even if those classes of +memory could physically work with DMA, you'd need to ensure the I/O +buffers were cacheline-aligned. Without that, you'd see cacheline +sharing problems (data corruption) on CPUs with DMA-incoherent caches. +(The CPU could write to one word, DMA would write to a different one +in the same cache line, and one of them could be overwritten.) + +Also, this means that you cannot take the return of a kmap() +call and DMA to/from that. This is similar to vmalloc(). + +What about block I/O and networking buffers? The block I/O and +networking subsystems make sure that the buffers they use are valid +for you to DMA from/to. + + DMA addressing limitations + +Does your device have any DMA addressing limitations? For example, is +your device only capable of driving the low order 24-bits of address +on the PCI bus for SAC DMA transfers? If so, you need to inform the +PCI layer of this fact. + +By default, the kernel assumes that your device can address the full +32-bits in a SAC cycle. For a 64-bit DAC capable device, this needs +to be increased. And for a device with limitations, as discussed in +the previous paragraph, it needs to be decreased. + +pci_alloc_consistent() by default will return 32-bit DMA addresses. +PCI-X specification requires PCI-X devices to support 64-bit +addressing (DAC) for all transactions. And at least one platform (SGI +SN2) requires 64-bit consistent allocations to operate correctly when +the IO bus is in PCI-X mode. Therefore, like with pci_set_dma_mask(), +it's good practice to call pci_set_consistent_dma_mask() to set the +appropriate mask even if your device only supports 32-bit DMA +(default) and especially if it's a PCI-X device. + +For correct operation, you must interrogate the PCI layer in your +device probe routine to see if the PCI controller on the machine can +properly support the DMA addressing limitation your device has. It is +good style to do this even if your device holds the default setting, +because this shows that you did think about these issues wrt. your +device. + +The query is performed via a call to pci_set_dma_mask(): + + int pci_set_dma_mask(struct pci_dev *pdev, u64 device_mask); + +The query for consistent allocations is performed via a call to +pci_set_consistent_dma_mask(): + + int pci_set_consistent_dma_mask(struct pci_dev *pdev, u64 device_mask); + +Here, pdev is a pointer to the PCI device struct of your device, and +device_mask is a bit mask describing which bits of a PCI address your +device supports. It returns zero if your card can perform DMA +properly on the machine given the address mask you provided. + +If it returns non-zero, your device cannot perform DMA properly on +this platform, and attempting to do so will result in undefined +behavior. You must either use a different mask, or not use DMA. + +This means that in the failure case, you have three options: + +1) Use another DMA mask, if possible (see below). +2) Use some non-DMA mode for data transfer, if possible. +3) Ignore this device and do not initialize it. + +It is recommended that your driver print a kernel KERN_WARNING message +when you end up performing either #2 or #3. In this manner, if a user +of your driver reports that performance is bad or that the device is not +even detected, you can ask them for the kernel messages to find out +exactly why. + +The standard 32-bit addressing PCI device would do something like +this: + + if (pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) { + printk(KERN_WARNING + "mydev: No suitable DMA available.\n"); + goto ignore_this_device; + } + +Another common scenario is a 64-bit capable device. The approach +here is to try for 64-bit DAC addressing, but back down to a +32-bit mask should that fail. The PCI platform code may fail the +64-bit mask not because the platform is not capable of 64-bit +addressing. Rather, it may fail in this case simply because +32-bit SAC addressing is done more efficiently than DAC addressing. +Sparc64 is one platform which behaves in this way. + +Here is how you would handle a 64-bit capable device which can drive +all 64-bits when accessing streaming DMA: + + int using_dac; + + if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) { + using_dac = 1; + } else if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) { + using_dac = 0; + } else { + printk(KERN_WARNING + "mydev: No suitable DMA available.\n"); + goto ignore_this_device; + } + +If a card is capable of using 64-bit consistent allocations as well, +the case would look like this: + + int using_dac, consistent_using_dac; + + if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) { + using_dac = 1; + consistent_using_dac = 1; + pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)); + } else if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) { + using_dac = 0; + consistent_using_dac = 0; + pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32)); + } else { + printk(KERN_WARNING + "mydev: No suitable DMA available.\n"); + goto ignore_this_device; + } + +pci_set_consistent_dma_mask() will always be able to set the same or a +smaller mask as pci_set_dma_mask(). However for the rare case that a +device driver only uses consistent allocations, one would have to +check the return value from pci_set_consistent_dma_mask(). + +Finally, if your device can only drive the low 24-bits of +address during PCI bus mastering you might do something like: + + if (pci_set_dma_mask(pdev, DMA_BIT_MASK(24))) { + printk(KERN_WARNING + "mydev: 24-bit DMA addressing not available.\n"); + goto ignore_this_device; + } + +When pci_set_dma_mask() is successful, and returns zero, the PCI layer +saves away this mask you have provided. The PCI layer will use this +information later when you make DMA mappings. + +There is a case which we are aware of at this time, which is worth +mentioning in this documentation. If your device supports multiple +functions (for example a sound card provides playback and record +functions) and the various different functions have _different_ +DMA addressing limitations, you may wish to probe each mask and +only provide the functionality which the machine can handle. It +is important that the last call to pci_set_dma_mask() be for the +most specific mask. + +Here is pseudo-code showing how this might be done: + + #define PLAYBACK_ADDRESS_BITS DMA_BIT_MASK(32) + #define RECORD_ADDRESS_BITS DMA_BIT_MASK(24) + + struct my_sound_card *card; + struct pci_dev *pdev; + + ... + if (!pci_set_dma_mask(pdev, PLAYBACK_ADDRESS_BITS)) { + card->playback_enabled = 1; + } else { + card->playback_enabled = 0; + printk(KERN_WARNING "%s: Playback disabled due to DMA limitations.\n", + card->name); + } + if (!pci_set_dma_mask(pdev, RECORD_ADDRESS_BITS)) { + card->record_enabled = 1; + } else { + card->record_enabled = 0; + printk(KERN_WARNING "%s: Record disabled due to DMA limitations.\n", + card->name); + } + +A sound card was used as an example here because this genre of PCI +devices seems to be littered with ISA chips given a PCI front end, +and thus retaining the 16MB DMA addressing limitations of ISA. + + Types of DMA mappings + +There are two types of DMA mappings: + +- Consistent DMA mappings which are usually mapped at driver + initialization, unmapped at the end and for which the hardware should + guarantee that the device and the CPU can access the data + in parallel and will see updates made by each other without any + explicit software flushing. + + Think of "consistent" as "synchronous" or "coherent". + + The current default is to return consistent memory in the low 32 + bits of the PCI bus space. However, for future compatibility you + should set the consistent mask even if this default is fine for your + driver. + + Good examples of what to use consistent mappings for are: + + - Network card DMA ring descriptors. + - SCSI adapter mailbox command data structures. + - Device firmware microcode executed out of + main memory. + + The invariant these examples all require is that any CPU store + to memory is immediately visible to the device, and vice + versa. Consistent mappings guarantee this. + + IMPORTANT: Consistent DMA memory does not preclude the usage of + proper memory barriers. The CPU may reorder stores to + consistent memory just as it may normal memory. Example: + if it is important for the device to see the first word + of a descriptor updated before the second, you must do + something like: + + desc->word0 = address; + wmb(); + desc->word1 = DESC_VALID; + + in order to get correct behavior on all platforms. + + Also, on some platforms your driver may need to flush CPU write + buffers in much the same way as it needs to flush write buffers + found in PCI bridges (such as by reading a register's value + after writing it). + +- Streaming DMA mappings which are usually mapped for one DMA transfer, + unmapped right after it (unless you use pci_dma_sync_* below) and for which + hardware can optimize for sequential accesses. + + This of "streaming" as "asynchronous" or "outside the coherency + domain". + + Good examples of what to use streaming mappings for are: + + - Networking buffers transmitted/received by a device. + - Filesystem buffers written/read by a SCSI device. + + The interfaces for using this type of mapping were designed in + such a way that an implementation can make whatever performance + optimizations the hardware allows. To this end, when using + such mappings you must be explicit about what you want to happen. + +Neither type of DMA mapping has alignment restrictions that come +from PCI, although some devices may have such restrictions. +Also, systems with caches that aren't DMA-coherent will work better +when the underlying buffers don't share cache lines with other data. + + + Using Consistent DMA mappings. + +To allocate and map large (PAGE_SIZE or so) consistent DMA regions, +you should do: + + dma_addr_t dma_handle; + + cpu_addr = pci_alloc_consistent(pdev, size, &dma_handle); + +where pdev is a struct pci_dev *. This may be called in interrupt context. +You should use dma_alloc_coherent (see DMA-API.txt) for buses +where devices don't have struct pci_dev (like ISA, EISA). + +This argument is needed because the DMA translations may be bus +specific (and often is private to the bus which the device is attached +to). + +Size is the length of the region you want to allocate, in bytes. + +This routine will allocate RAM for that region, so it acts similarly to +__get_free_pages (but takes size instead of a page order). If your +driver needs regions sized smaller than a page, you may prefer using +the pci_pool interface, described below. + +The consistent DMA mapping interfaces, for non-NULL pdev, will by +default return a DMA address which is SAC (Single Address Cycle) +addressable. Even if the device indicates (via PCI dma mask) that it +may address the upper 32-bits and thus perform DAC cycles, consistent +allocation will only return > 32-bit PCI addresses for DMA if the +consistent dma mask has been explicitly changed via +pci_set_consistent_dma_mask(). This is true of the pci_pool interface +as well. + +pci_alloc_consistent returns two values: the virtual address which you +can use to access it from the CPU and dma_handle which you pass to the +card. + +The cpu return address and the DMA bus master address are both +guaranteed to be aligned to the smallest PAGE_SIZE order which +is greater than or equal to the requested size. This invariant +exists (for example) to guarantee that if you allocate a chunk +which is smaller than or equal to 64 kilobytes, the extent of the +buffer you receive will not cross a 64K boundary. + +To unmap and free such a DMA region, you call: + + pci_free_consistent(pdev, size, cpu_addr, dma_handle); + +where pdev, size are the same as in the above call and cpu_addr and +dma_handle are the values pci_alloc_consistent returned to you. +This function may not be called in interrupt context. + +If your driver needs lots of smaller memory regions, you can write +custom code to subdivide pages returned by pci_alloc_consistent, +or you can use the pci_pool API to do that. A pci_pool is like +a kmem_cache, but it uses pci_alloc_consistent not __get_free_pages. +Also, it understands common hardware constraints for alignment, +like queue heads needing to be aligned on N byte boundaries. + +Create a pci_pool like this: + + struct pci_pool *pool; + + pool = pci_pool_create(name, pdev, size, align, alloc); + +The "name" is for diagnostics (like a kmem_cache name); pdev and size +are as above. The device's hardware alignment requirement for this +type of data is "align" (which is expressed in bytes, and must be a +power of two). If your device has no boundary crossing restrictions, +pass 0 for alloc; passing 4096 says memory allocated from this pool +must not cross 4KByte boundaries (but at that time it may be better to +go for pci_alloc_consistent directly instead). + +Allocate memory from a pci pool like this: + + cpu_addr = pci_pool_alloc(pool, flags, &dma_handle); + +flags are SLAB_KERNEL if blocking is permitted (not in_interrupt nor +holding SMP locks), SLAB_ATOMIC otherwise. Like pci_alloc_consistent, +this returns two values, cpu_addr and dma_handle. + +Free memory that was allocated from a pci_pool like this: + + pci_pool_free(pool, cpu_addr, dma_handle); + +where pool is what you passed to pci_pool_alloc, and cpu_addr and +dma_handle are the values pci_pool_alloc returned. This function +may be called in interrupt context. + +Destroy a pci_pool by calling: + + pci_pool_destroy(pool); + +Make sure you've called pci_pool_free for all memory allocated +from a pool before you destroy the pool. This function may not +be called in interrupt context. + + DMA Direction + +The interfaces described in subsequent portions of this document +take a DMA direction argument, which is an integer and takes on +one of the following values: + + PCI_DMA_BIDIRECTIONAL + PCI_DMA_TODEVICE + PCI_DMA_FROMDEVICE + PCI_DMA_NONE + +One should provide the exact DMA direction if you know it. + +PCI_DMA_TODEVICE means "from main memory to the PCI device" +PCI_DMA_FROMDEVICE means "from the PCI device to main memory" +It is the direction in which the data moves during the DMA +transfer. + +You are _strongly_ encouraged to specify this as precisely +as you possibly can. + +If you absolutely cannot know the direction of the DMA transfer, +specify PCI_DMA_BIDIRECTIONAL. It means that the DMA can go in +either direction. The platform guarantees that you may legally +specify this, and that it will work, but this may be at the +cost of performance for example. + +The value PCI_DMA_NONE is to be used for debugging. One can +hold this in a data structure before you come to know the +precise direction, and this will help catch cases where your +direction tracking logic has failed to set things up properly. + +Another advantage of specifying this value precisely (outside of +potential platform-specific optimizations of such) is for debugging. +Some platforms actually have a write permission boolean which DMA +mappings can be marked with, much like page protections in the user +program address space. Such platforms can and do report errors in the +kernel logs when the PCI controller hardware detects violation of the +permission setting. + +Only streaming mappings specify a direction, consistent mappings +implicitly have a direction attribute setting of +PCI_DMA_BIDIRECTIONAL. + +The SCSI subsystem tells you the direction to use in the +'sc_data_direction' member of the SCSI command your driver is +working on. + +For Networking drivers, it's a rather simple affair. For transmit +packets, map/unmap them with the PCI_DMA_TODEVICE direction +specifier. For receive packets, just the opposite, map/unmap them +with the PCI_DMA_FROMDEVICE direction specifier. + + Using Streaming DMA mappings + +The streaming DMA mapping routines can be called from interrupt +context. There are two versions of each map/unmap, one which will +map/unmap a single memory region, and one which will map/unmap a +scatterlist. + +To map a single region, you do: + + struct pci_dev *pdev = mydev->pdev; + dma_addr_t dma_handle; + void *addr = buffer->ptr; + size_t size = buffer->len; + + dma_handle = pci_map_single(pdev, addr, size, direction); + +and to unmap it: + + pci_unmap_single(pdev, dma_handle, size, direction); + +You should call pci_unmap_single when the DMA activity is finished, e.g. +from the interrupt which told you that the DMA transfer is done. + +Using cpu pointers like this for single mappings has a disadvantage, +you cannot reference HIGHMEM memory in this way. Thus, there is a +map/unmap interface pair akin to pci_{map,unmap}_single. These +interfaces deal with page/offset pairs instead of cpu pointers. +Specifically: + + struct pci_dev *pdev = mydev->pdev; + dma_addr_t dma_handle; + struct page *page = buffer->page; + unsigned long offset = buffer->offset; + size_t size = buffer->len; + + dma_handle = pci_map_page(pdev, page, offset, size, direction); + + ... + + pci_unmap_page(pdev, dma_handle, size, direction); + +Here, "offset" means byte offset within the given page. + +With scatterlists, you map a region gathered from several regions by: + + int i, count = pci_map_sg(pdev, sglist, nents, direction); + struct scatterlist *sg; + + for_each_sg(sglist, sg, count, i) { + hw_address[i] = sg_dma_address(sg); + hw_len[i] = sg_dma_len(sg); + } + +where nents is the number of entries in the sglist. + +The implementation is free to merge several consecutive sglist entries +into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any +consecutive sglist entries can be merged into one provided the first one +ends and the second one starts on a page boundary - in fact this is a huge +advantage for cards which either cannot do scatter-gather or have very +limited number of scatter-gather entries) and returns the actual number +of sg entries it mapped them to. On failure 0 is returned. + +Then you should loop count times (note: this can be less than nents times) +and use sg_dma_address() and sg_dma_len() macros where you previously +accessed sg->address and sg->length as shown above. + +To unmap a scatterlist, just call: + + pci_unmap_sg(pdev, sglist, nents, direction); + +Again, make sure DMA activity has already finished. + +PLEASE NOTE: The 'nents' argument to the pci_unmap_sg call must be + the _same_ one you passed into the pci_map_sg call, + it should _NOT_ be the 'count' value _returned_ from the + pci_map_sg call. + +Every pci_map_{single,sg} call should have its pci_unmap_{single,sg} +counterpart, because the bus address space is a shared resource (although +in some ports the mapping is per each BUS so less devices contend for the +same bus address space) and you could render the machine unusable by eating +all bus addresses. + +If you need to use the same streaming DMA region multiple times and touch +the data in between the DMA transfers, the buffer needs to be synced +properly in order for the cpu and device to see the most uptodate and +correct copy of the DMA buffer. + +So, firstly, just map it with pci_map_{single,sg}, and after each DMA +transfer call either: + + pci_dma_sync_single_for_cpu(pdev, dma_handle, size, direction); + +or: + + pci_dma_sync_sg_for_cpu(pdev, sglist, nents, direction); + +as appropriate. + +Then, if you wish to let the device get at the DMA area again, +finish accessing the data with the cpu, and then before actually +giving the buffer to the hardware call either: + + pci_dma_sync_single_for_device(pdev, dma_handle, size, direction); + +or: + + pci_dma_sync_sg_for_device(dev, sglist, nents, direction); + +as appropriate. + +After the last DMA transfer call one of the DMA unmap routines +pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_* +call till pci_unmap_*, then you don't have to call the pci_dma_sync_* +routines at all. + +Here is pseudo code which shows a situation in which you would need +to use the pci_dma_sync_*() interfaces. + + my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len) + { + dma_addr_t mapping; + + mapping = pci_map_single(cp->pdev, buffer, len, PCI_DMA_FROMDEVICE); + + cp->rx_buf = buffer; + cp->rx_len = len; + cp->rx_dma = mapping; + + give_rx_buf_to_card(cp); + } + + ... + + my_card_interrupt_handler(int irq, void *devid, struct pt_regs *regs) + { + struct my_card *cp = devid; + + ... + if (read_card_status(cp) == RX_BUF_TRANSFERRED) { + struct my_card_header *hp; + + /* Examine the header to see if we wish + * to accept the data. But synchronize + * the DMA transfer with the CPU first + * so that we see updated contents. + */ + pci_dma_sync_single_for_cpu(cp->pdev, cp->rx_dma, + cp->rx_len, + PCI_DMA_FROMDEVICE); + + /* Now it is safe to examine the buffer. */ + hp = (struct my_card_header *) cp->rx_buf; + if (header_is_ok(hp)) { + pci_unmap_single(cp->pdev, cp->rx_dma, cp->rx_len, + PCI_DMA_FROMDEVICE); + pass_to_upper_layers(cp->rx_buf); + make_and_setup_new_rx_buf(cp); + } else { + /* Just sync the buffer and give it back + * to the card. + */ + pci_dma_sync_single_for_device(cp->pdev, + cp->rx_dma, + cp->rx_len, + PCI_DMA_FROMDEVICE); + give_rx_buf_to_card(cp); + } + } + } + +Drivers converted fully to this interface should not use virt_to_bus any +longer, nor should they use bus_to_virt. Some drivers have to be changed a +little bit, because there is no longer an equivalent to bus_to_virt in the +dynamic DMA mapping scheme - you have to always store the DMA addresses +returned by the pci_alloc_consistent, pci_pool_alloc, and pci_map_single +calls (pci_map_sg stores them in the scatterlist itself if the platform +supports dynamic DMA mapping in hardware) in your driver structures and/or +in the card registers. + +All PCI drivers should be using these interfaces with no exceptions. +It is planned to completely remove virt_to_bus() and bus_to_virt() as +they are entirely deprecated. Some ports already do not provide these +as it is impossible to correctly support them. + + Optimizing Unmap State Space Consumption + +On many platforms, pci_unmap_{single,page}() is simply a nop. +Therefore, keeping track of the mapping address and length is a waste +of space. Instead of filling your drivers up with ifdefs and the like +to "work around" this (which would defeat the whole purpose of a +portable API) the following facilities are provided. + +Actually, instead of describing the macros one by one, we'll +transform some example code. + +1) Use DECLARE_PCI_UNMAP_{ADDR,LEN} in state saving structures. + Example, before: + + struct ring_state { + struct sk_buff *skb; + dma_addr_t mapping; + __u32 len; + }; + + after: + + struct ring_state { + struct sk_buff *skb; + DECLARE_PCI_UNMAP_ADDR(mapping) + DECLARE_PCI_UNMAP_LEN(len) + }; + + NOTE: DO NOT put a semicolon at the end of the DECLARE_*() + macro. + +2) Use pci_unmap_{addr,len}_set to set these values. + Example, before: + + ringp->mapping = FOO; + ringp->len = BAR; + + after: + + pci_unmap_addr_set(ringp, mapping, FOO); + pci_unmap_len_set(ringp, len, BAR); + +3) Use pci_unmap_{addr,len} to access these values. + Example, before: + + pci_unmap_single(pdev, ringp->mapping, ringp->len, + PCI_DMA_FROMDEVICE); + + after: + + pci_unmap_single(pdev, + pci_unmap_addr(ringp, mapping), + pci_unmap_len(ringp, len), + PCI_DMA_FROMDEVICE); + +It really should be self-explanatory. We treat the ADDR and LEN +separately, because it is possible for an implementation to only +need the address in order to perform the unmap operation. + + Platform Issues + +If you are just writing drivers for Linux and do not maintain +an architecture port for the kernel, you can safely skip down +to "Closing". + +1) Struct scatterlist requirements. + + Struct scatterlist must contain, at a minimum, the following + members: + + struct page *page; + unsigned int offset; + unsigned int length; + + The base address is specified by a "page+offset" pair. + + Previous versions of struct scatterlist contained a "void *address" + field that was sometimes used instead of page+offset. As of Linux + 2.5., page+offset is always used, and the "address" field has been + deleted. + +2) More to come... + + Handling Errors + +DMA address space is limited on some architectures and an allocation +failure can be determined by: + +- checking if pci_alloc_consistent returns NULL or pci_map_sg returns 0 + +- checking the returned dma_addr_t of pci_map_single and pci_map_page + by using pci_dma_mapping_error(): + + dma_addr_t dma_handle; + + dma_handle = pci_map_single(pdev, addr, size, direction); + if (pci_dma_mapping_error(pdev, dma_handle)) { + /* + * reduce current DMA mapping usage, + * delay and try again later or + * reset driver. + */ + } + + Closing + +This document, and the API itself, would not be in it's current +form without the feedback and suggestions from numerous individuals. +We would like to specifically mention, in no particular order, the +following people: + + Russell King + Leo Dagum + Ralf Baechle + Grant Grundler + Jay Estabrook + Thomas Sailer + Andrea Arcangeli + Jens Axboe + David Mosberger-Tang diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt index 8d2158a..6fab97e 100644 --- a/Documentation/block/biodoc.txt +++ b/Documentation/block/biodoc.txt @@ -186,7 +186,7 @@ a virtual address mapping (unlike the earlier scheme of virtual address do not have a corresponding kernel virtual address space mapping) and low-memory pages. -Note: Please refer to Documentation/DMA-mapping.txt for a discussion +Note: Please refer to Documentation/PCI/PCI-DMA-mapping.txt for a discussion on PCI high mem DMA aspects and mapping of scatter gather lists, and support for 64 bit PCI. -- cgit v1.1 From c5114a1cd6d84b2b3144c1c3e093c80ca6c30f47 Mon Sep 17 00:00:00 2001 From: Clemens Ladisch Date: Sun, 10 Jan 2010 20:52:34 +0100 Subject: hwmon: (k10temp) Blacklist more family 10h processors The latest version of the Revision Guide for AMD Family 10h Processors lists two more processor revisions which may be affected by erratum 319. Change the blacklisting code to correctly detect those processors, by implementing AMD's recommended algorithm. Signed-off-by: Clemens Ladisch Signed-off-by: Jean Delvare Cc: Andreas Herrmann --- Documentation/hwmon/k10temp | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/hwmon/k10temp b/Documentation/hwmon/k10temp index a7a18d4..6526eee 100644 --- a/Documentation/hwmon/k10temp +++ b/Documentation/hwmon/k10temp @@ -3,8 +3,8 @@ Kernel driver k10temp Supported chips: * AMD Family 10h processors: - Socket F: Quad-Core/Six-Core/Embedded Opteron - Socket AM2+: Opteron, Phenom (II) X3/X4 + Socket F: Quad-Core/Six-Core/Embedded Opteron (but see below) + Socket AM2+: Quad-Core Opteron, Phenom (II) X3/X4, Athlon X2 (but see below) Socket AM3: Quad-Core Opteron, Athlon/Phenom II X2/X3/X4, Sempron II Socket S1G3: Athlon II, Sempron, Turion II * AMD Family 11h processors: @@ -36,10 +36,15 @@ Description This driver permits reading of the internal temperature sensor of AMD Family 10h and 11h processors. -All these processors have a sensor, but on older revisions of Family 10h -processors, the sensor may return inconsistent values (erratum 319). The -driver will refuse to load on these revisions unless you specify the -"force=1" module parameter. +All these processors have a sensor, but on those for Socket F or AM2+, +the sensor may return inconsistent values (erratum 319). The driver +will refuse to load on these revisions unless you specify the "force=1" +module parameter. + +Due to technical reasons, the driver can detect only the mainboard's +socket type, not the processor's actual capabilities. Therefore, if you +are using an AM3 processor on an AM2+ mainboard, you can safely use the +"force=1" parameter. There is one temperature measurement value, available as temp1_input in sysfs. It is measured in degrees Celsius with a resolution of 1/8th degree. -- cgit v1.1 From cb5a8b2c92febbed57126e1b8416dfd7607ff03d Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 8 Jan 2010 14:42:34 -0800 Subject: docs: large update to ioctl-number.txt Add many ioctl definitions to ioctl-number.txt. Fix some whitespace/formatting. Correct some filenames/paths. Signed-off-by: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/ioctl/ioctl-number.txt | 203 +++++++++++++++++++++++++++-------- 1 file changed, 159 insertions(+), 44 deletions(-) (limited to 'Documentation') diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt index 9473749..35cf64d 100644 --- a/Documentation/ioctl/ioctl-number.txt +++ b/Documentation/ioctl/ioctl-number.txt @@ -56,10 +56,11 @@ Following this convention is good because: (5) When following the convention, the driver code can use generic code to copy the parameters between user and kernel space. -This table lists ioctls visible from user land for Linux/i386. It contains -most drivers up to 2.3.14, but I know I am missing some. +This table lists ioctls visible from user land for Linux/x86. It contains +most drivers up to 2.6.31, but I know I am missing some. There has been +no attempt to list non-X86 architectures or ioctls from drivers/staging/. -Code Seq# Include File Comments +Code Seq#(hex) Include File Comments ======================================================== 0x00 00-1F linux/fs.h conflict! 0x00 00-1F scsi/scsi_ioctl.h conflict! @@ -69,119 +70,228 @@ Code Seq# Include File Comments 0x03 all linux/hdreg.h 0x04 D2-DC linux/umsdos_fs.h Dead since 2.6.11, but don't reuse these. 0x06 all linux/lp.h -0x09 all linux/md.h +0x09 all linux/raid/md_u.h +0x10 00-0F drivers/char/s390/vmcp.h 0x12 all linux/fs.h linux/blkpg.h 0x1b all InfiniBand Subsystem 0x20 all drivers/cdrom/cm206.h 0x22 all scsi/sg.h '#' 00-3F IEEE 1394 Subsystem Block for the entire subsystem +'$' 00-0F linux/perf_counter.h, linux/perf_event.h '1' 00-1F PPS kit from Ulrich Windl +'2' 01-04 linux/i2o.h +'3' 00-0F drivers/s390/char/raw3270.h conflict! +'3' 00-1F linux/suspend_ioctls.h conflict! + and kernel/power/user.c '8' all SNP8023 advanced NIC card -'A' 00-1F linux/apm_bios.h +'@' 00-0F linux/radeonfb.h conflict! +'@' 00-0F drivers/video/aty/aty128fb.c conflict! +'A' 00-1F linux/apm_bios.h conflict! +'A' 00-0F linux/agpgart.h conflict! + and drivers/char/agp/compat_ioctl.h +'A' 00-7F sound/asound.h conflict! +'B' 00-1F linux/cciss_ioctl.h conflict! +'B' 00-0F include/linux/pmu.h conflict! 'B' C0-FF advanced bbus -'C' all linux/soundcard.h +'C' all linux/soundcard.h conflict! +'C' 01-2F linux/capi.h conflict! +'C' F0-FF drivers/net/wan/cosa.h conflict! 'D' all arch/s390/include/asm/dasd.h -'E' all linux/input.h -'F' all linux/fb.h -'H' all linux/hiddev.h -'I' all linux/isdn.h +'D' 40-5F drivers/scsi/dpt/dtpi_ioctl.h +'D' 05 drivers/scsi/pmcraid.h +'E' all linux/input.h conflict! +'E' 00-0F xen/evtchn.h conflict! +'F' all linux/fb.h conflict! +'F' 01-02 drivers/scsi/pmcraid.h conflict! +'F' 20 drivers/video/fsl-diu-fb.h conflict! +'F' 20 drivers/video/intelfb/intelfb.h conflict! +'F' 20 linux/ivtvfb.h conflict! +'F' 20 linux/matroxfb.h conflict! +'F' 20 drivers/video/aty/atyfb_base.c conflict! +'F' 00-0F video/da8xx-fb.h conflict! +'F' 80-8F linux/arcfb.h conflict! +'F' DD video/sstfb.h conflict! +'G' 00-3F drivers/misc/sgi-gru/grulib.h conflict! +'G' 00-0F linux/gigaset_dev.h conflict! +'H' 00-7F linux/hiddev.h conflict! +'H' 00-0F linux/hidraw.h conflict! +'H' 00-0F sound/asound.h conflict! +'H' 20-40 sound/asound_fm.h conflict! +'H' 80-8F sound/sfnt_info.h conflict! +'H' 10-8F sound/emu10k1.h conflict! +'H' 10-1F sound/sb16_csp.h conflict! +'H' 10-1F sound/hda_hwdep.h conflict! +'H' 40-4F sound/hdspm.h conflict! +'H' 40-4F sound/hdsp.h conflict! +'H' 90 sound/usb/usx2y/usb_stream.h +'H' C0-F0 net/bluetooth/hci.h conflict! +'H' C0-DF net/bluetooth/hidp/hidp.h conflict! +'H' C0-DF net/bluetooth/cmtp/cmtp.h conflict! +'H' C0-DF net/bluetooth/bnep/bnep.h conflict! +'I' all linux/isdn.h conflict! +'I' 00-0F drivers/isdn/divert/isdn_divert.h conflict! +'I' 40-4F linux/mISDNif.h conflict! 'J' 00-1F drivers/scsi/gdth_ioctl.h 'K' all linux/kd.h -'L' 00-1F linux/loop.h -'L' 20-2F driver/usb/misc/vstusb.h +'L' 00-1F linux/loop.h conflict! +'L' 10-1F drivers/scsi/mpt2sas/mpt2sas_ctl.h conflict! +'L' 20-2F linux/usb/vstusb.h 'L' E0-FF linux/ppdd.h encrypted disk device driver -'M' all linux/soundcard.h +'M' all linux/soundcard.h conflict! +'M' 01-16 mtd/mtd-abi.h conflict! + and drivers/mtd/mtdchar.c +'M' 01-03 drivers/scsi/megaraid/megaraid_sas.h +'M' 00-0F drivers/video/fsl-diu-fb.h conflict! 'N' 00-1F drivers/usb/scanner.h -'O' 00-02 include/mtd/ubi-user.h UBI -'P' all linux/soundcard.h +'O' 00-06 mtd/ubi-user.h UBI +'P' all linux/soundcard.h conflict! +'P' 60-6F sound/sscape_ioctl.h conflict! +'P' 00-0F drivers/usb/class/usblp.c conflict! 'Q' all linux/soundcard.h -'R' 00-1F linux/random.h +'R' 00-1F linux/random.h conflict! +'R' 01 linux/rfkill.h conflict! +'R' 01-0F media/rds.h conflict! +'R' C0-DF net/bluetooth/rfcomm.h 'S' all linux/cdrom.h conflict! 'S' 80-81 scsi/scsi_ioctl.h conflict! 'S' 82-FF scsi/scsi.h conflict! +'S' 00-7F sound/asequencer.h conflict! 'T' all linux/soundcard.h conflict! +'T' 00-AF sound/asound.h conflict! 'T' all arch/x86/include/asm/ioctls.h conflict! -'U' 00-EF linux/drivers/usb/usb.h -'V' all linux/vt.h +'T' C0-DF linux/if_tun.h conflict! +'U' all sound/asound.h conflict! +'U' 00-0F drivers/media/video/uvc/uvcvideo.h conflict! +'U' 00-CF linux/uinput.h conflict! +'U' 00-EF linux/usbdevice_fs.h +'U' C0-CF drivers/bluetooth/hci_uart.h +'V' all linux/vt.h conflict! +'V' all linux/videodev2.h conflict! +'V' C0 linux/ivtvfb.h conflict! +'V' C0 linux/ivtv.h conflict! +'V' C0 media/davinci/vpfe_capture.h conflict! +'V' C0 media/si4713.h conflict! +'V' C0-CF drivers/media/video/mxb.h conflict! 'W' 00-1F linux/watchdog.h conflict! 'W' 00-1F linux/wanrouter.h conflict! -'X' all linux/xfs_fs.h +'W' 00-3F sound/asound.h conflict! +'X' all fs/xfs/xfs_fs.h conflict! + and fs/xfs/linux-2.6/xfs_ioctl32.h + and include/linux/falloc.h + and linux/fs.h +'X' all fs/ocfs2/ocfs_fs.h conflict! +'X' 01 linux/pktcdvd.h conflict! 'Y' all linux/cyclades.h -'[' 00-07 linux/usb/usbtmc.h USB Test and Measurement Devices +'Z' 14-15 drivers/message/fusion/mptctl.h +'[' 00-07 linux/usb/tmc.h USB Test and Measurement Devices -'a' all ATM on linux +'a' all linux/atm*.h, linux/sonet.h ATM on linux -'b' 00-FF bit3 vme host bridge +'b' 00-FF conflict! bit3 vme host bridge +'b' 00-0F media/bt819.h conflict! +'c' all linux/cm4000_cs.h conflict! 'c' 00-7F linux/comstats.h conflict! 'c' 00-7F linux/coda.h conflict! -'c' 80-9F arch/s390/include/asm/chsc.h -'c' A0-AF arch/x86/include/asm/msr.h +'c' 00-1F linux/chio.h conflict! +'c' 80-9F arch/s390/include/asm/chsc.h conflict! +'c' A0-AF arch/x86/include/asm/msr.h conflict! 'd' 00-FF linux/char/drm/drm/h conflict! +'d' 02-40 pcmcia/ds.h conflict! +'d' 10-3F drivers/media/video/dabusb.h conflict! +'d' C0-CF drivers/media/video/saa7191.h conflict! 'd' F0-FF linux/digi1.h 'e' all linux/digi1.h conflict! -'e' 00-1F net/irda/irtty.h conflict! -'f' 00-1F linux/ext2_fs.h -'h' 00-7F Charon filesystem +'e' 00-1F drivers/net/irda/irtty-sir.h conflict! +'f' 00-1F linux/ext2_fs.h conflict! +'f' 00-1F linux/ext3_fs.h conflict! +'f' 00-0F fs/jfs/jfs_dinode.h conflict! +'f' 00-0F fs/ext4/ext4.h conflict! +'f' 00-0F linux/fs.h conflict! +'f' 00-0F fs/ocfs2/ocfs2_fs.h conflict! +'g' 00-0F linux/usb/gadgetfs.h +'g' 20-2F linux/usb/g_printer.h +'h' 00-7F conflict! Charon filesystem -'i' 00-3F linux/i2o.h +'h' 00-1F linux/hpet.h conflict! +'i' 00-3F linux/i2o-dev.h conflict! +'i' 0B-1F linux/ipmi.h conflict! +'i' 80-8F linux/i8k.h 'j' 00-3F linux/joystick.h +'k' 00-0F linux/spi/spidev.h conflict! +'k' 00-05 video/kyro.h conflict! 'l' 00-3F linux/tcfs_fs.h transparent cryptographic file system 'l' 40-7F linux/udf_fs_i.h in development: -'m' 00-09 linux/mmtimer.h +'m' 00-09 linux/mmtimer.h conflict! 'm' all linux/mtio.h conflict! 'm' all linux/soundcard.h conflict! 'm' all linux/synclink.h conflict! +'m' 00-19 drivers/message/fusion/mptctl.h conflict! +'m' 00 drivers/scsi/megaraid/megaraid_ioctl.h conflict! 'm' 00-1F net/irda/irmod.h conflict! -'n' 00-7F linux/ncp_fs.h +'n' 00-7F linux/ncp_fs.h and fs/ncpfs/ioctl.c 'n' 80-8F linux/nilfs2_fs.h NILFS2 -'n' E0-FF video/matrox.h matroxfb +'n' E0-FF linux/matroxfb.h matroxfb 'o' 00-1F fs/ocfs2/ocfs2_fs.h OCFS2 -'o' 00-03 include/mtd/ubi-user.h conflict! (OCFS2 and UBI overlaps) -'o' 40-41 include/mtd/ubi-user.h UBI -'o' 01-A1 include/linux/dvb/*.h DVB +'o' 00-03 mtd/ubi-user.h conflict! (OCFS2 and UBI overlaps) +'o' 40-41 mtd/ubi-user.h UBI +'o' 01-A1 linux/dvb/*.h DVB 'p' 00-0F linux/phantom.h conflict! (OpenHaptics needs this) +'p' 00-1F linux/rtc.h conflict! 'p' 00-3F linux/mc146818rtc.h conflict! 'p' 40-7F linux/nvram.h -'p' 80-9F user-space parport +'p' 80-9F linux/ppdev.h user-space parport -'p' a1-a4 linux/pps.h LinuxPPS +'p' A1-A4 linux/pps.h LinuxPPS 'q' 00-1F linux/serio.h -'q' 80-FF Internet PhoneJACK, Internet LineJACK - -'r' 00-1F linux/msdos_fs.h +'q' 80-FF linux/telephony.h Internet PhoneJACK, Internet LineJACK + linux/ixjuser.h +'r' 00-1F linux/msdos_fs.h and fs/fat/dir.c 's' all linux/cdk.h 't' 00-7F linux/if_ppp.h 't' 80-8F linux/isdn_ppp.h +'t' 90 linux/toshiba.h 'u' 00-1F linux/smb_fs.h -'v' 00-1F linux/ext2_fs.h conflict! 'v' all linux/videodev.h conflict! +'v' 00-1F linux/ext2_fs.h conflict! +'v' 00-1F linux/fs.h conflict! +'v' 00-0F linux/sonypi.h conflict! +'v' C0-CF drivers/media/video/ov511.h conflict! +'v' C0-DF media/pwc-ioctl.h conflict! +'v' C0-FF linux/meye.h conflict! +'v' C0-CF drivers/media/video/zoran/zoran.h conflict! +'v' D0-DF drivers/media/video/cpia2/cpia2dev.h conflict! 'w' all CERN SCI driver 'y' 00-1F packet based user level communications -'z' 00-3F CAN bus card +'z' 00-3F CAN bus card conflict! -'z' 40-7F CAN bus card +'z' 40-7F CAN bus card conflict! +'z' 10-4F drivers/s390/crypto/zcrypt_api.h conflict! 0x80 00-1F linux/fb.h 0x81 00-1F linux/videotext.h +0x88 00-3F media/ovcamchip.h 0x89 00-06 arch/x86/include/asm/sockios.h 0x89 0B-DF linux/sockios.h 0x89 E0-EF linux/sockios.h SIOCPROTOPRIVATE range +0x89 E0-EF linux/dn.h PROTOPRIVATE range 0x89 F0-FF linux/sockios.h SIOCDEVPRIVATE range 0x8B all linux/wireless.h 0x8C 00-3F WiNRADiO driver 0x90 00 drivers/cdrom/sbpcd.h +0x92 00-0F drivers/usb/mon/mon_bin.c 0x93 60-7F linux/auto_fs.h +0x94 all fs/btrfs/ioctl.h 0x99 00-0F 537-Addinboard driver 0xA0 all linux/sdp/sdp.h Industrial Device Project @@ -192,17 +302,22 @@ Code Seq# Include File Comments 0xAB 00-1F linux/nbd.h 0xAC 00-1F linux/raw.h 0xAD 00 Netfilter device in development: - + 0xAE all linux/kvm.h Kernel-based Virtual Machine 0xB0 all RATIO devices in development: 0xB1 00-1F PPPoX +0xC0 00-0F linux/usb/iowarrior.h 0xCB 00-1F CBM serial IEC bus in development: +0xCD 01 linux/reiserfs_fs.h +0xCF 02 fs/cifs/ioctl.c +0xDB 00-0F drivers/char/mwave/mwavepub.h 0xDD 00-3F ZFCP device driver see drivers/s390/scsi/ -0xF3 00-3F video/sisfb.h sisfb (in development) +0xF3 00-3F drivers/usb/misc/sisusbvga/sisusb.h sisfb (in development) 0xF4 00-1F video/mbxfb.h mbxfb +0xFD all linux/dm-ioctl.h -- cgit v1.1 From 1306d603fcf1f6682f8575d1ff23631a24184b21 Mon Sep 17 00:00:00 2001 From: KOSAKI Motohiro Date: Fri, 8 Jan 2010 14:42:56 -0800 Subject: proc: partially revert "procfs: provide stack information for threads" Commit d899bf7b (procfs: provide stack information for threads) introduced to show stack information in /proc/{pid}/status. But it cause large performance regression. Unfortunately /proc/{pid}/status is used ps command too and ps is one of most important component. Because both to take mmap_sem and page table walk are heavily operation. If many process run, the ps performance is, [before d899bf7b] % perf stat ps >/dev/null Performance counter stats for 'ps': 4090.435806 task-clock-msecs # 0.032 CPUs 229 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 234 page-faults # 0.000 M/sec 8587565207 cycles # 2099.425 M/sec 9866662403 instructions # 1.149 IPC 3789415411 cache-references # 926.409 M/sec 30419509 cache-misses # 7.437 M/sec 128.859521955 seconds time elapsed [after d899bf7b] % perf stat ps > /dev/null Performance counter stats for 'ps': 4305.081146 task-clock-msecs # 0.028 CPUs 480 context-switches # 0.000 M/sec 2 CPU-migrations # 0.000 M/sec 237 page-faults # 0.000 M/sec 9021211334 cycles # 2095.480 M/sec 10605887536 instructions # 1.176 IPC 3612650999 cache-references # 839.160 M/sec 23917502 cache-misses # 5.556 M/sec 152.277819582 seconds time elapsed Thus, this patch revert it. Fortunately /proc/{pid}/task/{tid}/smaps provide almost same information. we can use it. Commit d899bf7b introduced two features: 1) Add the annotattion of [thread stack: xxxx] mark to /proc/{pid}/task/{tid}/maps. 2) Add StackUsage field to /proc/{pid}/status. I only revert (2), because I haven't seen (1) cause regression. Signed-off-by: KOSAKI Motohiro Cc: Stefani Seibold Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Alexey Dobriyan Cc: "Eric W. Biederman" Cc: Randy Dunlap Cc: Andrew Morton Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 2 -- 1 file changed, 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 220cc63..0d07513 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -177,7 +177,6 @@ read the file /proc/PID/status: CapBnd: ffffffffffffffff voluntary_ctxt_switches: 0 nonvoluntary_ctxt_switches: 1 - Stack usage: 12 kB This shows you nearly the same information you would get if you viewed it with the ps command. In fact, ps uses the proc file system to obtain its @@ -231,7 +230,6 @@ Table 1-2: Contents of the statm files (as of 2.6.30-rc7) Mems_allowed_list Same as previous, but in "list format" voluntary_ctxt_switches number of voluntary context switches nonvoluntary_ctxt_switches number of non voluntary context switches - Stack usage: stack usage high water mark (round up to page size) .............................................................................. Table 1-3: Contents of the statm files (as of 2.6.8-rc3) -- cgit v1.1 From b5430a04e995081a308b4419bd0940f2badc6e6b Mon Sep 17 00:00:00 2001 From: Tomaz Mertelj Date: Fri, 8 Jan 2010 14:43:04 -0800 Subject: hwmon: driver for Texas Instruments amc6821 chip Signed-off-by: Cc: Jean Delvare Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/hwmon/amc6821 | 102 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 Documentation/hwmon/amc6821 (limited to 'Documentation') diff --git a/Documentation/hwmon/amc6821 b/Documentation/hwmon/amc6821 new file mode 100644 index 0000000..ced8359 --- /dev/null +++ b/Documentation/hwmon/amc6821 @@ -0,0 +1,102 @@ +Kernel driver amc6821 +===================== + +Supported chips: + Texas Instruments AMC6821 + Prefix: 'amc6821' + Addresses scanned: 0x18, 0x19, 0x1a, 0x2c, 0x2d, 0x2e, 0x4c, 0x4d, 0x4e + Datasheet: http://focus.ti.com/docs/prod/folders/print/amc6821.html + +Authors: + Tomaz Mertelj + + +Description +----------- + +This driver implements support for the Texas Instruments amc6821 chip. +The chip has one on-chip and one remote temperature sensor and one pwm fan +regulator. +The pwm can be controlled either from software or automatically. + +The driver provides the following sensor accesses in sysfs: + +temp1_input ro on-chip temperature +temp1_min rw " +temp1_max rw " +temp1_crit rw " +temp1_min_alarm ro " +temp1_max_alarm ro " +temp1_crit_alarm ro " + +temp2_input ro remote temperature +temp2_min rw " +temp2_max rw " +temp2_crit rw " +temp2_min_alarm ro " +temp2_max_alarm ro " +temp2_crit_alarm ro " +temp2_fault ro " + +fan1_input ro tachometer speed +fan1_min rw " +fan1_max rw " +fan1_fault ro " +fan1_div rw Fan divisor can be either 2 or 4. + +pwm1 rw pwm1 +pwm1_enable rw regulator mode, 1=open loop, 2=fan controlled + by remote temperature, 3=fan controlled by + combination of the on-chip temperature and + remote-sensor temperature, +pwm1_auto_channels_temp ro 1 if pwm_enable==2, 3 if pwm_enable==3 +pwm1_auto_point1_pwm ro Hardwired to 0, shared for both + temperature channels. +pwm1_auto_point2_pwm rw This value is shared for both temperature + channels. +pwm1_auto_point3_pwm rw Hardwired to 255, shared for both + temperature channels. + +temp1_auto_point1_temp ro Hardwired to temp2_auto_point1_temp + which is rw. Below this temperature fan stops. +temp1_auto_point2_temp rw The low-temperature limit of the proportional + range. Below this temperature + pwm1 = pwm1_auto_point2_pwm. It can go from + 0 degree C to 124 degree C in steps of + 4 degree C. Read it out after writing to get + the actual value. +temp1_auto_point3_temp rw Above this temperature fan runs at maximum + speed. It can go from temp1_auto_point2_temp. + It can only have certain discrete values + which depend on temp1_auto_point2_temp and + pwm1_auto_point2_pwm. Read it out after + writing to get the actual value. + +temp2_auto_point1_temp rw Must be between 0 degree C and 63 degree C and + it defines the passive cooling temperature. + Below this temperature the fan stops in + the closed loop mode. +temp2_auto_point2_temp rw The low-temperature limit of the proportional + range. Below this temperature + pwm1 = pwm1_auto_point2_pwm. It can go from + 0 degree C to 124 degree C in steps + of 4 degree C. + +temp2_auto_point3_temp rw Above this temperature fan runs at maximum + speed. It can only have certain discrete + values which depend on temp2_auto_point2_temp + and pwm1_auto_point2_pwm. Read it out after + writing to get actual value. + + +Module parameters +----------------- + +If your board has a BIOS that initializes the amc6821 correctly, you should +load the module with: init=0. + +If your board BIOS doesn't initialize the chip, or you want +different settings, you can set the following parameters: +init=1, +pwminv: 0 default pwm output, 1 inverts pwm output. + -- cgit v1.1 From 006b4298f26984d514546fe4e53371761f66b643 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 8 Jan 2010 14:43:07 -0800 Subject: Documentation: update ring-buffer-design.txt Fix typos, grammos, spellos, hyphenation. Signed-off-by: Randy Dunlap Acked-by: Steven Rostedt Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/trace/ring-buffer-design.txt | 56 +++++++++++++++--------------- 1 file changed, 28 insertions(+), 28 deletions(-) (limited to 'Documentation') diff --git a/Documentation/trace/ring-buffer-design.txt b/Documentation/trace/ring-buffer-design.txt index 5b1d23d..d299ff3 100644 --- a/Documentation/trace/ring-buffer-design.txt +++ b/Documentation/trace/ring-buffer-design.txt @@ -33,9 +33,9 @@ head_page - a pointer to the page that the reader will use next tail_page - a pointer to the page that will be written to next -commit_page - a pointer to the page with the last finished non nested write. +commit_page - a pointer to the page with the last finished non-nested write. -cmpxchg - hardware assisted atomic transaction that performs the following: +cmpxchg - hardware-assisted atomic transaction that performs the following: A = B iff previous A == C @@ -52,15 +52,15 @@ The Generic Ring Buffer The ring buffer can be used in either an overwrite mode or in producer/consumer mode. -Producer/consumer mode is where the producer were to fill up the +Producer/consumer mode is where if the producer were to fill up the buffer before the consumer could free up anything, the producer will stop writing to the buffer. This will lose most recent events. -Overwrite mode is where the produce were to fill up the buffer +Overwrite mode is where if the producer were to fill up the buffer before the consumer could free up anything, the producer will overwrite the older data. This will lose the oldest events. -No two writers can write at the same time (on the same per cpu buffer), +No two writers can write at the same time (on the same per-cpu buffer), but a writer may interrupt another writer, but it must finish writing before the previous writer may continue. This is very important to the algorithm. The writers act like a "stack". The way interrupts works @@ -79,16 +79,16 @@ the interrupt doing a write as well. Readers can happen at any time. But no two readers may run at the same time, nor can a reader preempt/interrupt another reader. A reader -can not preempt/interrupt a writer, but it may read/consume from the +cannot preempt/interrupt a writer, but it may read/consume from the buffer at the same time as a writer is writing, but the reader must be on another processor to do so. A reader may read on its own processor and can be preempted by a writer. -A writer can preempt a reader, but a reader can not preempt a writer. +A writer can preempt a reader, but a reader cannot preempt a writer. But a reader can read the buffer at the same time (on another processor) as a writer. -The ring buffer is made up of a list of pages held together by a link list. +The ring buffer is made up of a list of pages held together by a linked list. At initialization a reader page is allocated for the reader that is not part of the ring buffer. @@ -102,7 +102,7 @@ the head page. The reader has its own page to use. At start up time, this page is allocated but is not attached to the list. When the reader wants -to read from the buffer, if its page is empty (like it is on start up) +to read from the buffer, if its page is empty (like it is on start-up), it will swap its page with the head_page. The old reader page will become part of the ring buffer and the head_page will be removed. The page after the inserted page (old reader_page) will become the @@ -206,7 +206,7 @@ The main pointers: commit page - the page that last finished a write. -The commit page only is updated by the outer most writer in the +The commit page only is updated by the outermost writer in the writer stack. A writer that preempts another writer will not move the commit page. @@ -281,7 +281,7 @@ with the previous write. The commit pointer points to the last write location that was committed without preempting another write. When a write that preempted another write is committed, it only becomes a pending commit -and will not be a full commit till all writes have been committed. +and will not be a full commit until all writes have been committed. The commit page points to the page that has the last full commit. The tail page points to the page with the last write (before @@ -292,7 +292,7 @@ be several pages ahead. If the tail page catches up to the commit page then no more writes may take place (regardless of the mode of the ring buffer: overwrite and produce/consumer). -The order of pages are: +The order of pages is: head page commit page @@ -311,7 +311,7 @@ Possible scenario: There is a special case that the head page is after either the commit page and possibly the tail page. That is when the commit (and tail) page has been swapped with the reader page. This is because the head page is always -part of the ring buffer, but the reader page is not. When ever there +part of the ring buffer, but the reader page is not. Whenever there has been less than a full page that has been committed inside the ring buffer, and a reader swaps out a page, it will be swapping out the commit page. @@ -338,7 +338,7 @@ and a reader swaps out a page, it will be swapping out the commit page. In this case, the head page will not move when the tail and commit move back into the ring buffer. -The reader can not swap a page into the ring buffer if the commit page +The reader cannot swap a page into the ring buffer if the commit page is still on that page. If the read meets the last commit (real commit not pending or reserved), then there is nothing more to read. The buffer is considered empty until another full commit finishes. @@ -395,7 +395,7 @@ The main idea behind the lockless algorithm is to combine the moving of the head_page pointer with the swapping of pages with the reader. State flags are placed inside the pointer to the page. To do this, each page must be aligned in memory by 4 bytes. This will allow the 2 -least significant bits of the address to be used as flags. Since +least significant bits of the address to be used as flags, since they will always be zero for the address. To get the address, simply mask out the flags. @@ -460,7 +460,7 @@ When the reader tries to swap the page with the ring buffer, it will also use cmpxchg. If the flag bit in the pointer to the head page does not have the HEADER flag set, the compare will fail and the reader will need to look for the new head page and try again. -Note, the flag UPDATE and HEADER are never set at the same time. +Note, the flags UPDATE and HEADER are never set at the same time. The reader swaps the reader page as follows: @@ -539,7 +539,7 @@ updated to the reader page. | +-----------------------------+ | +------------------------------------+ -Another important point. The page that the reader page points back to +Another important point: The page that the reader page points back to by its previous pointer (the one that now points to the new head page) never points back to the reader page. That is because the reader page is not part of the ring buffer. Traversing the ring buffer via the next pointers @@ -572,7 +572,7 @@ not be able to swap the head page from the buffer, nor will it be able to move the head page, until the writer is finished with the move. This eliminates any races that the reader can have on the writer. The reader -must spin, and this is why the reader can not preempt the writer. +must spin, and this is why the reader cannot preempt the writer. tail page | @@ -659,9 +659,9 @@ before pushing the head page. If it is, then it can be assumed that the tail page wrapped the buffer, and we must drop new writes. This is not a race condition, because the commit page can only be moved -by the outter most writer (the writer that was preempted). +by the outermost writer (the writer that was preempted). This means that the commit will not move while a writer is moving the -tail page. The reader can not swap the reader page if it is also being +tail page. The reader cannot swap the reader page if it is also being used as the commit page. The reader can simply check that the commit is off the reader page. Once the commit page leaves the reader page it will never go back on it unless a reader does another swap with the @@ -733,7 +733,7 @@ The write converts the head page pointer to UPDATE. --->| |<---| |<---| |<---| |<--- +---+ +---+ +---+ +---+ -But if a nested writer preempts here. It will see that the next +But if a nested writer preempts here, it will see that the next page is a head page, but it is also nested. It will detect that it is nested and will save that information. The detection is the fact that it sees the UPDATE flag instead of a HEADER or NORMAL @@ -761,7 +761,7 @@ to NORMAL. --->| |<---| |<---| |<---| |<--- +---+ +---+ +---+ +---+ -After the nested writer finishes, the outer most writer will convert +After the nested writer finishes, the outermost writer will convert the UPDATE pointer to NORMAL. @@ -812,7 +812,7 @@ head page. +---+ +---+ +---+ +---+ The nested writer moves the tail page forward. But does not set the old -update page to NORMAL because it is not the outer most writer. +update page to NORMAL because it is not the outermost writer. tail page | @@ -892,7 +892,7 @@ It will return to the first writer. --->| |<---| |<---| |<---| |<--- +---+ +---+ +---+ +---+ -The first writer can not know atomically test if the tail page moved +The first writer cannot know atomically if the tail page moved while it updates the HEAD page. It will then update the head page to what it thinks is the new head page. @@ -923,9 +923,9 @@ if the tail page is either where it use to be or on the next page: --->| |<---| |<---| |<---| |<--- +---+ +---+ +---+ +---+ -If tail page != A and tail page does not equal B, then it must reset the -pointer back to NORMAL. The fact that it only needs to worry about -nested writers, it only needs to check this after setting the HEAD page. +If tail page != A and tail page != B, then it must reset the pointer +back to NORMAL. The fact that it only needs to worry about nested +writers means that it only needs to check this after setting the HEAD page. (first writer) @@ -939,7 +939,7 @@ nested writers, it only needs to check this after setting the HEAD page. +---+ +---+ +---+ +---+ Now the writer can update the head page. This is also why the head page must -remain in UPDATE and only reset by the outer most writer. This prevents +remain in UPDATE and only reset by the outermost writer. This prevents the reader from seeing the incorrect head page. -- cgit v1.1 From d2b34e20c1f431604e0dde910c3ff271c84ed706 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 8 Jan 2010 14:43:09 -0800 Subject: documentation: update kernel-doc-nano-HOWTO information Remove comments about function short descriptions not allowed to be on multiple lines (that was fixed/changed recently). Add comments that function "section header:" names need to be unique per function/struct/union/typedef/enum. Signed-off-by: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/kernel-doc-nano-HOWTO.txt | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/kernel-doc-nano-HOWTO.txt b/Documentation/kernel-doc-nano-HOWTO.txt index 348b9e5..27a52b3 100644 --- a/Documentation/kernel-doc-nano-HOWTO.txt +++ b/Documentation/kernel-doc-nano-HOWTO.txt @@ -214,11 +214,13 @@ The format of the block comment is like this: * (section header: (section description)? )* (*)?*/ -The short function description ***cannot be multiline***, but the other -descriptions can be (and they can contain blank lines). If you continue -that initial short description onto a second line, that second line will -appear further down at the beginning of the description section, which is -almost certainly not what you had in mind. +All "description" text can span multiple lines, although the +function_name & its short description are traditionally on a single line. +Description text may also contain blank lines (i.e., lines that contain +only a "*"). + +"section header:" names must be unique per function (or struct, +union, typedef, enum). Avoid putting a spurious blank line after the function name, or else the description will be repeated! -- cgit v1.1 From aa4e2e171385bb77b4da8b760d26dea2aa291587 Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Mon, 11 Jan 2010 15:53:45 -0800 Subject: Documentation/3c509: document ethtool support 3c509 was changed to support ethtool in 2002, making the 'xcvr' module parameter obsolete in most cases. More recently 3c509 was converted to the modern driver model and this parameter was removed. Fix the documentation to refer to ethtool rather than the module parameter. Signed-off-by: Ben Hutchings Signed-off-by: David S. Miller --- Documentation/networking/3c509.txt | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/3c509.txt b/Documentation/networking/3c509.txt index 0643e3b..3c45d5d 100644 --- a/Documentation/networking/3c509.txt +++ b/Documentation/networking/3c509.txt @@ -48,11 +48,11 @@ for LILO parameters for doing this: This configures the first found 3c509 card for IRQ 10, base I/O 0x310, and transceiver type 3 (10base2). The flag "0x3c509" must be set to avoid conflicts with other card types when overriding the I/O address. When the driver is -loaded as a module, only the IRQ and transceiver setting may be overridden. -For example, setting two cards to 10base2/IRQ10 and AUI/IRQ11 is done by using -the xcvr and irq module options: +loaded as a module, only the IRQ may be overridden. For example, +setting two cards to IRQ10 and IRQ11 is done by using the irq module +option: - options 3c509 xcvr=3,1 irq=10,11 + options 3c509 irq=10,11 (2) Full-duplex mode @@ -77,6 +77,8 @@ operation. itself full-duplex capable. This is almost certainly one of two things: a full- duplex-capable Ethernet switch (*not* a hub), or a full-duplex-capable NIC on another system that's connected directly to the 3c509B via a crossover cable. + +Full-duplex mode can be enabled using 'ethtool'. /////Extremely important caution concerning full-duplex mode///// Understand that the 3c509B's hardware's full-duplex support is much more @@ -113,6 +115,8 @@ This insured that merely upgrading the driver from an earlier version would never automatically enable full-duplex mode in an existing installation; it must always be explicitly enabled via one of these code in order to be activated. + +The transceiver type can be changed using 'ethtool'. (4a) Interpretation of error messages and common problems -- cgit v1.1 From ceafe1d2fe33e92691bfdbd5a93ed259c3da7b60 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Mon, 11 Jan 2010 10:50:53 -0200 Subject: feature-removal-schedule: Add v4l1 drivers obsoleted by gspca sub drivers This patch adds the ov511, quickcam_messenger, w9968cf, stv680 and ovcamchip v4l1 drivers to the feature removal schedule as the devices they support are now all also supported by v4l2 gspca sub drivers. This patch also adds the v4l2 vc0301 driver for removal as it duplicates functionality of the gspca_zc3xx driver, zc0301 only supports 2 USB-ID's (because it only supports a limited set of sensors) wich are also supported by the gspca_zc3xx driver (which supports 53 USB-ID's in total). [mchehab@redhat.com: change "when" to 2.6.35] Signed-off-by: Hans de Goede Signed-off-by: Mauro Carvalho Chehab --- Documentation/feature-removal-schedule.txt | 49 ++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 870d190..0a46833 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -493,3 +493,52 @@ Why: These two features use non-standard interfaces. There are the Who: Corentin Chary ---------------------------- + +What: usbvideo quickcam_messenger driver +When: 2.6.35 +Files: drivers/media/video/usbvideo/quickcam_messenger.[ch] +Why: obsolete v4l1 driver replaced by gspca_stv06xx +Who: Hans de Goede + +---------------------------- + +What: ov511 v4l1 driver +When: 2.6.35 +Files: drivers/media/video/ov511.[ch] +Why: obsolete v4l1 driver replaced by gspca_ov519 +Who: Hans de Goede + +---------------------------- + +What: w9968cf v4l1 driver +When: 2.6.35 +Files: drivers/media/video/w9968cf*.[ch] +Why: obsolete v4l1 driver replaced by gspca_ov519 +Who: Hans de Goede + +---------------------------- + +What: ovcamchip sensor framework +When: 2.6.35 +Files: drivers/media/video/ovcamchip/* +Why: Only used by obsoleted v4l1 drivers +Who: Hans de Goede + +---------------------------- + +What: stv680 v4l1 driver +When: 2.6.35 +Files: drivers/media/video/stv680.[ch] +Why: obsolete v4l1 driver replaced by gspca_stv0680 +Who: Hans de Goede + +---------------------------- + +What: zc0301 v4l driver +When: 2.6.35 +Files: drivers/media/video/zc0301/* +Why: Duplicate functionality with the gspca_zc3xx driver, zc0301 only + supports 2 USB-ID's (because it only supports a limited set of + sensors) wich are also supported by the gspca_zc3xx driver + (which supports 53 USB-ID's in total) +Who: Hans de Goede -- cgit v1.1 From 6993b1bb1e62367f500789835a1f747e12259f07 Mon Sep 17 00:00:00 2001 From: Yang Hongyang Date: Mon, 25 Jan 2010 11:10:32 +0800 Subject: tracing/documentation: Fix a typo in ftrace.txt 'ftrace' is no longer the name of the function tracer, to activate the function trace 'echo function > current_tracer' is to be used instead of 'echo ftrace > current_tracer'. Update the documentation to reflect the current implementation. Signed-off-by: Yang Hongyang LKML-Reference: <4B5D0BA8.20106@cn.fujitsu.com> Signed-off-by: Steven Rostedt --- Documentation/trace/ftrace.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt index 8179692..bab3040 100644 --- a/Documentation/trace/ftrace.txt +++ b/Documentation/trace/ftrace.txt @@ -1625,7 +1625,7 @@ If I am only interested in sys_nanosleep and hrtimer_interrupt: # echo sys_nanosleep hrtimer_interrupt \ > set_ftrace_filter - # echo ftrace > current_tracer + # echo function > current_tracer # echo 1 > tracing_enabled # usleep 1 # echo 0 > tracing_enabled -- cgit v1.1 From 03688970347bfea32823953a7ce5886d1713205f Mon Sep 17 00:00:00 2001 From: Mike Frysinger Date: Fri, 22 Jan 2010 08:12:47 -0500 Subject: tracing/documentation: Cover new frame pointer semantics Update the graph tracer examples to cover the new frame pointer semantics (in terms of passing it along). Move the HAVE_FUNCTION_GRAPH_FP_TEST docs out of the Kconfig, into the right place, and expand on the details. Signed-off-by: Mike Frysinger LKML-Reference: <1264165967-18938-1-git-send-email-vapier@gentoo.org> Signed-off-by: Steven Rostedt --- Documentation/trace/ftrace-design.txt | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt index 239f14b..6a5a579 100644 --- a/Documentation/trace/ftrace-design.txt +++ b/Documentation/trace/ftrace-design.txt @@ -1,5 +1,6 @@ function tracer guts ==================== + By Mike Frysinger Introduction ------------ @@ -173,14 +174,16 @@ void ftrace_graph_caller(void) unsigned long *frompc = &...; unsigned long selfpc = - MCOUNT_INSN_SIZE; - prepare_ftrace_return(frompc, selfpc); + /* passing frame pointer up is optional -- see below */ + prepare_ftrace_return(frompc, selfpc, frame_pointer); /* restore all state needed by the ABI */ } #endif -For information on how to implement prepare_ftrace_return(), simply look at -the x86 version. The only architecture-specific piece in it is the setup of +For information on how to implement prepare_ftrace_return(), simply look at the +x86 version (the frame pointer passing is optional; see the next section for +more information). The only architecture-specific piece in it is the setup of the fault recovery table (the asm(...) code). The rest should be the same across architectures. @@ -205,6 +208,23 @@ void return_to_handler(void) #endif +HAVE_FUNCTION_GRAPH_FP_TEST +--------------------------- + +An arch may pass in a unique value (frame pointer) to both the entering and +exiting of a function. On exit, the value is compared and if it does not +match, then it will panic the kernel. This is largely a sanity check for bad +code generation with gcc. If gcc for your port sanely updates the frame +pointer under different opitmization levels, then ignore this option. + +However, adding support for it isn't terribly difficult. In your assembly code +that calls prepare_ftrace_return(), pass the frame pointer as the 3rd argument. +Then in the C version of that function, do what the x86 port does and pass it +along to ftrace_push_return_trace() instead of a stub value of 0. + +Similarly, when you call ftrace_return_to_handler(), pass it the frame pointer. + + HAVE_FTRACE_NMI_ENTER --------------------- -- cgit v1.1 From f6bdc2303da6786cc22a7d24b6790e9f75b4cfdc Mon Sep 17 00:00:00 2001 From: Henrik Rydberg Date: Thu, 28 Jan 2010 22:28:28 -0800 Subject: Input: update multi-touch protocol documentation This patch documents a new ABS_MT parameter and adds further text to clarify some points around the MT protocol. Requested-by: Yoonyoung Shim Requested-by: Mika Kuoppala Requested-by: Peter Hutterer Signed-off-by: Henrik Rydberg Signed-off-by: Dmitry Torokhov --- Documentation/input/multi-touch-protocol.txt | 48 +++++++++++++++++++++++----- 1 file changed, 40 insertions(+), 8 deletions(-) (limited to 'Documentation') diff --git a/Documentation/input/multi-touch-protocol.txt b/Documentation/input/multi-touch-protocol.txt index a12ea3b..8490480 100644 --- a/Documentation/input/multi-touch-protocol.txt +++ b/Documentation/input/multi-touch-protocol.txt @@ -27,12 +27,30 @@ set of events/packets. A set of ABS_MT events with the desired properties is defined. The events are divided into categories, to allow for partial implementation. The -minimum set consists of ABS_MT_TOUCH_MAJOR, ABS_MT_POSITION_X and -ABS_MT_POSITION_Y, which allows for multiple fingers to be tracked. If the -device supports it, the ABS_MT_WIDTH_MAJOR may be used to provide the size -of the approaching finger. Anisotropy and direction may be specified with -ABS_MT_TOUCH_MINOR, ABS_MT_WIDTH_MINOR and ABS_MT_ORIENTATION. The -ABS_MT_TOOL_TYPE may be used to specify whether the touching tool is a +minimum set consists of ABS_MT_POSITION_X and ABS_MT_POSITION_Y, which +allows for multiple fingers to be tracked. If the device supports it, the +ABS_MT_TOUCH_MAJOR and ABS_MT_WIDTH_MAJOR may be used to provide the size +of the contact area and approaching finger, respectively. + +The TOUCH and WIDTH parameters have a geometrical interpretation; imagine +looking through a window at someone gently holding a finger against the +glass. You will see two regions, one inner region consisting of the part +of the finger actually touching the glass, and one outer region formed by +the perimeter of the finger. The diameter of the inner region is the +ABS_MT_TOUCH_MAJOR, the diameter of the outer region is +ABS_MT_WIDTH_MAJOR. Now imagine the person pressing the finger harder +against the glass. The inner region will increase, and in general, the +ratio ABS_MT_TOUCH_MAJOR / ABS_MT_WIDTH_MAJOR, which is always smaller than +unity, is related to the finger pressure. For pressure-based devices, +ABS_MT_PRESSURE may be used to provide the pressure on the contact area +instead. + +In addition to the MAJOR parameters, the oval shape of the finger can be +described by adding the MINOR parameters, such that MAJOR and MINOR are the +major and minor axis of an ellipse. Finally, the orientation of the oval +shape can be describe with the ORIENTATION parameter. + +The ABS_MT_TOOL_TYPE may be used to specify whether the touching tool is a finger or a pen or something else. Devices with more granular information may specify general shapes as blobs, i.e., as a sequence of rectangular shapes grouped together by an ABS_MT_BLOB_ID. Finally, for the few devices @@ -42,11 +60,9 @@ report finger tracking from hardware [5]. Here is what a minimal event sequence for a two-finger touch would look like: - ABS_MT_TOUCH_MAJOR ABS_MT_POSITION_X ABS_MT_POSITION_Y SYN_MT_REPORT - ABS_MT_TOUCH_MAJOR ABS_MT_POSITION_X ABS_MT_POSITION_Y SYN_MT_REPORT @@ -87,6 +103,12 @@ the contact. The ratio ABS_MT_TOUCH_MAJOR / ABS_MT_WIDTH_MAJOR approximates the notion of pressure. The fingers of the hand and the palm all have different characteristic widths [1]. +ABS_MT_PRESSURE + +The pressure, in arbitrary units, on the contact area. May be used instead +of TOUCH and WIDTH for pressure-based devices or any device with a spatial +signal intensity distribution. + ABS_MT_ORIENTATION The orientation of the ellipse. The value should describe a signed quarter @@ -170,6 +192,16 @@ There are a few devices that support trackingID in hardware. User space can make use of these native identifiers to reduce bandwidth and cpu usage. +Gestures +-------- + +In the specific application of creating gesture events, the TOUCH and WIDTH +parameters can be used to, e.g., approximate finger pressure or distinguish +between index finger and thumb. With the addition of the MINOR parameters, +one can also distinguish between a sweeping finger and a pointing finger, +and with ORIENTATION, one can detect twisting of fingers. + + Notes ----- -- cgit v1.1 From a225a5cc2c25e1217e68ee2ab8dd4454573fa308 Mon Sep 17 00:00:00 2001 From: Anton Blanchard Date: Tue, 2 Feb 2010 13:44:15 -0800 Subject: fault injection: correct function names in documentation init_fault_attr_entries() should be init_fault_attr_dentries(). cleanup_fault_attr_entries() should be cleanup_fault_attr_dentries(). Signed-off-by: Anton Blanchard Acked-by: Akinobu Mita Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/fault-injection/fault-injection.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt index 0793056..7be15e4 100644 --- a/Documentation/fault-injection/fault-injection.txt +++ b/Documentation/fault-injection/fault-injection.txt @@ -143,8 +143,8 @@ o provide a way to configure fault attributes failslab, fail_page_alloc, and fail_make_request use this way. Helper functions: - init_fault_attr_entries(entries, attr, name); - void cleanup_fault_attr_entries(entries); + init_fault_attr_dentries(entries, attr, name); + void cleanup_fault_attr_dentries(entries); - module parameters -- cgit v1.1