diff options
author | Jiri Kosina <jkosina@suse.cz> | 2013-05-28 10:09:22 +0200 |
---|---|---|
committer | Jiri Kosina <jkosina@suse.cz> | 2013-05-28 10:09:29 +0200 |
commit | 864bfb25b57a6766ea689befa5cf09a4353281ce (patch) | |
tree | 478941ca6e76b8d77b71f5827a3ce751c46a72b5 /Documentation | |
parent | 071361d3473ebb8142907470ff12d59c59f6be72 (diff) | |
parent | 49717cb40410fe4b563968680ff7c513967504c6 (diff) | |
download | op-kernel-dev-864bfb25b57a6766ea689befa5cf09a4353281ce.zip op-kernel-dev-864bfb25b57a6766ea689befa5cf09a4353281ce.tar.gz |
Merge branch 'master' into for-next
Merge with 49717cb ("kthread: Document ways of reducing OS jitter due
to per-CPU kthreads") to be able to apply fixup patch on top of it.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Diffstat (limited to 'Documentation')
27 files changed, 373 insertions, 66 deletions
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 31ef8fe..79e789b8 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -217,9 +217,14 @@ over a rather long period of time, but improvements are always welcome! whether the increased speed is worth it. 8. Although synchronize_rcu() is slower than is call_rcu(), it - usually results in simpler code. So, unless update performance - is critically important or the updaters cannot block, - synchronize_rcu() should be used in preference to call_rcu(). + usually results in simpler code. So, unless update performance is + critically important, the updaters cannot block, or the latency of + synchronize_rcu() is visible from userspace, synchronize_rcu() + should be used in preference to call_rcu(). Furthermore, + kfree_rcu() usually results in even simpler code than does + synchronize_rcu() without synchronize_rcu()'s multi-millisecond + latency. So please take advantage of kfree_rcu()'s "fire and + forget" memory-freeing capabilities where it applies. An especially important property of the synchronize_rcu() primitive is that it automatically self-limits: if grace periods @@ -268,7 +273,8 @@ over a rather long period of time, but improvements are always welcome! e. Periodically invoke synchronize_rcu(), permitting a limited number of updates per grace period. - The same cautions apply to call_rcu_bh() and call_rcu_sched(). + The same cautions apply to call_rcu_bh(), call_rcu_sched(), + call_srcu(), and kfree_rcu(). 9. All RCU list-traversal primitives, which include rcu_dereference(), list_for_each_entry_rcu(), and @@ -296,9 +302,9 @@ over a rather long period of time, but improvements are always welcome! all currently executing rcu_read_lock()-protected RCU read-side critical sections complete. It does -not- necessarily guarantee that all currently running interrupts, NMIs, preempt_disable() - code, or idle loops will complete. Therefore, if you do not have - rcu_read_lock()-protected read-side critical sections, do -not- - use synchronize_rcu(). + code, or idle loops will complete. Therefore, if your + read-side critical sections are protected by something other + than rcu_read_lock(), do -not- use synchronize_rcu(). Similarly, disabling preemption is not an acceptable substitute for rcu_read_lock(). Code that attempts to use preemption @@ -401,9 +407,9 @@ over a rather long period of time, but improvements are always welcome! read-side critical sections. It is the responsibility of the RCU update-side primitives to deal with this. -17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and - the __rcu sparse checks to validate your RCU code. These - can help find problems as follows: +17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the + __rcu sparse checks (enabled by CONFIG_SPARSE_RCU_POINTER) to + validate your RCU code. These can help find problems as follows: CONFIG_PROVE_RCU: check that accesses to RCU-protected data structures are carried out under the proper RCU diff --git a/Documentation/RCU/lockdep.txt b/Documentation/RCU/lockdep.txt index a102d4b..cd83d23 100644 --- a/Documentation/RCU/lockdep.txt +++ b/Documentation/RCU/lockdep.txt @@ -64,6 +64,11 @@ checking of rcu_dereference() primitives: but retain the compiler constraints that prevent duplicating or coalescsing. This is useful when when testing the value of the pointer itself, for example, against NULL. + rcu_access_index(idx): + Return the value of the index and omit all barriers, but + retain the compiler constraints that prevent duplicating + or coalescsing. This is useful when when testing the + value of the index itself, for example, against -1. The rcu_dereference_check() check expression can be any boolean expression, but would normally include a lockdep expression. However, diff --git a/Documentation/RCU/rcubarrier.txt b/Documentation/RCU/rcubarrier.txt index 38428c1..2e319d1 100644 --- a/Documentation/RCU/rcubarrier.txt +++ b/Documentation/RCU/rcubarrier.txt @@ -79,7 +79,20 @@ complete. Pseudo-code using rcu_barrier() is as follows: 2. Execute rcu_barrier(). 3. Allow the module to be unloaded. -The rcutorture module makes use of rcu_barrier in its exit function +There are also rcu_barrier_bh(), rcu_barrier_sched(), and srcu_barrier() +functions for the other flavors of RCU, and you of course must match +the flavor of rcu_barrier() with that of call_rcu(). If your module +uses multiple flavors of call_rcu(), then it must also use multiple +flavors of rcu_barrier() when unloading that module. For example, if +it uses call_rcu_bh(), call_srcu() on srcu_struct_1, and call_srcu() on +srcu_struct_2(), then the following three lines of code will be required +when unloading: + + 1 rcu_barrier_bh(); + 2 srcu_barrier(&srcu_struct_1); + 3 srcu_barrier(&srcu_struct_2); + +The rcutorture module makes use of rcu_barrier() in its exit function as follows: 1 static void diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 1927151..e38b8df 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -92,14 +92,14 @@ If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set, more information is printed with the stall-warning message, for example: INFO: rcu_preempt detected stall on CPU - 0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 + 0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 softirq=82/543 (t=65000 jiffies) In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is printed: INFO: rcu_preempt detected stall on CPU - 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 drain=0 . timer not pending + 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 nonlazy_posted: 25 .D (t=65000 jiffies) The "(64628 ticks this GP)" indicates that this CPU has taken more @@ -116,13 +116,28 @@ number between the two "/"s is the value of the nesting, which will be a small positive number if in the idle loop and a very large positive number (as shown above) otherwise. -For CONFIG_RCU_FAST_NO_HZ kernels, the "drain=0" indicates that the CPU is -not in the process of trying to force itself into dyntick-idle state, the -"." indicates that the CPU has not given up forcing RCU into dyntick-idle -mode (it would be "H" otherwise), and the "timer not pending" indicates -that the CPU has not recently forced RCU into dyntick-idle mode (it -would otherwise indicate the number of microseconds remaining in this -forced state). +The "softirq=" portion of the message tracks the number of RCU softirq +handlers that the stalled CPU has executed. The number before the "/" +is the number that had executed since boot at the time that this CPU +last noted the beginning of a grace period, which might be the current +(stalled) grace period, or it might be some earlier grace period (for +example, if the CPU might have been in dyntick-idle mode for an extended +time period. The number after the "/" is the number that have executed +since boot until the current time. If this latter number stays constant +across repeated stall-warning messages, it is possible that RCU's softirq +handlers are no longer able to execute on this CPU. This can happen if +the stalled CPU is spinning with interrupts are disabled, or, in -rt +kernels, if a high-priority process is starving RCU's softirq handler. + +For CONFIG_RCU_FAST_NO_HZ kernels, the "last_accelerate:" prints the +low-order 16 bits (in hex) of the jiffies counter when this CPU last +invoked rcu_try_advance_all_cbs() from rcu_needs_cpu() or last invoked +rcu_accelerate_cbs() from rcu_prepare_for_idle(). The "nonlazy_posted:" +prints the number of non-lazy callbacks posted since the last call to +rcu_needs_cpu(). Finally, an "L" indicates that there are currently +no non-lazy callbacks ("." is printed otherwise, as shown above) and +"D" indicates that dyntick-idle processing is enabled ("." is printed +otherwise, for example, if disabled via the "nohz=" kernel boot parameter). Multiple Warnings From One Stall diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 0cc7820..10df0b8 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -265,9 +265,9 @@ rcu_dereference() rcu_read_lock(); p = rcu_dereference(head.next); rcu_read_unlock(); - x = p->address; + x = p->address; /* BUG!!! */ rcu_read_lock(); - y = p->data; + y = p->data; /* BUG!!! */ rcu_read_unlock(); Holding a reference from one RCU read-side critical section diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index c379a2a..aa0c1e6 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -60,8 +60,7 @@ own source tree. For example: "dontdiff" is a list of files which are generated by the kernel during the build process, and should be ignored in any diff(1)-generated patch. The "dontdiff" file is included in the kernel tree in -2.6.12 and later. For earlier kernel versions, you can get it -from <http://www.xenotime.net/linux/doc/dontdiff>. +2.6.12 and later. Make sure your patch does not include any extra files which do not belong in a patch submission. Make sure to review your patch -after- diff --git a/Documentation/device-mapper/dm-raid.txt b/Documentation/device-mapper/dm-raid.txt index 56fb62b..b428556 100644 --- a/Documentation/device-mapper/dm-raid.txt +++ b/Documentation/device-mapper/dm-raid.txt @@ -30,6 +30,7 @@ The target is named "raid" and it accepts the following parameters: raid10 Various RAID10 inspired algorithms chosen by additional params - RAID10: Striped Mirrors (aka 'Striping on top of mirrors') - RAID1E: Integrated Adjacent Stripe Mirroring + - RAID1E: Integrated Offset Stripe Mirroring - and other similar RAID10 variants Reference: Chapter 4 of @@ -64,15 +65,15 @@ The target is named "raid" and it accepts the following parameters: synchronisation state for each region. [raid10_copies <# copies>] - [raid10_format near] + [raid10_format <near|far|offset>] These two options are used to alter the default layout of a RAID10 configuration. The number of copies is can be - specified, but the default is 2. There are other variations - to how the copies are laid down - the default and only current - option is "near". Near copies are what most people think of - with respect to mirroring. If these options are left - unspecified, or 'raid10_copies 2' and/or 'raid10_format near' - are given, then the layouts for 2, 3 and 4 devices are: + specified, but the default is 2. There are also three + variations to how the copies are laid down - the default + is "near". Near copies are what most people think of with + respect to mirroring. If these options are left unspecified, + or 'raid10_copies 2' and/or 'raid10_format near' are given, + then the layouts for 2, 3 and 4 devices are: 2 drives 3 drives 4 drives -------- ---------- -------------- A1 A1 A1 A1 A2 A1 A1 A2 A2 @@ -85,6 +86,33 @@ The target is named "raid" and it accepts the following parameters: 3-device layout is what might be called a 'RAID1E - Integrated Adjacent Stripe Mirroring'. + If 'raid10_copies 2' and 'raid10_format far', then the layouts + for 2, 3 and 4 devices are: + 2 drives 3 drives 4 drives + -------- -------------- -------------------- + A1 A2 A1 A2 A3 A1 A2 A3 A4 + A3 A4 A4 A5 A6 A5 A6 A7 A8 + A5 A6 A7 A8 A9 A9 A10 A11 A12 + .. .. .. .. .. .. .. .. .. + A2 A1 A3 A1 A2 A2 A1 A4 A3 + A4 A3 A6 A4 A5 A6 A5 A8 A7 + A6 A5 A9 A7 A8 A10 A9 A12 A11 + .. .. .. .. .. .. .. .. .. + + If 'raid10_copies 2' and 'raid10_format offset', then the + layouts for 2, 3 and 4 devices are: + 2 drives 3 drives 4 drives + -------- ------------ ----------------- + A1 A2 A1 A2 A3 A1 A2 A3 A4 + A2 A1 A3 A1 A2 A2 A1 A4 A3 + A3 A4 A4 A5 A6 A5 A6 A7 A8 + A4 A3 A6 A4 A5 A6 A5 A8 A7 + A5 A6 A7 A8 A9 A9 A10 A11 A12 + A6 A5 A9 A7 A8 A10 A9 A12 A11 + .. .. .. .. .. .. .. .. .. + Here we see layouts closely akin to 'RAID1E - Integrated + Offset Stripe Mirroring'. + <#raid_devs>: The number of devices composing the array. Each device consists of two entries. The first is the device containing the metadata (if any); the second is the one containing the @@ -142,3 +170,5 @@ Version History 1.3.0 Added support for RAID 10 1.3.1 Allow device replacement/rebuild for RAID 10 1.3.2 Fix/improve redundancy checking for RAID10 +1.4.0 Non-functional change. Removes arg from mapping function. +1.4.1 Add RAID10 "far" and "offset" algorithm support. diff --git a/Documentation/hwmon/adm1275 b/Documentation/hwmon/adm1275 index 2cfa256..15b4a20 100644 --- a/Documentation/hwmon/adm1275 +++ b/Documentation/hwmon/adm1275 @@ -15,7 +15,7 @@ Supported chips: Addresses scanned: - Datasheet: www.analog.com/static/imported-files/data_sheets/ADM1276.pdf -Author: Guenter Roeck <guenter.roeck@ericsson.com> +Author: Guenter Roeck <linux@roeck-us.net> Description diff --git a/Documentation/hwmon/adt7410 b/Documentation/hwmon/adt7410 index 9600400..58150c4 100644 --- a/Documentation/hwmon/adt7410 +++ b/Documentation/hwmon/adt7410 @@ -4,9 +4,14 @@ Kernel driver adt7410 Supported chips: * Analog Devices ADT7410 Prefix: 'adt7410' - Addresses scanned: I2C 0x48 - 0x4B + Addresses scanned: None Datasheet: Publicly available at the Analog Devices website http://www.analog.com/static/imported-files/data_sheets/ADT7410.pdf + * Analog Devices ADT7420 + Prefix: 'adt7420' + Addresses scanned: None + Datasheet: Publicly available at the Analog Devices website + http://www.analog.com/static/imported-files/data_sheets/ADT7420.pdf Author: Hartmut Knaack <knaack.h@gmx.de> @@ -27,6 +32,10 @@ value per second or even justget one sample on demand for power saving. Besides, it can completely power down its ADC, if power management is required. +The ADT7420 is register compatible, the only differences being the package, +a slightly narrower operating temperature range (-40°C to +150°C), and a +better accuracy (0.25°C instead of 0.50°C.) + Configuration Notes ------------------- diff --git a/Documentation/hwmon/jc42 b/Documentation/hwmon/jc42 index 1650771..868d74d 100644 --- a/Documentation/hwmon/jc42 +++ b/Documentation/hwmon/jc42 @@ -49,7 +49,7 @@ Supported chips: Addresses scanned: I2C 0x18 - 0x1f Author: - Guenter Roeck <guenter.roeck@ericsson.com> + Guenter Roeck <linux@roeck-us.net> Description diff --git a/Documentation/hwmon/lineage-pem b/Documentation/hwmon/lineage-pem index 2ba5ed1..83b2ddc 100644 --- a/Documentation/hwmon/lineage-pem +++ b/Documentation/hwmon/lineage-pem @@ -8,7 +8,7 @@ Supported devices: Documentation: http://www.lineagepower.com/oem/pdf/CPLI2C.pdf -Author: Guenter Roeck <guenter.roeck@ericsson.com> +Author: Guenter Roeck <linux@roeck-us.net> Description diff --git a/Documentation/hwmon/lm25066 b/Documentation/hwmon/lm25066 index a21db81..26025e4 100644 --- a/Documentation/hwmon/lm25066 +++ b/Documentation/hwmon/lm25066 @@ -19,7 +19,7 @@ Supported chips: Datasheet: http://www.national.com/pf/LM/LM5066.html -Author: Guenter Roeck <guenter.roeck@ericsson.com> +Author: Guenter Roeck <linux@roeck-us.net> Description diff --git a/Documentation/hwmon/ltc2978 b/Documentation/hwmon/ltc2978 index c365f9b..e4d75c6 100644 --- a/Documentation/hwmon/ltc2978 +++ b/Documentation/hwmon/ltc2978 @@ -5,13 +5,13 @@ Supported chips: * Linear Technology LTC2978 Prefix: 'ltc2978' Addresses scanned: - - Datasheet: http://cds.linear.com/docs/Datasheet/2978fa.pdf + Datasheet: http://www.linear.com/product/ltc2978 * Linear Technology LTC3880 Prefix: 'ltc3880' Addresses scanned: - - Datasheet: http://cds.linear.com/docs/Datasheet/3880f.pdf + Datasheet: http://www.linear.com/product/ltc3880 -Author: Guenter Roeck <guenter.roeck@ericsson.com> +Author: Guenter Roeck <linux@roeck-us.net> Description diff --git a/Documentation/hwmon/ltc4261 b/Documentation/hwmon/ltc4261 index eba2e2c..9378a75 100644 --- a/Documentation/hwmon/ltc4261 +++ b/Documentation/hwmon/ltc4261 @@ -8,7 +8,7 @@ Supported chips: Datasheet: http://cds.linear.com/docs/Datasheet/42612fb.pdf -Author: Guenter Roeck <guenter.roeck@ericsson.com> +Author: Guenter Roeck <linux@roeck-us.net> Description diff --git a/Documentation/hwmon/max16064 b/Documentation/hwmon/max16064 index f8b4780..d59cc78 100644 --- a/Documentation/hwmon/max16064 +++ b/Documentation/hwmon/max16064 @@ -7,7 +7,7 @@ Supported chips: Addresses scanned: - Datasheet: http://datasheets.maxim-ic.com/en/ds/MAX16064.pdf -Author: Guenter Roeck <guenter.roeck@ericsson.com> +Author: Guenter Roeck <linux@roeck-us.net> Description diff --git a/Documentation/hwmon/max16065 b/Documentation/hwmon/max16065 index c11f64a..208a29e 100644 --- a/Documentation/hwmon/max16065 +++ b/Documentation/hwmon/max16065 @@ -24,7 +24,7 @@ Supported chips: http://datasheets.maxim-ic.com/en/ds/MAX16070-MAX16071.pdf -Author: Guenter Roeck <guenter.roeck@ericsson.com> +Author: Guenter Roeck <linux@roeck-us.net> Description diff --git a/Documentation/hwmon/max34440 b/Documentation/hwmon/max34440 index 47651ff..37cbf47 100644 --- a/Documentation/hwmon/max34440 +++ b/Documentation/hwmon/max34440 @@ -27,7 +27,7 @@ Supported chips: Addresses scanned: - Datasheet: http://datasheets.maximintegrated.com/en/ds/MAX34461.pdf -Author: Guenter Roeck <guenter.roeck@ericsson.com> +Author: Guenter Roeck <linux@roeck-us.net> Description diff --git a/Documentation/hwmon/max8688 b/Documentation/hwmon/max8688 index fe84987..e780786 100644 --- a/Documentation/hwmon/max8688 +++ b/Documentation/hwmon/max8688 @@ -7,7 +7,7 @@ Supported chips: Addresses scanned: - Datasheet: http://datasheets.maxim-ic.com/en/ds/MAX8688.pdf -Author: Guenter Roeck <guenter.roeck@ericsson.com> +Author: Guenter Roeck <linux@roeck-us.net> Description diff --git a/Documentation/hwmon/pmbus b/Documentation/hwmon/pmbus index 3d3a0f9..cf756ed 100644 --- a/Documentation/hwmon/pmbus +++ b/Documentation/hwmon/pmbus @@ -34,7 +34,7 @@ Supported chips: Addresses scanned: - Datasheet: n.a. -Author: Guenter Roeck <guenter.roeck@ericsson.com> +Author: Guenter Roeck <linux@roeck-us.net> Description diff --git a/Documentation/hwmon/smm665 b/Documentation/hwmon/smm665 index 59e3161..a341eee 100644 --- a/Documentation/hwmon/smm665 +++ b/Documentation/hwmon/smm665 @@ -29,7 +29,7 @@ Supported chips: http://www.summitmicro.com/prod_select/summary/SMM766/SMM766_2086.pdf http://www.summitmicro.com/prod_select/summary/SMM766B/SMM766B_2122.pdf -Author: Guenter Roeck <guenter.roeck@ericsson.com> +Author: Guenter Roeck <linux@roeck-us.net> Module Parameters diff --git a/Documentation/hwmon/ucd9000 b/Documentation/hwmon/ucd9000 index 0df5f27..805e33e 100644 --- a/Documentation/hwmon/ucd9000 +++ b/Documentation/hwmon/ucd9000 @@ -11,7 +11,7 @@ Supported chips: http://focus.ti.com/lit/ds/symlink/ucd9090.pdf http://focus.ti.com/lit/ds/symlink/ucd90910.pdf -Author: Guenter Roeck <guenter.roeck@ericsson.com> +Author: Guenter Roeck <linux@roeck-us.net> Description diff --git a/Documentation/hwmon/ucd9200 b/Documentation/hwmon/ucd9200 index fd7d07b..1e8060e 100644 --- a/Documentation/hwmon/ucd9200 +++ b/Documentation/hwmon/ucd9200 @@ -15,7 +15,7 @@ Supported chips: http://focus.ti.com/lit/ds/symlink/ucd9246.pdf http://focus.ti.com/lit/ds/symlink/ucd9248.pdf -Author: Guenter Roeck <guenter.roeck@ericsson.com> +Author: Guenter Roeck <linux@roeck-us.net> Description diff --git a/Documentation/hwmon/zl6100 b/Documentation/hwmon/zl6100 index 3d924b6..756b57c 100644 --- a/Documentation/hwmon/zl6100 +++ b/Documentation/hwmon/zl6100 @@ -54,7 +54,7 @@ http://archive.ericsson.net/service/internet/picov/get?DocNo=28701-EN/LZT146401 http://archive.ericsson.net/service/internet/picov/get?DocNo=28701-EN/LZT146256 -Author: Guenter Roeck <guenter.roeck@ericsson.com> +Author: Guenter Roeck <linux@roeck-us.net> Description diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index da519d3..a9d780e 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2459,9 +2459,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. In kernels built with CONFIG_RCU_NOCB_CPU=y, set the specified list of CPUs to be no-callback CPUs. Invocation of these CPUs' RCU callbacks will - be offloaded to "rcuoN" kthreads created for - that purpose. This reduces OS jitter on the + be offloaded to "rcuox/N" kthreads created for + that purpose, where "x" is "b" for RCU-bh, "p" + for RCU-preempt, and "s" for RCU-sched, and "N" + is the CPU number. This reduces OS jitter on the offloaded CPUs, which can be useful for HPC and + real-time workloads. It can also improve energy efficiency for asymmetric multiprocessors. @@ -2485,6 +2488,17 @@ bytes respectively. Such letter suffixes can also be entirely omitted. leaf rcu_node structure. Useful for very large systems. + rcutree.jiffies_till_first_fqs= [KNL,BOOT] + Set delay from grace-period initialization to + first attempt to force quiescent states. + Units are jiffies, minimum value is zero, + and maximum value is HZ. + + rcutree.jiffies_till_next_fqs= [KNL,BOOT] + Set delay between subsequent attempts to force + quiescent states. Units are jiffies, minimum + value is one, and maximum value is HZ. + rcutree.qhimark= [KNL,BOOT] Set threshold of queued RCU callbacks over which batch limiting is disabled. @@ -2499,16 +2513,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted. rcutree.rcu_cpu_stall_timeout= [KNL,BOOT] Set timeout for RCU CPU stall warning messages. - rcutree.jiffies_till_first_fqs= [KNL,BOOT] - Set delay from grace-period initialization to - first attempt to force quiescent states. - Units are jiffies, minimum value is zero, - and maximum value is HZ. + rcutree.rcu_idle_gp_delay= [KNL,BOOT] + Set wakeup interval for idle CPUs that have + RCU callbacks (RCU_FAST_NO_HZ=y). - rcutree.jiffies_till_next_fqs= [KNL,BOOT] - Set delay between subsequent attempts to force - quiescent states. Units are jiffies, minimum - value is one, and maximum value is HZ. + rcutree.rcu_idle_lazy_gp_delay= [KNL,BOOT] + Set wakeup interval for idle CPUs that have + only "lazy" RCU callbacks (RCU_FAST_NO_HZ=y). + Lazy RCU callbacks are those which RCU can + prove do nothing more than free memory. rcutorture.fqs_duration= [KNL,BOOT] Set duration of force_quiescent_state bursts. diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt new file mode 100644 index 0000000..cbf7ae41 --- /dev/null +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -0,0 +1,202 @@ +REDUCING OS JITTER DUE TO PER-CPU KTHREADS + +This document lists per-CPU kthreads in the Linux kernel and presents +options to control their OS jitter. Note that non-per-CPU kthreads are +not listed here. To reduce OS jitter from non-per-CPU kthreads, bind +them to a "housekeeping" CPU dedicated to such work. + + +REFERENCES + +o Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs. + +o Documentation/cgroups: Using cgroups to bind tasks to sets of CPUs. + +o man taskset: Using the taskset command to bind tasks to sets + of CPUs. + +o man sched_setaffinity: Using the sched_setaffinity() system + call to bind tasks to sets of CPUs. + +o /sys/devices/system/cpu/cpuN/online: Control CPU N's hotplug state, + writing "0" to offline and "1" to online. + +o In order to locate kernel-generated OS jitter on CPU N: + + cd /sys/kernel/debug/tracing + echo 1 > max_graph_depth # Increase the "1" for more detail + echo function_graph > current_tracer + # run workload + cat per_cpu/cpuN/trace + + +KTHREADS + +Name: ehca_comp/%u +Purpose: Periodically process Infiniband-related work. +To reduce its OS jitter, do any of the following: +1. Don't use eHCA Infiniband hardware, instead choosing hardware + that does not require per-CPU kthreads. This will prevent these + kthreads from being created in the first place. (This will + work for most people, as this hardware, though important, is + relatively old and is produced in relatively low unit volumes.) +2. Do all eHCA-Infiniband-related work on other CPUs, including + interrupts. +3. Rework the eHCA driver so that its per-CPU kthreads are + provisioned only on selected CPUs. + + +Name: irq/%d-%s +Purpose: Handle threaded interrupts. +To reduce its OS jitter, do the following: +1. Use irq affinity to force the irq threads to execute on + some other CPU. + +Name: kcmtpd_ctr_%d +Purpose: Handle Bluetooth work. +To reduce its OS jitter, do one of the following: +1. Don't use Bluetooth, in which case these kthreads won't be + created in the first place. +2. Use irq affinity to force Bluetooth-related interrupts to + occur on some other CPU and furthermore initiate all + Bluetooth activity on some other CPU. + +Name: ksoftirqd/%u +Purpose: Execute softirq handlers when threaded or when under heavy load. +To reduce its OS jitter, each softirq vector must be handled +separately as follows: +TIMER_SOFTIRQ: Do all of the following: +1. To the extent possible, keep the CPU out of the kernel when it + is non-idle, for example, by avoiding system calls and by forcing + both kernel threads and interrupts to execute elsewhere. +2. Build with CONFIG_HOTPLUG_CPU=y. After boot completes, force + the CPU offline, then bring it back online. This forces + recurring timers to migrate elsewhere. If you are concerned + with multiple CPUs, force them all offline before bringing the + first one back online. Once you have onlined the CPUs in question, + do not offline any other CPUs, because doing so could force the + timer back onto one of the CPUs in question. +NET_TX_SOFTIRQ and NET_RX_SOFTIRQ: Do all of the following: +1. Force networking interrupts onto other CPUs. +2. Initiate any network I/O on other CPUs. +3. Once your application has started, prevent CPU-hotplug operations + from being initiated from tasks that might run on the CPU to + be de-jittered. (It is OK to force this CPU offline and then + bring it back online before you start your application.) +BLOCK_SOFTIRQ: Do all of the following: +1. Force block-device interrupts onto some other CPU. +2. Initiate any block I/O on other CPUs. +3. Once your application has started, prevent CPU-hotplug operations + from being initiated from tasks that might run on the CPU to + be de-jittered. (It is OK to force this CPU offline and then + bring it back online before you start your application.) +BLOCK_IOPOLL_SOFTIRQ: Do all of the following: +1. Force block-device interrupts onto some other CPU. +2. Initiate any block I/O and block-I/O polling on other CPUs. +3. Once your application has started, prevent CPU-hotplug operations + from being initiated from tasks that might run on the CPU to + be de-jittered. (It is OK to force this CPU offline and then + bring it back online before you start your application.) +TASKLET_SOFTIRQ: Do one or more of the following: +1. Avoid use of drivers that use tasklets. (Such drivers will contain + calls to things like tasklet_schedule().) +2. Convert all drivers that you must use from tasklets to workqueues. +3. Force interrupts for drivers using tasklets onto other CPUs, + and also do I/O involving these drivers on other CPUs. +SCHED_SOFTIRQ: Do all of the following: +1. Avoid sending scheduler IPIs to the CPU to be de-jittered, + for example, ensure that at most one runnable kthread is present + on that CPU. If a thread that expects to run on the de-jittered + CPU awakens, the scheduler will send an IPI that can result in + a subsequent SCHED_SOFTIRQ. +2. Build with CONFIG_RCU_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_ALL=y, + CONFIG_NO_HZ_FULL=y, and, in addition, ensure that the CPU + to be de-jittered is marked as an adaptive-ticks CPU using the + "nohz_full=" boot parameter. This reduces the number of + scheduler-clock interrupts that the de-jittered CPU receives, + minimizing its chances of being selected to do the load balancing + work that runs in SCHED_SOFTIRQ context. +3. To the extent possible, keep the CPU out of the kernel when it + is non-idle, for example, by avoiding system calls and by + forcing both kernel threads and interrupts to execute elsewhere. + This further reduces the number of scheduler-clock interrupts + received by the de-jittered CPU. +HRTIMER_SOFTIRQ: Do all of the following: +1. To the extent possible, keep the CPU out of the kernel when it + is non-idle. For example, avoid system calls and force both + kernel threads and interrupts to execute elsewhere. +2. Build with CONFIG_HOTPLUG_CPU=y. Once boot completes, force the + CPU offline, then bring it back online. This forces recurring + timers to migrate elsewhere. If you are concerned with multiple + CPUs, force them all offline before bringing the first one + back online. Once you have onlined the CPUs in question, do not + offline any other CPUs, because doing so could force the timer + back onto one of the CPUs in question. +RCU_SOFTIRQ: Do at least one of the following: +1. Offload callbacks and keep the CPU in either dyntick-idle or + adaptive-ticks state by doing all of the following: + a. Build with CONFIG_RCU_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_ALL=y, + CONFIG_NO_HZ_FULL=y, and, in addition ensure that the CPU + to be de-jittered is marked as an adaptive-ticks CPU using + the "nohz_full=" boot parameter. Bind the rcuo kthreads + to housekeeping CPUs, which can tolerate OS jitter. + b. To the extent possible, keep the CPU out of the kernel + when it is non-idle, for example, by avoiding system + calls and by forcing both kernel threads and interrupts + to execute elsewhere. +2. Enable RCU to do its processing remotely via dyntick-idle by + doing all of the following: + a. Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y. + b. Ensure that the CPU goes idle frequently, allowing other + CPUs to detect that it has passed through an RCU quiescent + state. If the kernel is built with CONFIG_NO_HZ_FULL=y, + userspace execution also allows other CPUs to detect that + the CPU in question has passed through a quiescent state. + c. To the extent possible, keep the CPU out of the kernel + when it is non-idle, for example, by avoiding system + calls and by forcing both kernel threads and interrupts + to execute elsewhere. + +Name: rcuc/%u +Purpose: Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels. +To reduce its OS jitter, do at least one of the following: +1. Build the kernel with CONFIG_PREEMPT=n. This prevents these + kthreads from being created in the first place, and also obviates + the need for RCU priority boosting. This approach is feasible + for workloads that do not require high degrees of responsiveness. +2. Build the kernel with CONFIG_RCU_BOOST=n. This prevents these + kthreads from being created in the first place. This approach + is feasible only if your workload never requires RCU priority + boosting, for example, if you ensure frequent idle time on all + CPUs that might execute within the kernel. +3. Build with CONFIG_RCU_NOCB_CPU=y and CONFIG_RCU_NOCB_CPU_ALL=y, + which offloads all RCU callbacks to kthreads that can be moved + off of CPUs susceptible to OS jitter. This approach prevents the + rcuc/%u kthreads from having any work to do, so that they are + never awakened. +4. Ensure that the CPU never enters the kernel, and, in particular, + avoid initiating any CPU hotplug operations on this CPU. This is + another way of preventing any callbacks from being queued on the + CPU, again preventing the rcuc/%u kthreads from having any work + to do. + +Name: rcuob/%d, rcuop/%d, and rcuos/%d +Purpose: Offload RCU callbacks from the corresponding CPU. +To reduce its OS jitter, do at least one of the following: +1. Use affinity, cgroups, or other mechanism to force these kthreads + to execute on some other CPU. +2. Build with CONFIG_RCU_NOCB_CPUS=n, which will prevent these + kthreads from being created in the first place. However, please + note that this will not eliminate OS jitter, but will instead + shift it to RCU_SOFTIRQ. + +Name: watchdog/%u +Purpose: Detect software lockups on each CPU. +To reduce its OS jitter, do at least one of the following: +1. Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these + kthreads from being created in the first place. +2. Echo a zero to /proc/sys/kernel/watchdog to disable the + watchdog timer. +3. Echo a large number of /proc/sys/kernel/watchdog_thresh in + order to reduce the frequency of OS jitter due to the watchdog + timer down to a level that is acceptable for your workload. diff --git a/Documentation/power/opp.txt b/Documentation/power/opp.txt index 3035d00..425c51d 100644 --- a/Documentation/power/opp.txt +++ b/Documentation/power/opp.txt @@ -1,6 +1,5 @@ -*=============* -* OPP Library * -*=============* +Operating Performance Points (OPP) Library +========================================== (C) 2009-2010 Nishanth Menon <nm@ti.com>, Texas Instruments Incorporated @@ -16,15 +15,31 @@ Contents 1. Introduction =============== +1.1 What is an Operating Performance Point (OPP)? + Complex SoCs of today consists of a multiple sub-modules working in conjunction. In an operational system executing varied use cases, not all modules in the SoC need to function at their highest performing frequency all the time. To facilitate this, sub-modules in a SoC are grouped into domains, allowing some -domains to run at lower voltage and frequency while other domains are loaded -more. The set of discrete tuples consisting of frequency and voltage pairs that +domains to run at lower voltage and frequency while other domains run at +voltage/frequency pairs that are higher. + +The set of discrete tuples consisting of frequency and voltage pairs that the device will support per domain are called Operating Performance Points or OPPs. +As an example: +Let us consider an MPU device which supports the following: +{300MHz at minimum voltage of 1V}, {800MHz at minimum voltage of 1.2V}, +{1GHz at minimum voltage of 1.3V} + +We can represent these as three OPPs as the following {Hz, uV} tuples: +{300000000, 1000000} +{800000000, 1200000} +{1000000000, 1300000} + +1.2 Operating Performance Points Library + OPP library provides a set of helper functions to organize and query the OPP information. The library is located in drivers/base/power/opp.c and the header is located in include/linux/opp.h. OPP library can be enabled by enabling diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt index e8a6aa4..6e95356 100644 --- a/Documentation/printk-formats.txt +++ b/Documentation/printk-formats.txt @@ -170,5 +170,5 @@ Reminder: sizeof() result is of type size_t. Thank you for your cooperation and attention. -By Randy Dunlap <rdunlap@xenotime.net> and +By Randy Dunlap <rdunlap@infradead.org> and Andrew Murray <amurray@mpc-data.co.uk> |