diff options
-rw-r--r-- | Documentation/RCU/trace.txt | 254 | ||||
-rw-r--r-- | Documentation/RCU/whatisRCU.txt | 2 | ||||
-rw-r--r-- | include/linux/hardirq.h | 24 | ||||
-rw-r--r-- | include/linux/rcupdate.h | 6 | ||||
-rw-r--r-- | include/linux/rcutiny.h | 95 | ||||
-rw-r--r-- | include/linux/srcu.h | 1 | ||||
-rw-r--r-- | init/Kconfig | 9 | ||||
-rw-r--r-- | kernel/Makefile | 1 | ||||
-rw-r--r-- | kernel/rcupdate.c | 4 | ||||
-rw-r--r-- | kernel/rcutiny.c | 291 | ||||
-rw-r--r-- | kernel/rcutorture.c | 31 | ||||
-rw-r--r-- | kernel/rcutree.c | 7 | ||||
-rw-r--r-- | kernel/rcutree_trace.c | 8 | ||||
-rw-r--r-- | kernel/softirq.c | 2 | ||||
-rw-r--r-- | kernel/srcu.c | 74 |
15 files changed, 570 insertions, 239 deletions
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index 187bbf1..8608fd8 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -1,185 +1,10 @@ CONFIG_RCU_TRACE debugfs Files and Formats -The rcupreempt and rcutree implementations of RCU provide debugfs trace -output that summarizes counters and state. This information is useful for -debugging RCU itself, and can sometimes also help to debug abuses of RCU. -Note that the rcuclassic implementation of RCU does not provide debugfs -trace output. - -The following sections describe the debugfs files and formats for -preemptable RCU (rcupreempt) and hierarchical RCU (rcutree). - - -Preemptable RCU debugfs Files and Formats - -This implementation of RCU provides three debugfs files under the -top-level directory RCU: rcu/rcuctrs (which displays the per-CPU -counters used by preemptable RCU) rcu/rcugp (which displays grace-period -counters), and rcu/rcustats (which internal counters for debugging RCU). - -The output of "cat rcu/rcuctrs" looks as follows: - -CPU last cur F M - 0 5 -5 0 0 - 1 -1 0 0 0 - 2 0 1 0 0 - 3 0 1 0 0 - 4 0 1 0 0 - 5 0 1 0 0 - 6 0 2 0 0 - 7 0 -1 0 0 - 8 0 1 0 0 -ggp = 26226, state = waitzero - -The per-CPU fields are as follows: - -o "CPU" gives the CPU number. Offline CPUs are not displayed. - -o "last" gives the value of the counter that is being decremented - for the current grace period phase. In the example above, - the counters sum to 4, indicating that there are still four - RCU read-side critical sections still running that started - before the last counter flip. - -o "cur" gives the value of the counter that is currently being - both incremented (by rcu_read_lock()) and decremented (by - rcu_read_unlock()). In the example above, the counters sum to - 1, indicating that there is only one RCU read-side critical section - still running that started after the last counter flip. - -o "F" indicates whether RCU is waiting for this CPU to acknowledge - a counter flip. In the above example, RCU is not waiting on any, - which is consistent with the state being "waitzero" rather than - "waitack". - -o "M" indicates whether RCU is waiting for this CPU to execute a - memory barrier. In the above example, RCU is not waiting on any, - which is consistent with the state being "waitzero" rather than - "waitmb". - -o "ggp" is the global grace-period counter. - -o "state" is the RCU state, which can be one of the following: - - o "idle": there is no grace period in progress. - - o "waitack": RCU just incremented the global grace-period - counter, which has the effect of reversing the roles of - the "last" and "cur" counters above, and is waiting for - all the CPUs to acknowledge the flip. Once the flip has - been acknowledged, CPUs will no longer be incrementing - what are now the "last" counters, so that their sum will - decrease monotonically down to zero. - - o "waitzero": RCU is waiting for the sum of the "last" counters - to decrease to zero. - - o "waitmb": RCU is waiting for each CPU to execute a memory - barrier, which ensures that instructions from a given CPU's - last RCU read-side critical section cannot be reordered - with instructions following the memory-barrier instruction. - -The output of "cat rcu/rcugp" looks as follows: - -oldggp=48870 newggp=48873 - -Note that reading from this file provokes a synchronize_rcu(). The -"oldggp" value is that of "ggp" from rcu/rcuctrs above, taken before -executing the synchronize_rcu(), and the "newggp" value is also the -"ggp" value, but taken after the synchronize_rcu() command returns. - - -The output of "cat rcu/rcugp" looks as follows: - -na=1337955 nl=40 wa=1337915 wl=44 da=1337871 dl=0 dr=1337871 di=1337871 -1=50989 e1=6138 i1=49722 ie1=82 g1=49640 a1=315203 ae1=265563 a2=49640 -z1=1401244 ze1=1351605 z2=49639 m1=5661253 me1=5611614 m2=49639 - -These are counters tracking internal preemptable-RCU events, however, -some of them may be useful for debugging algorithms using RCU. In -particular, the "nl", "wl", and "dl" values track the number of RCU -callbacks in various states. The fields are as follows: - -o "na" is the total number of RCU callbacks that have been enqueued - since boot. - -o "nl" is the number of RCU callbacks waiting for the previous - grace period to end so that they can start waiting on the next - grace period. - -o "wa" is the total number of RCU callbacks that have started waiting - for a grace period since boot. "na" should be roughly equal to - "nl" plus "wa". - -o "wl" is the number of RCU callbacks currently waiting for their - grace period to end. - -o "da" is the total number of RCU callbacks whose grace periods - have completed since boot. "wa" should be roughly equal to - "wl" plus "da". - -o "dr" is the total number of RCU callbacks that have been removed - from the list of callbacks ready to invoke. "dr" should be roughly - equal to "da". - -o "di" is the total number of RCU callbacks that have been invoked - since boot. "di" should be roughly equal to "da", though some - early versions of preemptable RCU had a bug so that only the - last CPU's count of invocations was displayed, rather than the - sum of all CPU's counts. - -o "1" is the number of calls to rcu_try_flip(). This should be - roughly equal to the sum of "e1", "i1", "a1", "z1", and "m1" - described below. In other words, the number of times that - the state machine is visited should be equal to the sum of the - number of times that each state is visited plus the number of - times that the state-machine lock acquisition failed. - -o "e1" is the number of times that rcu_try_flip() was unable to - acquire the fliplock. - -o "i1" is the number of calls to rcu_try_flip_idle(). - -o "ie1" is the number of times rcu_try_flip_idle() exited early - due to the calling CPU having no work for RCU. - -o "g1" is the number of times that rcu_try_flip_idle() decided - to start a new grace period. "i1" should be roughly equal to - "ie1" plus "g1". - -o "a1" is the number of calls to rcu_try_flip_waitack(). - -o "ae1" is the number of times that rcu_try_flip_waitack() found - that at least one CPU had not yet acknowledge the new grace period - (AKA "counter flip"). - -o "a2" is the number of time rcu_try_flip_waitack() found that - all CPUs had acknowledged. "a1" should be roughly equal to - "ae1" plus "a2". (This particular output was collected on - a 128-CPU machine, hence the smaller-than-usual fraction of - calls to rcu_try_flip_waitack() finding all CPUs having already - acknowledged.) - -o "z1" is the number of calls to rcu_try_flip_waitzero(). - -o "ze1" is the number of times that rcu_try_flip_waitzero() found - that not all of the old RCU read-side critical sections had - completed. - -o "z2" is the number of times that rcu_try_flip_waitzero() finds - the sum of the counters equal to zero, in other words, that - all of the old RCU read-side critical sections had completed. - The value of "z1" should be roughly equal to "ze1" plus - "z2". - -o "m1" is the number of calls to rcu_try_flip_waitmb(). - -o "me1" is the number of times that rcu_try_flip_waitmb() finds - that at least one CPU has not yet executed a memory barrier. - -o "m2" is the number of times that rcu_try_flip_waitmb() finds that - all CPUs have executed a memory barrier. +The rcutree implementation of RCU provides debugfs trace output that +summarizes counters and state. This information is useful for debugging +RCU itself, and can sometimes also help to debug abuses of RCU. +The following sections describe the debugfs files and formats. Hierarchical RCU debugfs Files and Formats @@ -210,9 +35,10 @@ rcu_bh: 6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1 dn=0 df=15 of=0 ri=0 ql=0 b=10 7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1 dn=0 df=15 of=0 ri=0 ql=0 b=10 -The first section lists the rcu_data structures for rcu, the second for -rcu_bh. Each section has one line per CPU, or eight for this 8-CPU system. -The fields are as follows: +The first section lists the rcu_data structures for rcu_sched, the second +for rcu_bh. Note that CONFIG_TREE_PREEMPT_RCU kernels will have an +additional section for rcu_preempt. Each section has one line per CPU, +or eight for this 8-CPU system. The fields are as follows: o The number at the beginning of each line is the CPU number. CPUs numbers followed by an exclamation mark are offline, @@ -223,9 +49,9 @@ o The number at the beginning of each line is the CPU number. o "c" is the count of grace periods that this CPU believes have completed. CPUs in dynticks idle mode may lag quite a ways - behind, for example, CPU 4 under "rcu" above, which has slept - through the past 25 RCU grace periods. It is not unusual to - see CPUs lagging by thousands of grace periods. + behind, for example, CPU 4 under "rcu_sched" above, which has + slept through the past 25 RCU grace periods. It is not unusual + to see CPUs lagging by thousands of grace periods. o "g" is the count of grace periods that this CPU believes have started. Again, CPUs in dynticks idle mode may lag behind. @@ -308,8 +134,10 @@ The output of "cat rcu/rcugp" looks as follows: rcu_sched: completed=33062 gpnum=33063 rcu_bh: completed=464 gpnum=464 -Again, this output is for both "rcu" and "rcu_bh". The fields are -taken from the rcu_state structure, and are as follows: +Again, this output is for both "rcu_sched" and "rcu_bh". Note that +kernels built with CONFIG_TREE_PREEMPT_RCU will have an additional +"rcu_preempt" line. The fields are taken from the rcu_state structure, +and are as follows: o "completed" is the number of grace periods that have completed. It is comparable to the "c" field from rcu/rcudata in that a @@ -324,23 +152,24 @@ o "gpnum" is the number of grace periods that have started. It is If these two fields are equal (as they are for "rcu_bh" above), then there is no grace period in progress, in other words, RCU is idle. On the other hand, if the two fields differ (as they - do for "rcu" above), then an RCU grace period is in progress. + do for "rcu_sched" above), then an RCU grace period is in progress. The output of "cat rcu/rcuhier" looks as follows, with very long lines: -c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6 -1/1 0:127 ^0 -3/3 0:35 ^0 0/0 36:71 ^1 0/0 72:107 ^2 0/0 108:127 ^3 -3/3f 0:5 ^0 2/3 6:11 ^1 0/0 12:17 ^2 0/0 18:23 ^3 0/0 24:29 ^4 0/0 30:35 ^5 0/0 36:41 ^0 0/0 42:47 ^1 0/0 48:53 ^2 0/0 54:59 ^3 0/0 60:65 ^4 0/0 66:71 ^5 0/0 72:77 ^0 0/0 78:83 ^1 0/0 84:89 ^2 0/0 90:95 ^3 0/0 96:101 ^4 0/0 102:107 ^5 0/0 108:113 ^0 0/0 114:119 ^1 0/0 120:125 ^2 0/0 126:127 ^3 +c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6 oqlen=0 +1/1 .>. 0:127 ^0 +3/3 .>. 0:35 ^0 0/0 .>. 36:71 ^1 0/0 .>. 72:107 ^2 0/0 .>. 108:127 ^3 +3/3f .>. 0:5 ^0 2/3 .>. 6:11 ^1 0/0 .>. 12:17 ^2 0/0 .>. 18:23 ^3 0/0 .>. 24:29 ^4 0/0 .>. 30:35 ^5 0/0 .>. 36:41 ^0 0/0 .>. 42:47 ^1 0/0 .>. 48:53 ^2 0/0 .>. 54:59 ^3 0/0 .>. 60:65 ^4 0/0 .>. 66:71 ^5 0/0 .>. 72:77 ^0 0/0 .>. 78:83 ^1 0/0 .>. 84:89 ^2 0/0 .>. 90:95 ^3 0/0 .>. 96:101 ^4 0/0 .>. 102:107 ^5 0/0 .>. 108:113 ^0 0/0 .>. 114:119 ^1 0/0 .>. 120:125 ^2 0/0 .>. 126:127 ^3 rcu_bh: -c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0 -0/1 0:127 ^0 -0/3 0:35 ^0 0/0 36:71 ^1 0/0 72:107 ^2 0/0 108:127 ^3 -0/3f 0:5 ^0 0/3 6:11 ^1 0/0 12:17 ^2 0/0 18:23 ^3 0/0 24:29 ^4 0/0 30:35 ^5 0/0 36:41 ^0 0/0 42:47 ^1 0/0 48:53 ^2 0/0 54:59 ^3 0/0 60:65 ^4 0/0 66:71 ^5 0/0 72:77 ^0 0/0 78:83 ^1 0/0 84:89 ^2 0/0 90:95 ^3 0/0 96:101 ^4 0/0 102:107 ^5 0/0 108:113 ^0 0/0 114:119 ^1 0/0 120:125 ^2 0/0 126:127 ^3 +c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0 oqlen=0 +0/1 .>. 0:127 ^0 +0/3 .>. 0:35 ^0 0/0 .>. 36:71 ^1 0/0 .>. 72:107 ^2 0/0 .>. 108:127 ^3 +0/3f .>. 0:5 ^0 0/3 .>. 6:11 ^1 0/0 .>. 12:17 ^2 0/0 .>. 18:23 ^3 0/0 .>. 24:29 ^4 0/0 .>. 30:35 ^5 0/0 .>. 36:41 ^0 0/0 .>. 42:47 ^1 0/0 .>. 48:53 ^2 0/0 .>. 54:59 ^3 0/0 .>. 60:65 ^4 0/0 .>. 66:71 ^5 0/0 .>. 72:77 ^0 0/0 .>. 78:83 ^1 0/0 .>. 84:89 ^2 0/0 .>. 90:95 ^3 0/0 .>. 96:101 ^4 0/0 .>. 102:107 ^5 0/0 .>. 108:113 ^0 0/0 .>. 114:119 ^1 0/0 .>. 120:125 ^2 0/0 .>. 126:127 ^3 -This is once again split into "rcu" and "rcu_bh" portions. The fields are -as follows: +This is once again split into "rcu_sched" and "rcu_bh" portions, +and CONFIG_TREE_PREEMPT_RCU kernels will again have an additional +"rcu_preempt" section. The fields are as follows: o "c" is exactly the same as "completed" under rcu/rcugp. @@ -372,6 +201,11 @@ o "fqlh" is the number of calls to force_quiescent_state() that exited immediately (without even being counted in nfqs above) due to contention on ->fqslock. +o "oqlen" is the number of callbacks on the "orphan" callback + list. RCU callbacks are placed on this list by CPUs going + offline, and are "adopted" either by the CPU helping the outgoing + CPU or by the next rcu_barrier*() call, whichever comes first. + o Each element of the form "1/1 0:127 ^0" represents one struct rcu_node. Each line represents one level of the hierarchy, from root to leaves. It is best to think of the rcu_data structures @@ -379,7 +213,7 @@ o Each element of the form "1/1 0:127 ^0" represents one struct might be either one, two, or three levels of rcu_node structures, depending on the relationship between CONFIG_RCU_FANOUT and CONFIG_NR_CPUS. - + o The numbers separated by the "/" are the qsmask followed by the qsmaskinit. The qsmask will have one bit set for each entity in the next lower level that @@ -389,10 +223,19 @@ o Each element of the form "1/1 0:127 ^0" represents one struct The value of qsmaskinit is assigned to that of qsmask at the beginning of each grace period. - For example, for "rcu", the qsmask of the first entry - of the lowest level is 0x14, meaning that we are still - waiting for CPUs 2 and 4 to check in for the current - grace period. + For example, for "rcu_sched", the qsmask of the first + entry of the lowest level is 0x14, meaning that we + are still waiting for CPUs 2 and 4 to check in for the + current grace period. + + o The characters separated by the ">" indicate the state + of the blocked-tasks lists. A "T" preceding the ">" + indicates that at least one task blocked in an RCU + read-side critical section blocks the current grace + period, while a "." preceding the ">" indicates otherwise. + The character following the ">" indicates similarly for + the next grace period. A "T" should appear in this + field only for rcu-preempt. o The numbers separated by the ":" are the range of CPUs served by this struct rcu_node. This can be helpful @@ -431,8 +274,9 @@ rcu_bh: 6 np=120834 qsp=9902 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921 7 np=144888 qsp=26336 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542 -As always, this is once again split into "rcu" and "rcu_bh" portions. -The fields are as follows: +As always, this is once again split into "rcu_sched" and "rcu_bh" +portions, with CONFIG_TREE_PREEMPT_RCU kernels having an additional +"rcu_preempt" section. The fields are as follows: o "np" is the number of times that __rcu_pending() has been invoked for the corresponding flavor of RCU. diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index e41a7fe..d542ca2 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -830,7 +830,7 @@ sched: Critical sections Grace period Barrier SRCU: Critical sections Grace period Barrier srcu_read_lock synchronize_srcu N/A - srcu_read_unlock + srcu_read_unlock synchronize_srcu_expedited SRCU: Initialization/cleanup init_srcu_struct diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h index 6d527ee..d5b3876 100644 --- a/include/linux/hardirq.h +++ b/include/linux/hardirq.h @@ -139,10 +139,34 @@ static inline void account_system_vtime(struct task_struct *tsk) #endif #if defined(CONFIG_NO_HZ) +#if defined(CONFIG_TINY_RCU) +extern void rcu_enter_nohz(void); +extern void rcu_exit_nohz(void); + +static inline void rcu_irq_enter(void) +{ + rcu_exit_nohz(); +} + +static inline void rcu_irq_exit(void) +{ + rcu_enter_nohz(); +} + +static inline void rcu_nmi_enter(void) +{ +} + +static inline void rcu_nmi_exit(void) +{ +} + +#else extern void rcu_irq_enter(void); extern void rcu_irq_exit(void); extern void rcu_nmi_enter(void); extern void rcu_nmi_exit(void); +#endif #else # define rcu_irq_enter() do { } while (0) # define rcu_irq_exit() do { } while (0) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 3ebd0b7..2f1bc42 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -68,11 +68,17 @@ extern int sched_expedited_torture_stats(char *page); /* Internal to kernel */ extern void rcu_init(void); extern void rcu_scheduler_starting(void); +#ifndef CONFIG_TINY_RCU extern int rcu_needs_cpu(int cpu); +#else +static inline int rcu_needs_cpu(int cpu) { return 0; } +#endif extern int rcu_scheduler_active; #if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU) #include <linux/rcutree.h> +#elif defined(CONFIG_TINY_RCU) +#include <linux/rcutiny.h> #else #error "Unknown RCU implementation specified to kernel configuration" #endif diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h new file mode 100644 index 0000000..2c1fe83 --- /dev/null +++ b/include/linux/rcutiny.h @@ -0,0 +1,95 @@ +/* + * Read-Copy Update mechanism for mutual exclusion, the Bloatwatch edition. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright IBM Corporation, 2008 + * + * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> + * + * For detailed explanation of Read-Copy Update mechanism see - + * Documentation/RCU + */ +#ifndef __LINUX_TINY_H +#define __LINUX_TINY_H + +#include <linux/cache.h> + +void rcu_sched_qs(int cpu); +void rcu_bh_qs(int cpu); + +#define __rcu_read_lock() preempt_disable() +#define __rcu_read_unlock() preempt_enable() +#define __rcu_read_lock_bh() local_bh_disable() +#define __rcu_read_unlock_bh() local_bh_enable() +#define call_rcu_sched call_rcu + +#define rcu_init_sched() do { } while (0) +extern void rcu_check_callbacks(int cpu, int user); +extern void __rcu_init(void); + +/* + * Return the number of grace periods. + */ +static inline long rcu_batches_completed(void) +{ + return 0; +} + +/* + * Return the number of bottom-half grace periods. + */ +static inline long rcu_batches_completed_bh(void) +{ + return 0; +} + +extern int rcu_expedited_torture_stats(char *page); + +static inline void synchronize_rcu_expedited(void) +{ + synchronize_sched(); +} + +static inline void synchronize_rcu_bh_expedited(void) +{ + synchronize_sched(); +} + +struct notifier_block; +extern int rcu_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu); + +#ifdef CONFIG_NO_HZ + +extern void rcu_enter_nohz(void); +extern void rcu_exit_nohz(void); + +#else /* #ifdef CONFIG_NO_HZ */ + +static inline void rcu_enter_nohz(void) +{ +} + +static inline void rcu_exit_nohz(void) +{ +} + +#endif /* #else #ifdef CONFIG_NO_HZ */ + +static inline void exit_rcu(void) +{ +} + +#endif /* __LINUX_RCUTINY_H */ diff --git a/include/linux/srcu.h b/include/linux/srcu.h index aca0eee..4765d97 100644 --- a/include/linux/srcu.h +++ b/include/linux/srcu.h @@ -48,6 +48,7 @@ void cleanup_srcu_struct(struct srcu_struct *sp); int srcu_read_lock(struct srcu_struct *sp) __acquires(sp); void srcu_read_unlock(struct srcu_struct *sp, int idx) __releases(sp); void synchronize_srcu(struct srcu_struct *sp); +void synchronize_srcu_expedited(struct srcu_struct *sp); long srcu_batches_completed(struct srcu_struct *sp); #endif diff --git a/init/Kconfig b/init/Kconfig index 09c5c64..d367bf6 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -334,6 +334,15 @@ config TREE_PREEMPT_RCU is also required. It also scales down nicely to smaller systems. +config TINY_RCU + bool "UP-only small-memory-footprint RCU" + depends on !SMP + help + This option selects the RCU implementation that is + designed for UP systems from which real-time response + is not required. This option greatly reduces the + memory footprint of RCU. + endchoice config RCU_TRACE diff --git a/kernel/Makefile b/kernel/Makefile index b8d4cd8..e9a0e6e 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -82,6 +82,7 @@ obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o obj-$(CONFIG_TREE_RCU) += rcutree.o obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o obj-$(CONFIG_TREE_RCU_TRACE) += rcutree_trace.o +obj-$(CONFIG_TINY_RCU) += rcutiny.o obj-$(CONFIG_RELAY) += relay.o obj-$(CONFIG_SYSCTL) += utsname_sysctl.o obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c index 4001833..7625f20 100644 --- a/kernel/rcupdate.c +++ b/kernel/rcupdate.c @@ -67,6 +67,8 @@ void wakeme_after_rcu(struct rcu_head *head) complete(&rcu->completion); } +#ifndef CONFIG_TINY_RCU + #ifdef CONFIG_TREE_PREEMPT_RCU /** @@ -157,6 +159,8 @@ void synchronize_rcu_bh(void) } EXPORT_SYMBOL_GPL(synchronize_rcu_bh); +#endif /* #ifndef CONFIG_TINY_RCU */ + static int __cpuinit rcu_barrier_cpu_hotplug(struct notifier_block *self, unsigned long action, void *hcpu) { diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c new file mode 100644 index 0000000..b33ec3aa --- /dev/null +++ b/kernel/rcutiny.c @@ -0,0 +1,291 @@ +/* + * Read-Copy Update mechanism for mutual exclusion, the Bloatwatch edition. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright IBM Corporation, 2008 + * + * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> + * + * For detailed explanation of Read-Copy Update mechanism see - + * Documentation/RCU + */ +#include <linux/moduleparam.h> +#include <linux/completion.h> +#include <linux/interrupt.h> +#include <linux/notifier.h> +#include <linux/rcupdate.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/mutex.h> +#include <linux/sched.h> +#include <linux/types.h> +#include <linux/init.h> +#include <linux/time.h> +#include <linux/cpu.h> + +/* Global control variables for rcupdate callback mechanism. */ +struct rcu_ctrlblk { + struct rcu_head *rcucblist; /* List of pending callbacks (CBs). */ + struct rcu_head **donetail; /* ->next pointer of last "done" CB. */ + struct rcu_head **curtail; /* ->next pointer of last CB. */ +}; + +/* Definition for rcupdate control block. */ +static struct rcu_ctrlblk rcu_ctrlblk = { + .donetail = &rcu_ctrlblk.rcucblist, + .curtail = &rcu_ctrlblk.rcucblist, +}; + +static struct rcu_ctrlblk rcu_bh_ctrlblk = { + .donetail = &rcu_bh_ctrlblk.rcucblist, + .curtail = &rcu_bh_ctrlblk.rcucblist, +}; + +#ifdef CONFIG_NO_HZ + +static long rcu_dynticks_nesting = 1; + +/* + * Enter dynticks-idle mode, which is an extended quiescent state + * if we have fully entered that mode (i.e., if the new value of + * dynticks_nesting is zero). + */ +void rcu_enter_nohz(void) +{ + if (--rcu_dynticks_nesting == 0) + rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */ +} + +/* + * Exit dynticks-idle mode, so that we are no longer in an extended + * quiescent state. + */ +void rcu_exit_nohz(void) +{ + rcu_dynticks_nesting++; +} + +#endif /* #ifdef CONFIG_NO_HZ */ + +/* + * Helper function for rcu_qsctr_inc() and rcu_bh_qsctr_inc(). + * Also disable irqs to avoid confusion due to interrupt handlers + * invoking call_rcu(). + */ +static int rcu_qsctr_help(struct rcu_ctrlblk *rcp) +{ + unsigned long flags; + + local_irq_save(flags); + if (rcp->rcucblist != NULL && + rcp->donetail != rcp->curtail) { + rcp->donetail = rcp->curtail; + local_irq_restore(flags); + return 1; + } + local_irq_restore(flags); + + return 0; +} + +/* + * Record an rcu quiescent state. And an rcu_bh quiescent state while we + * are at it, given that any rcu quiescent state is also an rcu_bh + * quiescent state. Use "+" instead of "||" to defeat short circuiting. + */ +void rcu_sched_qs(int cpu) +{ + if (rcu_qsctr_help(&rcu_ctrlblk) + rcu_qsctr_help(&rcu_bh_ctrlblk)) + raise_softirq(RCU_SOFTIRQ); +} + +/* + * Record an rcu_bh quiescent state. + */ +void rcu_bh_qs(int cpu) +{ + if (rcu_qsctr_help(&rcu_bh_ctrlblk)) + raise_softirq(RCU_SOFTIRQ); +} + +/* + * Check to see if the scheduling-clock interrupt came from an extended + * quiescent state, and, if so, tell RCU about it. + */ +void rcu_check_callbacks(int cpu, int user) +{ + if (user || + (idle_cpu(cpu) && + !in_softirq() && + hardirq_count() <= (1 << HARDIRQ_SHIFT))) + rcu_sched_qs(cpu); + else if (!in_softirq()) + rcu_bh_qs(cpu); +} + +/* + * Helper function for rcu_process_callbacks() that operates on the + * specified rcu_ctrlkblk structure. + */ +static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp) +{ + struct rcu_head *next, *list; + unsigned long flags; + + /* If no RCU callbacks ready to invoke, just return. */ + if (&rcp->rcucblist == rcp->donetail) + return; + + /* Move the ready-to-invoke callbacks to a local list. */ + local_irq_save(flags); + list = rcp->rcucblist; + rcp->rcucblist = *rcp->donetail; + *rcp->donetail = NULL; + if (rcp->curtail == rcp->donetail) + rcp->curtail = &rcp->rcucblist; + rcp->donetail = &rcp->rcucblist; + local_irq_restore(flags); + + /* Invoke the callbacks on the local list. */ + while (list) { + next = list->next; + prefetch(next); + list->func(list); + list = next; + } +} + +/* + * Invoke any callbacks whose grace period has completed. + */ +static void rcu_process_callbacks(struct softirq_action *unused) +{ + __rcu_process_callbacks(&rcu_ctrlblk); + __rcu_process_callbacks(&rcu_bh_ctrlblk); +} + +/* + * Null function to handle CPU being onlined. Longer term, we want to + * make TINY_RCU avoid using rcupdate.c, but later... + */ +int rcu_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu) +{ + return NOTIFY_OK; +} + +/* + * Wait for a grace period to elapse. But it is illegal to invoke + * synchronize_sched() from within an RCU read-side critical section. + * Therefore, any legal call to synchronize_sched() is a quiescent + * state, and so on a UP system, synchronize_sched() need do nothing. + * Ditto for synchronize_rcu_bh(). (But Lai Jiangshan points out the + * benefits of doing might_sleep() to reduce latency.) + * + * Cool, huh? (Due to Josh Triplett.) + * + * But we want to make this a static inline later. + */ +void synchronize_sched(void) +{ + cond_resched(); +} +EXPORT_SYMBOL_GPL(synchronize_sched); + +void synchronize_rcu_bh(void) +{ + synchronize_sched(); +} +EXPORT_SYMBOL_GPL(synchronize_rcu_bh); + +/* + * Helper function for call_rcu() and call_rcu_bh(). + */ +static void __call_rcu(struct rcu_head *head, + void (*func)(struct rcu_head *rcu), + struct rcu_ctrlblk *rcp) +{ + unsigned long flags; + + head->func = func; + head->next = NULL; + + local_irq_save(flags); + *rcp->curtail = head; + rcp->curtail = &head->next; + local_irq_restore(flags); +} + +/* + * Post an RCU callback to be invoked after the end of an RCU grace + * period. But since we have but one CPU, that would be after any + * quiescent state. + */ +void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu)) +{ + __call_rcu(head, func, &rcu_ctrlblk); +} +EXPORT_SYMBOL_GPL(call_rcu); + +/* + * Post an RCU bottom-half callback to be invoked after any subsequent + * quiescent state. + */ +void call_rcu_bh(struct rcu_head *head, void (*func)(struct rcu_head *rcu)) +{ + __call_rcu(head, func, &rcu_bh_ctrlblk); +} +EXPORT_SYMBOL_GPL(call_rcu_bh); + +void rcu_barrier(void) +{ + struct rcu_synchronize rcu; + + init_completion(&rcu.completion); + /* Will wake me after RCU finished. */ + call_rcu(&rcu.head, wakeme_after_rcu); + /* Wait for it. */ + wait_for_completion(&rcu.completion); +} +EXPORT_SYMBOL_GPL(rcu_barrier); + +void rcu_barrier_bh(void) +{ + struct rcu_synchronize rcu; + + init_completion(&rcu.completion); + /* Will wake me after RCU finished. */ + call_rcu_bh(&rcu.head, wakeme_after_rcu); + /* Wait for it. */ + wait_for_completion(&rcu.completion); +} +EXPORT_SYMBOL_GPL(rcu_barrier_bh); + +void rcu_barrier_sched(void) +{ + struct rcu_synchronize rcu; + + init_completion(&rcu.completion); + /* Will wake me after RCU finished. */ + call_rcu_sched(&rcu.head, wakeme_after_rcu); + /* Wait for it. */ + wait_for_completion(&rcu.completion); +} +EXPORT_SYMBOL_GPL(rcu_barrier_sched); + +void __rcu_init(void) +{ + open_softirq(RCU_SOFTIRQ, rcu_process_callbacks); +} diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c index 697c0a0..3dd0ca2 100644 --- a/kernel/rcutorture.c +++ b/kernel/rcutorture.c @@ -547,6 +547,25 @@ static struct rcu_torture_ops srcu_ops = { .name = "srcu" }; +static void srcu_torture_synchronize_expedited(void) +{ + synchronize_srcu_expedited(&srcu_ctl); +} + +static struct rcu_torture_ops srcu_expedited_ops = { + .init = srcu_torture_init, + .cleanup = srcu_torture_cleanup, + .readlock = srcu_torture_read_lock, + .read_delay = srcu_read_delay, + .readunlock = srcu_torture_read_unlock, + .completed = srcu_torture_completed, + .deferred_free = rcu_sync_torture_deferred_free, + .sync = srcu_torture_synchronize_expedited, + .cb_barrier = NULL, + .stats = srcu_torture_stats, + .name = "srcu_expedited" +}; + /* * Definitions for sched torture testing. */ @@ -592,7 +611,7 @@ static struct rcu_torture_ops sched_ops = { .name = "sched" }; -static struct rcu_torture_ops sched_ops_sync = { +static struct rcu_torture_ops sched_sync_ops = { .init = rcu_sync_torture_init, .cleanup = NULL, .readlock = sched_torture_read_lock, @@ -1098,8 +1117,8 @@ rcu_torture_init(void) int firsterr = 0; static struct rcu_torture_ops *torture_ops[] = { &rcu_ops, &rcu_sync_ops, &rcu_bh_ops, &rcu_bh_sync_ops, - &sched_expedited_ops, - &srcu_ops, &sched_ops, &sched_ops_sync, }; + &srcu_ops, &srcu_expedited_ops, + &sched_ops, &sched_sync_ops, &sched_expedited_ops, }; mutex_lock(&fullstop_mutex); @@ -1110,8 +1129,12 @@ rcu_torture_init(void) break; } if (i == ARRAY_SIZE(torture_ops)) { - printk(KERN_ALERT "rcutorture: invalid torture type: \"%s\"\n", + printk(KERN_ALERT "rcu-torture: invalid torture type: \"%s\"\n", torture_type); + printk(KERN_ALERT "rcu-torture types:"); + for (i = 0; i < ARRAY_SIZE(torture_ops); i++) + printk(KERN_ALERT " %s", torture_ops[i]->name); + printk(KERN_ALERT "\n"); mutex_unlock(&fullstop_mutex); return -EINVAL; } diff --git a/kernel/rcutree.c b/kernel/rcutree.c index f3077c0..69f4efe 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -51,6 +51,8 @@ /* Data structures. */ +static struct lock_class_key rcu_root_class; + #define RCU_STATE_INITIALIZER(name) { \ .level = { &name.node[0] }, \ .levelcnt = { \ @@ -1685,8 +1687,7 @@ static void __init rcu_init_one(struct rcu_state *rsp) cpustride *= rsp->levelspread[i]; rnp = rsp->level[i]; for (j = 0; j < rsp->levelcnt[i]; j++, rnp++) { - if (rnp != rcu_get_root(rsp)) - spin_lock_init(&rnp->lock); + spin_lock_init(&rnp->lock); rnp->gpnum = 0; rnp->qsmask = 0; rnp->qsmaskinit = 0; @@ -1709,7 +1710,7 @@ static void __init rcu_init_one(struct rcu_state *rsp) INIT_LIST_HEAD(&rnp->blocked_tasks[1]); } } - spin_lock_init(&rcu_get_root(rsp)->lock); + lockdep_set_class(&rcu_get_root(rsp)->lock, &rcu_root_class); } /* diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c index 4b31c77..1984cdc 100644 --- a/kernel/rcutree_trace.c +++ b/kernel/rcutree_trace.c @@ -155,12 +155,14 @@ static const struct file_operations rcudata_csv_fops = { static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp) { + long gpnum; int level = 0; struct rcu_node *rnp; + gpnum = rsp->gpnum; seq_printf(m, "c=%ld g=%ld s=%d jfq=%ld j=%x " "nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu oqlen=%ld\n", - rsp->completed, rsp->gpnum, rsp->signaled, + rsp->completed, gpnum, rsp->signaled, (long)(rsp->jiffies_force_qs - jiffies), (int)(jiffies & 0xffff), rsp->n_force_qs, rsp->n_force_qs_ngp, @@ -171,8 +173,10 @@ static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp) seq_puts(m, "\n"); level = rnp->level; } - seq_printf(m, "%lx/%lx %d:%d ^%d ", + seq_printf(m, "%lx/%lx %c>%c %d:%d ^%d ", rnp->qsmask, rnp->qsmaskinit, + "T."[list_empty(&rnp->blocked_tasks[gpnum & 1])], + "T."[list_empty(&rnp->blocked_tasks[!(gpnum & 1)])], rnp->grplo, rnp->grphi, rnp->grpnum); } seq_puts(m, "\n"); diff --git a/kernel/softirq.c b/kernel/softirq.c index f8749e5..21939d9 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -302,9 +302,9 @@ void irq_exit(void) if (!in_interrupt() && local_softirq_pending()) invoke_softirq(); + rcu_irq_exit(); #ifdef CONFIG_NO_HZ /* Make sure that timer wheel updates are propagated */ - rcu_irq_exit(); if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched()) tick_nohz_stop_sched_tick(0); #endif diff --git a/kernel/srcu.c b/kernel/srcu.c index b0aeeaf..818d7d9 100644 --- a/kernel/srcu.c +++ b/kernel/srcu.c @@ -49,6 +49,7 @@ int init_srcu_struct(struct srcu_struct *sp) sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array); return (sp->per_cpu_ref ? 0 : -ENOMEM); } +EXPORT_SYMBOL_GPL(init_srcu_struct); /* * srcu_readers_active_idx -- returns approximate number of readers @@ -97,6 +98,7 @@ void cleanup_srcu_struct(struct srcu_struct *sp) free_percpu(sp->per_cpu_ref); sp->per_cpu_ref = NULL; } +EXPORT_SYMBOL_GPL(cleanup_srcu_struct); /** * srcu_read_lock - register a new reader for an SRCU-protected structure. @@ -118,6 +120,7 @@ int srcu_read_lock(struct srcu_struct *sp) preempt_enable(); return idx; } +EXPORT_SYMBOL_GPL(srcu_read_lock); /** * srcu_read_unlock - unregister a old reader from an SRCU-protected structure. @@ -136,22 +139,12 @@ void srcu_read_unlock(struct srcu_struct *sp, int idx) per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]--; preempt_enable(); } +EXPORT_SYMBOL_GPL(srcu_read_unlock); -/** - * synchronize_srcu - wait for prior SRCU read-side critical-section completion - * @sp: srcu_struct with which to synchronize. - * - * Flip the completed counter, and wait for the old count to drain to zero. - * As with classic RCU, the updater must use some separate means of - * synchronizing concurrent updates. Can block; must be called from - * process context. - * - * Note that it is illegal to call synchornize_srcu() from the corresponding - * SRCU read-side critical section; doing so will result in deadlock. - * However, it is perfectly legal to call synchronize_srcu() on one - * srcu_struct from some other srcu_struct's read-side critical section. +/* + * Helper function for synchronize_srcu() and synchronize_srcu_expedited(). */ -void synchronize_srcu(struct srcu_struct *sp) +void __synchronize_srcu(struct srcu_struct *sp, void (*sync_func)(void)) { int idx; @@ -173,7 +166,7 @@ void synchronize_srcu(struct srcu_struct *sp) return; } - synchronize_sched(); /* Force memory barrier on all CPUs. */ + sync_func(); /* Force memory barrier on all CPUs. */ /* * The preceding synchronize_sched() ensures that any CPU that @@ -190,7 +183,7 @@ void synchronize_srcu(struct srcu_struct *sp) idx = sp->completed & 0x1; sp->completed++; - synchronize_sched(); /* Force memory barrier on all CPUs. */ + sync_func(); /* Force memory barrier on all CPUs. */ /* * At this point, because of the preceding synchronize_sched(), @@ -203,7 +196,7 @@ void synchronize_srcu(struct srcu_struct *sp) while (srcu_readers_active_idx(sp, idx)) schedule_timeout_interruptible(1); - synchronize_sched(); /* Force memory barrier on all CPUs. */ + sync_func(); /* Force memory barrier on all CPUs. */ /* * The preceding synchronize_sched() forces all srcu_read_unlock() @@ -237,6 +230,47 @@ void synchronize_srcu(struct srcu_struct *sp) } /** + * synchronize_srcu - wait for prior SRCU read-side critical-section completion + * @sp: srcu_struct with which to synchronize. + * + * Flip the completed counter, and wait for the old count to drain to zero. + * As with classic RCU, the updater must use some separate means of + * synchronizing concurrent updates. Can block; must be called from + * process context. + * + * Note that it is illegal to call synchronize_srcu() from the corresponding + * SRCU read-side critical section; doing so will result in deadlock. + * However, it is perfectly legal to call synchronize_srcu() on one + * srcu_struct from some other srcu_struct's read-side critical section. + */ +void synchronize_srcu(struct srcu_struct *sp) +{ + __synchronize_srcu(sp, synchronize_sched); +} +EXPORT_SYMBOL_GPL(synchronize_srcu); + +/** + * synchronize_srcu_expedited - like synchronize_srcu, but less patient + * @sp: srcu_struct with which to synchronize. + * + * Flip the completed counter, and wait for the old count to drain to zero. + * As with classic RCU, the updater must use some separate means of + * synchronizing concurrent updates. Can block; must be called from + * process context. + * + * Note that it is illegal to call synchronize_srcu_expedited() + * from the corresponding SRCU read-side critical section; doing so + * will result in deadlock. However, it is perfectly legal to call + * synchronize_srcu_expedited() on one srcu_struct from some other + * srcu_struct's read-side critical section. + */ +void synchronize_srcu_expedited(struct srcu_struct *sp) +{ + __synchronize_srcu(sp, synchronize_sched_expedited); +} +EXPORT_SYMBOL_GPL(synchronize_srcu_expedited); + +/** * srcu_batches_completed - return batches completed. * @sp: srcu_struct on which to report batch completion. * @@ -248,10 +282,4 @@ long srcu_batches_completed(struct srcu_struct *sp) { return sp->completed; } - -EXPORT_SYMBOL_GPL(init_srcu_struct); -EXPORT_SYMBOL_GPL(cleanup_srcu_struct); -EXPORT_SYMBOL_GPL(srcu_read_lock); -EXPORT_SYMBOL_GPL(srcu_read_unlock); -EXPORT_SYMBOL_GPL(synchronize_srcu); EXPORT_SYMBOL_GPL(srcu_batches_completed); |