Correct nr_processes() when CPUs have been unplugged

nr_processes() returns the sum of the per cpu counter process_counts for all online CPUs. This counter is incremented for the current CPU on fork() and decremented for the current CPU on exit(). Since a process does not necessarily fork and exit on the same CPU the process_count for an individual CPU can be either positive or negative and effectively has no meaning in isolation. Therefore calculating the sum of process_counts over only the online CPUs omits the processes which were started or stopped on any CPU which has since been unplugged. Only the sum of process_counts across all possible CPUs has meaning. The only caller of nr_processes() is proc_root_getattr() which calculates the number of links to /proc as stat->nlink = proc_root.nlink + nr_processes(); You don't have to be all that unlucky for the nr_processes() to return a negative value leading to a negative number of links (or rather, an apparently enormous number of links). If this happens then you can get failures where things like "ls /proc" start to fail because they got an -EOVERFLOW from some stat() call. Example with some debugging inserted to show what goes on: # ps haux|wc -l nr_processes: CPU0: 90 nr_processes: CPU1: 1030 nr_processes: CPU2: -900 nr_processes: CPU3: -136 nr_processes: TOTAL: 84 proc_root_getattr. nlink 12 + nr_processes() 84 = 96 84 # echo 0 >/sys/devices/system/cpu/cpu1/online # ps haux|wc -l nr_processes: CPU0: 85 nr_processes: CPU2: -901 nr_processes: CPU3: -137 nr_processes: TOTAL: -953 proc_root_getattr. nlink 12 + nr_processes() -953 = -941 75 # stat /proc/ nr_processes: CPU0: 84 nr_processes: CPU2: -901 nr_processes: CPU3: -137 nr_processes: TOTAL: -954 proc_root_getattr. nlink 12 + nr_processes() -954 = -942 File: `/proc/' Size: 0 Blocks: 0 IO Block: 1024 directory Device: 3h/3d Inode: 1 Links: 4294966354 Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2009-11-03 09:06:55.000000000 +0000 Modify: 2009-11-03 09:06:55.000000000 +0000 Change: 2009-11-03 09:06:55.000000000 +0000 I'm not 100% convinced that the per_cpu regions remain valid for offline CPUs, although my testing suggests that they do. If not then I think the correct solution would be to aggregate the process_count for a given CPU into a global base value in cpu_down(). This bug appears to pre-date the transition to git and it looks like it may even have been present in linux-2.6.0-test7-bk3 since it looks like the code Rusty patched in http://lwn.net/Articles/64773/ was already wrong. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Ian Campbell <Ian.Campbell@citrix.com> 2009-11-03 10:11:14 +0000
committer: Linus Torvalds <torvalds@linux-foundation.org> 2009-11-03 07:52:39 -0800
commit: 1d510750941a53a1d3049c1d33c75d6dfcd78618 (patch)
tree: 03f501224f839aa897cf540e29595aeda8551052 /kernel
parent: 1c211849d893b14cc923a18708923954fdd2c63e (diff)
download: op-kernel-dev-1d510750941a53a1d3049c1d33c75d6dfcd78618.zip
op-kernel-dev-1d510750941a53a1d3049c1d33c75d6dfcd78618.tar.gz
1 files changed, 1 insertions, 1 deletions
diff --git a/kernel/fork.c b/kernel/fork.c
index 4c20fff..166b8c4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -91,7 +91,7 @@ int nr_processes(void)
 	int cpu;
 	int total = 0;
 
-	for_each_online_cpu(cpu)
+	for_each_possible_cpu(cpu)
 		total += per_cpu(process_counts, cpu);
 
 	return total;
author	Ian Campbell <Ian.Campbell@citrix.com>	2009-11-03 10:11:14 +0000
committer	Linus Torvalds <torvalds@linux-foundation.org>	2009-11-03 07:52:39 -0800
commit	1d510750941a53a1d3049c1d33c75d6dfcd78618 (patch)
tree	03f501224f839aa897cf540e29595aeda8551052 /kernel
parent	1c211849d893b14cc923a18708923954fdd2c63e (diff)
download	op-kernel-dev-1d510750941a53a1d3049c1d33c75d6dfcd78618.zip op-kernel-dev-1d510750941a53a1d3049c1d33c75d6dfcd78618.tar.gz