op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	[PATCH] zone_reclaim: configurable off node allocation period.	Christoph Lameter	2006-02-01	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the zone_reclaim code has a fixed window of 30 seconds of off node allocations should a local zone have no unused pagecache pages left. Reclaim will be attempted again after this timeout period to avoid repeated useless scans for memory. This is also useful to established sufficiently large off node allocation chunks to relieve the local node. It may be beneficial to adjust that time period for some special situations. For example if memory use was exceeding node capacity one may want to give up for longer periods of time. If memory spikes intermittendly then one may want to shorten the time period to reduce the number of off node allocations. This patch allows just that.... Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] zone_reclaim: minor fixes	Christoph Lameter	2006-02-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- If we only reclaim nr_pages then its okay to stay on node. Switch from > to >= for the comparison. - vm_table[] entry for zone_reclaim_mode is a bit screwed up. - Add empty lines around shrink_zone to show that this is the central function to be called. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] swsusp: do not change log level during suspend/resume	Rafael J. Wysocki	2006-02-01	2	-11/+6
\| \| \| \| \| \| \| \| \| \| \|	Prevent the kernel from setting the log level to 10 unconditionally during suspend/resume which was needed in the past for debugging, but generally is undesirable. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] sys_sched_getaffinity() & hotplug	Jack Steiner	2006-02-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change sched_getaffinity() so that it returns a bitmap that indicates the legally schedulable cpus that a task is allowed to run on. Without this patch, if CONFIG_HOTPLUG_CPU is enabled, sched_getaffinity() unconditionally returns (at least on IA64) a mask with NR_CPUS bits set. This conveys no useful infornmation except for a kernel compile option. This fixes a breakage we obseved running recent kernels. We have MPI jobs that use sched_getaffinity() to determine where to place their threads. Placing them on non-existant cpus is problematic :-) Signed-off-by: Jack Steiner <steiner@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Nathan Lynch <ntl@pobox.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] kernel/posix-timers.c: remove do_posix_clock_notimer_create()	Adrian Bunk	2006-02-01	1	-6/+0
\| \| \| \| \| \| \| \| \|	This function is neither used nor has any real contents. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] hrtimers: set correct initial expiry time for relative SIGEV_NONE timers	Thomas Gleixner	2006-02-01	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \|	The expiry time for relative timers with SIGEV_NONE set was never updated to the correct value. Pointed out by George Anzinger. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] hrtimers: add back lost credit lines	Thomas Gleixner	2006-02-01	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	At some point we added credits to people who actively helped to bring k/hr-timers along. This was lost in the big code revamp. Add it back. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] hrtimers: cleanups and simplifications	George Anzinger	2006-02-01	3	-64/+34
\| \| \| \| \| \| \| \| \| \| \| \|	Clean up the interface to hrtimers by changing the init code to pass the mode as well as the clock. This allow the init code to select the correct base and eliminates extra timer re-init code in posix-timers. We also simplify the restart interface nanosleep use. Signed-off-by: George Anzinger <george@mvista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] hrtimers: fix posix-timer requeue race	akpm@osdl.org	2006-02-01	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From: Steven Rostedtrostedt@goodmis.org <rostedt@goodmis.org> CPU0 expires a posix-timer and runs the callback function. The signal is queued. After releasing the posix-timer lock and before returning to hrtimer_run_queue CPU0 gets interrupted. CPU1 delivers the queued signal and rearms the timer. CPU0 comes back to hrtimer_run_queue and sets the timer state to expired. The next modification of the timer can result in an oops, because the state information is wrong. Keep track of state = RUNNING and check if the state has been in the return path of hrtimer_run_queue. In case the state has been changed, ignore a restart request and do not touch the state variable. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] hrtimers: fix oldvalue return in setitimer	Thomas Gleixner	2006-02-01	1	-5/+5
\| \| \| \| \| \| \| \| \|	This resolves bugzilla bug#5617. The oldvalue of the timer was read after the timer was cancelled, so the remaining time was always zero. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] hrtimers: fix possible use of NULL pointer in posix-timers	Thomas Gleixner	2006-02-01	1	-1/+2
\| \| \| \| \| \| \| \|	Fixup the conversion of posix-timers to hrtimers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] hrtimers: fixup itimer conversion	Thomas Gleixner	2006-02-01	1	-1/+10
\| \| \| \| \| \| \| \| \| \|	The itimer conversion removed the locking which protects the timer and variables in the shared signal structure. Steven Rostedt found the problem in the latest -rt patches. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] swsusp: use bytes as image size units	Rafael J. Wysocki	2006-02-01	3	-9/+9
\| \| \| \| \| \| \| \| \| \|	Make swsusp use bytes as the image size units, which is needed for future compatibility. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] "Fix uidhash_lock <-> RXU deadlock" fix	Andrew Morton	2006-01-31	1	-10/+17
\| \| \| \| \| \| \| \| \| \|	I get storms of warnings from local_bh_enable(). Better-tested patches, please. Cc: Ingo Molnar <mingo@elte.hu> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] rcu_torture_lock deadlock fix	Ingo Molnar	2006-01-31	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	rcu_torture_lock is used in a softirq-unsafe manner, but it is also taken by rcu_torture_cb(), which may execute in softirq-context, resulting in potential deadlocks. The fix is to acquire rcu_torture_lock in a softirq-safe manner. With this fix applied, the rcu-torture code passes validation. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] fix uidhash_lock <-> RCU deadlock	Ingo Molnar	2006-01-31	1	-8/+17
\| \| \| \| \| \| \| \| \| \| \|	RCU task-struct freeing can call free_uid(), which is taking uidhash_lock - while other users of uidhash_lock are softirq-unsafe. The fix is to always take the uidhash_spinlock in a softirq-safe manner. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] Fix boot-time slowdown for measure_migration_cost	Ingo Molnar	2006-01-31	1	-3/+3
\| \| \| \| \| \| \|	This reduces the amount of time the migration cost calculations cost during bootup. Based on numbers by Tony Luck <tony.luck@intel.com>. Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	Don't try to "validate" a non-existing timeval.	Linus Torvalds	2006-01-31	1	-1/+1
\| \| \| \| \| \| \| \|	settime() with a NULL timeval is silly but legal. Noticed by Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] EDAC: atomic scrub operations	Alan Cox	2006-01-18	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EDAC requires a way to scrub memory if an ECC error is found and the chipset does not do the work automatically. That means rewriting memory locations atomically with respect to all CPUs _and_ bus masters. That means we can't use atomic_add(foo, 0) as it gets optimised for non-SMP This adds a function to include/asm-foo/atomic.h for the platforms currently supported which implements a scrub of a mapped block. It also adjusts a few other files include order where atomic.h is included before types.h as this now causes an error as atomic_scrub uses u32. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] Generic sys_rt_sigsuspend()	David Woodhouse	2006-01-18	2	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \|	The TIF_RESTORE_SIGMASK flag allows us to have a generic implementation of sys_rt_sigsuspend() instead of duplicating it for each architecture. This provides such an implementation and makes arch/powerpc use it. It also tidies up the ppc32 sys_sigsuspend() to use TIF_RESTORE_SIGMASK. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] fix sched_setscheduler semantics	Jason Baron	2006-01-18	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, a negative policy argument passed into the 'sys_sched_setscheduler()' system call, will return with success. However, the manpage for 'sys_sched_setscheduler' says: EINVAL The scheduling policy is not one of the recognized policies, or the parameter p does not make sense for the policy. Signed-off-by: Jason Baron <jbaron@redhat.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] Zone reclaim: proc override	Christoph Lameter	2006-01-18	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \|	proc support for zone reclaim This patch creates a proc entry /proc/sys/vm/zone_reclaim_mode that may be used to override the automatic determination of the zone reclaim made on bootup. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] kernel/hrtimer.c sparse warning fix	Ingo Molnar	2006-01-16	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	fix the following sparse warning: kernel/hrtimer.c:665:34: warning: incorrect type in argument 2 (different address spaces) kernel/hrtimer.c:665:34: expected void const from kernel/hrtimer.c:665:34: got struct timespec [noderef] <noident><asn:1> kernel/hrtimer.c:664:2: warning: dereference of noderef expression Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] build kernel/intermodule.c only when required	Adrian Bunk	2006-01-16	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	Build kernel/intermodule.c only when required. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] hrtimer comment tweak	Jonathan Corbet	2006-01-16	1	-1/+1
\| \| \| \| \| \| \|	Fix a comment which missed an update cycle somewhere. Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial	Linus Torvalds	2006-01-15	2	-2/+2
\|\
\| *	correct email address of Manfred Spraul	Christian Kujau	2006-01-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I tried to send the forcedeth maintainer an email, but it came back with: "The mail address manfreds@colorfullife.com is not read anymore. Please resent your mail to manfred@ instead of manfreds@." This patch fixes this. Signed-off-by: Adrian Bunk <bunk@stusta.de>
\| *	SOFTWARE_SUSPEND: fix a typo in the dependencies	Adrian Bunk	2006-01-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a typo in the dependencies of SOFTWARE_SUSPEND. This patch is based on a report by Jean-Luc Leger <reiga@dspnet.fr.eu.org>. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Pavel Machek <pavel@ucw.cz>
* \|	[PATCH] cpuset oom lock fix	Paul Jackson	2006-01-14	1	-5/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem, reported in: http://bugzilla.kernel.org/show_bug.cgi?id=5859 and by various other email messages and lkml posts is that the cpuset hook in the oom (out of memory) code can try to take a cpuset semaphore while holding the tasklist_lock (a spinlock). One must not sleep while holding a spinlock. The fix seems easy enough - move the cpuset semaphore region outside the tasklist_lock region. This required a few lines of mechanism to implement. The oom code where the locking needs to be changed does not have access to the cpuset locks, which are internal to kernel/cpuset.c only. So I provided a couple more cpuset interface routines, available to the rest of the kernel, which simple take and drop the lock needed here (cpusets callback_sem). Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* \|	[PATCH] s390: spinlock fixes	Martin Schwidefsky	2006-01-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove useless spin_retry_counter and fix compilation for UP kernels. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* \|	[PATCH] Unlinline a bunch of other functions	Arjan van de Ven	2006-01-14	6	-21/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the "inline" keyword from a bunch of big functions in the kernel with the goal of shrinking it by 30kb to 40kb Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* \|	[PATCH] sched: add new SCHED_BATCH policy	Ingo Molnar	2006-01-14	2	-16/+36
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new SCHED_BATCH (3) scheduling policy: such tasks are presumed CPU-intensive, and will acquire a constant +5 priority level penalty. Such policy is nice for workloads that are non-interactive, but which do not want to give up their nice levels. The policy is also useful for workloads that want a deterministic scheduling policy without interactivity causing extra preemptions (between that workload's tasks). Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	Merge master.kernel.org:/pub/scm/linux/kernel/git/tglx/hrtimer-2.6	Linus Torvalds	2006-01-12	1	-15/+19
\|\
\| *	[hrtimer] Enforce resolution as lower limit of intervals	Thomas Gleixner	2006-01-12	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Roman Zippel pointed out that the missing lower limit of intervals leads to an accounting error in the overrun count. Enforce the lower limit of intervals to resolution in the timer forwarding code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| *	[hrtimer] Change resolution storage to ktime_t format	Thomas Gleixner	2006-01-12	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the storage format of the per base resolution to ktime_t to make it easier accessible in the hrtimers code. Change the resolution from (NSEC_PER_SEC/HZ) to TICK_NSEC as Roman pointed out. TICK_NSEC is closer to the real resolution. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| *	[hrtimer] Remove listhead from hrtimer struct	Thomas Gleixner	2006-01-12	1	-12/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The list_head in the hrtimer structure was introduced for easy access to the first timer with the further extensions of real high resolution timers in mind, but it turned out in the course of development that it is not necessary for the standard use case. Remove the list head and access the first expiry timer by a datafield in the timer base. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* \|	[PATCH] sched: filter affine wakeups	akpm@osdl.org	2006-01-12	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	) From: Nick Piggin <nickpiggin@yahoo.com.au> Track the last waker CPU, and only consider wakeup-balancing if there's a match between current waker CPU and the previous waker CPU. This ensures that there is some correlation between two subsequent wakeup events before we move the task. Should help random-wakeup workloads on large SMP systems, by reducing the migration attempts by a factor of nr_cpus. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* \|	[PATCH] scheduler cache-hot-autodetect	akpm@osdl.org	2006-01-12	1	-0/+468
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	) From: Ingo Molnar <mingo@elte.hu> This is the latest version of the scheduler cache-hot-auto-tune patch. The first problem was that detection time scaled with O(N^2), which is unacceptable on larger SMP and NUMA systems. To solve this: - I've added a 'domain distance' function, which is used to cache measurement results. Each distance is only measured once. This means that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT distances 0 and 1, and on SMP distance 0 is measured. The code walks the domain tree to determine the distance, so it automatically follows whatever hierarchy an architecture sets up. This cuts down on the boot time significantly and removes the O(N^2) limit. The only assumption is that migration costs can be expressed as a function of domain distance - this covers the overwhelming majority of existing systems, and is a good guess even for more assymetric systems. [ People hacking systems that have assymetries that break this assumption (e.g. different CPU speeds) should experiment a bit with the cpu_distance() function. Adding a ->migration_distance factor to the domain structure would be one possible solution - but lets first see the problem systems, if they exist at all. Lets not overdesign. ] Another problem was that only a single cache-size was used for measuring the cost of migration, and most architectures didnt set that variable up. Furthermore, a single cache-size does not fit NUMA hierarchies with L3 caches and does not fit HT setups, where different CPUs will often have different 'effective cache sizes'. To solve this problem: - Instead of relying on a single cache-size provided by the platform and sticking to it, the code now auto-detects the 'effective migration cost' between two measured CPUs, via iterating through a wide range of cachesizes. The code searches for the maximum migration cost, which occurs when the working set of the test-workload falls just below the 'effective cache size'. I.e. real-life optimized search is done for the maximum migration cost, between two real CPUs. This, amongst other things, has the positive effect hat if e.g. two CPUs share a L2/L3 cache, a different (and accurate) migration cost will be found than between two CPUs on the same system that dont share any caches. (The reliable measurement of migration costs is tricky - see the source for details.) Furthermore i've added various boot-time options to override/tune migration behavior. Firstly, there's a blanket override for autodetection: migration_cost=1000,2000,3000 will override the depth 0/1/2 values with 1msec/2msec/3msec values. Secondly, there's a global factor that can be used to increase (or decrease) the autodetected values: migration_factor=120 will increase the autodetected values by 20%. This option is useful to tune things in a workload-dependent way - e.g. if a workload is cache-insensitive then CPU utilization can be maximized by specifying migration_factor=0. I've tested the autodetection code quite extensively on x86, on 3 P3/Xeon/2MB, and the autodetected values look pretty good: Dual Celeron (128K L2 cache): --------------------- migration cost matrix (max_cache_size: 131072, cpu: 467 MHz): --------------------- [00] [01] [00]: - 1.7(1) [01]: 1.7(1) - --------------------- cacheflush times [2]: 0.0 (0) 1.7 (1784008) --------------------- Here the slow memory subsystem dominates system performance, and even though caches are small, the migration cost is 1.7 msecs. Dual HT P4 (512K L2 cache): --------------------- migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz): --------------------- [00] [01] [02] [03] [00]: - 0.4(1) 0.0(0) 0.4(1) [01]: 0.4(1) - 0.4(1) 0.0(0) [02]: 0.0(0) 0.4(1) - 0.4(1) [03]: 0.4(1) 0.0(0) 0.4(1) - --------------------- cacheflush times [2]: 0.0 (33900) 0.4 (448514) --------------------- Here it can be seen that there is no migration cost between two HT siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory system makes inter-physical-CPU migration pretty cheap: 0.4 msecs. 8-way P3/Xeon [2MB L2 cache]: --------------------- migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz): --------------------- [00] [01] [02] [03] [04] [05] [06] [07] [00]: - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [01]: 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [02]: 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [03]: 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) [04]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) [05]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) [06]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) [07]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - --------------------- cacheflush times [2]: 0.0 (0) 19.2 (19281756) --------------------- This one has huge caches and a relatively slow memory subsystem - so the migration cost is 19 msecs. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Cc: <wilder@us.ibm.com> Signed-off-by: John Hawkes <hawkes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] x86_64: Inclusion of ScaleMP vSMP architecture patches - vsmp_align	Ravikiran G Thirumalai	2006-01-11	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	vSMP specific alignment patch to 1. Define INTERNODE_CACHE_SHIFT for vSMP 2. Use this for alignment of critical structures 3. Use INTERNODE_CACHE_SHIFT for ARCH_MIN_TASKALIGN, and let the slab align task_struct allocations to the internode cacheline size 4. Introduce and use ARCH_MIN_MMSTRUCT_ALIGN for mm_struct slab allocations. Signed-off-by: Ravikiran Thirumalai <kiran@scalemp.com> Signed-off-by: Shai Fultheim <shai@scalemp.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] x86_64: Make the cpu_*_maps in kernel/sched.c read mostly	Andi Kleen	2006-01-11	1	-3/+3
\| \| \| \| \| \| \|	They are referred to often so avoid potential false sharing for them. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] move capable() to capability.h	Randy.Dunlap	2006-01-11	13	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	- Move capable() from sched.h to capability.h; - Use <linux/capability.h> where capable() is used (in include/, block/, ipc/, kernel/, a few drivers/, mm/, security/, & sound/; many more drivers/ to go) Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] uninline capable()	Ingo Molnar	2006-01-11	1	-0/+12
\| \| \| \| \| \| \| \| \| \|	Uninline capable(). Saves 2K of kernel text on a generic .config, and 1K on a tiny config. In addition it makes the use of capable more consistent between CONFIG_SECURITY and !CONFIG_SECURITY Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] kprobes: fix unloading of self probed module	Keshavamurthy Anil S	2006-01-11	1	-10/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a kprobes modules is written in such a way that probes are inserted on itself, then unload of that moudle was not possible due to reference couning on the same module. The below patch makes a check and incrementes the module refcount only if it is not a self probed module. We need to allow modules to probe themself for kprobes performance measurements This patch has been tested on several x86_64, ppc64 and IA64 architectures. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] fix/simplify mutex debugging code	David Woodhouse	2006-01-11	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	Let's switch mutex_debug_check_no_locks_freed() to take (addr, len) as arguments instead, since all its callers were just calculating the 'to' address for themselves anyway... (and sometimes doing so badly). Signed-off-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] mutex: trivial whitespace cleanups	Ingo Molnar	2006-01-10	2	-5/+1
\| \| \| \| \|	Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] mark mutex_lock*() as might_sleep()	Ingo Molnar	2006-01-10	1	-0/+2
\| \| \| \| \| \| \| \|	Mark mutex_lock() and mutex_lock_interruptible() as might_sleep() functions. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] fix i386 mutex fastpath on FRAME_POINTER && !DEBUG_MUTEXES	Ingo Molnar	2006-01-10	1	-9/+0
\| \| \| \| \| \| \| \| \|	Call the mutex slowpath more conservatively - e.g. FRAME_POINTERS can change the calling convention, in which case a direct branch to the slowpath becomes illegal. Bug found by Hugh Dickins. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] remove unnecessary asm/mutex.h from kernel/mutex-debug.c	Ingo Molnar	2006-01-10	1	-2/+0
\| \| \| \| \| \| \| \|	Remove unnecessary (and incorrect) inclusion of asm/mutex.h, pointed out by David Howells. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] rcu: fix hotplug-cpu ->donelist leak	Oleg Nesterov	2006-01-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pointed out by Srivatsa Vaddagiri <vatsa@in.ibm.com>. rcu_do_batch() stops after processing maxbatch callbacks on ->donelist leaving rcu_tasklet in TASKLET_STATE_SCHED state. If CPU_DEAD event happens remaining ->donelist entries are lost, rcu_offline_cpu() kills this tasklet. With this patch ->donelist migrates along with ->curlist and ->nxtlist to the current cpu. Compile tested. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] rcu: join rcu_ctrlblk and rcu_state	Oleg Nesterov	2006-01-10	1	-44/+38
\| \| \| \| \| \| \| \| \| \| \|	This patch moves rcu_state into the rcu_ctrlblk. I think there are no reasons why we should have 2 different variables to control rcu state. Every user of rcu_state has also "rcu_ctrlblk *rcp" in the parameter list. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>