op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	kernel-doc: fix sched.c missing parameter	Randy Dunlap	2008-04-22	1	-0/+1
\| \| \| \| \| \| \| \| \|	Add missing kernel-doc in kernel/sched.c: Warning(linux-2.6.25-git3//kernel/sched.c:7044): No description found for parameter 'span' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	sched: features fix	Ingo Molnar	2008-04-19	1	-13/+2
\| \| \| \|	Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: /debug/sched_features	Peter Zijlstra	2008-04-19	1	-22/+140
\| \| \| \| \| \| \| \|	provide a text based interface to the scheduler features; this saves the 'user' from setting bits using decimal arithmetic. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: add SCHED_FEAT_DEADLINE	Ingo Molnar	2008-04-19	1	-1/+3
\| \| \| \| \| \|	unused at the moment. Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: fair: weight calculations	Peter Zijlstra	2008-04-19	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to level the hierarchy, we need to calculate load based on the root view. That is, each task's load is in the same unit. A / \ B 1 / \ 2 3 To compute 1's load we do: weight(1) -------------- rq_weight(A) To compute 2's load we do: weight(2) weight(B) ------------ * ----------- rq_weight(B) rw_weight(A) This yields load fractions in comparable units. The consequence is that it changes virtual time. We used to have: time_{i} vtime_{i} = ------------ weight_{i} vtime = \Sum vtime_{i} = time / rq_weight. But with the new way of load calculation we get that vtime equals time. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: fair-group: de-couple load-balancing from the rb-trees	Peter Zijlstra	2008-04-19	1	-2/+8
\| \| \| \| \| \| \| \|	De-couple load-balancing from the rb-trees, so that I can change their organization. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: fair-group: SMP-nice for group scheduling	Peter Zijlstra	2008-04-19	1	-34/+463
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement SMP nice support for the full group hierarchy. On each load-balance action, compile a sched_domain wide view of the full task_group tree. We compute the domain wide view when walking down the hierarchy, and readjust the weights when walking back up. After collecting and readjusting the domain wide view, we try to balance the tasks within the task_groups. The current approach is a naively balance each task group until we've moved the targeted amount of load. Inspired by Srivatsa Vaddsgiri's previous code and Abhishek Chandra's H-SMP paper. XXX: there will be some numerical issues due to the limited nature of SCHED_LOAD_SCALE wrt to representing a task_groups influence on the total weight. When the tree is deep enough, or the task weight small enough, we'll run out of bits. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Abhishek Chandra <chandra@cs.umn.edu> CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched, cpuset: customize sched domains, core	Hidetoshi Seto	2008-04-19	1	-5/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[rebased for sched-devel/latest] - Add a new cpuset file, having levels: sched_relax_domain_level - Modify partition_sched_domains() and build_sched_domains() to take attributes parameter passed from cpuset. - Fill newidle_idx for node domains which currently unused but might be required if sched_relax_domain_level become higher. - We can change the default level by boot option 'relax_domain_level='. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: rt: multi level group constraints	Peter Zijlstra	2008-04-19	1	-0/+33
\| \| \| \| \| \| \|	multi level rt constraints Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: task_group hierarchy	Peter Zijlstra	2008-04-19	1	-0/+20
\| \| \| \| \| \| \|	Add the full parent<->child relation thing into task_groups as well. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: fix the task_group hierarchy for UID grouping	Peter Zijlstra	2008-04-19	1	-2/+41
\| \| \| \| \| \| \| \|	UID grouping doesn't actually have a task_group representing the root of the task_group tree. Add one. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: allow the group scheduler to have multiple levels	Dhaval Giani	2008-04-19	1	-32/+53
\| \| \| \| \| \| \| \| \|	This patch makes the group scheduler multi hierarchy aware. [a.p.zijlstra@chello.nl: rt-parts and assorted fixes] Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: mix tasks and groups	Dhaval Giani	2008-04-19	1	-2/+49
\| \| \| \| \| \| \| \| \| \| \| \|	This patch allows tasks and groups to exist in the same cfs_rq. With this change the CFS group scheduling follows a 1/(M+N) model from a 1/(1+N) fairness model where M tasks and N groups exist at the cfs_rq level. [a.p.zijlstra@chello.nl: rt bits and assorted fixes] Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: fix checks	Ingo Molnar	2008-04-19	1	-4/+6
\| \| \| \|	Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: old sleeper bonus	Peter Zijlstra	2008-04-19	1	-1/+3
\| \| \| \| \|	Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: add new set_cpus_allowed_ptr function	Mike Travis	2008-04-19	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new function that accepts a pointer to the "newly allowed cpus" cpumask argument. int set_cpus_allowed_ptr(struct task_struct p, const cpumask_t new_mask) The current set_cpus_allowed() function is modified to use the above but this does not result in an ABI change. And with some compiler optimization help, it may not introduce any additional overhead. Additionally, to enforce the read only nature of the new_mask arg, the "const" property is migrated to sub-functions called by set_cpus_allowed. This silences compiler warnings. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	init: move setup of nr_cpu_ids to as early as possible	Mike Travis	2008-04-19	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Move the setting of nr_cpu_ids from sched_init() to start_kernel() so that it's available as early as possible. Note that an arch has the option of setting it even earlier if need be, but it should not result in a different value than the setup_nr_cpu_ids() function. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: remove another cpumask_t variable from stack	Mike Travis	2008-04-19	1	-9/+6
\| \| \| \| \| \| \| \|	* Remove another cpumask_t variable from stack that was missed in the last kernel_sched_c updates. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	cpumask: reduce stack usage in SD_x_INIT initializers	Mike Travis	2008-04-19	1	-120/+248
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Remove empty cpumask_t (and all non-zero/non-null) variables in SD__INIT macros. Use memset(0) to clear. Also, don't inline the initializer functions to save on stack space in build_sched_domains(). Merge change to include/linux/topology.h that uses the new node_to_cpumask_ptr function in the nr_cpus_node macro into this patch. Depends on: [mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch [sched-devel]: sched: add new set_cpus_allowed_ptr function Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	nodemask: use new node_to_cpumask_ptr function	Mike Travis	2008-04-19	1	-15/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Use new node_to_cpumask_ptr. This creates a pointer to the cpumask for a given node. This definition is in mm patch: asm-generic-add-node_to_cpumask_ptr-macro.patch * Use new set_cpus_allowed_ptr function. Depends on: [mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch [sched-devel]: sched: add new set_cpus_allowed_ptr function [x86/latest]: x86: add cpus_scnprintf function Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Greg Banks <gnb@melbourne.sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	generic: reduce stack pressure in sched_affinity	Mike Travis	2008-04-19	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Modify sched_affinity functions to pass cpumask_t variables by reference instead of by value. * Use new set_cpus_allowed_ptr function. Depends on: [sched-devel]: sched: add new set_cpus_allowed_ptr function Cc: Paul Jackson <pj@sgi.com> Cc: Cliff Wickman <cpw@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	cpuset: modify cpuset_set_cpus_allowed to use cpumask pointer	Mike Travis	2008-04-19	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Modify cpuset_cpus_allowed to return the currently allowed cpuset via a pointer argument instead of as the function return value. * Use new set_cpus_allowed_ptr function. * Cleanup CPU_MASK_ALL and NODE_MASK_ALL uses. Depends on: [sched-devel]: sched: add new set_cpus_allowed_ptr function Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: remove fixed NR_CPUS sized arrays in kernel_sched_c	Mike Travis	2008-04-19	1	-28/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Change fixed size arrays to per_cpu variables or dynamically allocated arrays in sched_init() and sched_init_smp(). (1) static struct sched_entity init_sched_entity_p[NR_CPUS]; (1) static struct cfs_rq init_cfs_rq_p[NR_CPUS]; (1) static struct sched_rt_entity init_sched_rt_entity_p[NR_CPUS]; (1) static struct rt_rq init_rt_rq_p[NR_CPUS]; static struct sched_group *sched_group_nodes_bycpu[NR_CPUS]; (1) - these arrays are allocated via alloc_bootmem_low() Change sched_domain_debug_one() to use cpulist_scnprintf instead of cpumask_scnprintf. This reduces the output buffer required and improves readability when large NR_CPU count machines arrive. * In sched_create_group() we allocate new arrays based on nr_cpu_ids. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: allow cpuacct stats to be reset	Dhaval Giani	2008-04-19	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \|	Currently the schedstats implementation does not allow the statistics to be reset. This patch aims to allow that. echo 0 > cpuacct.usage resets the usage. Any other value is not allowed and returns -EINVAL. Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: cleanup cpuacct variable names	Dhaval Giani	2008-04-19	1	-9/+9
\| \| \| \| \| \| \| \| \|	Change the variable names to the common convention for the cpuacct subsystem. Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: rt-group: smp balancing	Peter Zijlstra	2008-04-19	1	-3/+37
\| \| \| \| \| \| \| \| \| \| \| \|	Currently the rt group scheduling does a per cpu runtime limit, however the rt load balancer makes no guarantees about an equal spread of real- time tasks, just that at any one time, the highest priority tasks run. Solve this by making the runtime limit a global property by borrowing excessive runtime from the other cpus once the local limit runs out. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: rt-group: synchonised bandwidth period	Peter Zijlstra	2008-04-19	1	-49/+211
\| \| \| \| \| \| \| \| \| \| \|	Various SMP balancing algorithms require that the bandwidth period run in sync. Possible improvements are moving the rt_bandwidth thing into root_domain and keeping a span per rt_bandwidth which marks throttled cpus. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: remove sysctl_sched_batch_wakeup_granularity	Ingo Molnar	2008-04-19	1	-1/+0
\| \| \| \| \| \|	it's unused. Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: reenable sync wakeups	Ingo Molnar	2008-04-19	1	-5/+5
\| \| \| \|	Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: cache hot buddy	Ingo Molnar	2008-04-19	1	-7/+9
\| \| \| \|	Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: feat affine wakeups	Ingo Molnar	2008-04-19	1	-1/+3
\| \| \| \|	Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: introduce SCHED_FEAT_SYNC_WAKEUPS, turn it off	Ingo Molnar	2008-04-19	1	-1/+6
\| \| \| \| \| \| \| \|	turn off sync wakeups by default. They are not needed anymore - the buddy logic should be smart enough to keep the system from overscheduling. Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: fix rq->clock overflows detection with CONFIG_NO_HZ	Guillaume Chazarain	2008-04-19	1	-3/+35
\| \| \| \| \| \| \| \| \|	When using CONFIG_NO_HZ, rq->tick_timestamp is not updated every TICK_NSEC. We check that the number of skipped ticks matches the clock jump seen in __update_rq_clock(). Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: sched.c needs tick.h	Reynes Philippe	2008-04-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	kernel/sched.c:506: erreur: implicit declaration of function tick_get_tick_sched kernel/sched.c:506: erreur: invalid type argument of -> kernel/sched.c:506: erreur: NOHZ_MODE_INACTIVE undeclared (first use in this function) kernel/sched.c:506: erreur: (Each undeclared identifier is reported only once kernel/sched.c:506: erreur: for each function it appears in.) Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: make cpu_clock() globally synchronous	Ingo Molnar	2008-04-19	1	-3/+49
\| \| \| \| \| \| \| \| \| \|	Alexey Zaytsev reported (and bisected) that the introduction of cpu_clock() in printk made the timestamps jump back and forth. Make cpu_clock() more reliable while still keeping it fast when it's called frequently. Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	NOHZ: reevaluate idle sleep length after add_timer_on()	Thomas Gleixner	2008-03-26	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	add_timer_on() can add a timer on a CPU which is currently in a long idle sleep, but the timer wheel is not reevaluated by the nohz code on that CPU. So a timer can be delayed for quite a long time. This triggered a false positive in the clocksource watchdog code. To avoid this we need to wake up the idle CPU and enforce the reevaluation of the timer wheel for the next timer event. Add a function, which checks a given CPU for idle state, marks the idle task with NEED_RESCHED and sends a reschedule IPI to notify the other CPU of the change in the timer wheel. Call this function from add_timer_on(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: stable@kernel.org -- include/linux/sched.h \| 6 ++++++ kernel/sched.c \| 43 +++++++++++++++++++++++++++++++++++++++++++ kernel/timer.c \| 10 +++++++++- 3 files changed, 58 insertions(+), 1 deletion(-)
*	sched: add arch_update_cpu_topology hook.	Heiko Carstens	2008-03-21	1	-0/+5
\| \| \| \| \| \| \| \| \|	Will be called each time the scheduling domains are rebuild. Needed for architectures that don't have a static cpu topology. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: add exported arch_reinit_sched_domains() to header file.	Heiko Carstens	2008-03-21	1	-1/+1
\| \| \| \| \| \| \| \|	Needed so it can be called from outside of sched.c. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: remove double unlikely from schedule()	Roel Kluin	2008-03-21	1	-1/+1
\| \| \| \| \| \| \|	Combine two unlikely's Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: cleanup old and rarely used 'debug' features.	Peter Zijlstra	2008-03-21	1	-6/+2
\| \| \| \| \| \| \| \| \|	TREE_AVG and APPROX_AVG are initial task placement policies that have been disabled for a long while.. time to remove them. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: wakeup-buddy tasks are cache-hot	Ingo Molnar	2008-03-19	1	-0/+6
\| \| \| \| \| \| \| \|	Wakeup-buddy tasks are cache-hot - this makes it a bit harder for the load-balancer to tear them apart. (but it's still possible, if the load is sufficiently assymetric) Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: improve affine wakeups	Ingo Molnar	2008-03-19	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	improve affine wakeups. Maintain the 'overlap' metric based on CFS's sum_exec_runtime - which means the amount of time a task executes after it wakes up some other task. Use the 'overlap' for the wakeup decisions: if the 'overlap' is short, it means there's strong workload coupling between this task and the woken up task. If the 'overlap' is large then the workload is decoupled and the scheduler will move them to separate CPUs more easily. ( Also slightly move the preempt_check within try_to_wake_up() - this has no effect on functionality but allows 'early wakeups' (for still-on-rq tasks) to be correctly accounted as well.) Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: fix overload performance: buddy wakeups	Peter Zijlstra	2008-03-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we schedule to the leftmost task in the runqueue. When the runtimes are very short because of some server/client ping-pong, especially in over-saturated workloads, this will cycle through all tasks trashing the cache. Reduce cache trashing by keeping dependent tasks together by running newly woken tasks first. However, by not running the leftmost task first we could starve tasks because the wakee can gain unlimited runtime. Therefore we only run the wakee if its within a small (wakeup_granularity) window of the leftmost task. This preserves fairness, but does alternate server/client task groups. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: fix calc_delta_mine()	Ingo Molnar	2008-03-15	1	-1/+1
\| \| \| \| \| \| \|	lw->weight can be 0 for a short time during bootup. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
*	sched: fix update_load_add()/sub()	Ingo Molnar	2008-03-15	1	-0/+2
\| \| \| \| \| \| \| \|	Clear the cached inverse value when updating load. This is needed for calc_delta_mine() to work correctly when using the rq load. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
*	sched: fix race in schedule()	Hiroshi Shimamoto	2008-03-15	1	-22/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a hard to trigger crash seen in the -rt kernel that also affects the vanilla scheduler. There is a race condition between schedule() and some dequeue/enqueue functions; rt_mutex_setprio(), __setscheduler() and sched_move_task(). When scheduling to idle, idle_balance() is called to pull tasks from other busy processor. It might drop the rq lock. It means that those 3 functions encounter on_rq=0 and running=1. The current task should be put when running. Here is a possible scenario: CPU0 CPU1 \| schedule() \| ->deactivate_task() \| ->idle_balance() \| -->load_balance_newidle() rt_mutex_setprio() \| \| --->double_lock_balance() get lock rel lock * on_rq=0, ruuning=1 \| * sched_class is changed \| rel lock get lock : \| : ->put_prev_task_rt() ->pick_next_task_fair() => panic The current process of CPU1(P1) is scheduling. Deactivated P1, and the scheduler looks for another process on other CPU's runqueue because CPU1 will be idle. idle_balance(), load_balance_newidle() and double_lock_balance() are called and double_lock_balance() could drop the rq lock. On the other hand, CPU0 is trying to boost the priority of P1. The result of boosting only P1's prio and sched_class are changed to RT. The sched entities of P1 and P1's group are never put. It makes cfs_rq invalid, because the cfs_rq has curr and no leaf, but pick_next_task_fair() is called, then the kernel panics. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	keep rd->online and cpu_online_map in sync	Gregory Haskins	2008-03-11	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is possible to allow the root-domain cache of online cpus to become out of sync with the global cpu_online_map. This is because we currently trigger removal of cpus too early in the notifier chain. Other DOWN_PREPARE handlers may in fact run and reconfigure the root-domain topology, thereby stomping on our own offline handling. The end result is that rd->online may become out of sync with cpu_online_map, which results in potential task misrouting. So change the offline handling to be more tightly coupled with the global offline process by triggering on CPU_DYING intead of CPU_DOWN_PREPARE. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	Revert "cpu hotplug: adjust root-domain->online span in response to hotplug ↵	Gregory Haskins	2008-03-11	1	-7/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	event" This reverts commit 393d94d98b19089ec172566e23557997931b137e. Lets fix this right. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	cpu hotplug: adjust root-domain->online span in response to hotplug event	Gregory Haskins	2008-03-09	1	-11/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently set the root-domain online span automatically when the domain is added to the cpu if the cpu is already a member of cpu_online_map. This was done as a hack/bug-fix for s2ram, but it also causes a problem with hotplug CPU_DOWN transitioning. The right way to fix the original problem is to actually respond to CPU_UP events, instead of CPU_ONLINE, which is already too late. This solves the hung reboot regression reported by Andrew Morton and others. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	sched: don't allow rt_runtime_us to be zero for groups having rt tasks	Dhaval Giani	2008-03-07	1	-0/+17
\| \| \| \| \| \| \| \| \| \|	This patch checks if we can set the rt_runtime_us to 0. If there is a realtime task in the group, we don't want to set the rt_runtime_us as 0 or bad things will happen. (that task wont get any CPU time despite being TASK_RUNNNG) Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>