summaryrefslogtreecommitdiffstats
path: root/kernel/sched.c
Commit message (Collapse)AuthorAgeFilesLines
* sched: sync wakeups preempt tooIngo Molnar2007-10-151-10/+1
| | | | | | | | | | make sure sync wakeups preempt too - the scheduler will not overschedule as we've got various throttles against that. As a result, sync wakeups can be used more widely in the kernel (to signal wakeup affinity between tasks), and no arbitrary latencies will be introduced either. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: affine sync wakeupsIngo Molnar2007-10-151-1/+7
| | | | | | | | make sync wakeups affine for cache-cold tasks: if a cache-cold task is woken up by a sync wakeup then use the opportunity to migrate it straight away. (the two tasks are 'related' because they communicate) Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: guest CPU accounting: maintain stats in account_system_time()Laurent Vivier2007-10-151-0/+25
| | | | | | | | | | | | | | | modify account_system_time() to add cputime to cpustat->guest if we are running a VCPU. We add this cputime to cpustat->user instead of cpustat->system because this part of KVM code is in fact user code although it is executed in the kernel. We duplicate VCPU time between guest and user to allow an unmodified "top(1)" to display correct value. A modified "top(1)" is able to display good cpu user time and cpu guest time by subtracting cpu guest time from cpu user time. Update "gtime" in task_struct accordingly. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Acked-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: domain sysctl fixes: add terminator commentMilton Miller2007-10-151-0/+1
| | | | | | | | we had an incorrect-terminator bug in sd_alloc_ctl_domain_table() before, so add a comment that documents it. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: domain sysctl fixes: do not crash on allocation failureMilton Miller2007-10-151-2/+8
| | | | | | | | | Now that we are calling this at runtime, a more relaxed error path is suggested. If an allocation fails, we just register the partial table, which will show empty directories. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: domain sysctl fixes: unregister the sysctl table before domainsMilton Miller2007-10-151-4/+30
| | | | | | | | | Unregister and free the sysctl table before destroying domains, then rebuild and register after creating the new domains. This prevents the sysctl table from pointing to freed memory for root to write. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: domain sysctl fixes: use for_each_online_cpu()Milton Miller2007-10-151-1/+2
| | | | | | | | | | init_sched_domain_sysctl was walking cpus 0-n and referencing per_cpu variables. If the cpus_possible mask is not contigious this will result in a crash referencing unallocated data. If the online mask is not contigious then we would show offline cpus and miss online ones. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: domain sysctl fixes: use kcalloc()Milton Miller2007-10-151-3/+2
| | | | | | | kcalloc checks for n * sizeof(element) overflows and it zeros. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: allow the immediate migration of cache-cold tasksIngo Molnar2007-10-151-1/+7
| | | | | | allow the immediate migration of cache-cold tasks. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: debug, improve migration statisticsIngo Molnar2007-10-151-22/+54
| | | | | | | add new migration statistics when SCHED_DEBUG and SCHEDSTATS is enabled. Available in /proc/<PID>/sched. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: activate task_hot() only on fair-scheduled tasksPeter Zijlstra2007-10-151-3/+8
| | | | | | | | activate task_hot() only for fair-scheduled tasks (i.e. disable it for RT tasks). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: reintroduce cache-hot affinityIngo Molnar2007-10-151-0/+27
| | | | | | | | reintroduce a simplified version of cache-hot/cold scheduling affinity. This improves performance with certain SMP workloads, such as sysbench. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: do not normalize kernel threads via SysRq-NIngo Molnar2007-10-151-11/+7
| | | | | | | | | | do not normalize kernel threads via SysRq-N: the migration threads, softlockup threads, etc. might be essential for the system to function properly. So only zap user tasks. pointed out by Andi Kleen. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: remove stale comment from sched_group_set_shares()Andi Kleen2007-10-151-2/+0
| | | | | | | | | remove stale comment from sched_group_set_shares(). Function never returns -EINVAL. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up is_migration_thread()Ingo Molnar2007-10-151-6/+9
| | | | | | clean up is_migration_thread() and turn it into an inline function. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: cleanup: refactor normalize_rt_tasksAndi Kleen2007-10-151-20/+23
| | | | | | | | Replace a particularly ugly ifdef with an inline and a new macro. Also split up the function to be easier to read. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: cleanup: refactor common code of sleep_on / wait_for_completionAndi Kleen2007-10-151-139/+49
| | | | | | | | | | | | | Refactor common code of sleep_on / wait_for_completion These functions were largely cut'n'pasted. This moves the common code into single helpers instead. Advantage is about 1k less code on x86-64 and 91 lines of code removed. It adds one function call to the non timeout version of the functions; i don't expect this to be measurable. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: cleanup: remove unnecessary gotosAndi Kleen2007-10-151-165/+162
| | | | | | | | | | Replace loops implemented with gotos with real loops. Replace err = ...; goto x; x: return err; with return ...; No functional changes. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: prevent wakeup over-schedulingMike Galbraith2007-10-151-1/+3
| | | | | | | | | | | Prevent wakeup over-scheduling. Once a task has been preempted by a task of the same or lower priority, it becomes ineligible for repeated preemption by same until it has been ticked, or slept. Instead, the task is marked for preemption at the next tick. Tasks of higher priority still preempt immediately. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: disable forced preemption by defaultPeter Zijlstra2007-10-151-1/+3
| | | | | | | | Implement feature bit to disable forced preemption. This way it can be checked whether a workload is overscheduling or not. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: some proc entries are missed in sched_domain sys_ctl debug codeZou Nan hai2007-10-151-3/+3
| | | | | | | | | | | cache_nice_tries and flags entry do not appear in proc fs sched_domain directory, because ctl_table entry is skipped. This patch fixes the issue. Signed-off-by: Zou Nan hai <nanhai.zou@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: fix rt ptracer monopolizing CPUGautham R Shenoy2007-10-151-1/+1
| | | | | | | | | | | | | yield() in wait_task_inactive(), can cause a high priority thread to be scheduled back in, and there by loop forever while it is waiting for some lower priority thread which is unfortunately still on the runqueue. Use schedule_timeout_uninterruptible(1) instead. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Credit: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: group scheduling, sysfs tunablesDhaval Giani2007-10-151-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | Add tunables in sysfs to modify a user's cpu share. A directory is created in sysfs for each new user in the system. /sys/kernel/uids/<uid>/cpu_share Reading this file returns the cpu shares granted for the user. Writing into this file modifies the cpu share for the user. Only an administrator is allowed to modify a user's cpu share. Ex: # cd /sys/kernel/uids/ # cat 512/cpu_share 1024 # echo 2048 > 512/cpu_share # cat 512/cpu_share 2048 # Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: export cpu_clock()Paul E. McKenney2007-10-151-0/+1
| | | | | | | | export cpu_clock() - the preferred API instead of sched_clock(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: fix: move the CPU check into ->task_new_fair()Ingo Molnar2007-10-151-4/+1
| | | | | | | | | | | | | noticed by Peter Zijlstra: fix: move the CPU check into ->task_new_fair(), this way we can call place_entity() and get child ->vruntime right at initial wakeup time. (without this there can be large latencies) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
* sched: cleanup: rename task_grp to task_groupIngo Molnar2007-10-151-18/+18
| | | | | | | cleanup: rename task_grp to task_group. No need to save two characters and 'grp' is annoying to read. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: cleanup: rename SCHED_FEAT_USE_TREE_AVG to SCHED_FEAT_TREE_AVGIngo Molnar2007-10-151-2/+2
| | | | | | | cleanup: rename SCHED_FEAT_USE_TREE_AVG to SCHED_FEAT_TREE_AVG, to make SCHED_FEAT_ names more consistent. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: kfree(NULL) is validIngo Molnar2007-10-151-8/+5
| | | | | | | | | | | | | | kfree(NULL) is valid. pointed out by checkpatch.pl. the fix shrinks the code a bit: text data bss dec hex filename 40024 3842 100 43966 abbe sched.o.before 40002 3842 100 43944 aba8 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: style cleanupIngo Molnar2007-10-151-1/+1
| | | | | | fix up __setup() style bug - noticed via checkpatch.pl. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: break out if printing a warning in sched_domain_debug()Ingo Molnar2007-10-151-0/+3
| | | | | | | checkpatch.pl and Andy Whitcroft noticed the following bug: we did not break out after printing an error. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: run sched_domain_debug() if CONFIG_SCHED_DEBUG=yIngo Molnar2007-10-151-2/+1
| | | | | | | run sched_domain_debug() if CONFIG_SCHED_DEBUG=y, instead of relying on the hand-crafted SCHED_DOMAIN_DEBUG switch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: tidy up SCHED_RRDmitry Adamushko2007-10-151-24/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - make timeslices of SCHED_RR tasks constant and not dependent on task's static_prio [1] ; - remove obsolete code (timeslice related bits); - make sched_rr_get_interval() return something more meaningful [2] for SCHED_OTHER tasks. [1] according to the following link, it's not compliant with SUSv3 (not sure though, what is the reference for us :-) http://lkml.org/lkml/2007/3/7/656 [2] the interval is dynamic and can be depicted as follows "should a task be one of the runnable tasks at this particular moment, it would expect to run for this interval of time before being re-scheduled by the scheduler tick". (i.e. it's more precise if a task is runnable at the moment) yeah, this seems to require task_rq_lock/unlock() but this is not a hot path. results: (SCHED_FIFO) dimm@earth:~/storage/prog$ sudo chrt -f 10 ./rr_interval time_slice: 0 : 0 (SCHED_RR) dimm@earth:~/storage/prog$ sudo chrt 10 ./rr_interval time_slice: 0 : 99984800 (SCHED_NORMAL) dimm@earth:~/storage/prog$ ./rr_interval time_slice: 0 : 19996960 (SCHED_NORMAL + a cpu_hog of similar 'weight' on the same CPU --- so should be a half of the previous result) dimm@earth:~/storage/prog$ taskset 1 ./rr_interval time_slice: 0 : 9998480 Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: uninline schedulerAlexey Dobriyan2007-10-151-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * save ~300 bytes * activate_idle_task() was moved to avoid a warning bloat-o-meter output: add/remove: 6/0 grow/shrink: 0/16 up/down: 438/-733 (-295) <=== function old new delta __enqueue_entity - 165 +165 finish_task_switch - 110 +110 update_curr_rt - 79 +79 __load_balance_iterator - 32 +32 __task_rq_unlock - 28 +28 find_process_by_pid - 24 +24 do_sched_setscheduler 133 123 -10 sys_sched_rr_get_interval 176 165 -11 sys_sched_getparam 156 145 -11 normalize_rt_tasks 482 470 -12 sched_getaffinity 112 99 -13 sys_sched_getscheduler 86 72 -14 sched_setaffinity 226 212 -14 sched_setscheduler 666 642 -24 load_balance_start_fair 33 9 -24 load_balance_next_fair 33 9 -24 dequeue_task_rt 133 67 -66 put_prev_task_rt 97 28 -69 schedule_tail 133 50 -83 schedule 682 594 -88 enqueue_entity 499 366 -133 task_new_fair 317 180 -137 Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: optimize schedule() a bit on SMPIngo Molnar2007-10-151-2/+6
| | | | | | | | | | | | | | optimize schedule() a bit on SMP, by moving the rq-clock update outside the rq lock. code size is the same: text data bss dec hex filename 25725 2666 96 28487 6f47 sched.o.before 25725 2666 96 28487 6f47 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
* sched: whitespace cleanupsIngo Molnar2007-10-151-26/+26
| | | | | | | | | | | more whitespace cleanups. No code changed: text data bss dec hex filename 26553 2790 288 29631 73bf sched.o.before 26553 2790 288 29631 73bf sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
* sched: mark scheduling classes as constIngo Molnar2007-10-151-12/+5
| | | | | | | | | | | | mark scheduling classes as const. The speeds up the code a bit and shrinks it: text data bss dec hex filename 40027 4018 292 44337 ad31 sched.o.before 40190 3842 292 44324 ad24 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
* sched: group scheduler SMP migration fixSrivatsa Vaddagiri2007-10-151-1/+4
| | | | | | | | group scheduler SMP migration fix: use task_cfs_rq(p) to get to the relevant fair-scheduling runqueue of a task, rq->cfs is not the right one. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up schedstats, cnt -> countIngo Molnar2007-10-151-12/+12
| | | | | | | | | | | | | | | | rename all 'cnt' fields and variables to the less yucky 'count' name. yuckage noticed by Andrew Morton. no change in code, other than the /proc/sched_debug bkl_count string got a bit larger: text data bss dec hex filename 38236 3506 24 41766 a326 sched.o.before 38240 3506 24 41770 a32a sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
* sched: clean up sched_fork()Hiroshi Shimamoto2007-10-151-5/+2
| | | | | | | | | | | | | | | | | | The adjusting sched_class is a missing part of the already existing "do not leak PI boosting priority to the child" at the sched_fork(). This patch moves the adjusting sched_class from wake_up_new_task() to sched_fork(). this also shrinks the code a bit: text data bss dec hex filename 40111 4018 292 44421 ad85 sched.o.before 40102 4018 292 44412 ad7c sched.o.after Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
* sched: fix sched_fork()Ingo Molnar2007-10-151-1/+1
| | | | | | | | | | fix sched_fork(): large latencies at new task creation time because the ->vruntime was not fixed up cross-CPU, if the parent got migrated after the child's CPU got set up. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
* sched: undo some of the recent changesIngo Molnar2007-10-151-1/+0
| | | | | | | | undo some of the recent changes that are not needed after all, such as last_min_vruntime. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
* sched: remove condition from set_task_cpu()Ingo Molnar2007-10-151-3/+1
| | | | | | | | | remove condition from set_task_cpu(). Now that ->vruntime is not global anymore, it should (and does) work fine without it too. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
* sched debug: check spreadPeter Zijlstra2007-10-151-0/+3
| | | | | | | | | | debug feature: check how well we schedule within a reasonable vruntime 'spread' range. (note that CPU overload can increase the spread, so this is not a hard condition, but normal loads should be within the spread.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
* sched: add vslicePeter Zijlstra2007-10-151-0/+2
| | | | | | | | | | add vslice: the load-dependent "virtual slice" a task should run ideally, so that the observed latency stays within the sched_latency window. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
* sched debug: BKL usage statisticsIngo Molnar2007-10-151-0/+9
| | | | | | | | add per task and per rq BKL usage statistics. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
* sched: add fair-user schedulerSrivatsa Vaddagiri2007-10-151-0/+9
| | | | | | | | | | | | | | | | | | | Enable user-id based fair group scheduling. This is useful for anyone who wants to test the group scheduler w/o having to enable CONFIG_CGROUPS. A separate scheduling group (i.e struct task_grp) is automatically created for every new user added to the system. Upon uid change for a task, it is made to move to the corresponding scheduling group. A /proc tunable (/proc/root_user_share) is also provided to tune root user's quota of cpu bandwidth. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
* sched: clean up code under CONFIG_FAIR_GROUP_SCHEDSrivatsa Vaddagiri2007-10-151-110/+62
| | | | | | | | | | | | | | | | With the view of supporting user-id based fair scheduling (and not just container-based fair scheduling), this patch renames several functions and makes them independent of whether they are being used for container or user-id based fair scheduling. Also fix a problem reported by KAMEZAWA Hiroyuki (wrt allocating less-sized array for tg->cfs_rq[] and tf->se[]). Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
* sched: revert recent removal of set_curr_task()Srivatsa Vaddagiri2007-10-151-8/+26
| | | | | | | | | | Revert removal of set_curr_task. Use put_prev_task/set_curr_task when changing groups/policies Signed-off-by: Srivatsa Vaddagiri < vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
* sched: rework enqueue/dequeue_entity() to get rid of set_curr_task()Dmitry Adamushko2007-10-151-26/+10
| | | | | | | | | | | | | | | | rework enqueue/dequeue_entity() to get rid of sched_class::set_curr_task(). This simplifies sched_setscheduler(), rt_mutex_setprio() and sched_move_tasks(). text data bss dec hex filename 24330 2734 20 27084 69cc sched.o.before 24233 2730 20 26983 6967 sched.o.after Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
* sched: simplify sched_class::yield_task()Dmitry Adamushko2007-10-151-1/+1
| | | | | | | | | | | | | | the 'p' (task_struct) parameter in the sched_class :: yield_task() is redundant as the caller is always the 'current'. Get rid of it. text data bss dec hex filename 24341 2734 20 27095 69d7 sched.o.before 24330 2734 20 27084 69cc sched.o.after Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
OpenPOWER on IntegriCloud