summaryrefslogtreecommitdiffstats
path: root/sys/kern/sched_ule.c
Commit message (Collapse)AuthorAgeFilesLines
...
* - We do need to IPI the idlethread on some systems. It may be stuck injeff2007-01-201-7/+1
| | | | | | | a power saving mode otherwise. - If the thread is already bound in sched_bind() unbind it before re-binding it to a new cpu. I don't like these semantics but they are expected by some code in the tree. Patch by jkoshy.
* - In tdq_transfer() always set NEEDRESCHED when necessary regardless ofjeff2007-01-201-15/+25
| | | | | | | | | | the ipi settings. If NEEDRESCHED is set and an ipi is later delivered it will clear it rather than cause extra context switches. However, if we miss setting it we can have terrible latency. - In sched_bind() correctly implement bind. Also be slightly more tolerant of code which calls bind multiple times. However, we don't change binding if another call is made with a different cpu. This does not presently work with hwpmc which I believe should be changed.
* Major revamp of ULE's cpu load balancing:jeff2007-01-191-237/+290
| | | | | | | | | | | | | | | | | | | | | | | - Switch back to direct modification of remote CPU run queues. This added a lot of complexity with questionable gain. It's easy enough to reimplement if it's shown to help on huge machines. - Re-implement the old tdq_transfer() call as tdq_pickidle(). Change sched_add() so we have selectable cpu choosers and simplify the logic a bit here. - Implement tdq_pickpri() as the new default cpu chooser. This algorithm is similar to Solaris in that it tries to always run the threads with the best priorities. It is actually slightly more complex than solaris's algorithm because we also tend to favor the local cpu over other cpus which has a boost in latency but also potentially enables cache sharing between the waking thread and the woken thread. - Add a bunch of tunables that can be used to measure effects of different load balancing strategies. Most of these will go away once the algorithm is more definite. - Add a new mechanism to steal threads from busy cpus when we idle. This is enabled with kern.sched.steal_busy and kern.sched.busy_thresh. The threshold is the required length of a tdq's run queue before another cpu will be able to steal runnable threads. This prevents most queue imbalances that contribute the long latencies.
* - Don't let SCHED_TICK_TOTAL() return less than hz. This can cause integerjeff2007-01-061-1/+1
| | | | | divide faults in roundup() later if it is able to return 0. For some reason this bug only shows up on my laptop and not my testboxes.
* - Fix the sched_priority() invalid priority bugs. Use roundup() insteadjeff2007-01-061-59/+45
| | | | | | | | | | | of max() when computing the divisor in SCHED_TICK_PRI(). This prevents cases where rounding down would allow the quotient to exceed SCHED_PRI_RANGE. - Garbage collect some unused flags and fields. - Replace TDF_HOLD with sched_pin_td()/sched_unpin_td() since it simply duplicated this functionality. - Re-enable the rebalancer by default and fix the sysctl so it can be modified.
* - Don't IPI unless we're going to interrupt something exiting in the kernel.jeff2007-01-061-1/+1
| | | | | otherwise we can afford the latency. This makes a significant performance improvement.
* - Fix a comparison in sched_choose() that caused cpus to be constantlyjeff2007-01-051-22/+38
| | | | | | | | marked idle, thus breaking cpu load balancing. - Change sched_interact_update() to fix cases where the stored history has expanded significantly rather than handling them in the callers. This fixes a case where sched_priority() could compute a bad value. - Add a sysctl to disable the global load balancer for experimentation.
* - ftick was initialized to -1 for init and any of it's children. Fix this byjeff2007-01-051-9/+35
| | | | | | | | | | | | | setting ftick = ltick = ticks in schedinit(). - Update the priority when we are pulled off of the run queue and when we are inserted onto the run queue so that it more accurately reflects our present status. This is important for efficient priority propagation functioning. - Move the frequency test into sched_pctcpu_update() so we don't repeat it each time we'd like to call it. - Put some temporary work-around code in sched_priority() in case the tick mechanism produces a bad priority. Eventually this should revert to an assert again.
* - Only allow the tdq_idx to increase by one each tick rather than up tojeff2007-01-041-22/+52
| | | | | | | | | | | | | | | the most recently chosen index. This significantly improves nice behavior. This allows a lower priority thread to run some multiple of times before the higher priority thread makes it to the front of the queue. A nice +20 cpu hog now only gets ~5% of the cpu when running with a nice 0 cpu hog and about 1.5% with a nice -20 hog. A nice difference of 1 makes a 4% difference in cpu usage between two hogs. - Track a seperate insert and removal index. When the removal index is empty it is updated to point at the current insert index. - Don't remove and re-add a thread to the runq when it is being adjusted down in priority. - Pull some conditional code out of sched_tick(). It's looking a bit large now.
* ULE 2.0:jeff2007-01-041-410/+330
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Remove the double queue mechanism for timeshare threads. It was slow due to excess cache lines in play, caused suboptimal scheduling behavior with niced and other non-interactive processes, complicated priority lending, etc. - Use a circular queue with a floating starting index for timeshare threads. Enforces fairness by moving the insertion point closer to threads with worse priorities over time. - Give interactive timeshare threads real-time user-space priorities and place them on the realtime/ithd queue. - Select non-interactive timeshare thread priorities based on their cpu utilization over the last 10 seconds combined with the nice value. This gives us more sane priorities and behavior in a loaded system as compared to the old method of using the interactivity score. The interactive score quickly hit a ceiling if threads were non-interactive and penalized new hog threads. - Use one slice size for all threads. The slice is not currently dynamically set to adjust scheduling behavior of different threads. - Add some new sysctls for scheduling parameters. Bug fixes/Clean up: - Fix zeroing of td_sched after initialization in sched_fork_thread() caused by recent ksegrp removal. - Fix KSE interactivity issues related to frequent forking and exiting of kse threads. We simply disable the penalty for thread creation and exit for kse threads. - Cleanup the cpu estimator by using tickincr here as well. Keep ticks and ltick/ftick in the same frequency. Previously ticks were stathz and others were hz. - Lots of new and updated comments. - Many many others. Tested on: up x86/amd64, 8way amd64.
* - More search and replace prettying.jeff2006-12-291-12/+12
|
* - Clean up a bit after the most recent KSE restructuring.jeff2006-12-291-206/+201
|
* Changes to try fix sched_ule.c courtesy of David Xu.julian2006-12-061-12/+13
|
* Threading cleanup.. part 2 of several.julian2006-12-061-578/+553
| | | | | | | | | | | | | | | | | | | | | | Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.
* o Fix a couple of obvious typos.maxim2006-11-081-2/+2
|
* Make KSE a kernel option, turned on by default in all GENERICjb2006-10-261-234/+173
| | | | | | | kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
* Add user priority loaning code to support priority propagation fordavidxu2006-08-251-1/+57
| | | | | 1:1 threading's POSIX priority mutexes, the code is no-op unless priority-aware umtx code is committed.
* Add scheduler API sched_relinquish(), the API is used to implementdavidxu2006-06-151-0/+13
| | | | | | | yield() and sched_yield() syscalls. Every scheduler has its own way to relinquish cpu, the ULE and CORE schedulers have two internal run- queues, a timesharing thread which calls yield() syscall should be moved to inactive queue.
* Add scheduler CORE, the work I have done half a year ago, recent,davidxu2006-06-131-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I picked it up again. The scheduler is forked from ULE, but the algorithm to detect an interactive process is almost completely different with ULE, it comes from Linux paper "Understanding the Linux 2.6.8.1 CPU Scheduler", although I still use same word "score" as a priority boost in ULE scheduler. Briefly, the scheduler has following characteristic: 1. Timesharing process's nice value is seriously respected, timeslice and interaction detecting algorithm are based on nice value. 2. per-cpu scheduling queue and load balancing. 3. O(1) scheduling. 4. Some cpu affinity code in wakeup path. 5. Support POSIX SCHED_FIFO and SCHED_RR. Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler uses 256 priority queues. Unlike ULE which using pull and push, the scheduelr uses pull method, the main reason is to let relative idle cpu do the work, but current the whole scheduler is protected by the big sched_lock, so the benefit is not visible, it really can be worse than nothing because all other cpu are locked out when we are doing balancing work, which the 4BSD scheduelr does not have this problem. The scheduler does not support hyperthreading very well, in fact, the scheduler does not make the difference between physical CPU and logical CPU, this should be improved in feature. The scheduler has priority inversion problem on MP machine, it is not good for realtime scheduling, it can cause realtime process starving. As a result, it seems the MySQL super-smack runs better on my Pentium-D machine when using libthr, despite on UP or SMP kernel.
* Make ke_rqindex unsigned.davidxu2006-06-061-1/+1
|
* Use variable i instead of variable cpus as an index to get correct kseq.davidxu2005-12-271-1/+1
|
* Fix a bug in slice calculation code, current code uses hz butdavidxu2005-12-191-19/+31
| | | | | | sched_clock() is called by state clock. Submitted by: taku at tackymt dot homeip dot net
* Temporarily disable nice threshold detection code, as it can starvedavidxu2005-09-221-1/+3
| | | | | | | | a thread holding critical resource, e.g mutex or other implicit synchronous flags. Give thread which exceeds nice threshold a minimum time slice. PR: kern/86087
* Move up code for testing KEF_HOLD to avoid ke_cpu being changed unexpectlydavidxu2005-08-191-8/+8
| | | | for PRI_ITHD and PRI_REALTIME threads.
* Try best to keep a preempted thread at front of run queue, this seemsdavidxu2005-08-081-1/+9
| | | | | improved performance a bit for some workloads, but still seeing interactive lagging unless cpu idling race is fixed.
* If a thread was removed from system run queue, kse_assign shouldn'tdavidxu2005-07-311-0/+4
| | | | add it again.
* Cast to uintptr_t when the compiler complains. This unbreaks ULEdelphij2005-07-251-2/+4
| | | | scheduler breakage accompanied by the recent atomic_ptr() change.
* Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't meritpeter2005-06-241-1/+2
| | | | | | | being in opt_global.h and forcing a global recompile when only a few files reference it. Approved by: re
* - Fix the case where we're not preempting but there is already a newtdjeff2005-06-071-3/+11
| | | | | | | | | | as this happens via thread_switchout(). I don't particularly like the structure of the code here. We twice call out to thread code when a thread is voluntarily switching. Once to thread_switchout() and once to slot_fill(), while sched_4BSD does even more work which is redundant to select another thread to use our remaining slice. This should be simplified in the future, but for now I'm only going to fix the bug not the bad design.
* - It's 2005 already, I've been working on this for three years.jeff2005-06-041-1/+1
|
* - Don't SLOT_USE() in the preempt case, sched_add() has already taken thejeff2005-06-041-72/+37
| | | | | | | slot for us. Previously, we would take two slots on every preempt, and setrunqueue() would fix it up for us in the non threaded case. The threaded case was simply broken. - Clean up flags, prototypes, comments.
* Bring a working snapshot of hwpmc(4), its associated libraries, userland ↵jkoshy2005-04-191-1/+22
| | | | | | | | | | utilities and documentation into -CURRENT. Bump FreeBSD_version. Reviewed by: alc, jhb (kernel changes)
* Sprinkle some volatile magic and rearrange things a bit to avoid raceups2005-04-081-1/+1
| | | | | | conditions in critical_exit now that it no longer blocks interrupts. Reviewed by: jhb
* - A test in sched_switch() is no longer necessary and it is incorrectjeff2005-02-231-2/+0
| | | | | | when td0 is preempted before it voluntarily switches. Discovered by: Arjan Van Leeuwen <avleeuwen@gmail.com>
* - Add ke_runq == NULL to the conditions which will cause us to abortjeff2005-02-041-2/+2
| | | | | | adjusting timeshare loads in sched_class(). This is only important if the thread has never run, otherwise the state checks should work as expected.
* Fix a typo and two whitespace nits.jhb2004-12-301-3/+3
|
* Rework the interface between priority propagation (lending) and thejhb2004-12-301-20/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | schedulers a bit to ensure more correct handling of priorities and fewer priority inversions: - Add two functions to the sched(9) API to handle priority lending: sched_lend_prio() and sched_unlend_prio(). The turnstile code uses these functions to ask the scheduler to lend a thread a set priority and to tell the scheduler when it thinks it is ok for a thread to stop borrowing priority. The unlend case is slightly complex in that the turnstile code tells the scheduler what the minimum priority of the thread needs to be to satisfy the requirements of any other threads blocked on locks owned by the thread in question. The scheduler then decides where the thread can go back to normal mode (if it's normal priority is high enough to satisfy the pending lock requests) or it it should continue to use the priority specified to the sched_unlend_prio() call. This involves adding a new per-thread flag TDF_BORROWING that replaces the ULE-only kse flag for priority elevation. - Schedulers now refuse to lower the priority of a thread that is currently borrowing another therad's priority. - If a scheduler changes the priority of a thread that is currently sitting on a turnstile, it will call a new function turnstile_adjust() to inform the turnstile code of the change. This function resorts the thread on the priority list of the turnstile if needed, and if the thread ends up at the head of the list (due to having the highest priority) and its priority was raised, then it will propagate that new priority to the owner of the lock it is blocked on. Some additional fixes specific to the 4BSD scheduler include: - Common code for updating the priority of a thread when the user priority of its associated kse group has been consolidated in a new static function resetpriority_thread(). One change to this function is that it will now only adjust the priority of a thread if it already has a time sharing priority, thus preserving any boosts from a tsleep() until the thread returns to userland. Also, resetpriority() no longer calls maybe_resched() on each thread in the group. Instead, the code calling resetpriority() is responsible for calling resetpriority_thread() on any threads that need to be updated. - schedcpu() now uses resetpriority_thread() instead of just calling sched_prio() directly after it updates a kse group's user priority. - sched_clock() now uses resetpriority_thread() rather than writing directly to td_priority. - sched_nice() now updates all the priorities of the threads after the group priority has been adjusted. Discussed with: bde Reviewed by: ups, jeffr Tested on: 4bsd, ule Tested on: i386, alpha, sparc64
* - Unintentionally checked in a debugging panic. Remove that.jeff2004-12-261-4/+0
|
* - Fix a long standing problem where an ithread would not honor sched_pin().jeff2004-12-261-127/+140
| | | | | | | | | | | | | | | | | | | | | | | | - Remove the sched_add wrapper that used sched_add_internal() as a backend. Its only purpose was to interpret one flag and turn it into an int. Do the right thing and interpret the flag in sched_add() instead. - Pass the flag argument to sched_add() to kseq_runq_add() so that we can get the SRQ_PREEMPT optimization too. - Add a KEF_INTERNAL flag. If KEF_INTERNAL is set we don't adjust the SLOT counts, otherwise the slot counts are adjusted as soon as we enter sched_add() or sched_rem() rather than when the thread is actually placed on the run queue. This greatly simplifies the handling of slots. - Remove the explicit prevention of migration for ithreads on non-x86 platforms. This was never shown to have any real benefit. - Remove the unused class argument to KSE_CAN_MIGRATE(). - Add ktr points for thread migration events. - Fix a long standing bug on platforms which don't initialize the cpu topology. The ksg_maxid variable was never correctly set on these platforms which caused the long term load balancer to never inspect more than the first group or processor. - Fix another bug which prevented the long term load balancer from working properly. If stathz != hz we can't expect sched_clock() to be called on the exact tick count that we're anticipating. - Rearrange sched_switch() a bit to reduce indentation levels.
* - Remove earlier KTR_ULE tracepoints.jeff2004-12-261-32/+14
| | | | | - Define new KTR_SCHED points so that we can graph the operation of the scheduler.
* - Garbage collect several unused members of struct kse and struce ksegrp.jeff2004-12-141-9/+0
| | | | As best as I can tell, some of these were never used.
* - In kseq_choose(), don't recalculate slice values for processes with ajeff2004-12-141-11/+25
| | | | | | | | | | | | | nice of 0. Doing so can cause an infinite loop because they should be running, but a nice -20 process could prevent them from doing so. - Add a new flag KEF_PRIOELEV to flag a thread that has had its priority elevated due to priority propagation. If a thread has had its priority elevated, we assume that it must go on the current queue and it must get a slice. - In sched_userret() if our priority was elevated and we shouldn't have a timeslice, yield here until we should. Found/Tested by: glebius
* - Take up a 'slot' while we're on the assigned queue, waiting to bejeff2004-12-131-16/+16
| | | | | posted to another processor. Otherwise, kern_switch() gets confused and tries to sched_add(NULL).
* - Temporarily disable the nice -20 throttling code. It has some interactionjeff2004-11-111-0/+4
| | | | | | with APM that I do not understand yet. Reported & Tested by: glebius
* - When choosing a thread on the run queue, check to see if its nice isjeff2004-10-301-2/+4
| | | | | | | outside of the nice threshold due to a recently awoken thread with a lower nice value. This further reduces the amount of time a positively niced thread gets while running in conjunction with a workload that has many short sleeps (ie buildworld).
* - In sched_prio() check to see if the kse is assigned to a runq as thejeff2004-10-301-1/+1
| | | | | | check for TD_ON_RUNQ() no longer means the thread is really on a run- queue. I suspect this state should be re-evaluated as it must mean something else now. This fixes ULE+KSE+PREEMPTION on UP x86.
* Fix whitespace botch that only showed up in the commit message diff :-/julian2004-10-051-1/+1
| | | | MFC after: 4 days
* When preempting a thread, put it back on the HEAD of its run queue.julian2004-10-051-6/+14
| | | | | | (Only really implemented in 4bsd) MFC after: 4 days
* Oops. left out part of the diff.julian2004-10-051-0/+2
| | | | MFC after: 4 days
* Use some macros to trach available scheduler slots to allowjulian2004-10-051-16/+30
| | | | | | easier debugging. MFC after: 4 days
OpenPOWER on IntegriCloud