summaryrefslogtreecommitdiffstats
path: root/sys/kern/sched_ule.c
Commit message (Collapse)AuthorAgeFilesLines
* Switch the sleep/wakeup and condition variable implementations to use thejhb2004-02-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | sleep queue interface: - Sleep queues attempt to merge some of the benefits of both sleep queues and condition variables. Having sleep qeueus in a hash table avoids having to allocate a queue head for each wait channel. Thus, struct cv has shrunk down to just a single char * pointer now. However, the hash table does not hold threads directly, but queue heads. This means that once you have located a queue in the hash bucket, you no longer have to walk the rest of the hash chain looking for threads. Instead, you have a list of all the threads sleeping on that wait channel. - Outside of the sleepq code and the sleep/cv code the kernel no longer differentiates between cv's and sleep/wakeup. For example, calls to abortsleep() and cv_abort() are replaced with a call to sleepq_abort(). Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and cv_waitq_remove() have been replaced with calls to sleepq_remove(). - The sched_sleep() function no longer accepts a priority argument as sleep's no longer inherently bump the priority. Instead, this is soley a propery of msleep() which explicitly calls sched_prio() before blocking. - The TDF_ONSLEEPQ flag has been dropped as it was never used. The associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been dropped and replaced with a single explicit clearing of td_wchan. TD_SET_ONSLEEPQ() would really have only made sense if it had taken the wait channel and message as arguments anyway. Now that that only happens in one place, a macro would be overkill.
* - Allow interactive tasks to use the maximum time-slice. This is not asjeff2004-02-011-1/+1
| | | | | | detrimental as I thought it would be in the case of massive process storms from a shell and it makes regular desktop usage noticeably better.
* - Add a new member to struct kseq called ksq_sysload. This is intended tojeff2004-02-011-3/+27
| | | | | | | | track the load for the sched_load() function. In the SMP case this member is not defined because it would be redundant with the ksg_load member which already tracks the non ithd load. - For sched_load() in the UP case simply return ksq_sysload. In the SMP case traverse the list of kseq groups and sum up their ksg_load fields.
* - sched_strict has been dead for a long time now. Get rid of it.jeff2004-01-251-3/+0
|
* - Clean up KASSERTS.jeff2004-01-251-4/+4
|
* - Add a flags parameter to mi_switch. The value of flags may be SW_VOL orjeff2004-01-251-2/+1
| | | | | | | | | | SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.
* - Make our transfer decisions based on load and not transferable load. Ajeff2003-12-201-7/+1
| | | | | | cpu could have been bogged down with non-transferable load and still not migrated a new thread to an idle cpu. This required some benchmarking and tuning to get right as the comment above it suggests.
* - Enable ithread migration on x86. This is done to work around a bug in thejeff2003-12-201-0/+10
| | | | | IO APIC on Xeons that prevents round-robin interrupt assignment from working.
* - In kseq_transfer() return if smp has not been started.jeff2003-12-201-9/+14
| | | | | | | | - In sched_add(), do the idle check prior to the transfer check so that we don't try to transfer load from an idle cpu. This fixes panics caused by IPIs on UP machines running SMP kernels. Reported/Debugged by: seanc
* - Running interactive tasks with the minimum time-slice is fine for vi andjeff2003-12-201-1/+2
| | | | | sh, but not so great for mozilla, X, etc. Add a fixed define for the slice size granted to interactive KSEs.
* - Assign the ke_cpu field in kseq_notify() so that all of our callers do notjeff2003-12-141-4/+2
| | | | | | have to do it. - Set the ke_runq to NULL in sched_add() before calling kseq_notify(). Otherwise we may panic in sched_add() if INVARIANTS is on.
* - Now that we have kseq groups, balance them seperately.jeff2003-12-121-47/+130
| | | | | | | | | | | | | | - The new sched_balance_groups() function does intra-group balancing while sched_balance() balances the available groups. - Pick a random time between 0 ticks and hz * 2 ticks to restart each balancing process. Each balancer has its own timeout. - Pick a random place in the list of groups to start the search for lowest and highest group loads. This prevents us from prefering a group based on numeric position. - Use a nasty hack to stop us from preferring cpu 0. The problem is that softclock always runs on cpu 0, so it always has a little extra load. We ignore this load in the balancer for now. In the future softclock should run on a random cpu and these hacks can go away.
* - Don't let the pctcpu rate limiter throttle us if we have recorded overjeff2003-12-111-1/+2
| | | | | SCHED_CPU_TICKS ticks. This was allowing processes to display (1/SCHED_CPU_TIME * 100) % more cpu than they had used.
* - In sched_switch(), if a thread has been assigned, don't touch the runqueuesjeff2003-12-111-15/+21
| | | | | or load. These things have already been taken care of in sched_bind() which should be the only place that we're switching in an assigned thread.
* - Add support for CPU groups to ule. All SMT cores on the same physicaljeff2003-12-111-116/+263
| | | | | | | | | | | | | | | | cpu are added to a group. - Don't place a cpu into the kseq_idle bitmask until all cpus in that group have idled. - Prefer idle groups over idle group members in the new kseq_transfer() function. In this way we will prefer to balance load across full cores rather than add further load a partial core. - Before a cpu goes idle, check the other group members for threads. Since SMT cpus may freely share threads, this is cheap. - SMT cores may be individually pinned and bound to now. This contrasts the old mechanism where binding or pinning would have allowed a thread to run on any available cpu. - Remove some unnecessary logic from sched_switch(). Priority propagation should be properly taken care of in sched_prio() now.
* rqb_bits[] may be an int64_t (eg: on alpha, and recently on amd64).peter2003-12-071-1/+1
| | | | | | | Be sure to shift (long)1 << 33 and higher, not (int)1. Otherwise bad things happen(TM). This is why beast.freebsd.org paniced with ULE. Reviewed by: jeff
* Fix all users of mp_maxid to use the same semantics, namely:jhb2003-12-031-1/+1
| | | | | | | | 1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1. 2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid. Approved by: re (scottl) Tested on: i386, amd64, alpha
* - Mark ksq_assigned as volatile so that when this code is used withoutjeff2003-11-171-3/+3
| | | | sched_lock we can be sure that we'll pick up the new value.
* - Remove long dead code. rslices hasn't been used in some time and neitherjeff2003-11-171-52/+4
| | | | has sched_pickcpu().
* - Introduce kseq_runq_{add,rem}() which are used to insert and removejeff2003-11-151-61/+83
| | | | | | | | | | | | | | | | | | | | | | kses from the run queues. Also, on SMP, we track the transferable count here. Threads are transferable only as long as they are on the run queue. - Previously, we adjusted our load balancing based on the transferable count minus the number of actual cpus. This was done to account for the threads which were likely to be running. All of this logic is simpler now that transferable accounts for only those threads which can actually be taken. Updated various places in sched_add() and kseq_balance() to account for this. - Rename kseq_{add,rem} to kseq_load_{add,rem} to reflect what they're really doing. The load is accounted for seperately from the runq because the load is accounted for even as the thread is running. - Fix a bug in sched_class() where we weren't properly using the PRI_BASE() version of the kg_pri_class. - Add a large comment that describes the impact of a seemingly simple conditional in sched_add(). - Also in sched_add() check the transferable count and KSE_CAN_MIGRATE() prior to checking kseq_idle. This reduces the frequency of access for kseq_idle which is a shared resource.
* - Somehow I botched my last commit. Add an extra ( to fix things up. I'mjeff2003-11-061-1/+1
| | | | | | still not sure how this happened. Reported by: ps
* - Remove the local definition of sched_pin and unpin. They are provided injeff2003-11-061-17/+3
| | | | | sched.h now. - Respect the td pin count.
* - It's ok if sched_runnable() has races in it, we don't need the sched_lockjeff2003-11-051-3/+4
| | | | here unless we have something on the assigned queue.
* - Add initial support for pinning and binding.jeff2003-11-041-2/+53
|
* - Remove kseq_find(), we no longer scan other cpu's run queues when we gojeff2003-11-031-66/+17
| | | | | | | | | | | | | idle. They figure out that we're idle fast enough that the cache pollution introduces by scanning their run queue is more expensive than waiting a little longer. - Add kseq_setidle() to mark us as being idle. Use this in place of kseq_find(). - Remove kseq_load_highest(), kseq_find() was the only consumer of this interface. kseq_balance() has it's own customized version that finds the lowest and highest loads simultaneously. Continuously told that this would be faster by: terry
* - Remove the ksq_loads[] array. We are only interested in three counts,jeff2003-11-021-33/+50
| | | | | | | | | the total load, the timeshare load, and the number of threads that can be migrated to another cpu. Account for these seperately. - Introduce a KSE_CAN_MIGRATE() macro which determines whether or not a KSE can be migrated to another CPU. Currently, this only checks to see if we're an interrupt handler. Eventually this will also be used to support CPU binding.
* - In sched_prio() only force us onto the current queue if our priority isjeff2003-11-021-1/+2
| | | | being elevated (numerically smaller).
* - Rename SCHED_PRI_NTHRESH to SCHED_SLICE_NTHRESH since it is only used injeff2003-11-021-10/+11
| | | | | | | | | slice assignment. Add a comment describing what it does. - Remove a stale XXX comment, the nice should not impact the interactivity, nice adjustments only effect non-interactive tasks in ULE. - Don't allow nice -20 tasks to totally starve nice 0 tasks. Give them at least SCHED_SLICE_MIN ticks. We still allow nice 0 tasks to starve nice +20 tasks as intended.
* - Remove uses of PRIO_TOTAL and replace them with SCHED_PRI_NRESVjeff2003-11-021-5/+5
| | | | | | | - SCHED_PRI_NRESV does not have the off by one error in PRIO_TOTAL so we do not have to account for it in the few places that we use it. Requested by: bde
* - Change sched_interact_update() to only accept slp+runtime values betweenjeff2003-11-021-27/+56
| | | | | | | | | | | | | | | | | | | 0 and SCHED_SLP_RUN_MAX * 2. This allows us to simplify the algorithm quite a bit. Before, it dealt with arbitrary values which required us to do nasty integer division tricks that didn't quite work out correctly. - Chnage sched_wakeup() to detect conditions where the slp+runtime could exceed SCHED_SLP_RUN_MAX * 2. This can happen if we go to sleep for longer than 6 seconds. In this case, we'll just clear the runtime and set the sleep time to the max. - Define a new function, sched_interact_fork() which updates the slp+runtime of a newly forked thread. We want to limit the amount of history retained from the parent so that we learn the child's behavior quickly. We don't, however want to decay it to nothing. Previously, we would simply divide each parameter by 100 whenever we forked. After a few forks the values would reach 0 and tasks would not be considered interactive. - Add another KTR entry, cleanup some existing entries. - Remove a useless sched_interact_update() from sched_priority(). This is already done by the callers that require it.
* - Add static to local functions and data where it was missing.jeff2003-10-311-78/+222
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Add an IPI based mechanism for migrating kses. This mechanism is broken down into several components. This is intended to reduce cache thrashing by eliminating most cases where one cpu touches another's run queues. - kseq_notify() appends a kse to a lockless singly linked list and conditionally sends an IPI to the target processor. Right now this is protected by sched_lock but at some point I'd like to get rid of the global lock. This is why I used something more complicated than a standard queue. - kseq_assign() processes our list of kses that have been assigned to us by other processors. This simply calls sched_add() for each item on the list after clearing the new KEF_ASSIGNED flag. This flag is used to indicate that we have been appeneded to the assigned queue but not added to the run queue yet. - In sched_add(), instead of adding a KSE to another processor's queue we use kse_notify() so that we don't touch their queue. Also in sched_add(), if KEF_ASSIGNED is already set return immediately. This can happen if a thread is removed and readded so that the priority is recorded properly. - In sched_rem() return immediately if KEF_ASSIGNED is set. All callers immediately readd simply to adjust priorites etc. - In sched_choose(), if we're running an IDLE task or the per cpu idle thread set our cpumask bit in 'kseq_idle' so that other processors may know that we are idle. Before this, make a single pass through the run queues of other processors so that we may find work more immediately if it is available. - In sched_runnable(), don't scan each processor's run queue, they will IPI us if they have work for us to do. - In sched_add(), if we're adding a thread that can be migrated and we have plenty of work to do, try to migrate the thread to an idle kseq. - Simplify the logic in sched_prio() and take the KEF_ASSIGNED flag into consideration. - No longer use kseq_choose() to steal threads, it can lose it's last argument. - Create a new function runq_steal() which operates like runq_choose() but skips threads based on some criteria. Currently it will not steal PRI_ITHD threads. In the future this will be used for CPU binding. - Create a kseq_steal() that checks each run queue with runq_steal(), use kseq_steal() in the places where we used kseq_choose() to steal with before.
* Removed sched_nest variable in sched_switch(). Context switches alwaysbde2003-10-291-3/+0
| | | | | | | | | | | | | | | | | begin with sched_lock held but not recursed, so this variable was always 0. Removed fixup of sched_lock.mtx_recurse after context switches in sched_switch(). Context switches always end with this variable in the same state that it began in, so there is no need to fix it up. Only sched_lock.mtx_lock really needs a fixup. Replaced fixup of sched_lock.mtx_recurse in fork_exit() by an assertion that sched_lock is owned and not recursed after it is fixed up. This assertion much match the one in mi_switch(), and if sched_lock were recursed then a non-null fixup of sched_lock.mtx_recurse would probably be needed again, unlike in sched_switch(), since fork_exit() doesn't return to its caller in the normal way.
* - Only change the run queue in sched_prio() if the kse is non null. threadsjeff2003-10-281-10/+2
| | | | | | can be in the TD_ON_RUNQ state and not have an associated kse. - Remove the PRI_IDLE special case from sched_clock(), it was not actually necessary.
* - Use a better algorithm in sched_pctcpu_update()jeff2003-10-271-56/+50
| | | | | | | | | | | | | | | | | | | | | Contributed by: Thomaswuerfl@gmx.de - In sched_prio(), adjust the run queue for threads which may need to move to the current queue due to priority propagation . - In sched_switch(), fix style bug introduced when the KSE support went in. Columns are 80 chars wide, not 90. - In sched_switch(), Fix the comparison in the idle case and explicitly re-initialize the runq in the not propagated case. - Remove dead code in sched_clock(). - In sched_clock(), If we're an IDLE class td set NEEDRESCHED so that threads that have become runnable will get a chance to. - In sched_runnable(), if we're not the IDLETD, we should not consider curthread when examining the load. This mimics the 4BSD behavior of returning 0 when the only runnable thread is running. - In sched_userret(), remove the code for setting NEEDRESCHED entirely. This is not necessary and is not implemented in 4BSD. - Use the correct comparison in sched_add() when checking to see if an idle prio task has had it's priority temporarily elevated.
* - If a thread is not bound to a kse return 0 from sched_pctcpu().jeff2003-10-201-0/+2
| | | | Reported by: pawel.worach@nordea.com
* - Only kse_reassign() in the !running case.jeff2003-10-161-8/+10
| | | | Reported by: kris
* - Call sched_add() with the correct argument on SMP.jeff2003-10-161-1/+1
| | | | Reported by: Valentin Chopov <valentin@valcho.net>
* - Fix a minor problem with my last commit, we don't want to return fromjeff2003-10-161-3/+1
| | | | | sched_switch if the thread is running, we want to fall through and pick a new thread because we have been preempted.
* - Collapse sched_switchin() and sched_switchout() into sched_switch(). Nowjeff2003-10-161-8/+9
| | | | | mi_switch() calls sched_switch() which calls cpu_switch(). This is actually one less function call than it had been.
* - Update the sched api. sched_{add,rem,clock,pctcpu} now all accept a tdjeff2003-10-161-7/+14
| | | | argument rather than a kse.
* - The non iterative algorithm for interact_update was broken due tojeff2003-10-161-8/+6
| | | | | | | | | | rounding errors. This was the source of the majority of the interactivity problems. Reintroduce the old algorithm and its XXX. - Up the interactivity threshold to 30. It really could stand to be even a tiny bit higher. - Let the sleep and run time accumulate up to 5 seconds of history rather than two. This helps stop XFree86 from becoming non-interactive during bursts of activity.
* - If our user_pri doesn't match our actual priority our priority has beenjeff2003-10-151-3/+10
| | | | | | | | | | | | | elevated either due to priority propagation or because we're in the kernel in either case, put us on the current queue so that we dont stop others from using important resources. At some point the priority elevations from sleeping in the kernel should go away. - Remove an optimization in sched_userret(). Before we would only set NEEDRESCHED if there was something of a higher priority available. This is a trivial optimization and it breaks priority propagation because it doesn't take threads which we may be blocking into account. Notice that the thread which is blocking others gets up to one tick of cpu time before we honor this NEEDRESCHED in sched_clock().
* - In SCHED_CURR() add holding Giant to the list of criteria that will keepjeff2003-10-121-8/+7
| | | | | | | | | you on the current queue. In the future, it would be nice if priority propagation could deterministicly pluck a thread off of the next queue and put it on the current queue. Until then this hack stops us from holding up our entire current queue, including interrupt handlers, while a thread on the next queue is blocked while holding Giant. - Inherit our pctcpu information from our parent.
* - Change a lame iterative algorithm to a constant time algorithm. Removejeff2003-10-041-4/+6
| | | | | | the XXX that complains about it as well. Submitted by: ThomasWuerfl@gmx.de
* - Somewhere along the line I stupidly removed critical logic fromjeff2003-09-201-10/+11
| | | | | sched_ptcpu_update(). This caused erroneous cpu times in TOP for processes that were asleep. Replace the code that was removed.
* Let SA process work under ULE scheduler, originally it would panic kernel.davidxu2003-08-261-18/+17
| | | | Reviewed by: jeff
* Change instances of callout_init that specify MPSAFE behaviour tosam2003-08-191-1/+1
| | | | | use CALLOUT_MPSAFE instead of "1" for the second parameter. This does not change the behaviour; it just makes the intent more clear.
* - When stealing a kse in kseq_move() ignore the current kseq's min nicejeff2003-07-081-7/+13
| | | | | value. We want to steal any thread, even one that is not given a slice on its current queue.
* - Clean up an unused variable.jeff2003-07-071-0/+2
| | | | Submitted by: Steve Kargl <skg@routmask.apl.washington.edu>
* - Parse the cpu topology map in sched_setup().jeff2003-07-041-13/+53
| | | | | | | | - Associate logical CPUs on the same physical core with the same kseq. - Adjust code that assumed there would only be one running thread in any kseq. - Wrap the HTT code with a ULE_HTT_EXPERIMENTAL ifdef. This is a start towards HyperThreading support but it isn't quite there yet.
OpenPOWER on IntegriCloud