summaryrefslogtreecommitdiffstats
path: root/sys/kern/sched_4bsd.c
Commit message (Collapse)AuthorAgeFilesLines
* Rework the interface between priority propagation (lending) and thejhb2004-12-301-14/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | schedulers a bit to ensure more correct handling of priorities and fewer priority inversions: - Add two functions to the sched(9) API to handle priority lending: sched_lend_prio() and sched_unlend_prio(). The turnstile code uses these functions to ask the scheduler to lend a thread a set priority and to tell the scheduler when it thinks it is ok for a thread to stop borrowing priority. The unlend case is slightly complex in that the turnstile code tells the scheduler what the minimum priority of the thread needs to be to satisfy the requirements of any other threads blocked on locks owned by the thread in question. The scheduler then decides where the thread can go back to normal mode (if it's normal priority is high enough to satisfy the pending lock requests) or it it should continue to use the priority specified to the sched_unlend_prio() call. This involves adding a new per-thread flag TDF_BORROWING that replaces the ULE-only kse flag for priority elevation. - Schedulers now refuse to lower the priority of a thread that is currently borrowing another therad's priority. - If a scheduler changes the priority of a thread that is currently sitting on a turnstile, it will call a new function turnstile_adjust() to inform the turnstile code of the change. This function resorts the thread on the priority list of the turnstile if needed, and if the thread ends up at the head of the list (due to having the highest priority) and its priority was raised, then it will propagate that new priority to the owner of the lock it is blocked on. Some additional fixes specific to the 4BSD scheduler include: - Common code for updating the priority of a thread when the user priority of its associated kse group has been consolidated in a new static function resetpriority_thread(). One change to this function is that it will now only adjust the priority of a thread if it already has a time sharing priority, thus preserving any boosts from a tsleep() until the thread returns to userland. Also, resetpriority() no longer calls maybe_resched() on each thread in the group. Instead, the code calling resetpriority() is responsible for calling resetpriority_thread() on any threads that need to be updated. - schedcpu() now uses resetpriority_thread() instead of just calling sched_prio() directly after it updates a kse group's user priority. - sched_clock() now uses resetpriority_thread() rather than writing directly to td_priority. - sched_nice() now updates all the priorities of the threads after the group priority has been adjusted. Discussed with: bde Reviewed by: ups, jeffr Tested on: 4bsd, ule Tested on: i386, alpha, sparc64
* - Wrap the thread count adjustment in sched_load_add() and sched_load_rem()jeff2004-12-261-6/+30
| | | | | | so that we may place some ktr entries nearby. - Define other KTR_SCHED tracepoints so that we may graph the operation of the scheduler.
* - Garbage collect several unused members of struct kse and struce ksegrp.jeff2004-12-141-9/+0
| | | | As best as I can tell, some of these were never used.
* Propagate TDF_NEEDRESCHED to replacement thread in sched_switch().ups2004-12-071-0/+3
| | | | | | Reviewed by: julian, jhb (in October) Approved by: sam (mentor) MFC after: 4 weeks
* When preempting a thread, put it back on the HEAD of its run queue.julian2004-10-051-26/+28
| | | | | | (Only really implemented in 4bsd) MFC after: 4 days
* Use some macros to trach available scheduler slots to allowjulian2004-10-051-4/+26
| | | | | | easier debugging. MFC after: 4 days
* clean up thread runq accounting a bit.julian2004-09-161-1/+13
| | | | MFC after: 3 days
* Add some kassertsjulian2004-09-131-0/+2
|
* Revert the previous round of changes to td_pinned. The scheduler isn'tscottl2004-09-111-23/+1
| | | | | | | | fully initialed when the pmap layer tries to call sched_pini() early in the boot and results in an quick panic. Use ke_pinned instead as was originally done with Tor's patch. Approved by: julian
* Make up my mind if cpu pinning is stored in the thread structure or thejulian2004-09-101-2/+22
| | | | | | | | | scheduler specific extension to it. Put it in the extension as the implimentation details of how the pinning is done needn't be visible outside the scheduler. Submitted by: tegge (of course!) (with changes) MFC after: 3 days
* Add some code to allow threads to nominat a sibling to run if theyu are ↵julian2004-09-101-1/+47
| | | | | | going to sleep. MFC after: 1 week
* Don't do IPIs on behalf of interrupt threads.julian2004-09-061-2/+3
| | | | | | | just punt straight on through to teh preemption code. Make a KASSSERT out of a condition that can no longer occur. MFC after: 1 week
* slight code cleanupjulian2004-09-051-2/+2
| | | | MFC after: 1 week
* turn on IPIs for 4bsd scheduler by default.julian2004-09-051-2/+2
| | | | MFC after: 1 week
* Refactor a bunch of scheduler code to give basically the same behaviourjulian2004-09-051-63/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
* Don't declare a function we are not defining.julian2004-09-031-0/+2
|
* fix compile for UPjulian2004-09-031-0/+4
|
* ooops finish last commit.julian2004-09-031-0/+1
| | | | moved the variables but not the declarations.
* Move 4bsd specific experimental IP code into the 4bsd file.julian2004-09-031-0/+126
| | | | Move the sysctls into kern.sched
* Give the 4bsd scheduler the ability to wake up idle processorsjulian2004-09-011-18/+59
| | | | | | when there is new work to be done. MFC after: 5 days
* Give setrunqueue() and sched_add() more of a clue as tojulian2004-09-011-5/+10
| | | | | | where they are coming from and what is expected from them. MFC after: 2 days
* diff reduction for upcoming patch. Use a macro that masksjulian2004-08-221-8/+9
| | | | | some of the odd goings on with sub-structures, because they will go away anyhow.
* Properly keep track of how many kses are on the system run queue(s).julian2004-08-111-2/+3
|
* Increase the amount of data exported by KTR in the KTR_RUNQ setting.julian2004-08-091-6/+5
| | | | | This extra data is needed to really follow what is going on in the threaded case.
* Clean up whitespace, increase consistency and correctness.scottl2004-07-231-8/+6
| | | | Submitted by: bde
* When calling scheduler entrypoints for creating new threads and processes,julian2004-07-181-14/+14
| | | | | | | | | | | specify "us" as the thread not the process/ksegrp/kse. You can always find the others from the thread but the converse is not true. Theorotically this would lead to runtime being allocated to the wrong entity in some cases though it is not clear how often this actually happenned. (would only affect threaded processes and would probably be pretty benign, but it WAS a bug..) Reviewed by: peter
* - Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflagsjhb2004-07-161-1/+2
| | | | | | | | | since they are only accessed by curthread and thus do not need any locking. - Move pr_addr and pr_ticks out of struct uprof (which is per-process) and directly into struct thread as td_profil_addr and td_profil_ticks as these variables are really per-thread. (They are used to defer an addupc_intr() that was too "hard" until ast()).
* Set TDF_NEEDRESCHED when a higher priority thread is scheduled injhb2004-07-131-1/+1
| | | | | | | | | sched_add() rather than just doing it in sched_wakeup(). The old ithread preemption code used to set NEEDRESCHED unconditionally if it didn't preempt which masked this bug in SCHED_4BSD. Noticed by: jake Reported by: kensmith, marcel
* Implement preemption of kernel threads natively in the scheduler ratherjhb2004-07-021-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)
* - Change mi_switch() and sched_switch() to accept an optional thread tojhb2004-07-021-6/+11
| | | | | | | | | | | | | switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.
* Fix another typo in the previous commit.scottl2004-06-211-1/+1
|
* Fix typo that somehow crept into the previous commitscottl2004-06-211-1/+1
|
* Add the sysctl node 'kern.sched.name' that has the name of the schedulerscottl2004-06-211-1/+8
| | | | | currently in use. Move the 4bsd kern.quantum node to kern.sched.quantum for consistency.
* Nice, is a property of a process as a whole..julian2004-06-161-5/+8
| | | | | I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.
* Remove advertising clause from University of California Regent's license,imp2004-04-051-4/+0
| | | | | | per letter dated July 22, 1999. Approved by: core
* Try not to crash instantly when signalling a libthr program to death.dfr2004-04-051-1/+1
|
* The roundrobin callout from sched_4bsd is MPSAFE, so set up therwatson2004-03-051-1/+1
| | | | | | callout as MPSAFE to avoid grabbing Giant. Reviewed by: jhb
* Switch the sleep/wakeup and condition variable implementations to use thejhb2004-02-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | sleep queue interface: - Sleep queues attempt to merge some of the benefits of both sleep queues and condition variables. Having sleep qeueus in a hash table avoids having to allocate a queue head for each wait channel. Thus, struct cv has shrunk down to just a single char * pointer now. However, the hash table does not hold threads directly, but queue heads. This means that once you have located a queue in the hash bucket, you no longer have to walk the rest of the hash chain looking for threads. Instead, you have a list of all the threads sleeping on that wait channel. - Outside of the sleepq code and the sleep/cv code the kernel no longer differentiates between cv's and sleep/wakeup. For example, calls to abortsleep() and cv_abort() are replaced with a call to sleepq_abort(). Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and cv_waitq_remove() have been replaced with calls to sleepq_remove(). - The sched_sleep() function no longer accepts a priority argument as sleep's no longer inherently bump the priority. Instead, this is soley a propery of msleep() which explicitly calls sched_prio() before blocking. - The TDF_ONSLEEPQ flag has been dropped as it was never used. The associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been dropped and replaced with a single explicit clearing of td_wchan. TD_SET_ONSLEEPQ() would really have only made sense if it had taken the wait channel and message as arguments anyway. Now that that only happens in one place, a macro would be overkill.
* - Disable ithread binding in all cases for now. This doesn't make as muchjeff2004-02-011-13/+5
| | | | | | sense with sched_4bsd as it does with sched_ule. - Use P_NOLOAD instead of the absence of td->td_ithd to determine whether or not a thread should be accounted for in sched_tdcnt.
* - Keep a variable 'sched_tdcnt' that is used for the local implementationjeff2004-02-011-2/+19
| | | | | | of sched_load(). This variable tracks the number of running and runnable non ithd threads. This removes the need to traverse the proc table and discover how many threads are runnable.
* - Correct function names listed in KASSERTs. These were copied from otherjeff2004-01-251-10/+11
| | | | code and it was sloppy of me not to adjust these sooner.
* - Implement cpu pinning and binding. This is acomplished by keeping a per-jeff2004-01-251-12/+128
| | | | | | cpu run queue that is only used for pinned or bound threads. Submitted by: Chris Bradfield <chrisb@ation.org>
* Create a separate kthread that executes sched_cpu() once a second. Becausejhb2003-12-261-6/+21
| | | | | | sched_cpu() locks an sx lock (allproc_lock) which can sleep if it fails to acquire the lock, it is not safe to execute this in a callout handler from softclock().
* Quick fix for scaling of statclock ticks in the SMP case. As explainedbde2003-11-091-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | in the log message for kern_sched.c 1.83 (which should have been repo-copied to preserve history for this file), the (4BSD) scheduler algorithm only works right if stathz is nearly 128 Hz. The old commit lock said 64 Hz; the scheduler actually wants nearly 16 Hz but there was a scale factor of 4 to give the requirement of 64 Hz, and rev.1.83 changed the scale factor so that the requirement became 128 Hz. The change of the scale factor was incomplete in the SMP case. Then scheduling ticks are provided by smp_ncpu CPUs, and the scheduler cannot tell the difference between this and 1 CPU providing scheduling ticks smp_ncpu times faster, so we need another scale factor of smp_ncp or an algorithm change. This quick fix uses the scale factor without even trying to optimize the runtime divisions required for this as is done for the other scale factor. The main algorithmic problem is the clamp on the scheduling tick counts. This was 295; it is now approximately 295 * smp_ncpu. When the limit is reached, threads get free timeslices and scheduling becomes very unfair to the threads that don't hit the limit. The limit can be reached and maintained in the worst case if the load average is larger than (limit / effective_stathz - 1) / 2 = 0.65 now (was just 0.08 with 2 CPUs before this change), so there are algorithmic problems even for a load average of 1. Fortunately, the worst case isn't common enough for the problem to be very noticeable (it is mainly for niced CPU hogs competing with less nice CPU hogs).
* Return a reasonable number for top or ps to display for M:N thread,davidxu2003-11-081-0/+2
| | | | | | | since there is no direct association between M:N thread and kse, sometimes, a thread does not have a kse, in that case, return a pctcpu from its last kse, it is not perfect, but gives a good number to be displayed.
* Removed sched_nest variable in sched_switch(). Context switches alwaysbde2003-10-291-3/+0
| | | | | | | | | | | | | | | | | begin with sched_lock held but not recursed, so this variable was always 0. Removed fixup of sched_lock.mtx_recurse after context switches in sched_switch(). Context switches always end with this variable in the same state that it began in, so there is no need to fix it up. Only sched_lock.mtx_lock really needs a fixup. Replaced fixup of sched_lock.mtx_recurse in fork_exit() by an assertion that sched_lock is owned and not recursed after it is fixed up. This assertion much match the one in mi_switch(), and if sched_lock were recursed then a non-null fixup of sched_lock.mtx_recurse would probably be needed again, unlike in sched_switch(), since fork_exit() doesn't return to its caller in the normal way.
* - The kse may be null in sched_pctcpu().jeff2003-10-161-1/+7
| | | | Reported by: kris
* - Collapse sched_switchin() and sched_switchout() into sched_switch(). Nowjeff2003-10-161-9/+10
| | | | | mi_switch() calls sched_switch() which calls cpu_switch(). This is actually one less function call than it had been.
* - Update the sched api. sched_{add,rem,clock,pctcpu} now all accept a tdjeff2003-10-161-8/+14
| | | | argument rather than a kse.
* Change instances of callout_init that specify MPSAFE behaviour tosam2003-08-191-1/+1
| | | | | use CALLOUT_MPSAFE instead of "1" for the second parameter. This does not change the behaviour; it just makes the intent more clear.
OpenPOWER on IntegriCloud