summaryrefslogtreecommitdiffstats
path: root/sys/kern/sched_ule.c
Commit message (Collapse)AuthorAgeFilesLines
...
* clean up thread runq accounting a bit.julian2004-09-161-0/+2
| | | | MFC after: 3 days
* Revert the previous round of changes to td_pinned. The scheduler isn'tscottl2004-09-111-24/+2
| | | | | | | | fully initialed when the pmap layer tries to call sched_pini() early in the boot and results in an quick panic. Use ke_pinned instead as was originally done with Tor's patch. Approved by: julian
* Try committing from the right tree this timejulian2004-09-111-3/+3
| | | | MFC after: 2 days
* Make up my mind if cpu pinning is stored in the thread structure or thejulian2004-09-101-1/+22
| | | | | | | | | scheduler specific extension to it. Put it in the extension as the implimentation details of how the pinning is done needn't be visible outside the scheduler. Submitted by: tegge (of course!) (with changes) MFC after: 3 days
* Add some code to allow threads to nominat a sibling to run if theyu are ↵julian2004-09-101-1/+1
| | | | | | going to sleep. MFC after: 1 week
* Refactor a bunch of scheduler code to give basically the same behaviourjulian2004-09-051-108/+163
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
* Turn PREEMPTION into a kernel option. Make sure that it's defined ifscottl2004-09-021-0/+14
| | | | | | FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is enabled (code inspired by the PREEMPTION warning in kern_switch.c). This is a possible MT5 candidate.
* Give setrunqueue() and sched_add() more of a clue as tojulian2004-09-011-4/+13
| | | | | | where they are coming from and what is expected from them. MFC after: 2 days
* Commit Jeff's suggested changes for avoiding a bug that is exposed bypeter2004-08-281-4/+2
| | | | | | | preemption and/or the rev 1.79 kern_switch.c change that was backed out. The thread was being assigned to a runq without adding in the load, which would cause the counter to hit -1.
* - Introduce a new flag KEF_HOLD that prevents sched_add() from doing ajeff2004-08-121-7/+19
| | | | | | | | | migration. Use this in sched_prio() and sched_switch() to stop us from migrating threads that are in short term sleeps or are runnable. These extra migrations were added in the patches to support KSE. - Only set NEEDRESCHED if the thread we're adding in sched_add() is a lower priority and is being placed on the current queue. - Fix some minor whitespace problems.
* - Use a new flag, KEF_XFERABLE, to record with certainty that this kse hadjeff2004-08-101-34/+76
| | | | | | | | | | | | | | | | | | | contributed to the transferable load count. This prevents any potential problems with sched_pin() being used around calls to setrunqueue(). - Change the sched_add() load balancing algorithm to try to migrate on wakeup. This attempts to place threads that communicate with each other on the same CPU. - Don't clear the idle counts in kseq_transfer(), let the cpus do that when they call sched_add() from kseq_assign(). - Correct a few out of date comments. - Make sure the ke_cpu field is correct when we preempt. - Call kseq_assign() from sched_clock() to catch any assignments that were done without IPI. Presently all assignments are done with an IPI, but I'm trying a patch that limits that. - Don't migrate a thread if it is still runnable in sched_add(). Previously, this could only happen for KSE threads, but due to changes to sched_switch() all threads went through this path. - Remove some code that was added with preemption but is not necessary.
* Avoid casts as lvalues.kan2004-07-281-2/+2
|
* Clean up whitespace, increase consistency and correctness.scottl2004-07-231-5/+3
| | | | Submitted by: bde
* When calling scheduler entrypoints for creating new threads and processes,julian2004-07-181-15/+18
| | | | | | | | | | | specify "us" as the thread not the process/ksegrp/kse. You can always find the others from the thread but the converse is not true. Theorotically this would lead to runtime being allocated to the wrong entity in some cases though it is not clear how often this actually happenned. (would only affect threaded processes and would probably be pretty benign, but it WAS a bug..) Reviewed by: peter
* - Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflagsjhb2004-07-161-1/+2
| | | | | | | | | since they are only accessed by curthread and thus do not need any locking. - Move pr_addr and pr_ticks out of struct uprof (which is per-process) and directly into struct thread as td_profil_addr and td_profil_ticks as these variables are really per-thread. (They are used to defer an addupc_intr() that was too "hard" until ast()).
* Update for the KDB framework:marcel2004-07-101-4/+2
| | | | o Call kdb_backtrace() instead of backtrace().
* - Move contents of sched_add() into a sched_add_internal() function thatjhb2004-07-081-5/+11
| | | | | | | | | | | takes an argument to specify if it should preempt or not. Don't preempt when sched_add_internal() is called from kseq_idled() or kseq_assign() as in those cases we are about to call mi_switch() anyways. Also, doing so during the first context switch on an AP leads to a NULL pointer deref because curthread is NULL. - Reenable preemption for ULE. Submitted by: Taku YAMAMOTO taku at tackymt.homeip.net
* Temporarily disable preemption in SCHED_ULE due to reported panics andrwatson2004-07-061-0/+2
| | | | | | | | | hangs due to recent preemption changes. This change appears to remove the panic that I was running into, but at the cost of increasing ithread scheduling latency, and as such is a temporary band-aid until jhb has a chance to resolve the ule<->preemption interaction that is the source of the problem. If it doesn't fix the problem for others-- sorry!
* Add NULL arg to mi_switch() call to stop kernel compiles from breaking.phk2004-07-031-1/+1
|
* Fix SCHED_ULE build on SMP. The previous revision (1.110)bmilekic2004-07-031-1/+1
| | | | | | | | introduced a KSE_CAN_MIGRATE() invocation with one argument missing (class). Either this is a genuine forget or it crept in from JHB's repo where he may have modified it. If it's the latter then it may require more attention. For now fix the make depend.
* Implement preemption of kernel threads natively in the scheduler ratherjhb2004-07-021-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)
* - Change mi_switch() and sched_switch() to accept an optional thread tojhb2004-07-021-5/+9
| | | | | | | | | | | | | switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.
* Add the sysctl node 'kern.sched.name' that has the name of the schedulerscottl2004-06-211-0/+5
| | | | | currently in use. Move the 4bsd kern.quantum node to kern.sched.quantum for consistency.
* Nice, is a property of a process as a whole..julian2004-06-161-24/+30
| | | | | I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.
* - Run sched_balance() and sched_balance_groups() from hardclock viajeff2004-06-021-38/+21
| | | | | | | | sched_clock() rather than using callouts. This means we no longer have to take the load of the callout thread into consideration while balancing and should make the balancing decisions simpler and more accurate. Tested on: x86/UP, amd64/SMP
* There was a thread on "unusually high load averages" when running underobrien2004-04-221-2/+2
| | | | | | | | | | sched_ule, in January 2004. Looking at this, "pagezero" is (one of) the culprit(s). We had no provision for processes with P_NOLOAD set. With pagezero not running at PRI_ITHD, kseq_load_{add,rem} count pagezero as another-normal-process, thus the "expected-plus-one" load reported in the above thread. Submitted by: Nikos Ntarmos <ntarmos@ceid.upatras.gr>
* Spell "switches" a more conventional way.cognet2004-04-091-1/+1
|
* - Use the proper constant in sched_interact_update(). Previously,jeff2004-04-041-1/+1
| | | | | | SCHED_INTERACT_MAX was used where SCHED_SLP_RUN_MAX was needed. This was causing the interactivity scaler to lose history at a more dramatic rate than intended.
* Change the type of the various CPU masks to cpumask_t. Note that asmarcel2004-03-271-4/+4
| | | | | | | long as there are still explicit uses of int, whether in types or in function names (such as atomic_set_int() in sched_ule.c), we can not change cpumask_t to be anything other than u_int. See also the commit log for sys/sys/types.h, revision 1.84.
* Give a more reasonable CPU time to the threads which are using schedulerobrien2004-03-211-6/+3
| | | | | | | | | | | | | activation (i.e., applications are using libpthread). This is because SCHED_ULE sometimes puts P_SA processes into ksq_next unnecessarily. Which doesn't give fair amount of CPU time to processes which are using scheduler-activation-based threads when other (semi-)CPU-intensive, non-P_SA processes are running. Further work will no doubt be done by jeffr at a later date. Submitted by: Taku YAMAMOTO <taku@cent.saitama-u.ac.jp> Reviewed by: rwatson, freebsd-current@
* Switch the sleep/wakeup and condition variable implementations to use thejhb2004-02-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | sleep queue interface: - Sleep queues attempt to merge some of the benefits of both sleep queues and condition variables. Having sleep qeueus in a hash table avoids having to allocate a queue head for each wait channel. Thus, struct cv has shrunk down to just a single char * pointer now. However, the hash table does not hold threads directly, but queue heads. This means that once you have located a queue in the hash bucket, you no longer have to walk the rest of the hash chain looking for threads. Instead, you have a list of all the threads sleeping on that wait channel. - Outside of the sleepq code and the sleep/cv code the kernel no longer differentiates between cv's and sleep/wakeup. For example, calls to abortsleep() and cv_abort() are replaced with a call to sleepq_abort(). Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and cv_waitq_remove() have been replaced with calls to sleepq_remove(). - The sched_sleep() function no longer accepts a priority argument as sleep's no longer inherently bump the priority. Instead, this is soley a propery of msleep() which explicitly calls sched_prio() before blocking. - The TDF_ONSLEEPQ flag has been dropped as it was never used. The associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been dropped and replaced with a single explicit clearing of td_wchan. TD_SET_ONSLEEPQ() would really have only made sense if it had taken the wait channel and message as arguments anyway. Now that that only happens in one place, a macro would be overkill.
* - Allow interactive tasks to use the maximum time-slice. This is not asjeff2004-02-011-1/+1
| | | | | | detrimental as I thought it would be in the case of massive process storms from a shell and it makes regular desktop usage noticeably better.
* - Add a new member to struct kseq called ksq_sysload. This is intended tojeff2004-02-011-3/+27
| | | | | | | | track the load for the sched_load() function. In the SMP case this member is not defined because it would be redundant with the ksg_load member which already tracks the non ithd load. - For sched_load() in the UP case simply return ksq_sysload. In the SMP case traverse the list of kseq groups and sum up their ksg_load fields.
* - sched_strict has been dead for a long time now. Get rid of it.jeff2004-01-251-3/+0
|
* - Clean up KASSERTS.jeff2004-01-251-4/+4
|
* - Add a flags parameter to mi_switch. The value of flags may be SW_VOL orjeff2004-01-251-2/+1
| | | | | | | | | | SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.
* - Make our transfer decisions based on load and not transferable load. Ajeff2003-12-201-7/+1
| | | | | | cpu could have been bogged down with non-transferable load and still not migrated a new thread to an idle cpu. This required some benchmarking and tuning to get right as the comment above it suggests.
* - Enable ithread migration on x86. This is done to work around a bug in thejeff2003-12-201-0/+10
| | | | | IO APIC on Xeons that prevents round-robin interrupt assignment from working.
* - In kseq_transfer() return if smp has not been started.jeff2003-12-201-9/+14
| | | | | | | | - In sched_add(), do the idle check prior to the transfer check so that we don't try to transfer load from an idle cpu. This fixes panics caused by IPIs on UP machines running SMP kernels. Reported/Debugged by: seanc
* - Running interactive tasks with the minimum time-slice is fine for vi andjeff2003-12-201-1/+2
| | | | | sh, but not so great for mozilla, X, etc. Add a fixed define for the slice size granted to interactive KSEs.
* - Assign the ke_cpu field in kseq_notify() so that all of our callers do notjeff2003-12-141-4/+2
| | | | | | have to do it. - Set the ke_runq to NULL in sched_add() before calling kseq_notify(). Otherwise we may panic in sched_add() if INVARIANTS is on.
* - Now that we have kseq groups, balance them seperately.jeff2003-12-121-47/+130
| | | | | | | | | | | | | | - The new sched_balance_groups() function does intra-group balancing while sched_balance() balances the available groups. - Pick a random time between 0 ticks and hz * 2 ticks to restart each balancing process. Each balancer has its own timeout. - Pick a random place in the list of groups to start the search for lowest and highest group loads. This prevents us from prefering a group based on numeric position. - Use a nasty hack to stop us from preferring cpu 0. The problem is that softclock always runs on cpu 0, so it always has a little extra load. We ignore this load in the balancer for now. In the future softclock should run on a random cpu and these hacks can go away.
* - Don't let the pctcpu rate limiter throttle us if we have recorded overjeff2003-12-111-1/+2
| | | | | SCHED_CPU_TICKS ticks. This was allowing processes to display (1/SCHED_CPU_TIME * 100) % more cpu than they had used.
* - In sched_switch(), if a thread has been assigned, don't touch the runqueuesjeff2003-12-111-15/+21
| | | | | or load. These things have already been taken care of in sched_bind() which should be the only place that we're switching in an assigned thread.
* - Add support for CPU groups to ule. All SMT cores on the same physicaljeff2003-12-111-116/+263
| | | | | | | | | | | | | | | | cpu are added to a group. - Don't place a cpu into the kseq_idle bitmask until all cpus in that group have idled. - Prefer idle groups over idle group members in the new kseq_transfer() function. In this way we will prefer to balance load across full cores rather than add further load a partial core. - Before a cpu goes idle, check the other group members for threads. Since SMT cpus may freely share threads, this is cheap. - SMT cores may be individually pinned and bound to now. This contrasts the old mechanism where binding or pinning would have allowed a thread to run on any available cpu. - Remove some unnecessary logic from sched_switch(). Priority propagation should be properly taken care of in sched_prio() now.
* rqb_bits[] may be an int64_t (eg: on alpha, and recently on amd64).peter2003-12-071-1/+1
| | | | | | | Be sure to shift (long)1 << 33 and higher, not (int)1. Otherwise bad things happen(TM). This is why beast.freebsd.org paniced with ULE. Reviewed by: jeff
* Fix all users of mp_maxid to use the same semantics, namely:jhb2003-12-031-1/+1
| | | | | | | | 1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1. 2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid. Approved by: re (scottl) Tested on: i386, amd64, alpha
* - Mark ksq_assigned as volatile so that when this code is used withoutjeff2003-11-171-3/+3
| | | | sched_lock we can be sure that we'll pick up the new value.
* - Remove long dead code. rslices hasn't been used in some time and neitherjeff2003-11-171-52/+4
| | | | has sched_pickcpu().
* - Introduce kseq_runq_{add,rem}() which are used to insert and removejeff2003-11-151-61/+83
| | | | | | | | | | | | | | | | | | | | | | kses from the run queues. Also, on SMP, we track the transferable count here. Threads are transferable only as long as they are on the run queue. - Previously, we adjusted our load balancing based on the transferable count minus the number of actual cpus. This was done to account for the threads which were likely to be running. All of this logic is simpler now that transferable accounts for only those threads which can actually be taken. Updated various places in sched_add() and kseq_balance() to account for this. - Rename kseq_{add,rem} to kseq_load_{add,rem} to reflect what they're really doing. The load is accounted for seperately from the runq because the load is accounted for even as the thread is running. - Fix a bug in sched_class() where we weren't properly using the PRI_BASE() version of the kg_pri_class. - Add a large comment that describes the impact of a seemingly simple conditional in sched_add(). - Also in sched_add() check the transferable count and KSE_CAN_MIGRATE() prior to checking kseq_idle. This reduces the frequency of access for kseq_idle which is a shared resource.
OpenPOWER on IntegriCloud