summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_synch.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Commit 2/14 of sched_lock decomposition.jeff2007-06-041-7/+7
| | | | | | | | | | | | | | | | | - Adapt sleepqueues to the new thread_lock() mechanism. - Delay assigning the sleep queue spinlock as the thread lock until after we've checked for signals. It is illegal for a thread to return in mi_switch() with any lock assigned to td_lock other than the scheduler locks. - Change sleepq_catch_signals() to do the switch if necessary to simplify the callers. - Simplify timeout handling now that locking a sleeping thread has the side-effect of locking the sleepqueue. Some previous races are no longer possible. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
* Do proper "locking" for missing vmmeters part.attilio2007-06-041-1/+1
| | | | | | | | Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)
* - Move rusage from being per-process in struct pstats to per-thread injeff2007-06-011-27/+5
| | | | | | | | | | | | | | | | | | | td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
* Revert VMCNT_* operations introduction.attilio2007-05-311-1/+1
| | | | | | | | Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
* - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulatingjeff2007-05-181-1/+1
| | | | | | | | vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>
* Fix a potential LOR with sx_sleep() and cv_wait() with sx locks byjhb2007-05-081-1/+7
| | | | | | 1) adding the thread to the sleepq via sleepq_add() before dropping the lock, and 2) dropping the sleepq lock around calls to lc_unlock() for sleepable locks (i.e. locks that use sleepq's in their implementation).
* Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes,jhb2007-03-211-3/+3
| | | | rwlocks, and sx locks to 'lock_object'.
* Allow threads to atomically release rw and sx locks while waiting for anjhb2007-03-091-28/+33
| | | | | | | | | | | | | | | | | | | event. Locking primitives that support this (mtx, rw, and sx) now each include their own foo_sleep() routine. - Rename msleep() to _sleep() and change it's 'struct mtx' object to a 'struct lock_object' pointer. _sleep() uses the recently added lc_unlock() and lc_lock() function pointers for the lock class of the specified lock to release the lock while the thread is suspended. - Add wrappers around _sleep() for mutexes (mtx_sleep()), rw locks (rw_sleep()), and sx locks (sx_sleep()). msleep() still exists and is now identical to mtx_sleep(), but it is deprecated. - Rename SLEEPQ_MSLEEP to SLEEPQ_SLEEP. - Rewrite much of sleep.9 to not be msleep(9) centric. - Flesh out the 'RETURN VALUES' section in sleep.9 and add an 'ERRORS' section. - Add __nonnull(1) to _sleep() and msleep_spin() so that the compiler will warn if you try to pass a NULL wait channel. The functions already have a KASSERT to that effect.
* Instead of doing comparisons using the pcpu area to see ifjulian2007-03-081-1/+1
| | | | | | | a thread is an idle thread, just see if it has the IDLETD flag set. That flag will probably move to the pflags word as it's permenent and never chenges for the life of the system so it doesn't need locking.
* Further system call comment cleanup:rwatson2007-03-051-1/+1
| | | | | | | | | | - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
* Print tid's rather than thread pointers in KTR_PROC traces.jhb2007-02-271-8/+8
|
* Add a new kernel sleep function pause(9). pause(9) is for places thatjhb2007-02-231-1/+21
| | | | | | | | want an equivalent of DELAY(9) that sleeps instead of spins. It accepts a wmesg and a timeout and is not interrupted by signals. It uses a private wait channel that should never be woken up by wakeup(9) or wakeup_one(9). Glanced at by: phk
* - Fix schedgraph output with KSE threads. Call thread_switchout() afterjeff2007-01-031-4/+8
| | | | calling CTR() so we don't confuse a new kse thread with a real preemption.
* Add second sleep queue so that sx and lockmgr can have separate sleepkmacy2006-12-161-4/+4
| | | | | | | queues for shared and exclusive acquisitions Submitted by: Attilio Rao Approved by: jhb
* Only grab the sched_lock if we actually need to modify the thread priority.phk2006-11-301-4/+5
| | | | | During a buildworld only 2/3 of the calls to msleep actually changed the priority.
* Change sleepq_add(9) argument from 'struct mtx *' to 'struct lock_object *',pjd2006-11-161-2/+3
| | | | | | | | which allows to use it with different kinds of locks. For example it allows to implement Solaris conditions variables which will be used in ZFS port on top of sx(9) locks. Reviewed by: jhb
* Adjust assertions to allow for magical properties of the 'lbolt' waitjhb2006-11-151-3/+3
| | | | | | | | | channel for tsleep(): - Allow tsleep() on &lbolt without Giant with a timeout 0 since &lbolt has an implied timeout. - If &lbolt is used with msleep() pass NULL to sleepq_add() for the lock object. Unlike other sleepq channels, &lbolt doesn't have an associated owning lock.
* Make KSE a kernel option, turned on by default in all GENERICjb2006-10-261-0/+2
| | | | | | | kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
* Use scheduler API sched_relinquish() to implement yield() syscall.davidxu2006-06-151-8/+2
|
* In the case of reentering the debugger due to an attempt to perform ajhb2006-06-031-10/+10
| | | | | context switch while in the debugger, reenter the debugger sooner before performing any statistics updates.
* Change msleep() and tsleep() to not alter the calling thread's priorityjhb2006-04-171-3/+5
| | | | | | | | | | | | if the specified priority is zero. This avoids a race where the calling thread could read a snapshot of it's current priority, then a different thread could change the first thread's priority, then the original thread would call sched_prio() inside msleep() undoing the change made by the second thread. I used a priority of zero as no thread that calls msleep() or tsleep() should be specifying a priority of zero anyway. The various places that passed 'curthread->td_priority' or some variant as the priority now pass 0.
* Fix a sleep queue race for KSE thread.davidxu2006-02-231-18/+4
| | | | Reviewed by: jhb
* Fixup some comments. Mutexes's are locked, not entered for several yearsjhb2006-02-221-5/+5
| | | | now and msleep blocks threads rather than processes.
* Fix a long standing race between sleep queue and threaddavidxu2006-02-151-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | suspension code. When a thread A is going to sleep, it calls sleepq_catch_signals() to detect any pending signals or thread suspension request, if nothing happens, it returns without holding process lock or scheduler lock, this opens a race window which allows thread B to come in and do process suspension work, however since A is still at running state, thread B can do nothing to A, thread A continues, and puts itself into actually sleeping state, but B has never seen it, and it sits there forever until B is woken up by other threads sometimes later(this can be very long delay or never happen). Fix this bug by forcing sleepq_catch_signals to return with scheduler lock held. Fix sleepq_abort() by passing it an interrupted code, previously, it worked as wakeup_one(), and the interruption can not be identified correctly by sleep queue code when the sleeping thread is resumed. Let thread_suspend_check() returns EINTR or ERESTART, so sleep queue no longer has to use SIGSTOP as a hack to build a return value. Reviewed by: jhb MFC after: 1 week
* CPU time accounting speedup (step 2)phk2006-02-111-0/+6
| | | | | | | | | | | | | | | | | | | Keep accounting time (in per-cpu) cputicks and the statistics counts in the thread and summarize into struct proc when at context switch. Don't reach across CPUs in calcru(). Add code to calibrate the top speed of cpu_tickrate() for variable cpu_tick hardware (like TSC on power managed machines). Don't enforce monotonicity (at least for now) in calcru. While the calibrated cpu_tickrate ramps up it may not be true. Use 27MHz counter on i386/Geode. Use TSC on amd64 & i386 if present. Use tick counter on sparc64
* Modify the way we account for CPU time spent (step 1)phk2006-02-071-5/+4
| | | | | | | | | | | | | | | | Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.
* patch(1) and I aren't friends today. Axe a duplicate copy ofjhb2005-12-291-82/+0
| | | | | | the msleep_spin() function definition. Spotted by: pjd
* Add a new function msleep_spin() which is a slightly stripped down versionjhb2005-12-291-0/+164
| | | | | | | | | | of msleep(). msleep_spin() doesn't support changing the priority of the thread while it is asleep nor does it support interruptible sleeps (PCATCH) or the PDROP flag. It does support timeouts however. It differs from msleep() in that the passed in mutex is a spin mutex. This means one can use msleep_spin() and wakeup() with a spin mutex similar to msleep() and wakeup() with a regular mutex. Note that the spin mutex in question needs to come before sched_lock and the sleepq locks in lock order.
* When checking to see if a process has exceeded its time limit, flag thejhb2005-11-281-2/+2
| | | | | | | | | | | | | | | process as over the limit when its time is >= to the limit rather than > the limit. Technically, if p->p_rux.rux_runtime.sec == p->p_pcpulimit and p->p_rux.rux_runtime.frac == 0, the process hasn't exceeded the limit yet. However, having the fraction exactly equal to 0 is rather rare, and it is not worth the overhead to handle that edge case. With just the > comparison, the process would have to exceed its limit by almost a second before it was killed. PR: kern/83192 Submitted by: Maciej Zawadzinski mzawadzinski at gmail dot com Reviewed by: bde MFC after: 1 week
* Use low level constructs borrowed from interrupt threads to wait forups2005-05-231-4/+3
| | | | | work in proc0. Remove the TDP_WAKEPROC0 workaround.
* Sprinkle some volatile magic and rearrange things a bit to avoid raceups2005-04-081-1/+1
| | | | | | conditions in critical_exit now that it no longer blocks interrupts. Reviewed by: jhb
* Don't recursively panic when we call mi_switch() in a critical section,jhb2005-03-311-1/+1
| | | | | even though calling mi_switch() after a panic is likely a bug anyway as the recursive panic only serves to make things worse.
* Stop explicitly touching td_base_pri outside of the scheduler and simplyjhb2004-12-301-2/+0
| | | | | set a thread's priority via sched_prio() when that is the desired action. The schedulers will start managing td_base_pri internally shortly.
* - Define KTR points for KTR_SCHED.jeff2004-12-261-0/+17
|
* Unlock mutex if PDROP was set by caller.davidxu2004-11-271-0/+2
|
* If a process needs to be swapped in, wakeup the swapper from withinscottl2004-10-161-5/+5
| | | | | | | | | critical_exit as the process is getting scheduled to run. This is subotimal but for now avoid the LOR between the scheduler and the sleepq systems. This is a 5.3 candidate. Submitted by: davidxu MFC After: 3 days
* Refine the turnstile and sleep queue interfaces just a bit:jhb2004-10-121-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | - Add a new _lock() call to each API that locks the associated chain lock for a lock_object pointer or wait channel. The _lookup() functions now require that the chain lock be locked via _lock() when they are called. - Change sleepq_add(), turnstile_wait() and turnstile_claim() to lookup the associated queue structure internally via _lookup() rather than accepting a pointer from the caller. For turnstiles, this means that the actual lookup of the turnstile in the hash table is only done when the thread actually blocks rather than being done on each loop iteration in _mtx_lock_sleep(). For sleep queues, this means that sleepq_lookup() is no longer used outside of the sleep queue code except to implement an assertion in cv_destroy(). - Change sleepq_broadcast() and sleepq_signal() to require that the chain lock is already required. For condition variables, this lets the cv_broadcast() and cv_signal() functions lock the sleep queue chain lock while testing the waiters count. This means that the waiters count internal to condition variables is no longer protected by the interlock mutex and cv_broadcast() and cv_signal() now no longer require that the interlock be held when they are called. This lets consumers of condition variables drop the lock before waking other threads which can result in fewer context switches. MFC after: 1 month
* Rework how we store process times in the kernel such that we always storejhb2004-10-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month
* clean up thread runq accounting a bit.julian2004-09-161-1/+1
| | | | MFC after: 3 days
* Add some code to allow threads to nominat a sibling to run if theyu are ↵julian2004-09-101-1/+1
| | | | | | going to sleep. MFC after: 1 week
* Refactor a bunch of scheduler code to give basically the same behaviourjulian2004-09-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
* Now that the return value semantics of cv's for multithreaded processesjhb2004-08-191-28/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | have been unified with that of msleep(9), further refine the sleepq interface and consolidate some duplicated code: - Move the pre-sleep checks for theaded processes into a thread_sleep_check() function in kern_thread.c. - Move all handling of TDF_SINTR to be internal to subr_sleepqueue.c. Specifically, if a thread is awakened by something other than a signal while checking for signals before going to sleep, clear TDF_SINTR in sleepq_catch_signals(). This removes a sched_lock lock/unlock combo in that edge case during an interruptible sleep. Also, fix sleepq_check_signals() to properly handle the condition if TDF_SINTR is clear rather than requiring the callers of the sleepq API to notice this edge case and call a non-_sig variant of sleepq_wait(). - Clarify the flags arguments to sleepq_add(), sleepq_signal() and sleepq_broadcast() by creating an explicit submask for sleepq types. Also, add an explicit SLEEPQ_MSLEEP type rather than a magic number of 0. Also, add a SLEEPQ_INTERRUPTIBLE flag for use with sleepq_add() and move the setting of TDF_SINTR to sleepq_add() if this flag is set rather than sleepq_catch_signals(). Note that it is the caller's responsibility to ensure that sleepq_catch_signals() is called if and only if this flag is passed to the preceeding sleepq_add(). Note that this also removes a sched_lock lock/unlock pair from sleepq_catch_signals(). It also ensures that for an interruptible sleep, TDF_SINTR is always set when TD_ON_SLEEPQ() is true.
* Increase the amount of data exported by KTR in the KTR_RUNQ setting.julian2004-08-091-4/+4
| | | | | This extra data is needed to really follow what is going on in the threaded case.
* Workaround a possible deadlock on SMP due to a spin lock LOR by disablingjhb2004-08-041-0/+6
| | | | | | | | the immediate awakening of proc0 (scheduler kproc, controls swapping processes in and out). The scheduler process periodically awakens already, so this will not result in processes not being swapped in, there will just be more latency in between a thread being made runnable and the scheduler waking up to swap the affected process back in.
* Use P_SINGLE_EXIT to check single-threading case, P_WEXIT is not for thatdavidxu2004-07-281-1/+1
| | | | purpose.
* - Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflagsjhb2004-07-161-1/+2
| | | | | | | | | since they are only accessed by curthread and thus do not need any locking. - Move pr_addr and pr_ticks out of struct uprof (which is per-process) and directly into struct thread as td_profil_addr and td_profil_ticks as these variables are really per-thread. (They are used to defer an addupc_intr() that was too "hard" until ast()).
* Update for the KDB framework:marcel2004-07-101-9/+5
| | | | | | | | | | | | | | | | | | | | | | o Make debugging code conditional upon KDB instead of DDB. o Call kdb_enter() instead of Debugger(). o Call kdb_backtrace() instead of db_print_backtrace() or backtrace(). kern_mutex.c: o Replace checks for db_active with checks for kdb_active and make them unconditional. kern_shutdown.c: o s/DDB_UNATTENDED/KDB_UNATTENDED/g o s/DDB_TRACE/KDB_TRACE/g o Save the TID of the thread doing the kernel dump so the debugger knows which thread to select as the current when debugging the kernel core file. o Clear kdb_active instead of db_active and do so unconditionally. o Remove backtrace() implementation. kern_synch.c: o Call kdb_reenter() instead of db_error().
* Implement preemption of kernel threads natively in the scheduler ratherjhb2004-07-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)
* - Change mi_switch() and sched_switch() to accept an optional thread tojhb2004-07-021-3/+3
| | | | | | | | | | | | | switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.
* Remove the signal_caught argument from sleepq_timedwait() as it wasjhb2004-06-281-1/+1
| | | | effectively always zero.
OpenPOWER on IntegriCloud