summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_synch.c
Commit message (Collapse)AuthorAgeFilesLines
* Add new msleep(9) flag PBDY that shall be specified together withkib2009-07-141-0/+2
| | | | | | | | | | | | PCATCH, to indicate that thread shall not be stopped upon receipt of SIGSTOP until it reaches the kernel->usermode boundary. Also change thread_single(SINGLE_NO_EXIT) to only stop threads at the user boundary unconditionally. Tested by: pho Reviewed by: jhb Approved by: re (kensmith)
* When wakeup(9) is going to notify swapper, assert that wait channel is notkib2009-07-141-1/+4
| | | | | | | | equal to &proc0. It shall be not, since proc0 stack is not swappable, and kick_proc0() is wakeup(&proc0). Reviewed by: jhb Approved by: re (kensmith)
* Remove even more unneeded variable assignments.ed2009-02-261-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kern_time.c: - Unused variable `p'. kern_thr.c: - Variable `error' is always caught immediately, so no reason to initialize it. There is no way that error != 0 at the end of create_thread(). kern_sig.c: - Unused variable `code'. kern_synch.c: - `rval' is always assigned in all different cases. kern_rwlock.c: - `v' is always overwritten with RW_UNLOCKED further on. kern_malloc.c: - `size' is always initialized with the proper value before being used. kern_exit.c: - `error' is always caught and returned immediately. abort2() never returns a non-zero value. kern_exec.c: - `len' is always assigned inside the if-statement right below it. tty_info.c: - `td' is always overwritten by FOREACH_THREAD_IN_PROC(). Found by: LLVM's scan-build
* - Implement generic macros for producing KTR records that are compatiblejeff2009-01-171-13/+14
| | | | | | | | | | | | with src/tools/sched/schedgraph.py. This allows developers to quickly create a graphical view of ktr data for any resource in the system. - Add sched_tdname() and the pcpu field 'name' for quickly and uniformly identifying records associated with a thread or cpu. - Reimplement the KTR_SCHED traces using the new generic facility. Obtained from: attilio Discussed with: jhb Sponsored by: Nokia
* - Forward port flush of page table updates on context switch or userretkmacy2008-10-191-0/+9
| | | | - Forward port vfork XEN hack
* - Don't do a WITNESS_SAVE() on the interlock if it is Giant in the conditionjhb2008-09-251-0/+2
| | | | | | variable wait routines. DROP_GIANT() already manages that state in the Giant interlock case. - Assert that Giant is held when it is passed as a sleep interlock.
* Remove the now unused `lbolt' variable from the kernel.ed2008-08-201-15/+3
| | | | | | | | | | We used to have a single wait channel inside the kernel which could be used by threads that just wanted to sleep for some time (the next second). The old TTY layer was the only piece of code that still used lbolt, because I already removed the use of lbolt from the NFS clients and the VFS syncer. Approved by: philip
* Permit Giant to be passed as the explicit interlock either tojhb2008-08-071-2/+6
| | | | | | | | | | | | | | | msleep/mtx_sleep or the various cv_*wait*() routines. Currently, the "unlock" behavior of PDROP and cv_wait_unlock() with Giant is not permitted as it is will be confusing since Giant is fully unrecursed and unlocked during a thread sleep. This is handy for subsystems which wish to allow unlocked drivers to continue to use Giant such as CAM, the new TTY layer, and the new USB stack. CAM currently uses a hack that I told Scott to use because I really didn't want to permit this behavior, and the TTY and USB patches both have various patches to permit this. MFC after: 2 weeks
* If a thread that is swapped out is made runnable, then the setrunnable()jhb2008-08-051-15/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()). With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock. Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal(). Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks
* - Make SCHED_STATS more generic by adding a wrapper to create thejeff2008-04-171-1/+5
| | | | | | | | | | | | | | | | | | variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia
* Consistently use ANSI C declarationsfor all functions in kern_synch.c.rwatson2008-03-161-19/+7
|
* In keeping with style(9)'s recommendations on macros, use a ';'rwatson2008-03-161-1/+2
| | | | | | | | | after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
* Remove kernel support for M:N threading.jeff2008-03-121-12/+2
| | | | | | | | While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
* - Pass the priority argument from *sleep() into sleepq and down intojeff2008-03-121-19/+10
| | | | | | | | | | | | | | | | | sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_* to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter. Reviewed by: jhb, peter
* - Handle kdb switch panics outside of mi_switch() to remove some instructionsjeff2008-03-101-6/+11
| | | | | | | from the common path and make the code more clear. Whether this has any impact on performance may depend on optimization levels. Sponsored by: Nokia
* Don't zero td_runtime when billing thread CPU usage to the process;rwatson2008-01-101-2/+4
| | | | | | | | | | | | | | | | | | | | | maintain a separate td_incruntime to hold unbilled CPU usage for the thread that has the previous properties of td_runtime. When thread information is requested using the thread monitoring sysctls, export thread td_runtime instead of process rusage runtime in kinfo_proc. This restores the display of individual ithread and other kernel thread CPU usage since inception in ps -H and top -SH, as well for libthr user threads, valuable debugging information lost with the move to try kthreads since they are no longer independent processes. There is universal agreement that we should rewrite the process and thread export sysctls, but this commit gets things going a bit better in the mean time. Likewise, there are resevations about the continued validity of statclock given the speed of modern processors. Reviewed by: attilio, emaste, jhb, julian
* A bunch more files that should probably print out a thread namejulian2007-11-141-4/+4
| | | | instead of a process name.
* generally we are interested in what thread did something asjulian2007-11-141-5/+5
| | | | | | opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.
* - Restore historical yield() behavior by manually lowering priority andjeff2007-10-081-3/+6
| | | | | | switching. Approved by: re
* - Move all of the PS_ flags into either p_flag or td_flags.jeff2007-09-171-11/+5
| | | | | | | | | | | | | | - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)
* Commit 2/14 of sched_lock decomposition.jeff2007-06-041-7/+7
| | | | | | | | | | | | | | | | | - Adapt sleepqueues to the new thread_lock() mechanism. - Delay assigning the sleep queue spinlock as the thread lock until after we've checked for signals. It is illegal for a thread to return in mi_switch() with any lock assigned to td_lock other than the scheduler locks. - Change sleepq_catch_signals() to do the switch if necessary to simplify the callers. - Simplify timeout handling now that locking a sleeping thread has the side-effect of locking the sleepqueue. Some previous races are no longer possible. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
* Do proper "locking" for missing vmmeters part.attilio2007-06-041-1/+1
| | | | | | | | Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)
* - Move rusage from being per-process in struct pstats to per-thread injeff2007-06-011-27/+5
| | | | | | | | | | | | | | | | | | | td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
* Revert VMCNT_* operations introduction.attilio2007-05-311-1/+1
| | | | | | | | Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
* - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulatingjeff2007-05-181-1/+1
| | | | | | | | vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>
* Fix a potential LOR with sx_sleep() and cv_wait() with sx locks byjhb2007-05-081-1/+7
| | | | | | 1) adding the thread to the sleepq via sleepq_add() before dropping the lock, and 2) dropping the sleepq lock around calls to lc_unlock() for sleepable locks (i.e. locks that use sleepq's in their implementation).
* Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes,jhb2007-03-211-3/+3
| | | | rwlocks, and sx locks to 'lock_object'.
* Allow threads to atomically release rw and sx locks while waiting for anjhb2007-03-091-28/+33
| | | | | | | | | | | | | | | | | | | event. Locking primitives that support this (mtx, rw, and sx) now each include their own foo_sleep() routine. - Rename msleep() to _sleep() and change it's 'struct mtx' object to a 'struct lock_object' pointer. _sleep() uses the recently added lc_unlock() and lc_lock() function pointers for the lock class of the specified lock to release the lock while the thread is suspended. - Add wrappers around _sleep() for mutexes (mtx_sleep()), rw locks (rw_sleep()), and sx locks (sx_sleep()). msleep() still exists and is now identical to mtx_sleep(), but it is deprecated. - Rename SLEEPQ_MSLEEP to SLEEPQ_SLEEP. - Rewrite much of sleep.9 to not be msleep(9) centric. - Flesh out the 'RETURN VALUES' section in sleep.9 and add an 'ERRORS' section. - Add __nonnull(1) to _sleep() and msleep_spin() so that the compiler will warn if you try to pass a NULL wait channel. The functions already have a KASSERT to that effect.
* Instead of doing comparisons using the pcpu area to see ifjulian2007-03-081-1/+1
| | | | | | | a thread is an idle thread, just see if it has the IDLETD flag set. That flag will probably move to the pflags word as it's permenent and never chenges for the life of the system so it doesn't need locking.
* Further system call comment cleanup:rwatson2007-03-051-1/+1
| | | | | | | | | | - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
* Print tid's rather than thread pointers in KTR_PROC traces.jhb2007-02-271-8/+8
|
* Add a new kernel sleep function pause(9). pause(9) is for places thatjhb2007-02-231-1/+21
| | | | | | | | want an equivalent of DELAY(9) that sleeps instead of spins. It accepts a wmesg and a timeout and is not interrupted by signals. It uses a private wait channel that should never be woken up by wakeup(9) or wakeup_one(9). Glanced at by: phk
* - Fix schedgraph output with KSE threads. Call thread_switchout() afterjeff2007-01-031-4/+8
| | | | calling CTR() so we don't confuse a new kse thread with a real preemption.
* Add second sleep queue so that sx and lockmgr can have separate sleepkmacy2006-12-161-4/+4
| | | | | | | queues for shared and exclusive acquisitions Submitted by: Attilio Rao Approved by: jhb
* Only grab the sched_lock if we actually need to modify the thread priority.phk2006-11-301-4/+5
| | | | | During a buildworld only 2/3 of the calls to msleep actually changed the priority.
* Change sleepq_add(9) argument from 'struct mtx *' to 'struct lock_object *',pjd2006-11-161-2/+3
| | | | | | | | which allows to use it with different kinds of locks. For example it allows to implement Solaris conditions variables which will be used in ZFS port on top of sx(9) locks. Reviewed by: jhb
* Adjust assertions to allow for magical properties of the 'lbolt' waitjhb2006-11-151-3/+3
| | | | | | | | | channel for tsleep(): - Allow tsleep() on &lbolt without Giant with a timeout 0 since &lbolt has an implied timeout. - If &lbolt is used with msleep() pass NULL to sleepq_add() for the lock object. Unlike other sleepq channels, &lbolt doesn't have an associated owning lock.
* Make KSE a kernel option, turned on by default in all GENERICjb2006-10-261-0/+2
| | | | | | | kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
* Use scheduler API sched_relinquish() to implement yield() syscall.davidxu2006-06-151-8/+2
|
* In the case of reentering the debugger due to an attempt to perform ajhb2006-06-031-10/+10
| | | | | context switch while in the debugger, reenter the debugger sooner before performing any statistics updates.
* Change msleep() and tsleep() to not alter the calling thread's priorityjhb2006-04-171-3/+5
| | | | | | | | | | | | if the specified priority is zero. This avoids a race where the calling thread could read a snapshot of it's current priority, then a different thread could change the first thread's priority, then the original thread would call sched_prio() inside msleep() undoing the change made by the second thread. I used a priority of zero as no thread that calls msleep() or tsleep() should be specifying a priority of zero anyway. The various places that passed 'curthread->td_priority' or some variant as the priority now pass 0.
* Fix a sleep queue race for KSE thread.davidxu2006-02-231-18/+4
| | | | Reviewed by: jhb
* Fixup some comments. Mutexes's are locked, not entered for several yearsjhb2006-02-221-5/+5
| | | | now and msleep blocks threads rather than processes.
* Fix a long standing race between sleep queue and threaddavidxu2006-02-151-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | suspension code. When a thread A is going to sleep, it calls sleepq_catch_signals() to detect any pending signals or thread suspension request, if nothing happens, it returns without holding process lock or scheduler lock, this opens a race window which allows thread B to come in and do process suspension work, however since A is still at running state, thread B can do nothing to A, thread A continues, and puts itself into actually sleeping state, but B has never seen it, and it sits there forever until B is woken up by other threads sometimes later(this can be very long delay or never happen). Fix this bug by forcing sleepq_catch_signals to return with scheduler lock held. Fix sleepq_abort() by passing it an interrupted code, previously, it worked as wakeup_one(), and the interruption can not be identified correctly by sleep queue code when the sleeping thread is resumed. Let thread_suspend_check() returns EINTR or ERESTART, so sleep queue no longer has to use SIGSTOP as a hack to build a return value. Reviewed by: jhb MFC after: 1 week
* CPU time accounting speedup (step 2)phk2006-02-111-0/+6
| | | | | | | | | | | | | | | | | | | Keep accounting time (in per-cpu) cputicks and the statistics counts in the thread and summarize into struct proc when at context switch. Don't reach across CPUs in calcru(). Add code to calibrate the top speed of cpu_tickrate() for variable cpu_tick hardware (like TSC on power managed machines). Don't enforce monotonicity (at least for now) in calcru. While the calibrated cpu_tickrate ramps up it may not be true. Use 27MHz counter on i386/Geode. Use TSC on amd64 & i386 if present. Use tick counter on sparc64
* Modify the way we account for CPU time spent (step 1)phk2006-02-071-5/+4
| | | | | | | | | | | | | | | | Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.
* patch(1) and I aren't friends today. Axe a duplicate copy ofjhb2005-12-291-82/+0
| | | | | | the msleep_spin() function definition. Spotted by: pjd
* Add a new function msleep_spin() which is a slightly stripped down versionjhb2005-12-291-0/+164
| | | | | | | | | | of msleep(). msleep_spin() doesn't support changing the priority of the thread while it is asleep nor does it support interruptible sleeps (PCATCH) or the PDROP flag. It does support timeouts however. It differs from msleep() in that the passed in mutex is a spin mutex. This means one can use msleep_spin() and wakeup() with a spin mutex similar to msleep() and wakeup() with a regular mutex. Note that the spin mutex in question needs to come before sched_lock and the sleepq locks in lock order.
* When checking to see if a process has exceeded its time limit, flag thejhb2005-11-281-2/+2
| | | | | | | | | | | | | | | process as over the limit when its time is >= to the limit rather than > the limit. Technically, if p->p_rux.rux_runtime.sec == p->p_pcpulimit and p->p_rux.rux_runtime.frac == 0, the process hasn't exceeded the limit yet. However, having the fraction exactly equal to 0 is rather rare, and it is not worth the overhead to handle that edge case. With just the > comparison, the process would have to exceed its limit by almost a second before it was killed. PR: kern/83192 Submitted by: Maciej Zawadzinski mzawadzinski at gmail dot com Reviewed by: bde MFC after: 1 week
* Use low level constructs borrowed from interrupt threads to wait forups2005-05-231-4/+3
| | | | | work in proc0. Remove the TDP_WAKEPROC0 workaround.
OpenPOWER on IntegriCloud