summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_synch.c
Commit message (Collapse)AuthorAgeFilesLines
* Fix some minor inaccuracies introduced in r243251.bjk2013-01-051-3/+3
| | | | | | | | Also correct the comment in kern_synch.c which was the source of the problematic text. Reviewed by: kib (previous version) Approved by: hrs (mentor)
* Implement the DTrace sched provider. This implementation aims to berstone2012-05-151-0/+17
| | | | | | | | | | | | | | | | | | | | | | compatible with the sched provider implemented by Solaris and its open- source derivatives. Full documentation of the sched provider can be found on Oracle's DTrace wiki pages. Note that for compatibility with scripts originally written for Solaris, serveral probes are defined that will never fire. These probes are defined to fire when Solaris-specific features perform certain actions. As these features are not present in FreeBSD, the probes can never fire. Also, I have added a two probes that are not defined in Solaris, lend-pri and load-change. These probes have been added to make it possible to collect schedgraph data with DTrace. Finally, a few probes are defined in Solaris to take a cpuinfo_t * argument. As it was not immediately clear to me how to translate that to FreeBSD, currently those probes are passed NULL in place of a cpuinfo_t *. Sponsored by: Sandvine Incorporated MFC after: 2 weeks
* Include the associated wait channel message for context switch ktracejhb2012-04-201-4/+4
| | | | | | | records. kdump supports both the old and new messages. Submitted by: Andrey Zonov andrey zonov org MFC after: 1 week
* panic: add a switch and infrastructure for stopping other CPUs in SMP caseavg2011-12-111-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historical behavior of letting other CPUs merily go on is a default for time being. The new behavior can be switched on via kern.stop_scheduler_on_panic tunable and sysctl. Stopping of the CPUs has (at least) the following benefits: - more of the system state at panic time is preserved intact - threads and interrupts do not interfere with dumping of the system state Only one thread runs uninterrupted after panic if stop_scheduler_on_panic is set. That thread might call code that is also used in normal context and that code might use locks to prevent concurrent execution of certain parts. Those locks might be held by the stopped threads and would never be released. To work around this issue, it was decided that instead of explicit checks for panic context, we would rather put those checks inside the locking primitives. This change has substantial portions written and re-written by attilio and kib at various times. Other changes are heavily based on the ideas and patches submitted by jhb and mdf. bde has provided many insights into the details and history of the current code. The new behavior may cause problems for systems that use a USB keyboard for interfacing with system console. This is because of some unusual locking patterns in the ukbd code which have to be used because on one hand ukbd is below syscons, but on the other hand it has to interface with other usb code that uses regular mutexes/Giant for its concurrency protection. Dumping to USB-connected disks may also be affected. PR: amd64/139614 (at least) In cooperation with: attilio, jhb, kib, mdf Discussed with: arch@, bde Tested by: Eugene Grosbein <eugen@grosbein.net>, gnn, Steven Hartland <killing@multiplay.co.uk>, glebius, Andrew Boyer <aboyer@averesystems.com> (various versions of the patch) MFC after: 3 months (or never)
* Make sure the description of pause() ishselasky2011-12-031-1/+2
| | | | | | | | equivalent to its implementation. No code change. Suggested by: Bruce Evans MFC after: 3 days
* Given that the typical usage of pause() is pause("zzz", hz / N), where N canhselasky2011-11-201-1/+1
| | | | | | | be greater than hz in some cases, simply ignore a timeout value of zero. Suggested by: Bruce Evans MFC after: 1 week
* Minor style change:hselasky2011-11-201-10/+9
| | | | | | | | Simplify the description of pause() and shorten the KASSERT message in pause. Also add a clamp for the timo argument in the non-KASSERT case. Suggested by: Bruce Evans MFC after: 1 week
* Simplify the usb_pause_mtx() function by factoring out the generic partshselasky2011-11-191-5/+23
| | | | | | | | | to the kernel's pause() function. The pause() function can now be used when cold != 0. Also assert that the timeout in system ticks must be positive. Suggested by: Bruce Evans MFC after: 1 week
* In order to maximize the re-usability of kernel code in user space thiskmacy2011-09-161-1/+1
| | | | | | | | | | | | | patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
* Simplify a stale assertion. We have not called mi_switch() from a nestedjhb2011-05-241-3/+1
| | | | | | critical section during a preemption for several years. MFC after: 1 week
* Use a name instead of a magic number for kern_yield(9) when the prioritymdf2011-05-131-1/+3
| | | | | | | | should not change. Fetch the td_user_pri under the thread lock. This is probably not necessary but a magic number also seems preferable to knowing the implementation details here. Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >
* Based on discussions on the svn-src mailing list, rework r218195:mdf2011-02-081-2/+33
| | | | | | | | | | | | | | | | | | | | | | - entirely eliminate some calls to uio_yeild() as being unnecessary, such as in a sysctl handler. - move should_yield() and maybe_yield() to kern_synch.c and move the prototypes from sys/uio.h to sys/proc.h - add a slightly more generic kern_yield() that can replace the functionality of uio_yield(). - replace source uses of uio_yield() with the functional equivalent, or in some cases do not change the thread priority when switching. - fix a logic inversion bug in vlrureclaim(), pointed out by bde@. - instead of using the per-cpu last switched ticks, use a per thread variable for should_yield(). With PREEMPTION, the only reasonable use of this is to determine if a lock has been held a long time and relinquish it. Without PREEMPTION, this is essentially the same as the per-cpu variable.
* Only change the priority of timeshare threads to PRI_MAX_TIMESHAREjhb2011-01-061-1/+2
| | | | | | | when yield() is called. Specifically, leave the priority of real time and idle threads unchanged. MFC after: 2 weeks
* Add new msleep(9) flag PBDY that shall be specified together withkib2009-07-141-0/+2
| | | | | | | | | | | | PCATCH, to indicate that thread shall not be stopped upon receipt of SIGSTOP until it reaches the kernel->usermode boundary. Also change thread_single(SINGLE_NO_EXIT) to only stop threads at the user boundary unconditionally. Tested by: pho Reviewed by: jhb Approved by: re (kensmith)
* When wakeup(9) is going to notify swapper, assert that wait channel is notkib2009-07-141-1/+4
| | | | | | | | equal to &proc0. It shall be not, since proc0 stack is not swappable, and kick_proc0() is wakeup(&proc0). Reviewed by: jhb Approved by: re (kensmith)
* Remove even more unneeded variable assignments.ed2009-02-261-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kern_time.c: - Unused variable `p'. kern_thr.c: - Variable `error' is always caught immediately, so no reason to initialize it. There is no way that error != 0 at the end of create_thread(). kern_sig.c: - Unused variable `code'. kern_synch.c: - `rval' is always assigned in all different cases. kern_rwlock.c: - `v' is always overwritten with RW_UNLOCKED further on. kern_malloc.c: - `size' is always initialized with the proper value before being used. kern_exit.c: - `error' is always caught and returned immediately. abort2() never returns a non-zero value. kern_exec.c: - `len' is always assigned inside the if-statement right below it. tty_info.c: - `td' is always overwritten by FOREACH_THREAD_IN_PROC(). Found by: LLVM's scan-build
* - Implement generic macros for producing KTR records that are compatiblejeff2009-01-171-13/+14
| | | | | | | | | | | | with src/tools/sched/schedgraph.py. This allows developers to quickly create a graphical view of ktr data for any resource in the system. - Add sched_tdname() and the pcpu field 'name' for quickly and uniformly identifying records associated with a thread or cpu. - Reimplement the KTR_SCHED traces using the new generic facility. Obtained from: attilio Discussed with: jhb Sponsored by: Nokia
* - Forward port flush of page table updates on context switch or userretkmacy2008-10-191-0/+9
| | | | - Forward port vfork XEN hack
* - Don't do a WITNESS_SAVE() on the interlock if it is Giant in the conditionjhb2008-09-251-0/+2
| | | | | | variable wait routines. DROP_GIANT() already manages that state in the Giant interlock case. - Assert that Giant is held when it is passed as a sleep interlock.
* Remove the now unused `lbolt' variable from the kernel.ed2008-08-201-15/+3
| | | | | | | | | | We used to have a single wait channel inside the kernel which could be used by threads that just wanted to sleep for some time (the next second). The old TTY layer was the only piece of code that still used lbolt, because I already removed the use of lbolt from the NFS clients and the VFS syncer. Approved by: philip
* Permit Giant to be passed as the explicit interlock either tojhb2008-08-071-2/+6
| | | | | | | | | | | | | | | msleep/mtx_sleep or the various cv_*wait*() routines. Currently, the "unlock" behavior of PDROP and cv_wait_unlock() with Giant is not permitted as it is will be confusing since Giant is fully unrecursed and unlocked during a thread sleep. This is handy for subsystems which wish to allow unlocked drivers to continue to use Giant such as CAM, the new TTY layer, and the new USB stack. CAM currently uses a hack that I told Scott to use because I really didn't want to permit this behavior, and the TTY and USB patches both have various patches to permit this. MFC after: 2 weeks
* If a thread that is swapped out is made runnable, then the setrunnable()jhb2008-08-051-15/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()). With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock. Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal(). Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks
* - Make SCHED_STATS more generic by adding a wrapper to create thejeff2008-04-171-1/+5
| | | | | | | | | | | | | | | | | | variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia
* Consistently use ANSI C declarationsfor all functions in kern_synch.c.rwatson2008-03-161-19/+7
|
* In keeping with style(9)'s recommendations on macros, use a ';'rwatson2008-03-161-1/+2
| | | | | | | | | after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
* Remove kernel support for M:N threading.jeff2008-03-121-12/+2
| | | | | | | | While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
* - Pass the priority argument from *sleep() into sleepq and down intojeff2008-03-121-19/+10
| | | | | | | | | | | | | | | | | sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_* to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter. Reviewed by: jhb, peter
* - Handle kdb switch panics outside of mi_switch() to remove some instructionsjeff2008-03-101-6/+11
| | | | | | | from the common path and make the code more clear. Whether this has any impact on performance may depend on optimization levels. Sponsored by: Nokia
* Don't zero td_runtime when billing thread CPU usage to the process;rwatson2008-01-101-2/+4
| | | | | | | | | | | | | | | | | | | | | maintain a separate td_incruntime to hold unbilled CPU usage for the thread that has the previous properties of td_runtime. When thread information is requested using the thread monitoring sysctls, export thread td_runtime instead of process rusage runtime in kinfo_proc. This restores the display of individual ithread and other kernel thread CPU usage since inception in ps -H and top -SH, as well for libthr user threads, valuable debugging information lost with the move to try kthreads since they are no longer independent processes. There is universal agreement that we should rewrite the process and thread export sysctls, but this commit gets things going a bit better in the mean time. Likewise, there are resevations about the continued validity of statclock given the speed of modern processors. Reviewed by: attilio, emaste, jhb, julian
* A bunch more files that should probably print out a thread namejulian2007-11-141-4/+4
| | | | instead of a process name.
* generally we are interested in what thread did something asjulian2007-11-141-5/+5
| | | | | | opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.
* - Restore historical yield() behavior by manually lowering priority andjeff2007-10-081-3/+6
| | | | | | switching. Approved by: re
* - Move all of the PS_ flags into either p_flag or td_flags.jeff2007-09-171-11/+5
| | | | | | | | | | | | | | - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)
* Commit 2/14 of sched_lock decomposition.jeff2007-06-041-7/+7
| | | | | | | | | | | | | | | | | - Adapt sleepqueues to the new thread_lock() mechanism. - Delay assigning the sleep queue spinlock as the thread lock until after we've checked for signals. It is illegal for a thread to return in mi_switch() with any lock assigned to td_lock other than the scheduler locks. - Change sleepq_catch_signals() to do the switch if necessary to simplify the callers. - Simplify timeout handling now that locking a sleeping thread has the side-effect of locking the sleepqueue. Some previous races are no longer possible. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
* Do proper "locking" for missing vmmeters part.attilio2007-06-041-1/+1
| | | | | | | | Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)
* - Move rusage from being per-process in struct pstats to per-thread injeff2007-06-011-27/+5
| | | | | | | | | | | | | | | | | | | td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
* Revert VMCNT_* operations introduction.attilio2007-05-311-1/+1
| | | | | | | | Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
* - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulatingjeff2007-05-181-1/+1
| | | | | | | | vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>
* Fix a potential LOR with sx_sleep() and cv_wait() with sx locks byjhb2007-05-081-1/+7
| | | | | | 1) adding the thread to the sleepq via sleepq_add() before dropping the lock, and 2) dropping the sleepq lock around calls to lc_unlock() for sleepable locks (i.e. locks that use sleepq's in their implementation).
* Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes,jhb2007-03-211-3/+3
| | | | rwlocks, and sx locks to 'lock_object'.
* Allow threads to atomically release rw and sx locks while waiting for anjhb2007-03-091-28/+33
| | | | | | | | | | | | | | | | | | | event. Locking primitives that support this (mtx, rw, and sx) now each include their own foo_sleep() routine. - Rename msleep() to _sleep() and change it's 'struct mtx' object to a 'struct lock_object' pointer. _sleep() uses the recently added lc_unlock() and lc_lock() function pointers for the lock class of the specified lock to release the lock while the thread is suspended. - Add wrappers around _sleep() for mutexes (mtx_sleep()), rw locks (rw_sleep()), and sx locks (sx_sleep()). msleep() still exists and is now identical to mtx_sleep(), but it is deprecated. - Rename SLEEPQ_MSLEEP to SLEEPQ_SLEEP. - Rewrite much of sleep.9 to not be msleep(9) centric. - Flesh out the 'RETURN VALUES' section in sleep.9 and add an 'ERRORS' section. - Add __nonnull(1) to _sleep() and msleep_spin() so that the compiler will warn if you try to pass a NULL wait channel. The functions already have a KASSERT to that effect.
* Instead of doing comparisons using the pcpu area to see ifjulian2007-03-081-1/+1
| | | | | | | a thread is an idle thread, just see if it has the IDLETD flag set. That flag will probably move to the pflags word as it's permenent and never chenges for the life of the system so it doesn't need locking.
* Further system call comment cleanup:rwatson2007-03-051-1/+1
| | | | | | | | | | - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
* Print tid's rather than thread pointers in KTR_PROC traces.jhb2007-02-271-8/+8
|
* Add a new kernel sleep function pause(9). pause(9) is for places thatjhb2007-02-231-1/+21
| | | | | | | | want an equivalent of DELAY(9) that sleeps instead of spins. It accepts a wmesg and a timeout and is not interrupted by signals. It uses a private wait channel that should never be woken up by wakeup(9) or wakeup_one(9). Glanced at by: phk
* - Fix schedgraph output with KSE threads. Call thread_switchout() afterjeff2007-01-031-4/+8
| | | | calling CTR() so we don't confuse a new kse thread with a real preemption.
* Add second sleep queue so that sx and lockmgr can have separate sleepkmacy2006-12-161-4/+4
| | | | | | | queues for shared and exclusive acquisitions Submitted by: Attilio Rao Approved by: jhb
* Only grab the sched_lock if we actually need to modify the thread priority.phk2006-11-301-4/+5
| | | | | During a buildworld only 2/3 of the calls to msleep actually changed the priority.
* Change sleepq_add(9) argument from 'struct mtx *' to 'struct lock_object *',pjd2006-11-161-2/+3
| | | | | | | | which allows to use it with different kinds of locks. For example it allows to implement Solaris conditions variables which will be used in ZFS port on top of sx(9) locks. Reviewed by: jhb
* Adjust assertions to allow for magical properties of the 'lbolt' waitjhb2006-11-151-3/+3
| | | | | | | | | channel for tsleep(): - Allow tsleep() on &lbolt without Giant with a timeout 0 since &lbolt has an implied timeout. - If &lbolt is used with msleep() pass NULL to sleepq_add() for the lock object. Unlike other sleepq channels, &lbolt doesn't have an associated owning lock.
OpenPOWER on IntegriCloud