summaryrefslogtreecommitdiffstats
path: root/sys/kern/subr_sleepqueue.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r313733:badger2017-03-161-35/+43
| | | | | | | | | | | | | sleepq_catch_signals: do thread suspension before signal check Since locks are dropped when a thread suspends, it's possible for another thread to deliver a signal to the suspended thread. If the thread awakens from suspension without checking for signals, it may go to sleep despite having a pending signal that should wake it up. Therefore the suspension check is done first, so any signals sent while suspended will be caught in the subsequent signal check. Sponsored by: Dell EMC
* MFC r314626vangyzen2017-03-101-7/+7
| | | | | | Fix grammar in some comments in subr_sleepqueue.c While I'm here, remove trailing whitespace.
* MFC r303426:kib2016-08-271-62/+49
| | | | | Rewrite subr_sleepqueue.c use of callouts to not depend on the specifics of callout KPI.
* MFC r296320:kib2016-03-151-1/+2
| | | | | | | Adjust _callout_stop_safe() return value for the subr_sleepqueue.c needs when migrating callout was blocked, but running one was not. PR: 200992
* MFC 261517,261520:jhb2014-02-181-3/+0
| | | | | Convert the license on files where I am the sole copyright holder to 2 clause BSD licenses.
* Partially revert r195702. Deferring stops is now implemented via a set ofjhb2013-03-181-4/+2
| | | | | | | | calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep calls. - Remove the stop_allowed parameters from cursig() and issignal(). issignal() checks TDF_SBDRY directly. - Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags.
* Make kern_nanosleep() and pause_sbt() to use per-CPU sleep queues.mav2013-03-121-7/+5
| | | | | This removes significant sleep queue lock congestion on multithreaded microbenchmarks, making them scale to multiple CPUs almost linearly.
* MFcalloutng:davide2013-03-041-2/+4
| | | | | | | | Convert sleepqueue(9) bits to the new callout KPI. Take advantage of the possibility to run callback directly from hw interrupt context. Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, markj, Fabian Keil
* Replace the TDP_NOSLEEPING flag with a counter so that thejhb2013-03-011-2/+2
| | | | | | THREAD_NO_SLEEPING() and THREAD_SLEEPING_OK() macros can nest. Reviewed by: attilio
* Rework the handling of stop signals in the NFS client. The changes injhb2013-02-061-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | 195702, 195703, and 195821 prevented a thread from suspending while holding locks inside of NFS by forcing the thread to fail sleeps with EINTR or ERESTART but defer the thread suspension to the user boundary. However, this had the effect that stopping a process during an NFS request could abort the request and trigger EINTR errors that were visible to userland processes (previously the thread would have suspended and completed the request once it was resumed). This change instead effectively masks stop signals while in the NFS client. It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot be masked directly. Also, instead of setting PBDRY on individual sleeps, the NFS client now sets the TDF_SBDRY flag around each NFS request and stop signals are masked for all sleeps during that region (the previous change missed sleeps in lockmgr locks). The end result is that stop signals sent to threads performing an NFS request are completely ignored until after the NFS request has finished processing and the thread prepares to return to userland. This restores the behavior of stop signals being transparent to userland processes while still preventing threads from suspending while holding NFS locks. Reviewed by: kib MFC after: 1 month
* Tweak the commit message in case of panic for sleeping from threadsattilio2012-09-121-1/+2
| | | | | | | | | | | | with TDP_NOSLEEPING on. The current message has no informations on the thread and wchan involed, which may be useful in case where dumps have mangled dwarf informations. Reported by: kib Reviewed by: bde, jhb, kib MFC after: 1 week
* Implement the DTrace sched provider. This implementation aims to berstone2012-05-151-0/+8
| | | | | | | | | | | | | | | | | | | | | | compatible with the sched provider implemented by Solaris and its open- source derivatives. Full documentation of the sched provider can be found on Oracle's DTrace wiki pages. Note that for compatibility with scripts originally written for Solaris, serveral probes are defined that will never fire. These probes are defined to fire when Solaris-specific features perform certain actions. As these features are not present in FreeBSD, the probes can never fire. Also, I have added a two probes that are not defined in Solaris, lend-pri and load-change. These probes have been added to make it possible to collect schedgraph data with DTrace. Finally, a few probes are defined in Solaris to take a cpuinfo_t * argument. As it was not immediately clear to me how to translate that to FreeBSD, currently those probes are passed NULL in place of a cpuinfo_t *. Sponsored by: Sandvine Incorporated MFC after: 2 weeks
* Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.ed2011-11-071-2/+2
| | | | | | The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
* Explicitly wire the user buffer rather than doing it implicitly inmdf2011-01-271-0/+3
| | | | | | | | sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT drain for extremely large amounts of data where the caller knows that appropriate references are held, and sleeping is not an issue. Inspired by: rwatson
* Rework realtime priority support:jhb2011-01-141-1/+2
| | | | | | | | | | | | | | | | | | | - Move the realtime priority range up above kernel sleep priorities and just below interrupt thread priorities. - Contract the interrupt and kernel sleep priority ranges a bit so that the timesharing priority band can be increased. The new timeshare range is now slightly larger than the old realtime + timeshare ranges. - Change the ULE scheduler to no longer use realtime priorities for interactive threads. Instead, the larger timeshare range is now split into separate subranges for interactive and non-interactive ("batch") threads. The end result is that interactive threads and non-interactive threads still use the same priority ranges as before, but realtime threads now have a separate, dedicated priority range. - Do not modify the priority of non-timeshare threads in sched_sleep() or via cv_broadcastpri(). Realtime and idle priority threads will no longer have their priorities affected by sleeping in the kernel. Reviewed by: jeff
* Re-add r212370 now that the LOR in powerpc64 has been resolved:mdf2010-09-161-11/+3
| | | | | | | | | | | | Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough SBUF_FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk (original patch)
* Revert r212370, as it causes a LOR on powerpc. powerpc does a fewmdf2010-09-131-3/+11
| | | | | | | unexpected things in copyout(9) and so wiring the user buffer is not sufficient to perform a copyout(9) while holding a random mutex. Requested by: nwhitehorn
* Add a drain function for struct sysctl_req, and use it for a variety ofmdf2010-09-091-11/+3
| | | | | | | | | | handlers, some of which had to do awkward things to get a large enough FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk
* make sure thread lock is locked.davidxu2010-08-201-0/+1
|
* If thread set a TDP_WAKEUP for itself, clears the flag and returns EINTRdavidxu2010-08-201-0/+7
| | | | immediately, this is used for implementing reliable pthread cancellation.
* Update comment for tdsignal() -> tdsendsignal() rename. Forgot to includejhb2010-06-301-1/+1
| | | | this in 209592.
* Introduce the new kernel thread called "deadlock resolver".attilio2010-01-091-3/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | While the name is pretentious, a good explanation of its targets is reported in this 17 months old presentation e-mail: http://lists.freebsd.org/pipermail/freebsd-arch/2008-August/008452.html In order to implement it, the sq_type in sleepqueues is mandatory and not only compiled along with INVARIANTS option. Additively, a new sleepqueue function, sleepq_type() is added, returning the type of the sleepqueue linked to a wchan. Three new sysctls are added in order to configure the thread: debug.deadlkres.slptime_threshold debug.deadlkres.blktime_threshold debug.deadlkres.sleepfreq rappresenting the thresholds for sleep and block time that will lead to a deadlock matching (when exceeded), while the sleepfreq rappresents the number of seconds between 2 consecutive thread runnings. In order to enable the deadlock resolver thread recompile your kernel with the option DEADLKRES. Reviewed by: jeff Tested by: pho, Giovanni Trematerra Sponsored by: Nokia Incorporated, Sandvine Incorporated MFC after: 2 weeks
* In current code, threads performing an interruptible sleep (on bothattilio2009-12-121-4/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sxlock, via the sx_{s, x}lock_sig() interface, or plain lockmgr), will leave the waiters flag on forcing the owner to do a wakeup even when if the waiter queue is empty. That operation may lead to a deadlock in the case of doing a fake wakeup on the "preferred" (based on the wakeup algorithm) queue while the other queue has real waiters on it, because nobody is going to wakeup the 2nd queue waiters and they will sleep indefinitively. A similar bug, is present, for lockmgr in the case the waiters are sleeping with LK_SLEEPFAIL on. In this case, even if the waiters queue is not empty, the waiters won't progress after being awake but they will just fail, still not taking care of the 2nd queue waiters (as instead the lock owned doing the wakeup would expect). In order to fix this bug in a cheap way (without adding too much locking and complicating too much the semantic) add a sleepqueue interface which does report the actual number of waiters on a specified queue of a waitchannel (sleepq_sleepcnt()) and use it in order to determine if the exclusive waiters (or shared waiters) are actually present on the lockmgr (or sx) before to give them precedence in the wakeup algorithm. This fix alone, however doesn't solve the LK_SLEEPFAIL bug. In order to cope with it, add the tracking of how many exclusive LK_SLEEPFAIL waiters a lockmgr has and if all the waiters on the exclusive waiters queue are LK_SLEEPFAIL just wake both queues. The sleepq_sleepcnt() introduction and ABI breakage require __FreeBSD_version bumping. Reported by: avg, kib, pho Reviewed by: kib Tested by: pho
* Add new msleep(9) flag PBDY that shall be specified together withkib2009-07-141-4/+8
| | | | | | | | | | | | PCATCH, to indicate that thread shall not be stopped upon receipt of SIGSTOP until it reaches the kernel->usermode boundary. Also change thread_single(SINGLE_NO_EXIT) to only stop threads at the user boundary unconditionally. Tested by: pho Reviewed by: jhb Approved by: re (kensmith)
* Revision 184199 had not been fully reverted, add missing piece.davidxu2008-12-011-0/+4
| | | | Reported by: phk
* Revert rev 184216 and 184199, due to the way the thread_lock works,davidxu2008-11-051-8/+8
| | | | | | it may cause a lockup. Noticed by: peter, jhb
* Don't bother calling setrunnable() and clearing the sleeping flag injhb2008-11-041-9/+12
| | | | sleepq_resume_thread() if the thread isn't asleep.
* partly revert revision 184199, because TDF_NEEDSIGCHK is persitentdavidxu2008-10-241-10/+5
| | | | | | | when thread is in kernel mode, it can cause dead loop, now unlock process lock after acquired sleep queue lock and thread lock to avoid the problem. This means TDF_NEEDSIGCHK and TDF_NEEDSUSPCHK must be set with process lock and thread lock being hold at same time.
* Actually, for signal and thread suspension, extra process spin lock isdavidxu2008-10-231-12/+13
| | | | | | unnecessary, the normal process lock and thread lock are enough. The spin lock is still needed for process and thread exiting to mimic single sched_lock.
* Make ddb command registration dynamic so modules can extendsam2008-09-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | the command set (only so long as the module is present): o add db_command_register and db_command_unregister to add and remove commands, respectively o replace linker sets with SYSINIT's (and SYSUINIT's) that register commands o expose 3 list heads: db_cmd_table, db_show_table, and db_show_all_table for registering top-level commands, show operands, and show all operands, respectively While here also: o sort command lists o add DB_ALIAS, DB_SHOW_ALIAS, and DB_SHOW_ALL_ALIAS to add aliases for existing commands o add "show all trace" as an alias for "show alltrace" o add "show all locks" as an alias for "show alllocks" Submitted by: Guillaume Ballet <gballet@gmail.com> (original version) Reviewed by: jhb MFC after: 1 month
* Close a race in sleepq_broadcast() where the sleepq could be reused afterjhb2008-09-081-3/+2
| | | | | | | | | | | | it had been assigned to the last sleeping thread. That thread might have started running on another CPU and have reused that sleep queue. Fix it by just walking the thread queue using TAILQ_FOREACH_SAFE() rather than a while loop. PR: amd64/124200 Discovered by: tegge Tested by: benjsc MFC after: 1 week
* If a thread that is swapped out is made runnable, then the setrunnable()jhb2008-08-051-18/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()). With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock. Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal(). Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks
* Really fix this.jhb2008-07-281-2/+1
|
* Properly check if td_name is empty and if it is, print process name,pjd2008-07-281-2/+2
| | | | | | instead of empty thread name. Reviewed by: jhb
* - Make SCHED_STATS more generic by adding a wrapper to create thejeff2008-04-171-4/+2
| | | | | | | | | | | | | | | | | | variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia
* - Convert two timeout users to the new callout_reset_curcpu() api.jeff2008-04-021-1/+1
| | | | Sponsored by: Nokia
* - Add a new td flag TDF_NEEDSUSPCHK that is set whenever a thread needsjeff2008-03-211-1/+1
| | | | | | | | | | | | | to enter thread_suspend_check(). - Set TDF_ASTPENDING along with TDF_NEEDSUSPCHK so we can move the thread_suspend_check() to ast() rather than userret(). - Check TDF_NEEDSUSPCHK in the sleepq_catch_signals() optimization so that we don't miss a suspend request. If this is set use the expensive signal path. - Set NEEDSUSPCHK when creating a new thread in thr in case the creating thread is due to be suspended as well but has not yet. Reviewed by: davidxu (Authored original patch)
* - At the top of sleepq_catch_signals() lock the thread and check TDF_NEEDSIGCHKjeff2008-03-191-4/+12
| | | | | | | before doing the very expensive cursig() and related locking. NEEDSIGCHK is updated whenever our signal mask change or when a signal is delivered and should be sufficient to avoid the more expensive tests. This eliminates another source of PROC_LOCK contention in multithreaded programs.
* - Add a facility similar to LOCK_PROFILING under SLEEPQUEUE_PROFILING. Keepjeff2008-03-191-1/+159
| | | | | | | | | | | | a simple (wmesg, count) tuple in a hash to keep track of how many times we sleep at each wait message. We hash on message and not channel. No line number information is given as typically wait messages are not used in more than one place. Identical strings defined at different addresses will show up with seperate counters. - Use debug.sleepq.enable to enable, .reset to reset, and .stats dumps stats. - Do an unsynchronized check in sleepq_switch() prior to switching before calling sleepq_profile() which uses a global lock to synchronize the hash. Only sleeps which actually cause a context switch are counted.
* PR 117603jeff2008-03-131-2/+5
| | | | | | | | | | - Close a sleepqueue signal race by interlocking with the per-process spinlock. This was mistakenly omitted from the thread_lock patch and has been a race since. MFC After: 1 week PR: bin/117603 Reported by: Danny Braniss <danny@cs.huji.ac.il>
* Remove kernel support for M:N threading.jeff2008-03-121-10/+2
| | | | | | | | While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
* - Pass the priority argument from *sleep() into sleepq and down intojeff2008-03-121-27/+23
| | | | | | | | | | | | | | | | | sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_* to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter. Reviewed by: jhb, peter
* Mark sleepqueue chain spin mutexes are recursable since the sleepq codejhb2008-02-131-1/+1
| | | | | | | now recurses on them in sleepq_broadcast() and sleepq_signal() when resuming threads that are fully asleep. MFC after: 1 week
* - Add THREAD_LOCKPTR_ASSERT() to assert that the thread's lock points atjeff2008-02-071-1/+1
| | | | | | | | | | the provided lock or &blocked_lock. The thread may be temporarily assigned to the blocked_lock by the scheduler so a direct comparison can not always be made. - Use THREAD_LOCKPTR_ASSERT() in the primary consumers of the scheduling interfaces. The schedulers themselves still use more explicit asserts. Sponsored by: Nokia
* Fix a bug where a thread that hit the race where the sleep timeout firesjhb2008-01-251-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | while the thread does not hold the thread lock would stop blocking for subsequent interruptible sleeps and would always immediately fail the sleep with EWOULDBLOCK instead (even sleeps that didn't have a timeout). Some background: - KSE has a facility for allowing one thread to interrupt another thread. During this process, the target thread aborts any interruptible sleeps much as if the target thread had a pending signal. Once the target thread acknowledges the interrupt, normal sleep handling resumes. KSE manages this via the TDF_INTERRUPTED flag. Specifically, it sets the flag when it sends an interrupt to another thread and clears it when the interrupt is acknowledged. (Note that this is purely a software interrupt sort of thing and has no relation to hardware interrupts or kernel interrupt threads.) - The old code for handling the sleep timeout race handled the race by setting the TDF_INTERRUPT flag and faking a KSE-style thread interrupt to the thread in the process of going to sleep. It probably should have just checked the TDF_TIMEOUT flag in sleepq_catch_signals() instead. - The bug was that the sleepq code would set TDF_INTERRUPT but it was never cleared. The sleepq code couldn't safely clear it in case there actually was a real KSE thread interrupt pending for the target thread (in fact, the sleepq timeout actually stomped on said pending interrupt). Thus, any future interruptible sleeps (*sleep(.. PCATCH ..) or cv_*wait_sig()) would see the TDF_INTERRUPT flag set and immediately fail with EWOULDBLOCK. The flag could be cleared if the thread belonged to a KSE process and another thread posted an interrupt to the original thread. However, in the more common case of a non-KSE process, the thread would pretty much stop sleeping. - Fix the bug by just setting TDF_TIMEOUT in the sleepq timeout code and not messing with TDF_INTERRUPT and td_intrval. With yesterday's fix to fix sleepq_switch() to check TDF_TIMEOUT, this is now sufficient. MFC after: 3 days
* Fix a race in the sleepqueue timeout code that resulted in sleeps notjhb2008-01-251-4/+25
| | | | | | | | | | | | | | | | | | | | | | | | | being properly cancelled by a timeout. In general there is a race between a the sleepq timeout handler firing while the thread is still in the process of going to sleep. In 6.x with sched_lock, the race was largely protected by sched_lock. The only place it was "exposed" and had to be handled was while checking for any pending signals in sleepq_catch_signals(). With the thread lock changes, the thread lock is dropped in between sleepq_add() and sleepq_*wait*() opening up a new window for this race. Thus, if the timeout fired while the sleeping thread was in between sleepq_add() and sleepq_*wait*(), the thread would be marked as timed out, but the thread would not be dequeued and sleepq_switch() would still block the thread until it was awakened via some other means. In the case of pause(9) where there is no other wakeup, the thread would never be awakened. Fix this by teaching sleepq_switch() to check if the thread has had its sleep canceled before blocking by checking the TDF_TIMEOUT flag and aborting the sleep and dequeueing the thread if it is set. MFC after: 3 days Reported by: dwhite, peter
* A bunch more files that should probably print out a thread namejulian2007-11-141-1/+1
| | | | instead of a process name.
* generally we are interested in what thread did something asjulian2007-11-141-5/+5
| | | | | | opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.
* subr_sleepqueue.c presents a thread lock missing which leads to dangerousattilio2007-09-131-0/+2
| | | | | | | | | | | | | races for some struct thread members. More specifically, this bug seems responsible for some memory dumping problems people were experiencing. Fix this adding correct thread locking. Tested by: rwatson Submitted by: tegge Approved by: jeff Approved by: re
* - Include opt_sched.h for SCHED_STATS.jeff2007-06-121-0/+1
|
OpenPOWER on IntegriCloud