summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_synch.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r308228:kib2016-11-091-7/+1
| | | | Remove remnants of the recursive sleep support.
* MFC r300489:hselasky2016-06-031-1/+1
| | | | | | | Use DELAY() instead of _sleep() when SCHEDULER_STOPPED() is set inside pause_sbt(). This allows pause() to continue working during a panic() which is not invoking KDB. This is useful when debugging graphics drivers using the LinuxKPI.
* MFC r283735:kib2015-06-051-4/+2
| | | | Remove several write-only variables.
* MFC r263710, r273377, r273378, r273423 and r273455:hselasky2014-10-271-2/+1
| | | | | | | - De-vnet hash sizes and hash masks. - Fix multiple issues related to arguments passed to SYSCTL macros. Sponsored by: Mellanox Technologies
* vt(4): Merge several bug fixes and improvementsdumbbell2014-09-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SVN revisions in this MFC: 269779 270705 270706 271180 271250 271253 271682 271684 Detailed commit list: r269779: fbd: Fix a bug where vt_fb_attach() success would be considered a failure vt_fb_attach() currently always returns 0, but it could return a code defined in errno.h. However, it doesn't return a CN_* code. So checking its return value against CN_DEAD (which is 0) is incorrect, and in this case, a success becomes a failure. The consequence was unimportant, because the caller (drm_fb_helper.c) would only log an error message in this case. The console would still work. Approved by: nwhitehorn r270705: vt(4): Add cngrab() and cnungrab() callbacks They are used when a panic occurs or when entering a DDB session for instance. cngrab() forces a vt-switch to the console window, no matter if the original window is another terminal or an X session. However, cnungrab() doesn't vt-switch back to the original window currently. r270706: drm: Don't "taskqueue" vt-switch if under DDB/panic situation If DDB is active, we can't use a taskqueue thread to switch away from the X window, because this thread can't run. Reviewed by: ray@ Approved by: ray@ r271180: vt_vga: vd_setpixel_t and vd_drawrect_t are noop in text mode r271250: vt(4): Change the terminal and buffer sizes, even without a font This fixes a bug where scroll lock would not work for tty #0 when using vt_vga's textmode. The reason was that this window is created with a static 256x100 buffer, larger than the real size of 80x25. Now, in vt_change_font() and vt_compute_drawable_area(), we still perform operations even of the window has no font loaded (this is the case in textmode here vw->vw_font == NULL). One of these operation resizes the buffer accordingly. In vt_compute_drawable_area(), we take the terminal size as is (ie. 80x25) for the drawable area. The font argument to vt_set_border() is removed (it was never used) and the code now uses the computed drawable area instead of re-doing its own calculation. Reported by: Harald Schmalzbauer <h.schmalzbauer_omnilan.de> Tested by: Harald Schmalzbauer <h.schmalzbauer_omnilan.de> r271253: pause_sbt(): Take the cold path (ie. use DELAY()) if KDB is active This fixes a panic in the i915 driver when one uses debug.kdb.enter=1 under vt(4). PR: 193269 Reported by: emaste@ Submitted by: avg@ r271682: vt(4): Fix a LOR which occurs during a call to vt_upgrade() Reported by: kib@ Review: https://reviews.freebsd.org/D785 Reviewed by: ray@ Approved by: ray@ r271684: vt(4): Use vt_fb_drawrect() and vt_fb_setpixel() in all vt_fb-derivative Review: https://reviews.freebsd.org/D789 Reviewed by: nwhitehorn Approved by: nwhitehorn Approved by: re (gjb)
* MFC r258622: dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINEavg2014-01-171-6/+6
|
* MFC r258648: use saner calculations in should_yieldavg2014-01-161-1/+1
|
* Make load average sampling asynchronous to hardclock ticks. This improvesmav2013-09-241-2/+2
| | | | | | | | | | | | | measurement of load caused by time-related events still using hardclock. For example, without this change dummynet, scheduling events each hardclock tick, was always miscounted as load of 1. There is still aliasing with events delayed by the new precision mechanism, but it probably can't be avoided without moving this sampling from using callout to some lower-level code or handling it in some other special way. Reviewed by: davide Approved by: re (marius)
* Fix lc_lock/lc_unlock() support for rmlocks held in shared mode. Withdavide2013-09-201-1/+2
| | | | | | | | | | | | | | | current lock classes KPI it was really difficult because there was no way to pass an rmtracker object to the lock/unlock routines. In order to accomplish the task, modify the aforementioned functions so that they can return (or pass as argument) an uinptr_t, which is in the rm case used to hold a pointer to struct rm_priotracker for current thread. As an added bonus, this fixes rm_sleep() in the rm shared case, which right now can communicate priotracker structure between lc_unlock()/lc_lock(). Suggested by: jhb Reviewed by: jhb Approved by: re (delphij)
* Simplify pause_sbt() logic. Don't call DELAY() if remainder is lesshselasky2013-08-301-7/+7
| | | | than or equal to zero.
* Don't call sleepinit() from proc0_init(), make it a SYSINIT instead.cognet2013-08-091-2/+8
| | | | | vmem needs the sleepq locks to be initialized when free'ing kva, so we want it called as early as possible.
* should_yield: protect from td_swvoltick being uninitialized or too staleavg2013-07-091-1/+1
| | | | | | | | | | | | The distance between ticks and td_swvoltick should be calculated as an unsigned number. Previously we could end up comparing a negative number with hogticks in which case should_yield() would give incorrect answer. We should probably ensure that td_swvoltick is properly initialized. Sponsored by: HybridCluster MFC after: 5 days
* Correct the comment above _sleep() function which still mentions 'timo'davide2013-06-281-1/+1
| | | | | | instead of 'sbintime_t'. Reported by: kan
* Partially revert r195702. Deferring stops is now implemented via a set ofjhb2013-03-181-2/+0
| | | | | | | | calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep calls. - Remove the stop_allowed parameters from cursig() and issignal(). issignal() checks TDF_SBDRY directly. - Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags.
* Make kern_nanosleep() and pause_sbt() to use per-CPU sleep queues.mav2013-03-121-3/+4
| | | | | This removes significant sleep queue lock congestion on multithreaded microbenchmarks, making them scale to multiple CPUs almost linearly.
* MFcalloutng:davide2013-03-041-26/+29
| | | | | | | | Introduce sbt variants of msleep(), msleep_spin(), pause(), tsleep() in the KPI, allowing to specify timeout in 'sbintime_t' rather than ticks. Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, markj, Fabian Keil
* MFcalloutng (r244355):davide2013-03-041-2/+3
| | | | | | | | | | | | | Make loadavg calculation callout direct. There are several reasons for it: - it is very simple and doesn't worth context switch to SWI; - since SWI is no longer used here, we can remove twelve years old hack, excluding this SWI from from the loadavg statistics; - it fixes problem when eventtimer (HPET) shares interrupt with some other device, and that interrupt thread counted as permanent loadavg of 1; now loadavg accounted before that interrupt thread is scheduled. Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, Fabian Keil, markj
* Fix some minor inaccuracies introduced in r243251.bjk2013-01-051-3/+3
| | | | | | | | Also correct the comment in kern_synch.c which was the source of the problematic text. Reviewed by: kib (previous version) Approved by: hrs (mentor)
* Implement the DTrace sched provider. This implementation aims to berstone2012-05-151-0/+17
| | | | | | | | | | | | | | | | | | | | | | compatible with the sched provider implemented by Solaris and its open- source derivatives. Full documentation of the sched provider can be found on Oracle's DTrace wiki pages. Note that for compatibility with scripts originally written for Solaris, serveral probes are defined that will never fire. These probes are defined to fire when Solaris-specific features perform certain actions. As these features are not present in FreeBSD, the probes can never fire. Also, I have added a two probes that are not defined in Solaris, lend-pri and load-change. These probes have been added to make it possible to collect schedgraph data with DTrace. Finally, a few probes are defined in Solaris to take a cpuinfo_t * argument. As it was not immediately clear to me how to translate that to FreeBSD, currently those probes are passed NULL in place of a cpuinfo_t *. Sponsored by: Sandvine Incorporated MFC after: 2 weeks
* Include the associated wait channel message for context switch ktracejhb2012-04-201-4/+4
| | | | | | | records. kdump supports both the old and new messages. Submitted by: Andrey Zonov andrey zonov org MFC after: 1 week
* panic: add a switch and infrastructure for stopping other CPUs in SMP caseavg2011-12-111-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historical behavior of letting other CPUs merily go on is a default for time being. The new behavior can be switched on via kern.stop_scheduler_on_panic tunable and sysctl. Stopping of the CPUs has (at least) the following benefits: - more of the system state at panic time is preserved intact - threads and interrupts do not interfere with dumping of the system state Only one thread runs uninterrupted after panic if stop_scheduler_on_panic is set. That thread might call code that is also used in normal context and that code might use locks to prevent concurrent execution of certain parts. Those locks might be held by the stopped threads and would never be released. To work around this issue, it was decided that instead of explicit checks for panic context, we would rather put those checks inside the locking primitives. This change has substantial portions written and re-written by attilio and kib at various times. Other changes are heavily based on the ideas and patches submitted by jhb and mdf. bde has provided many insights into the details and history of the current code. The new behavior may cause problems for systems that use a USB keyboard for interfacing with system console. This is because of some unusual locking patterns in the ukbd code which have to be used because on one hand ukbd is below syscons, but on the other hand it has to interface with other usb code that uses regular mutexes/Giant for its concurrency protection. Dumping to USB-connected disks may also be affected. PR: amd64/139614 (at least) In cooperation with: attilio, jhb, kib, mdf Discussed with: arch@, bde Tested by: Eugene Grosbein <eugen@grosbein.net>, gnn, Steven Hartland <killing@multiplay.co.uk>, glebius, Andrew Boyer <aboyer@averesystems.com> (various versions of the patch) MFC after: 3 months (or never)
* Make sure the description of pause() ishselasky2011-12-031-1/+2
| | | | | | | | equivalent to its implementation. No code change. Suggested by: Bruce Evans MFC after: 3 days
* Given that the typical usage of pause() is pause("zzz", hz / N), where N canhselasky2011-11-201-1/+1
| | | | | | | be greater than hz in some cases, simply ignore a timeout value of zero. Suggested by: Bruce Evans MFC after: 1 week
* Minor style change:hselasky2011-11-201-10/+9
| | | | | | | | Simplify the description of pause() and shorten the KASSERT message in pause. Also add a clamp for the timo argument in the non-KASSERT case. Suggested by: Bruce Evans MFC after: 1 week
* Simplify the usb_pause_mtx() function by factoring out the generic partshselasky2011-11-191-5/+23
| | | | | | | | | to the kernel's pause() function. The pause() function can now be used when cold != 0. Also assert that the timeout in system ticks must be positive. Suggested by: Bruce Evans MFC after: 1 week
* In order to maximize the re-usability of kernel code in user space thiskmacy2011-09-161-1/+1
| | | | | | | | | | | | | patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
* Simplify a stale assertion. We have not called mi_switch() from a nestedjhb2011-05-241-3/+1
| | | | | | critical section during a preemption for several years. MFC after: 1 week
* Use a name instead of a magic number for kern_yield(9) when the prioritymdf2011-05-131-1/+3
| | | | | | | | should not change. Fetch the td_user_pri under the thread lock. This is probably not necessary but a magic number also seems preferable to knowing the implementation details here. Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >
* Based on discussions on the svn-src mailing list, rework r218195:mdf2011-02-081-2/+33
| | | | | | | | | | | | | | | | | | | | | | - entirely eliminate some calls to uio_yeild() as being unnecessary, such as in a sysctl handler. - move should_yield() and maybe_yield() to kern_synch.c and move the prototypes from sys/uio.h to sys/proc.h - add a slightly more generic kern_yield() that can replace the functionality of uio_yield(). - replace source uses of uio_yield() with the functional equivalent, or in some cases do not change the thread priority when switching. - fix a logic inversion bug in vlrureclaim(), pointed out by bde@. - instead of using the per-cpu last switched ticks, use a per thread variable for should_yield(). With PREEMPTION, the only reasonable use of this is to determine if a lock has been held a long time and relinquish it. Without PREEMPTION, this is essentially the same as the per-cpu variable.
* Only change the priority of timeshare threads to PRI_MAX_TIMESHAREjhb2011-01-061-1/+2
| | | | | | | when yield() is called. Specifically, leave the priority of real time and idle threads unchanged. MFC after: 2 weeks
* Add new msleep(9) flag PBDY that shall be specified together withkib2009-07-141-0/+2
| | | | | | | | | | | | PCATCH, to indicate that thread shall not be stopped upon receipt of SIGSTOP until it reaches the kernel->usermode boundary. Also change thread_single(SINGLE_NO_EXIT) to only stop threads at the user boundary unconditionally. Tested by: pho Reviewed by: jhb Approved by: re (kensmith)
* When wakeup(9) is going to notify swapper, assert that wait channel is notkib2009-07-141-1/+4
| | | | | | | | equal to &proc0. It shall be not, since proc0 stack is not swappable, and kick_proc0() is wakeup(&proc0). Reviewed by: jhb Approved by: re (kensmith)
* Remove even more unneeded variable assignments.ed2009-02-261-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kern_time.c: - Unused variable `p'. kern_thr.c: - Variable `error' is always caught immediately, so no reason to initialize it. There is no way that error != 0 at the end of create_thread(). kern_sig.c: - Unused variable `code'. kern_synch.c: - `rval' is always assigned in all different cases. kern_rwlock.c: - `v' is always overwritten with RW_UNLOCKED further on. kern_malloc.c: - `size' is always initialized with the proper value before being used. kern_exit.c: - `error' is always caught and returned immediately. abort2() never returns a non-zero value. kern_exec.c: - `len' is always assigned inside the if-statement right below it. tty_info.c: - `td' is always overwritten by FOREACH_THREAD_IN_PROC(). Found by: LLVM's scan-build
* - Implement generic macros for producing KTR records that are compatiblejeff2009-01-171-13/+14
| | | | | | | | | | | | with src/tools/sched/schedgraph.py. This allows developers to quickly create a graphical view of ktr data for any resource in the system. - Add sched_tdname() and the pcpu field 'name' for quickly and uniformly identifying records associated with a thread or cpu. - Reimplement the KTR_SCHED traces using the new generic facility. Obtained from: attilio Discussed with: jhb Sponsored by: Nokia
* - Forward port flush of page table updates on context switch or userretkmacy2008-10-191-0/+9
| | | | - Forward port vfork XEN hack
* - Don't do a WITNESS_SAVE() on the interlock if it is Giant in the conditionjhb2008-09-251-0/+2
| | | | | | variable wait routines. DROP_GIANT() already manages that state in the Giant interlock case. - Assert that Giant is held when it is passed as a sleep interlock.
* Remove the now unused `lbolt' variable from the kernel.ed2008-08-201-15/+3
| | | | | | | | | | We used to have a single wait channel inside the kernel which could be used by threads that just wanted to sleep for some time (the next second). The old TTY layer was the only piece of code that still used lbolt, because I already removed the use of lbolt from the NFS clients and the VFS syncer. Approved by: philip
* Permit Giant to be passed as the explicit interlock either tojhb2008-08-071-2/+6
| | | | | | | | | | | | | | | msleep/mtx_sleep or the various cv_*wait*() routines. Currently, the "unlock" behavior of PDROP and cv_wait_unlock() with Giant is not permitted as it is will be confusing since Giant is fully unrecursed and unlocked during a thread sleep. This is handy for subsystems which wish to allow unlocked drivers to continue to use Giant such as CAM, the new TTY layer, and the new USB stack. CAM currently uses a hack that I told Scott to use because I really didn't want to permit this behavior, and the TTY and USB patches both have various patches to permit this. MFC after: 2 weeks
* If a thread that is swapped out is made runnable, then the setrunnable()jhb2008-08-051-15/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()). With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock. Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal(). Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks
* - Make SCHED_STATS more generic by adding a wrapper to create thejeff2008-04-171-1/+5
| | | | | | | | | | | | | | | | | | variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia
* Consistently use ANSI C declarationsfor all functions in kern_synch.c.rwatson2008-03-161-19/+7
|
* In keeping with style(9)'s recommendations on macros, use a ';'rwatson2008-03-161-1/+2
| | | | | | | | | after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
* Remove kernel support for M:N threading.jeff2008-03-121-12/+2
| | | | | | | | While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
* - Pass the priority argument from *sleep() into sleepq and down intojeff2008-03-121-19/+10
| | | | | | | | | | | | | | | | | sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_* to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter. Reviewed by: jhb, peter
* - Handle kdb switch panics outside of mi_switch() to remove some instructionsjeff2008-03-101-6/+11
| | | | | | | from the common path and make the code more clear. Whether this has any impact on performance may depend on optimization levels. Sponsored by: Nokia
* Don't zero td_runtime when billing thread CPU usage to the process;rwatson2008-01-101-2/+4
| | | | | | | | | | | | | | | | | | | | | maintain a separate td_incruntime to hold unbilled CPU usage for the thread that has the previous properties of td_runtime. When thread information is requested using the thread monitoring sysctls, export thread td_runtime instead of process rusage runtime in kinfo_proc. This restores the display of individual ithread and other kernel thread CPU usage since inception in ps -H and top -SH, as well for libthr user threads, valuable debugging information lost with the move to try kthreads since they are no longer independent processes. There is universal agreement that we should rewrite the process and thread export sysctls, but this commit gets things going a bit better in the mean time. Likewise, there are resevations about the continued validity of statclock given the speed of modern processors. Reviewed by: attilio, emaste, jhb, julian
* A bunch more files that should probably print out a thread namejulian2007-11-141-4/+4
| | | | instead of a process name.
* generally we are interested in what thread did something asjulian2007-11-141-5/+5
| | | | | | opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.
* - Restore historical yield() behavior by manually lowering priority andjeff2007-10-081-3/+6
| | | | | | switching. Approved by: re
* - Move all of the PS_ flags into either p_flag or td_flags.jeff2007-09-171-11/+5
| | | | | | | | | | | | | | - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)
OpenPOWER on IntegriCloud