summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_sx.c
Commit message (Collapse)AuthorAgeFilesLines
* Fixup r240424: On entering KDB backends, the hijacked thread to runattilio2012-12-221-4/+5
| | | | | | | | | | | | | interrupt context can still be idlethread. At that point, without the panic condition, it can still happen that idlethread then will try to acquire some locks to carry on some operations. Skip the idlethread check on block/sleep lock operations when KDB is active. Reported by: jh Tested by: jh MFC after: 1 week
* Remove all the checks on curthread != NULL with the exception of some MDattilio2012-09-131-5/+0
| | | | | | | | | | | trap checks (eg. printtrap()). Generally this check is not needed anymore, as there is not a legitimate case where curthread != NULL, after pcpu 0 area has been properly initialized. Reviewed by: bde, jhb MFC after: 1 week
* Improve check coverage about idle threads.attilio2012-09-121-0/+13
| | | | | | | | | | | | Idle threads are not allowed to acquire any lock but spinlocks. Deny any attempt to do so by panicing at the locking operation when INVARIANTS is on. Then, remove the check on blocking on a turnstile. The check in sleepqueues is left because they are not allowed to use tsleep() either which could happen still. Reviewed by: bde, jhb, kib MFC after: 1 week
* Add software PMC support.fabient2012-03-281-0/+12
| | | | | | | | | | | | | New kernel events can be added at various location for sampling or counting. This will for example allow easy system profiling whatever the processor is with known tools like pmcstat(8). Simultaneous usage of software PMC and hardware PMC is possible, for example looking at the lock acquire failure, page fault while sampling on instructions. Sponsored by: NETASQ MFC after: 1 month
* put sys/systm.h at its proper place or add it if missingavg2011-12-121-1/+1
| | | | | | | Reported by: lstewart, tinderbox Pointyhat to: avg, attilio MFC after: 1 week MFC with: r228430
* panic: add a switch and infrastructure for stopping other CPUs in SMP caseavg2011-12-111-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historical behavior of letting other CPUs merily go on is a default for time being. The new behavior can be switched on via kern.stop_scheduler_on_panic tunable and sysctl. Stopping of the CPUs has (at least) the following benefits: - more of the system state at panic time is preserved intact - threads and interrupts do not interfere with dumping of the system state Only one thread runs uninterrupted after panic if stop_scheduler_on_panic is set. That thread might call code that is also used in normal context and that code might use locks to prevent concurrent execution of certain parts. Those locks might be held by the stopped threads and would never be released. To work around this issue, it was decided that instead of explicit checks for panic context, we would rather put those checks inside the locking primitives. This change has substantial portions written and re-written by attilio and kib at various times. Other changes are heavily based on the ideas and patches submitted by jhb and mdf. bde has provided many insights into the details and history of the current code. The new behavior may cause problems for systems that use a USB keyboard for interfacing with system console. This is because of some unusual locking patterns in the ukbd code which have to be used because on one hand ukbd is below syscons, but on the other hand it has to interface with other usb code that uses regular mutexes/Giant for its concurrency protection. Dumping to USB-connected disks may also be affected. PR: amd64/139614 (at least) In cooperation with: attilio, jhb, kib, mdf Discussed with: arch@, bde Tested by: Eugene Grosbein <eugen@grosbein.net>, gnn, Steven Hartland <killing@multiplay.co.uk>, glebius, Andrew Boyer <aboyer@averesystems.com> (various versions of the patch) MFC after: 3 months (or never)
* Introduce the same mutex-wise fix in r227758 for sx locks.attilio2011-11-211-4/+4
| | | | | | | | | | | | | | | | | | | | | The functions that offer file and line specifications are: - sx_assert_ - sx_downgrade_ - sx_slock_ - sx_slock_sig_ - sx_sunlock_ - sx_try_slock_ - sx_try_xlock_ - sx_try_upgrade_ - sx_unlock_ - sx_xlock_ - sx_xlock_sig_ - sx_xunlock_ Now vm_map locking is fully converted and can avoid to know specifics about locking procedures. Reviewed by: kib MFC after: 1 month
* Constify arguments for locking KPIs where possible.pjd2011-11-161-11/+11
| | | | | | | This enables locking consumers to pass their own structures around as const and be able to assert locks embedded into those structures. Reviewed by: ed, kib, jhb
* Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.ed2011-11-071-1/+1
| | | | | | The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
* - Merge changes to the base system to support OFED. These includejeff2011-03-211-1/+1
| | | | | a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.
* sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.mdf2011-01-121-2/+2
| | | | Commit the kernel changes.
* Remove unneeded includes of <sys/linker_set.h>. Other headers that usejhb2011-01-111-1/+0
| | | | | | it internally contain nested includes. Reviewed by: bde
* Fix a sign bug that caused adaptive spinning in sx_xlock() to not workjhb2010-06-081-1/+1
| | | | | | | | properly. Among other things it did not drop Giant while spinning leading to livelocks. Reviewed by: rookie, kib, jmallett MFC after: 3 days
* In current code, threads performing an interruptible sleep (on bothattilio2009-12-121-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sxlock, via the sx_{s, x}lock_sig() interface, or plain lockmgr), will leave the waiters flag on forcing the owner to do a wakeup even when if the waiter queue is empty. That operation may lead to a deadlock in the case of doing a fake wakeup on the "preferred" (based on the wakeup algorithm) queue while the other queue has real waiters on it, because nobody is going to wakeup the 2nd queue waiters and they will sleep indefinitively. A similar bug, is present, for lockmgr in the case the waiters are sleeping with LK_SLEEPFAIL on. In this case, even if the waiters queue is not empty, the waiters won't progress after being awake but they will just fail, still not taking care of the 2nd queue waiters (as instead the lock owned doing the wakeup would expect). In order to fix this bug in a cheap way (without adding too much locking and complicating too much the semantic) add a sleepqueue interface which does report the actual number of waiters on a specified queue of a waitchannel (sleepq_sleepcnt()) and use it in order to determine if the exclusive waiters (or shared waiters) are actually present on the lockmgr (or sx) before to give them precedence in the wakeup algorithm. This fix alone, however doesn't solve the LK_SLEEPFAIL bug. In order to cope with it, add the tracking of how many exclusive LK_SLEEPFAIL waiters a lockmgr has and if all the waiters on the exclusive waiters queue are LK_SLEEPFAIL just wake both queues. The sleepq_sleepcnt() introduction and ABI breakage require __FreeBSD_version bumping. Reported by: avg, kib, pho Reviewed by: kib Tested by: pho
* When releasing a read/shared lock we need to use a write memory barrierattilio2009-09-301-4/+4
| | | | | | | | | | in order to avoid, on architectures which doesn't have strong ordered writes, CPU instructions reordering. Diagnosed by: fabio Reviewed by: jhb Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* Fix some bugs related to adaptive spinning:attilio2009-09-021-1/+1
| | | | | | | | | | | | | | | | | | | In the lockmgr support: - GIANT_RESTORE() is just called when the sleep finishes, so the current code can ends up into a giant unlock problem. Fix it by appropriately call GIANT_RESTORE() when needed. Note that this is not exactly ideal because for any interation of the adaptive spinning we drop and restore Giant, but the overhead should be not a factor. - In the lock held in exclusive mode case, after the adaptive spinning is brought to completition, we should just retry to acquire the lock instead to fallthrough. Fix that. - Fix a style nit In the sx support: - Call GIANT_SAVE() before than looping. This saves some overhead because in the current code GIANT_SAVE() is called several times. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* * Change the scope of the ASSERT_ATOMIC_LOAD() from a generic check toattilio2009-08-171-2/+3
| | | | | | | | | | | | | | | a pointer-fetching specific operation check. Consequently, rename the operation ASSERT_ATOMIC_LOAD_PTR(). * Fix the implementation of ASSERT_ATOMIC_LOAD_PTR() by checking directly alignment on the word boundry, for all the given specific architectures. That's a bit too strict for some common case, but it assures safety. * Add a comment explaining the scope of the macro * Add a new stub in the lockmgr specific implementation Tested by: marcel (initial version), marius Reviewed by: rwatson, jhb (comment specific review) Approved by: re (kib)
* Add a new macro to test that a variable could be loaded atomically.bz2009-08-141-0/+2
| | | | | | | | | | | | | | | | | | | Check that the given variable is at most uintptr_t in size and that it is aligned. Note: ASSERT_ATOMIC_LOAD() uses ALIGN() to check for adequate alignment -- however, the function of ALIGN() is to guarantee alignment, and therefore may lead to stronger alignment enforcement than necessary for types that are smaller than sizeof(uintptr_t). Add checks to mtx, rw and sx locks init functions to detect possible breakage. This was used during debugging of the problem fixed with r196118 where a pointer was on an un-aligned address in the dpcpu area. In collaboration with: rwatson Reviewed by: rwatson Approved by: re (kib)
* Handle lock recursion differenty by always checking against LO_RECURSABLEattilio2009-06-021-6/+8
| | | | | | instead the lock own flag itself. Tested by: pho
* The patch for r193011 was partially rejected when applied, complete it.attilio2009-05-291-2/+4
|
* Reverse the logic for ADAPTIVE_SX option and enable it by default.attilio2009-05-291-21/+50
| | | | | | | | | | | | | | | | | | | | | | Introduce for this operation the reverse NO_ADAPTIVE_SX option. The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed and the new flag, offering the reversed logic, SX_NOADAPTIVE is added. Additively implements adaptive spininning for sx held in shared mode. The spinning limit can be handled through sysctls in order to be tuned while the code doesn't reach the release, after which time they should be dropped probabilly. This change has made been necessary by recent benchmarks where it does improve concurrency of workloads in presence of high contention (ie. ZFS). KPI breakage is documented by __FreeBSD_version bumping, manpage and UPDATING updates. Requested by: jeff, kmacy Reviewed by: jeff Tested by: pho
* Add the OpenSolaris dtrace lockstat provider. The lockstat providersson2009-05-261-12/+82
| | | | | | | | | | adds probes for mutexes, reader/writer and shared/exclusive locks to gather contention statistics and other locking information for dtrace scripts, the lockstat(1M) command and other potential consumers. Reviewed by: attilio jhb jb Approved by: gnn (mentor)
* - Wrap lock profiling state variables in #ifdef LOCK_PROFILING blocks.jeff2009-03-151-2/+7
|
* Teach WITNESS about the interlocks used with lockmgr. This removes a bunchjhb2008-09-101-2/+2
| | | | | | | | of spurious witness warnings since lockmgr grew witness support. Before this, every time you passed an interlock to a lockmgr lock WITNESS treated it as a LOR. Reviewed by: attilio
* If a thread that is swapped out is made runnable, then the setrunnable()jhb2008-08-051-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()). With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock. Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal(). Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks
* - Embed the recursion counter for any locking primitive directly in theattilio2008-05-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | lock_object, using an unified field called lo_data. - Replace lo_type usage with the w_name usage and at init time pass the lock "type" directly to witness_init() from the parent lock init function. Handle delayed initialization before than witness_initialize() is called through the witness_pendhelp structure. - Axe out LO_ENROLLPEND as it is not really needed. The case where the mutex init delayed wants to be destroyed can't happen because witness_destroy() checks for witness_cold and panic in case. - In enroll(), if we cannot allocate a new object from the freelist, notify that to userspace through a printf(). - Modify the depart function in order to return nothing as in the current CVS version it always returns true and adjust callers accordingly. - Fix the witness_addgraph() argument name prototype. - Remove unuseful code from itismychild(). This commit leads to a shrinked struct lock_object and so smaller locks, in particular on amd64 where 2 uintptr_t (16 bytes per-primitive) are gained. Reviewed by: jhb
* - Pass the priority argument from *sleep() into sleepq and down intojeff2008-03-121-9/+10
| | | | | | | | | | | | | | | | | sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_* to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter. Reviewed by: jhb, peter
* - Re-implement lock profiling in such a way that it no longer breaksjeff2007-12-151-25/+9
| | | | | | | | | | | | | | | | | | | | | | the ABI when enabled. There is no longer an embedded lock_profile_object in each lock. Instead a list of lock_profile_objects is kept per-thread for each lock it may own. The cnt_hold statistic is now always 0 to facilitate this. - Support shared locking by tracking individual lock instances and statistics in the per-thread per-instance lock_profile_object. - Make the lock profiling hash table a per-cpu singly linked list with a per-cpu static lock_prof allocator. This removes the need for an array of spinlocks and reduces cache contention between cores. - Use a seperate hash for spinlocks and other locks so that only a critical_enter() is required and not a spinlock_enter() to modify the per-cpu tables. - Count time spent spinning in the lock statistics. - Remove the LOCK_PROFILE_SHARED option as it is always supported now. - Specifically drop and release the scheduler locks in both schedulers since we track owners now. In collaboration with: Kip Macy Sponsored by: Nokia
* Expand lock class with the "virtual" function lc_assert which will offerattilio2007-11-181-0/+9
| | | | | | | | | an unified way for all the lock primitives to express lock assertions. Currenty, lockmgrs and rmlocks don't have assertions, so just panic in that case. This will be a base for more callout improvements. Ok'ed by: jhb, jeff
* generally we are interested in what thread did something asjulian2007-11-141-1/+1
| | | | | | opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.
* Fix sx_try_slock(), so it only fails when there is an exclusive owner.pjd2007-10-021-9/+12
| | | | | | | | | | | | | | Before that fix, it was possible for the function to fail if number of sharers changes between 'x = sx->sx_lock' step and atomic_cmpset_acq_ptr() call. This fixes ZFS problem when ZFS returns strange EIO errors under load. In ZFS there is a code that depends on the fact that sx_try_slock() can only fail if there is an exclusive owner. Discussed with: attilio Reviewed by: jhb Approved by: re (kensmith)
* Fix some problems with lock_profiling in sx locks:attilio2007-07-061-22/+32
| | | | | | | | | | | | | | | | | - Adjust lock_profiling stubs semantic in the hard functions in order to be more accurate and trustable - Disable shared paths for lock_profiling. Actually, lock_profiling has a subtle race which makes results caming from shared paths not completely trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for re-enabling this paths, but is currently intended for developing use only. - Use homogeneous names for automatic variables in hard functions regarding lock_profiling - Style fixes - Add a CTASSERT for some flags building Discussed with: kmacy, kris Approved by: jeff (mentor) Approved by: re
* Add functions sx_xlock_sig() and sx_slock_sig().attilio2007-05-311-26/+62
| | | | | | | | | | | | | | | | These functions are intended to do the same actions of sx_xlock() and sx_slock() but with the difference to perform an interruptible sleep, so that sleep can be interrupted by external events. In order to support these new featueres, some code renstruction is needed, but external API won't be affected at all. Note: use "void" cast for "int" returning functions in order to avoid tools like Coverity prevents to whine. Requested by: rwatson Tested by: rwatson Reviewed by: jhb Approved by: jeff (mentor)
* style(9) fixes for sx locks.attilio2007-05-291-2/+2
| | | | Approved by: jeff (mentor)
* Add a small fix for lock profiling in sx locks.attilio2007-05-291-1/+1
| | | | | | | | | | "0" cannot be a correct value since when the function is entered at least one shared holder must be present and since we want the last one "1" is the correct value. Note that lock_profiling for sx locks is far from being perfect. Expect further fixes for that. Approved by: jeff (mentor)
* Rename the macros for assertion flags passed to sx_assert() from SX_* tojhb2007-05-191-19/+19
| | | | | | | SA_* to match mutexes and rwlocks. The old flags still exist for backwards compatiblity. Requested by: attilio
* Expose sx_xholder() as a public macro. It returns a pointer to the threadjhb2007-05-191-8/+0
| | | | | | | that holds the current exclusive lock, or NULL if no thread holds an exclusive lock. Requested by: pjd
* Oops, didn't include SX_ADAPTIVESPIN in the list of valid flags for thejhb2007-05-191-1/+1
| | | | | | assert in sx_init_flags(). Submitted by: attilio
* Add a new SX_RECURSE flag to make support for recursive exclusive locksjhb2007-05-191-2/+8
| | | | | | | conditional. By default, sx(9) locks are back to not supporting recursive exclusive locks. Submitted by: attilio
* Fix a comment.jhb2007-05-181-2/+2
|
* Move lock_profile_object_{init,destroy}() into lock_{init,destroy}().jhb2007-05-181-2/+0
|
* Add destroyed cookie values for sx locks and rwlocks as well as extrajhb2007-05-081-1/+21
| | | | | KASSERTs so that any lock operations on a destroyed lock will panic or hang.
* fix typokmacy2007-04-041-1/+1
|
* style fixes and make sure that the lock is treated as released in the ↵kmacy2007-04-041-4/+5
| | | | | | sharers == 0 case not that this is somewhat racy because a new sharer can come in while we're updating stats
* Fixes to sx for newsx - fix recursed case and move out of inlinekmacy2007-04-031-8/+31
| | | | Submitted by: Attilio Rao <attilio@freebsd.org>
* Optimize sx locks to use simple atomic operations for the common cases ofjhb2007-03-311-230/+682
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | obtaining and releasing shared and exclusive locks. The algorithms for manipulating the lock cookie are very similar to that rwlocks. This patch also adds support for exclusive locks using the same algorithm as mutexes. A new sx_init_flags() function has been added so that optional flags can be specified to alter a given locks behavior. The flags include SX_DUPOK, SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature to the similar flags for mutexes. Adaptive spinning on select locks may be enabled by enabling the ADAPTIVE_SX kernel option. Only locks initialized with the SX_ADAPTIVESPIN flag via sx_init_flags() will adaptively spin. The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock() are now performed inline in non-debug kernels. As a result, <sys/sx.h> now requires <sys/lock.h> to be included prior to <sys/sx.h>. The new kernel option SX_NOINLINE can be used to disable the aforementioned inlining in non-debug kernels. The size of struct sx has changed, so the kernel ABI is probably greatly disturbed. MFC after: 1 month Submitted by: attilio Tested by: kris, pjd
* Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes,jhb2007-03-211-41/+41
| | | | rwlocks, and sx locks to 'lock_object'.
* Add two new function pointers 'lc_lock' and 'lc_unlock' to lock classes.jhb2007-03-091-0/+33
| | | | | | | | | | | | | These functions are intended to be used to drop a lock and then reacquire it when doing an sleep such as msleep(9). Both functions accept a 'struct lock_object *' as their first parameter. The 'lc_unlock' function returns an integer that is then passed as the second paramter to the subsequent 'lc_lock' function. This can be used to communicate state. For example, sx locks and rwlocks use this to indicate if the lock was share/read locked vs exclusive/write locked. Currently, spin mutexes and lockmgr locks do not provide working lc_lock and lc_unlock functions.
* Use C99-style struct member initialization for lock classes.jhb2007-03-091-3/+3
|
* lock stats updates need to be protected by the lockkmacy2007-03-021-24/+3
|
OpenPOWER on IntegriCloud