summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_sx.c
Commit message (Collapse)AuthorAgeFilesLines
* - Merge changes to the base system to support OFED. These includejeff2011-03-211-1/+1
| | | | | a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.
* sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.mdf2011-01-121-2/+2
| | | | Commit the kernel changes.
* Remove unneeded includes of <sys/linker_set.h>. Other headers that usejhb2011-01-111-1/+0
| | | | | | it internally contain nested includes. Reviewed by: bde
* Fix a sign bug that caused adaptive spinning in sx_xlock() to not workjhb2010-06-081-1/+1
| | | | | | | | properly. Among other things it did not drop Giant while spinning leading to livelocks. Reviewed by: rookie, kib, jmallett MFC after: 3 days
* In current code, threads performing an interruptible sleep (on bothattilio2009-12-121-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sxlock, via the sx_{s, x}lock_sig() interface, or plain lockmgr), will leave the waiters flag on forcing the owner to do a wakeup even when if the waiter queue is empty. That operation may lead to a deadlock in the case of doing a fake wakeup on the "preferred" (based on the wakeup algorithm) queue while the other queue has real waiters on it, because nobody is going to wakeup the 2nd queue waiters and they will sleep indefinitively. A similar bug, is present, for lockmgr in the case the waiters are sleeping with LK_SLEEPFAIL on. In this case, even if the waiters queue is not empty, the waiters won't progress after being awake but they will just fail, still not taking care of the 2nd queue waiters (as instead the lock owned doing the wakeup would expect). In order to fix this bug in a cheap way (without adding too much locking and complicating too much the semantic) add a sleepqueue interface which does report the actual number of waiters on a specified queue of a waitchannel (sleepq_sleepcnt()) and use it in order to determine if the exclusive waiters (or shared waiters) are actually present on the lockmgr (or sx) before to give them precedence in the wakeup algorithm. This fix alone, however doesn't solve the LK_SLEEPFAIL bug. In order to cope with it, add the tracking of how many exclusive LK_SLEEPFAIL waiters a lockmgr has and if all the waiters on the exclusive waiters queue are LK_SLEEPFAIL just wake both queues. The sleepq_sleepcnt() introduction and ABI breakage require __FreeBSD_version bumping. Reported by: avg, kib, pho Reviewed by: kib Tested by: pho
* When releasing a read/shared lock we need to use a write memory barrierattilio2009-09-301-4/+4
| | | | | | | | | | in order to avoid, on architectures which doesn't have strong ordered writes, CPU instructions reordering. Diagnosed by: fabio Reviewed by: jhb Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* Fix some bugs related to adaptive spinning:attilio2009-09-021-1/+1
| | | | | | | | | | | | | | | | | | | In the lockmgr support: - GIANT_RESTORE() is just called when the sleep finishes, so the current code can ends up into a giant unlock problem. Fix it by appropriately call GIANT_RESTORE() when needed. Note that this is not exactly ideal because for any interation of the adaptive spinning we drop and restore Giant, but the overhead should be not a factor. - In the lock held in exclusive mode case, after the adaptive spinning is brought to completition, we should just retry to acquire the lock instead to fallthrough. Fix that. - Fix a style nit In the sx support: - Call GIANT_SAVE() before than looping. This saves some overhead because in the current code GIANT_SAVE() is called several times. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* * Change the scope of the ASSERT_ATOMIC_LOAD() from a generic check toattilio2009-08-171-2/+3
| | | | | | | | | | | | | | | a pointer-fetching specific operation check. Consequently, rename the operation ASSERT_ATOMIC_LOAD_PTR(). * Fix the implementation of ASSERT_ATOMIC_LOAD_PTR() by checking directly alignment on the word boundry, for all the given specific architectures. That's a bit too strict for some common case, but it assures safety. * Add a comment explaining the scope of the macro * Add a new stub in the lockmgr specific implementation Tested by: marcel (initial version), marius Reviewed by: rwatson, jhb (comment specific review) Approved by: re (kib)
* Add a new macro to test that a variable could be loaded atomically.bz2009-08-141-0/+2
| | | | | | | | | | | | | | | | | | | Check that the given variable is at most uintptr_t in size and that it is aligned. Note: ASSERT_ATOMIC_LOAD() uses ALIGN() to check for adequate alignment -- however, the function of ALIGN() is to guarantee alignment, and therefore may lead to stronger alignment enforcement than necessary for types that are smaller than sizeof(uintptr_t). Add checks to mtx, rw and sx locks init functions to detect possible breakage. This was used during debugging of the problem fixed with r196118 where a pointer was on an un-aligned address in the dpcpu area. In collaboration with: rwatson Reviewed by: rwatson Approved by: re (kib)
* Handle lock recursion differenty by always checking against LO_RECURSABLEattilio2009-06-021-6/+8
| | | | | | instead the lock own flag itself. Tested by: pho
* The patch for r193011 was partially rejected when applied, complete it.attilio2009-05-291-2/+4
|
* Reverse the logic for ADAPTIVE_SX option and enable it by default.attilio2009-05-291-21/+50
| | | | | | | | | | | | | | | | | | | | | | Introduce for this operation the reverse NO_ADAPTIVE_SX option. The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed and the new flag, offering the reversed logic, SX_NOADAPTIVE is added. Additively implements adaptive spininning for sx held in shared mode. The spinning limit can be handled through sysctls in order to be tuned while the code doesn't reach the release, after which time they should be dropped probabilly. This change has made been necessary by recent benchmarks where it does improve concurrency of workloads in presence of high contention (ie. ZFS). KPI breakage is documented by __FreeBSD_version bumping, manpage and UPDATING updates. Requested by: jeff, kmacy Reviewed by: jeff Tested by: pho
* Add the OpenSolaris dtrace lockstat provider. The lockstat providersson2009-05-261-12/+82
| | | | | | | | | | adds probes for mutexes, reader/writer and shared/exclusive locks to gather contention statistics and other locking information for dtrace scripts, the lockstat(1M) command and other potential consumers. Reviewed by: attilio jhb jb Approved by: gnn (mentor)
* - Wrap lock profiling state variables in #ifdef LOCK_PROFILING blocks.jeff2009-03-151-2/+7
|
* Teach WITNESS about the interlocks used with lockmgr. This removes a bunchjhb2008-09-101-2/+2
| | | | | | | | of spurious witness warnings since lockmgr grew witness support. Before this, every time you passed an interlock to a lockmgr lock WITNESS treated it as a LOR. Reviewed by: attilio
* If a thread that is swapped out is made runnable, then the setrunnable()jhb2008-08-051-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()). With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock. Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal(). Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks
* - Embed the recursion counter for any locking primitive directly in theattilio2008-05-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | lock_object, using an unified field called lo_data. - Replace lo_type usage with the w_name usage and at init time pass the lock "type" directly to witness_init() from the parent lock init function. Handle delayed initialization before than witness_initialize() is called through the witness_pendhelp structure. - Axe out LO_ENROLLPEND as it is not really needed. The case where the mutex init delayed wants to be destroyed can't happen because witness_destroy() checks for witness_cold and panic in case. - In enroll(), if we cannot allocate a new object from the freelist, notify that to userspace through a printf(). - Modify the depart function in order to return nothing as in the current CVS version it always returns true and adjust callers accordingly. - Fix the witness_addgraph() argument name prototype. - Remove unuseful code from itismychild(). This commit leads to a shrinked struct lock_object and so smaller locks, in particular on amd64 where 2 uintptr_t (16 bytes per-primitive) are gained. Reviewed by: jhb
* - Pass the priority argument from *sleep() into sleepq and down intojeff2008-03-121-9/+10
| | | | | | | | | | | | | | | | | sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_* to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter. Reviewed by: jhb, peter
* - Re-implement lock profiling in such a way that it no longer breaksjeff2007-12-151-25/+9
| | | | | | | | | | | | | | | | | | | | | | the ABI when enabled. There is no longer an embedded lock_profile_object in each lock. Instead a list of lock_profile_objects is kept per-thread for each lock it may own. The cnt_hold statistic is now always 0 to facilitate this. - Support shared locking by tracking individual lock instances and statistics in the per-thread per-instance lock_profile_object. - Make the lock profiling hash table a per-cpu singly linked list with a per-cpu static lock_prof allocator. This removes the need for an array of spinlocks and reduces cache contention between cores. - Use a seperate hash for spinlocks and other locks so that only a critical_enter() is required and not a spinlock_enter() to modify the per-cpu tables. - Count time spent spinning in the lock statistics. - Remove the LOCK_PROFILE_SHARED option as it is always supported now. - Specifically drop and release the scheduler locks in both schedulers since we track owners now. In collaboration with: Kip Macy Sponsored by: Nokia
* Expand lock class with the "virtual" function lc_assert which will offerattilio2007-11-181-0/+9
| | | | | | | | | an unified way for all the lock primitives to express lock assertions. Currenty, lockmgrs and rmlocks don't have assertions, so just panic in that case. This will be a base for more callout improvements. Ok'ed by: jhb, jeff
* generally we are interested in what thread did something asjulian2007-11-141-1/+1
| | | | | | opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.
* Fix sx_try_slock(), so it only fails when there is an exclusive owner.pjd2007-10-021-9/+12
| | | | | | | | | | | | | | Before that fix, it was possible for the function to fail if number of sharers changes between 'x = sx->sx_lock' step and atomic_cmpset_acq_ptr() call. This fixes ZFS problem when ZFS returns strange EIO errors under load. In ZFS there is a code that depends on the fact that sx_try_slock() can only fail if there is an exclusive owner. Discussed with: attilio Reviewed by: jhb Approved by: re (kensmith)
* Fix some problems with lock_profiling in sx locks:attilio2007-07-061-22/+32
| | | | | | | | | | | | | | | | | - Adjust lock_profiling stubs semantic in the hard functions in order to be more accurate and trustable - Disable shared paths for lock_profiling. Actually, lock_profiling has a subtle race which makes results caming from shared paths not completely trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for re-enabling this paths, but is currently intended for developing use only. - Use homogeneous names for automatic variables in hard functions regarding lock_profiling - Style fixes - Add a CTASSERT for some flags building Discussed with: kmacy, kris Approved by: jeff (mentor) Approved by: re
* Add functions sx_xlock_sig() and sx_slock_sig().attilio2007-05-311-26/+62
| | | | | | | | | | | | | | | | These functions are intended to do the same actions of sx_xlock() and sx_slock() but with the difference to perform an interruptible sleep, so that sleep can be interrupted by external events. In order to support these new featueres, some code renstruction is needed, but external API won't be affected at all. Note: use "void" cast for "int" returning functions in order to avoid tools like Coverity prevents to whine. Requested by: rwatson Tested by: rwatson Reviewed by: jhb Approved by: jeff (mentor)
* style(9) fixes for sx locks.attilio2007-05-291-2/+2
| | | | Approved by: jeff (mentor)
* Add a small fix for lock profiling in sx locks.attilio2007-05-291-1/+1
| | | | | | | | | | "0" cannot be a correct value since when the function is entered at least one shared holder must be present and since we want the last one "1" is the correct value. Note that lock_profiling for sx locks is far from being perfect. Expect further fixes for that. Approved by: jeff (mentor)
* Rename the macros for assertion flags passed to sx_assert() from SX_* tojhb2007-05-191-19/+19
| | | | | | | SA_* to match mutexes and rwlocks. The old flags still exist for backwards compatiblity. Requested by: attilio
* Expose sx_xholder() as a public macro. It returns a pointer to the threadjhb2007-05-191-8/+0
| | | | | | | that holds the current exclusive lock, or NULL if no thread holds an exclusive lock. Requested by: pjd
* Oops, didn't include SX_ADAPTIVESPIN in the list of valid flags for thejhb2007-05-191-1/+1
| | | | | | assert in sx_init_flags(). Submitted by: attilio
* Add a new SX_RECURSE flag to make support for recursive exclusive locksjhb2007-05-191-2/+8
| | | | | | | conditional. By default, sx(9) locks are back to not supporting recursive exclusive locks. Submitted by: attilio
* Fix a comment.jhb2007-05-181-2/+2
|
* Move lock_profile_object_{init,destroy}() into lock_{init,destroy}().jhb2007-05-181-2/+0
|
* Add destroyed cookie values for sx locks and rwlocks as well as extrajhb2007-05-081-1/+21
| | | | | KASSERTs so that any lock operations on a destroyed lock will panic or hang.
* fix typokmacy2007-04-041-1/+1
|
* style fixes and make sure that the lock is treated as released in the ↵kmacy2007-04-041-4/+5
| | | | | | sharers == 0 case not that this is somewhat racy because a new sharer can come in while we're updating stats
* Fixes to sx for newsx - fix recursed case and move out of inlinekmacy2007-04-031-8/+31
| | | | Submitted by: Attilio Rao <attilio@freebsd.org>
* Optimize sx locks to use simple atomic operations for the common cases ofjhb2007-03-311-230/+682
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | obtaining and releasing shared and exclusive locks. The algorithms for manipulating the lock cookie are very similar to that rwlocks. This patch also adds support for exclusive locks using the same algorithm as mutexes. A new sx_init_flags() function has been added so that optional flags can be specified to alter a given locks behavior. The flags include SX_DUPOK, SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature to the similar flags for mutexes. Adaptive spinning on select locks may be enabled by enabling the ADAPTIVE_SX kernel option. Only locks initialized with the SX_ADAPTIVESPIN flag via sx_init_flags() will adaptively spin. The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock() are now performed inline in non-debug kernels. As a result, <sys/sx.h> now requires <sys/lock.h> to be included prior to <sys/sx.h>. The new kernel option SX_NOINLINE can be used to disable the aforementioned inlining in non-debug kernels. The size of struct sx has changed, so the kernel ABI is probably greatly disturbed. MFC after: 1 month Submitted by: attilio Tested by: kris, pjd
* Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes,jhb2007-03-211-41/+41
| | | | rwlocks, and sx locks to 'lock_object'.
* Add two new function pointers 'lc_lock' and 'lc_unlock' to lock classes.jhb2007-03-091-0/+33
| | | | | | | | | | | | | These functions are intended to be used to drop a lock and then reacquire it when doing an sleep such as msleep(9). Both functions accept a 'struct lock_object *' as their first parameter. The 'lc_unlock' function returns an integer that is then passed as the second paramter to the subsequent 'lc_lock' function. This can be used to communicate state. For example, sx locks and rwlocks use this to indicate if the lock was share/read locked vs exclusive/write locked. Currently, spin mutexes and lockmgr locks do not provide working lc_lock and lc_unlock functions.
* Use C99-style struct member initialization for lock classes.jhb2007-03-091-3/+3
|
* lock stats updates need to be protected by the lockkmacy2007-03-021-24/+3
|
* Evidently I've overestimated gcc's ability to peak inside inline functionskmacy2007-03-011-2/+8
| | | | | and optimize away unused stack values. The 48 bytes that the lock_profile_object adds to the stack evidently has a measurable performance impact on certain workloads.
* Further improvements to LOCK_PROFILING:kmacy2007-02-271-5/+20
| | | | | | | | | | | | - Fix missing initialization in kern_rwlock.c causing bogus times to be collected - Move updates to the lock hash to after the lock is released for spin mutexes, sleep mutexes, and sx locks - Add new kernel build option LOCK_PROFILE_FAST - only update lock profiling statistics when an acquisition is contended. This reduces the overhead of LOCK_PROFILING to increasing system time by 20%-25% which on "make -j8 kernel-toolchain" on a dual woodcrest is unmeasurable in terms of wall-clock time. Contrast this to enabling lock profiling without LOCK_PROFILE_FAST and I see a 5x-6x slowdown in wall-clock time.
* general LOCK_PROFILING cleanupkmacy2007-02-261-10/+6
| | | | | | | | | | | | - only collect timestamps when a lock is contested - this reduces the overhead of collecting profiles from 20x to 5x - remove unused function from subr_lock.c - generalize cnt_hold and cnt_lock statistics to be kept for all locks - NOTE: rwlock profiling generates invalid statistics (and most likely always has) someone familiar with that should review
* track lock class name in a way that doesn't break WITNESSkmacy2006-11-131-1/+1
|
* MUTEX_PROFILING has been generalized to LOCK_PROFILING. We now profilekmacy2006-11-111-1/+21
| | | | | | | | | | | wait (time waited to acquire) and hold times for *all* kernel locks. If the architecture has a system synchronized TSC, the profiling code will use that - thereby minimizing profiling overhead. Large chunks of profiling code have been moved out of line, the overhead measured on the T1 for when it is compiled in but not enabled is < 1%. Approved by: scottl (standing in for mentor rwatson) Reviewed by: des and jhb
* Add a new 'show sleepchain' ddb command similar to 'show lockchain' exceptjhb2006-08-151-1/+54
| | | | | | | | | | that it operates on lockmgr and sx locks. This can be useful for tracking down vnode deadlocks in VFS for example. Note that this command is a bit more fragile than 'show lockchain' as we have to poke around at the wait channel of a thread to see if it points to either a struct lock or a condition variable inside of a struct sx. If td_wchan points to something unmapped, then this command will terminate early due to a fault, but no harm will be done.
* Adjust td_locks for non-spin mutexes, rwlocks, and sx locks so that it isjhb2006-07-271-0/+6
| | | | | | | | a count of all non-spin locks, not just lockmgr locks. This can give us a much cheaper way to see if we have any locks held (such as when returning to userland via userret()) without requiring WITNESS. MFC after: 1 week
* Add a new file (kern/subr_lock.c) for holding code related to structjhb2006-01-171-17/+4
| | | | | | | | | | | | | lock_obj objects: - Add new lock_init() and lock_destroy() functions to setup and teardown lock_object objects including KTR logging and registering with WITNESS. - Move all the handling of LO_INITIALIZED out of witness and the various lock init functions into lock_init() and lock_destroy(). - Remove the constants for static indices into the lock_classes[] array and change the code outside of subr_lock.c to use LOCK_CLASS to compare against a known lock class. - Move the 'show lock' ddb function and lock_classes[] array out of kern_mutex.c over to subr_lock.c.
* Trim another pointer from struct lock_object (and thus from struct mtx andjhb2006-01-061-2/+2
| | | | | | | | | | | | | | struct sx). Instead of storing a direct pointer to a our lock_class struct in lock_object, reserve 4 bits in the lo_flags field to serve as an index into a global lock_classes array that contains pointers to the lock classes. Only debugging code such as WITNESS or INVARIANTS checks and KTR logging need to access the lock_class member, so this shouldn't add any overhead to production kernels. It might add some slight overhead to kernels using those debug options however. As with the previous set of changes to lock_object, this is going to completely obliterate the kernel ABI, so be sure to recompile all your modules.
OpenPOWER on IntegriCloud