summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_mutex.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Mark the thread pointer used during an adaptive spin volatile so that thejhb2006-04-141-1/+1
| | | | | | compiler doesn't decide to cache td_state. Cachine the state would cause the spinning thread to not notice when the owning thread stopped executing (if it was preempted for example) which could result in livelock.
* - Add support for having both a shared and exclusive queue of threads injhb2006-01-271-5/+6
| | | | | | | | | | | | | | | each turnstile. Also, allow for the owner thread pointer of a turnstile to be NULL. This is needed for the upcoming reader/writer lock implementation. - Add a new ddb command 'show turnstile' that will look up the turnstile associated with the given lock argument and display useful information like the list of threads blocked on each queue, etc. If there isn't an active turnstile for a lock at the specified address, then the function will see if there is an active turnstile at the specified address and display info about it if so. - Adjust the mutex code to handle the turnstile API changes. Tested on: i386 (all), alpha, amd64, sparc64 (1 and 3)
* Whitespace fix.jhb2006-01-241-1/+1
|
* Add a new file (kern/subr_lock.c) for holding code related to structjhb2006-01-171-56/+28
| | | | | | | | | | | | | lock_obj objects: - Add new lock_init() and lock_destroy() functions to setup and teardown lock_object objects including KTR logging and registering with WITNESS. - Move all the handling of LO_INITIALIZED out of witness and the various lock init functions into lock_init() and lock_destroy(). - Remove the constants for static indices into the lock_classes[] array and change the code outside of subr_lock.c to use LOCK_CLASS to compare against a known lock class. - Move the 'show lock' ddb function and lock_classes[] array out of kern_mutex.c over to subr_lock.c.
* Initialize thread0.td_contested in init_turnstiles() rather thanjhb2006-01-171-3/+0
| | | | mutex_init() as it is used by the turnstile code and is not mutex-specific.
* If destroying a spinlock, make sure that it is exited properly.scottl2006-01-081-0/+4
| | | | | Submitted by: jhb MFC After: 3 days
* Revert an untested local change that crept in with the lo_class changesjhb2006-01-071-4/+0
| | | | | | | and subsequently broke the build. This change is supposed to fix the case where doing a mtx_destroy() off a spin mutex while you hold it fails. If it had been tested I would just leave it in, but it hasn't been tested yet, so it will have to wait until later.
* Trying to fix compilation bustage introduced in rev1.160 by convertingavatar2006-01-071-1/+1
| | | | a missing lo_class to LO_CLASSINDEX().
* Trim another pointer from struct lock_object (and thus from struct mtx andjhb2006-01-061-15/+28
| | | | | | | | | | | | | | struct sx). Instead of storing a direct pointer to a our lock_class struct in lock_object, reserve 4 bits in the lo_flags field to serve as an index into a global lock_classes array that contains pointers to the lock classes. Only debugging code such as WITNESS or INVARIANTS checks and KTR logging need to access the lock_class member, so this shouldn't add any overhead to production kernels. It might add some slight overhead to kernels using those debug options however. As with the previous set of changes to lock_object, this is going to completely obliterate the kernel ABI, so be sure to recompile all your modules.
* Add a new 'show lock' command to ddb. If the argument has a valid lockjhb2005-12-131-2/+73
| | | | | | | | | | | | class, then it displays various information about the lock and calls a new function pointer in lock_class (lc_ddb_show) to dump class-specific information about the lock as well (such as the owner of a mutex or xlock'ed sx lock). This is easier than staring at hex dumps of locks to figure out who owns the lock, etc. Note that extending lock_class doesn't affect the ABI for any kernel modules as the only code that deals with lock_class structures directly is kern_mutex.c, kern_sx.c, and witness. MFC after: 1 week
* Move the initialization of the devmtx into the mutex_init() functionjhb2005-10-181-0/+3
| | | | | | | | | called during early init before cninit(). Tested on: i386, alpha, sparc64 Reviewed by: phk, imp Reported by: Divacky Roman xdivac02 at stud dot fit dot vutbr dot cz MFC after: 1 week
* - Add an assertion to panic if one tries to call mtx_trylock() on a spinjhb2005-09-021-1/+4
| | | | | | | mutex. - Don't panic if a spin lock is held too long inside _mtx_lock_spin() if panicstr is set (meaning that we are already in a panic). Just keep spinning forever instead.
* Ignore mutex asserts when we're dumping as well. This allows meps2005-07-301-1/+2
| | | | | to panic a system from DDB when INVARIANTS is compiled into the kernel on a scsi system.
* Convert the atomic_ptr() operations over to operating on uintptr_tjhb2005-07-151-10/+9
| | | | | | | | | | variables rather than void * variables. This makes it easier and simpler to get asm constraints and volatile keywords correct. MFC after: 3 days Tested on: i386, alpha, sparc64 Compiled on: ia64, powerpc, amd64 Kernel toolchain busted on: arm
* Add additional newline to debug.mutex.prof.stats header, so thatglebius2005-04-081-1/+1
| | | | column names are printed exactly above the columns.
* Divorce critical sections from spinlocks. Critical sections as denoted byjhb2005-04-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | critical_enter() and critical_exit() are now solely a mechanism for deferring kernel preemptions. They no longer have any affect on interrupts. This means that standalone critical sections are now very cheap as they are simply unlocked integer increments and decrements for the common case. Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter() and spinlock_exit(). This KPI is responsible for providing whatever MD guarantees are needed to ensure that a thread holding a spin lock won't be preempted by any other code that will try to lock the same lock. For now all archs continue to block interrupts in a "spinlock section" as they did formerly in all critical sections. Note that I've also taken this opportunity to push a few things into MD code rather than MI. For example, critical_fork_exit() no longer exists. Instead, MD code ensures that new threads have the correct state when they are created. Also, we no longer try to fixup the idlethreads for APs in MI code. Instead, each arch sets the initial curthread and adjusts the state of the idle thread it borrows in order to perform the initial context switch. This change is largely a big NOP, but the cleaner separation it provides will allow for more efficient alternative locking schemes in other parts of the kernel (bare critical sections rather than per-CPU spin mutexes for per-CPU data for example). Reviewed by: grehan, cognet, arch@, others Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
* Rework the optimization for spinlocks on UP to be slightly less drastic andjhb2005-01-051-8/+2
| | | | | | | | | | | | turn it back on. Specifically, the actual changes are now less intrusive in that the _get_spin_lock() and _rel_spin_lock() macros now have their contents changed for UP vs SMP kernels which centralizes the changes. Also, UP kernels do not use _mtx_lock_spin() and no longer include it. The UP versions of the spin lock functions do not use any atomic operations, but simple compares and stores which allow mtx_owned() to still work for spin locks while removing the overhead of atomic operations. Tested on: i386, alpha
* Refine the turnstile and sleep queue interfaces just a bit:jhb2004-10-121-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | - Add a new _lock() call to each API that locks the associated chain lock for a lock_object pointer or wait channel. The _lookup() functions now require that the chain lock be locked via _lock() when they are called. - Change sleepq_add(), turnstile_wait() and turnstile_claim() to lookup the associated queue structure internally via _lookup() rather than accepting a pointer from the caller. For turnstiles, this means that the actual lookup of the turnstile in the hash table is only done when the thread actually blocks rather than being done on each loop iteration in _mtx_lock_sleep(). For sleep queues, this means that sleepq_lookup() is no longer used outside of the sleep queue code except to implement an assertion in cv_destroy(). - Change sleepq_broadcast() and sleepq_signal() to require that the chain lock is already required. For condition variables, this lets the cv_broadcast() and cv_signal() functions lock the sleep queue chain lock while testing the waiters count. This means that the waiters count internal to condition variables is no longer protected by the interlock mutex and cv_broadcast() and cv_signal() now no longer require that the interlock be held when they are called. This lets consumers of condition variables drop the lock before waking other threads which can result in fewer context switches. MFC after: 1 month
* Force MUTEX_WAKE_ALL.ups2004-10-121-0/+9
| | | | | | | | | A race condition in single thread wakeup may break priority inheritance. Tested by: pho Reviewed by: jhb,julian Approved by: sam (mentor) MFC: ASAP
* Turn PREEMPTION into a kernel option. Make sure that it's defined ifscottl2004-09-021-0/+1
| | | | | | FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is enabled (code inspired by the PREEMPTION warning in kern_switch.c). This is a possible MT5 candidate.
* add options MPROF_BUFFERS and MPROF_HASH_SIZE that adjust the sizes ofjmg2004-08-191-0/+10
| | | | | the mutex profiling buffers. Document them in the man page and in NOTES. Ensure _HASH_SIZE is larger than _BUFFERS with a cpp error.
* Cache the value of curthread in the _get_sleep_lock() and _get_spin_lock()jhb2004-08-041-4/+5
| | | | | | | | | macros and pass the value to the associated _mtx_*() functions to avoid more curthread dereferences in the function implementations. This provided a very modest perf improvement in some benchmarks. Suggested by: rwatson Tested by: scottl
* Instead of calling ia32_pause() conditionally on __i386__ or __amd64__mux2004-08-031-15/+5
| | | | | | | | | being defined, define and use a new MD macro, cpu_spinwait(). It only expands to something on i386 and amd64, so the compiled code should be identical. Name of the macro found by: jhb Reviewed by: jhb
* Add "options ADAPTIVE_GIANT" which causes Giant to also be treated inrwatson2004-07-271-0/+4
| | | | | | | | | | an adaptive fashion when adaptive mutexes are enabled. The theory behind non-adaptive Giant is that Giant will be held for long periods of time, and therefore spinning waiting on it is wasteful. However, in MySQL benchmarks which are relatively Giant-free, running Giant adaptive makes an observable difference on SMP (5% transaction rate improvement). As such, make adaptive behavior on Giant an option so it can be more widely benchmarked.
* #ifdef __i386__ -> __i386__ || __amd64__peter2004-07-201-5/+5
|
* Now we have NO_ADAPTIVE_MUTEXES option, so use it here too.pjd2004-07-181-1/+1
| | | | Missed by: scottl
* Enable ADAPTIVE_MUTEXES by default by changing the sense of the option toscottl2004-07-181-3/+3
| | | | | | | | | NO_ADAPTIVE_MUTEXES. This option has been enabled by default on amd64 for quite some time, and has been extensively tested on i386 and sparc64. It shows measurable performance gains in many circumstances, and few negative effects. It would be nice in t he future if adaptive mutexes actually went to sleep after a certain amount of spinning, but that will require quite a bit more testing.
* Update for the KDB framework:marcel2004-07-101-5/+2
| | | | | | | | | | | | | | | | | | | | | | o Make debugging code conditional upon KDB instead of DDB. o Call kdb_enter() instead of Debugger(). o Call kdb_backtrace() instead of db_print_backtrace() or backtrace(). kern_mutex.c: o Replace checks for db_active with checks for kdb_active and make them unconditional. kern_shutdown.c: o s/DDB_UNATTENDED/KDB_UNATTENDED/g o s/DDB_TRACE/KDB_TRACE/g o Save the TID of the thread doing the kernel dump so the debugger knows which thread to select as the current when debugging the kernel core file. o Clear kdb_active instead of db_active and do so unconditionally. o Remove backtrace() implementation. kern_synch.c: o Call kdb_reenter() instead of db_error().
* Implement preemption of kernel threads natively in the scheduler ratherjhb2004-07-021-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)
* - Change mi_switch() and sched_switch() to accept an optional thread tojhb2004-07-021-1/+1
| | | | | | | | | | | | | switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.
* Add a new kernel option MUTEX_WAKE_ALL that changes the mutex unlock codejhb2004-04-061-0/+10
| | | | | | | | | | | | | to awaken all waiters when a contested mutex is released instead of just the highest priority waiter. If the various threads are awakened in sequence then each thread may acquire and release the lock in question without contention resulting in fewer expensive unlock and lock operations. This old behavior of waking just the highest priority is still used if this option is specified. Making the algorithm conditional on a kernel option will allows us to benchmark both cases later and determine which one should be used by default. Requested by: tanimura-san
* Add a reset sysctl for mutex profiling: zeros all of the mutexrwatson2004-01-281-0/+27
| | | | | | | | profiling buffers and hash table. This makes it a lot easier to do multiple profiling runs without rebooting or performing gratuitous arithmetic. Sysctl is named debug.mutex.prof.reset. Reviewed by: jake
* Rework witness_lock() to make it slightly more useful and flexible.jhb2004-01-281-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | - witness_lock() is split into two pieces: witness_checkorder() and witness_lock(). Witness_checkorder() determines if acquiring a specified lock at the time it is called would result in a lock order. It optionally adds a new lock order relationship as well. witness_lock() updates witness's data structures to assume that a lock has been acquired by stick a new lock instance in the appropriate lock instance list. - The mutex and sx lock functions now call checkorder() prior to trying to acquire a lock and continue to call witness_lock() after the acquire is completed. This will let witness catch a deadlock before it happens rather than trying to do so after the threads have deadlocked (i.e. never actually report it). - A new function witness_defineorder() has been added that adds a lock order between two locks at runtime without having to acquire the locks. If the lock order cannot be added it will return an error. This function is available to programmers via the WITNESS_DEFINEORDER() macro which accepts either two mutexes or two sx locks as its arguments. - A few simple wrapper macros were added to allow developers to call witness_checkorder() anywhere as a way of enforcing locking assertions in code that might acquire a certain lock in some situations. The macros are: witness_check_{mutex,shared_sx,exclusive_sx} and take an appropriate lock as the sole argument. - The code to remove a lock instance from a lock list in witness_unlock() was unnested by using a goto to vastly improve the readability of this function.
* - Add a flags parameter to mi_switch. The value of flags may be SW_VOL orjeff2004-01-251-2/+1
| | | | | | | | | | SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.
* Add some basic support for measuring sleep mutex contention to therwatson2004-01-251-5/+33
| | | | | | | | | | | | | | | | | | | | | | mutex profiling code. As with existing mutex profiling, measurement is done with respect to mtx_lock() instances in the code, as opposed to specific mutexes. In particular, measure two things: (1) Lock contention. How often did this mtx_lock() call get made and have to sleep (or almost sleep) waiting for the lock. This helps identify the "victims" of contention. (2) Hold contention. How often, while the lock was held by a thread as a result of this mtx_lock(), did another thread try to acquire the same mutex. This helps identify the causes of contention. I'm currently exploring adding measurement of "time waited for the lock", but the current implementation has proven useful to me so far so I figured I'd commit it so others could try it out. Note that this increases the size of mutexes when MUTEX_PROFILING is enabled, so you might find you need to further bump UMA_BOOT_PAGES. Fixes welcome. The once over: des, others
* - Allow mtx_trylock() to recurse on a recursive mutex. Attempts to recursejhb2004-01-051-5/+11
| | | | | | | | on a non-recursive mutex will fail but will not trigger any assertions. - Add an assertion to mtx_lock() that one never recurses on a non-recursive mutex. This is mostly useful for the non-WITNESS case. Requested by: deischen, julian, others (1)
* Add an implementation of turnstiles and change the sleep mutex code to usejhb2003-11-111-225/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | turnstiles to implement blocking isntead of implementing a thread queue directly. These turnstiles are somewhat similar to those used in Solaris 7 as described in Solaris Internals but are also different. Turnstiles do not come out of a fixed-sized pool. Rather, each thread is assigned a turnstile when it is created that it frees when it is destroyed. When a thread blocks on a lock, it donates its turnstile to that lock to serve as queue of blocked threads. The queue associated with a given lock is found by a lookup in a simple hash table. The turnstile itself is protected by a lock associated with its entry in the hash table. This means that sched_lock is no longer needed to contest on a mutex. Instead, sched_lock is only used when manipulating run queues or thread priorities. Turnstiles also implement priority propagation inherently. Currently turnstiles only support mutexes. Eventually, however, turnstiles may grow two queue's to support a non-sleepable reader/writer lock implementation. For more details, see the comments in sys/turnstile.h and kern/subr_turnstile.c. The two primary advantages from the turnstile code include: 1) the size of struct mutex shrinks by four pointers as it no longer stores the thread queue linkages directly, and 2) less contention on sched_lock in SMP systems including the ability for multiple CPUs to contend on different locks simultaneously (not that this last detail is necessarily that much of a big win). Note that 1) means that this commit is a kernel ABI breaker, so don't mix old modules with a new kernel and vice versa. Tested on: i386 SMP, sparc64 SMP, alpha SMP
* If a spin lock is held for too long and WITNESS is enabled, then calljhb2003-07-311-3/+9
| | | | | witness_display_spinlock() to see if we can find out where the current owner of the spin lock last acquired the lock.
* When complaining about a sleeping thread owning a mutex, display thejhb2003-07-301-1/+3
| | | | | | | thread's pid to make debugging easier for people who don't want to have to use the intended tool for these panics (witness). Indirectly prodded by: kris
* - Add comments about the maintenance of the per-thread list of contestedjhb2003-07-021-4/+9
| | | | | | | | | | | | | locks held by each thread. - Fix a bug in the original BSD/OS code where a contested lock was not properly handed off from the old thread to the new thread when a contested lock with more than one blocked thread was transferred from one thread to another. - Don't use an atomic operation to write the MTX_CONTESTED value to mtx_lock in the aforementioned special case. The memory barriers and exclusion provided by sched_lock are sufficient. Spotted by: alc (2)
* Use __FBSDID().obrien2003-06-111-1/+3
|
* Add "" around mutex name to make message less confusing.phk2003-05-311-1/+1
|
* Use TD_IS_RUNNING() instead of thread_running() in the adaptive mutexjhb2003-04-171-7/+2
| | | | code.
* Move the _oncpu entry from the KSE to the thread.julian2003-04-101-1/+2
| | | | | The entry in the KSE still exists but it's purpose will change a bit when we add the ability to lock a KSE to a cpu.
* Remove unused mtx_lock_giant(), mtx_unlock_giant(), related globalstjr2003-03-231-43/+0
| | | | and sysctls.
* Including <sys/stdint.h> is (almost?) universally only to be able to usephk2003-03-181-1/+0
| | | | | %j in printfs, so put a newsted include in <sys/systm.h> where the printf prototype lives and save everybody else the trouble.
* Axe the useless MTX_SLEEPABLE flag. mutexes are not sleepable locks.jhb2003-03-111-3/+1
| | | | | Nothing used this flag and WITNESS would have panic'd during mtx_init() if anything had.
* Remove safety belt: it is now ok to do a mtx_trylock() on a mutex youjhb2003-03-041-5/+4
| | | | | | | already own. The mtx_trylock() will fail however. Enhance the comment at the top of the try lock function to explain this. Requested by: jlemon and his evil netisr locking
* Miscellaneous cleanups to _mtx_lock_sleep():jhb2003-03-041-4/+6
| | | | | | | | - Declare some local variables at the top of the function instead of in a nested block. - Use mtx_owned() instead of masking off bits from mtx_lock manually. - Read the value of mtx_lock into 'v' as a separate line rather than inside an if statement for clarity. This code is hairy enough as it is.
* Properly assert that mtx_trylock() is not called on a mutex we alreadyjhb2003-03-041-8/+4
| | | | | | | owned. Previously the KASSERT would only trigger if we successfully acquired a lock that we already held. However, _obtain_lock() fails to acquire locks that we already hold, so the KASSERT was never checked in the case it was supposed to fail.
OpenPOWER on IntegriCloud