summaryrefslogtreecommitdiffstats
path: root/sys/kern/subr_witness.c
Commit message (Collapse)AuthorAgeFilesLines
* Remove a bogus assertion.jhb2004-02-031-1/+0
| | | | | Noticed by: bde Pointy hat to: jhb
* - Assert that witness_cold is not true in enroll().jhb2004-02-021-1/+2
| | | | | | - Only check witness_watch once in enroll(). Reported by: ru (2)
* Rework witness_lock() to make it slightly more useful and flexible.jhb2004-01-281-108/+180
| | | | | | | | | | | | | | | | | | | | | | | | | | | - witness_lock() is split into two pieces: witness_checkorder() and witness_lock(). Witness_checkorder() determines if acquiring a specified lock at the time it is called would result in a lock order. It optionally adds a new lock order relationship as well. witness_lock() updates witness's data structures to assume that a lock has been acquired by stick a new lock instance in the appropriate lock instance list. - The mutex and sx lock functions now call checkorder() prior to trying to acquire a lock and continue to call witness_lock() after the acquire is completed. This will let witness catch a deadlock before it happens rather than trying to do so after the threads have deadlocked (i.e. never actually report it). - A new function witness_defineorder() has been added that adds a lock order between two locks at runtime without having to acquire the locks. If the lock order cannot be added it will return an error. This function is available to programmers via the WITNESS_DEFINEORDER() macro which accepts either two mutexes or two sx locks as its arguments. - A few simple wrapper macros were added to allow developers to call witness_checkorder() anywhere as a way of enforcing locking assertions in code that might acquire a certain lock in some situations. The macros are: witness_check_{mutex,shared_sx,exclusive_sx} and take an appropriate lock as the sole argument. - The code to remove a lock instance from a lock list in witness_unlock() was unnested by using a goto to vastly improve the readability of this function.
* Register the uart(4)'s spin lock with witness(4).ru2004-01-251-0/+1
|
* Fix a major faux pas of mine. I was causing 2 very bad things tomarkm2003-11-201-0/+2
| | | | | | | | | | | | | | | happen in interrupt context; 1) sleep locks, and 2) malloc/free calls. 1) is fixed by using spin locks instead. 2) is fixed by preallocating a FIFO (implemented with a STAILQ) and using elements from this FIFO instead. This turns out to be rather fast. OK'ed by: re (scottl) Thanks to: peter, jhb, rwatson, jake Apologies to: *
* Initial landing of SMP support for FreeBSD/amd64.peter2003-11-171-1/+1
| | | | | | | | | | | | | | | | - This is heavily derived from John Baldwin's apic/pci cleanup on i386. - I have completely rewritten or drastically cleaned up some other parts. (in particular, bootstrap) - This is still a WIP. It seems that there are some highly bogus bioses on nVidia nForce3-150 boards. I can't stress how broken these boards are. I have a workaround in mind, but right now the Asus SK8N is broken. The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed. - Most of my testing has been with SCHED_ULE. SCHED_4BSD works. - the apic and acpi components are 'standard'. - If you have an nVidia nForce3-150 board, you are stuck with 'device atpic' in addition, because they somehow managed to forget to connect the 8254 timer to the apic, even though its in the same silicon! ARGH! This directly violates the ACPI spec.
* Localized the cy driver's locking.bde2003-11-161-3/+0
|
* Add an implementation of turnstiles and change the sleep mutex code to usejhb2003-11-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | turnstiles to implement blocking isntead of implementing a thread queue directly. These turnstiles are somewhat similar to those used in Solaris 7 as described in Solaris Internals but are also different. Turnstiles do not come out of a fixed-sized pool. Rather, each thread is assigned a turnstile when it is created that it frees when it is destroyed. When a thread blocks on a lock, it donates its turnstile to that lock to serve as queue of blocked threads. The queue associated with a given lock is found by a lookup in a simple hash table. The turnstile itself is protected by a lock associated with its entry in the hash table. This means that sched_lock is no longer needed to contest on a mutex. Instead, sched_lock is only used when manipulating run queues or thread priorities. Turnstiles also implement priority propagation inherently. Currently turnstiles only support mutexes. Eventually, however, turnstiles may grow two queue's to support a non-sleepable reader/writer lock implementation. For more details, see the comments in sys/turnstile.h and kern/subr_turnstile.c. The two primary advantages from the turnstile code include: 1) the size of struct mutex shrinks by four pointers as it no longer stores the thread queue linkages directly, and 2) less contention on sched_lock in SMP systems including the ability for multiple CPUs to contend on different locks simultaneously (not that this last detail is necessarily that much of a big win). Note that 1) means that this commit is a kernel ABI breaker, so don't mix old modules with a new kernel and vice versa. Tested on: i386 SMP, sparc64 SMP, alpha SMP
* Update spin lock order list for new i386 interrupt and SMP code.jhb2003-11-031-3/+2
|
* Change all SYSCTLS which are readonly and have a related TUNABLEsilby2003-10-211-1/+1
| | | | | from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide more useful error messages.
* add fast swi taskqueue spinlock to the order_list so witness doesn't complainsam2003-09-061-0/+1
| | | | Submitted by: Tor Egge <Tor.Egge@cvsup.no.freebsd.org>
* Insert cosmetic spaces.jhb2003-08-041-2/+2
| | | | Reported by: kris
* Add a new function to look for a spinlock's instance when it is held byjhb2003-07-311-0/+21
| | | | | | | another thread. We use the td_oncpu member of the other field to locate it's associated CPU and then search the that CPU's list of spin locks contained in its per-CPU data. This is not always safe and may in fact panic or just not work, but it is useful in at least one case.
* unifdef -DLAZY_SWITCH and start to tidy up the associated glue.peter2003-07-101-4/+1
|
* Use __FBSDID().obrien2003-06-111-1/+3
|
* Remove return after panic.phk2003-05-311-1/+0
| | | | Found by: FlexeLint
* Add __amd64__ to the ifdefs that introduce the "pcicfg" spinlock topeter2003-05-311-1/+1
| | | | | | witness. Approved by: re (safe amd64 support)
* Move the _oncpu entry from the KSE to the thread.julian2003-04-101-1/+1
| | | | | The entry in the KSE still exists but it's purpose will change a bit when we add the ability to lock a KSE to a cpu.
* o In struct prison, add an allprison linked list of prisons (protectedmike2003-04-091-0/+1
| | | | | | | | | | | | | | | by allprison_mtx), a unique prison/jail identifier field, two path fields (pr_path for reporting and pr_root vnode instance) to store the chroot() point of each jail. o Add jail_attach(2) to allow a process to bind to an existing jail. o Add change_root() to perform the chroot operation on a specified vnode. o Generalize change_dir() to accept a vnode, and move namei() calls to callers of change_dir(). o Add a new sysctl (security.jail.list) which is a group of struct xprison instances that represent a snapshot of active jails. Reviewed by: rwatson, tjr
* Commit a partial lazy thread switch mechanism for i386. it isn't as lazypeter2003-04-021-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | as it could be and can do with some more cleanup. Currently its under options LAZY_SWITCH. What this does is avoid %cr3 reloads for short context switches that do not involve another user process. ie: we can take an interrupt, switch to a kthread and return to the user without explicitly flushing the tlb. However, this isn't as exciting as it could be, the interrupt overhead is still high and too much blocks on Giant still. There are some debug sysctls, for stats and for an on/off switch. The main problem with doing this has been "what if the process that you're running on exits while we're borrowing its address space?" - in this case we use an IPI to give it a kick when we're about to reclaim the pmap. Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a few more things and get some more feedback before turning it on by default. This is NOT a replacement for Bosko's lazy interrupt stuff. This was more meant for the kthread case, while his was for interrupts. Mine helps a little for interrupts, but his helps a lot more. The stats are enabled with options SWTCH_OPTIM_STATS - this has been a pseudo-option for years, I just added a bunch of stuff to it. One non-trivial change was to select a new thread before calling cpu_switch() in the first place. This allows us to catch the silly case of doing a cpu_switch() to the current process. This happens uncomfortably often. This simplifies a bit of the asm code in cpu_switch (no longer have to call choosethread() in the middle). This has been implemented on i386 and (thanks to jake) sparc64. The others will come soon. This is actually seperate to the lazy switch stuff. Glanced at by: jake, jhb
* - Remove witness_dead and just use witness_watch instead. If witness_watchjhb2003-03-241-21/+51
| | | | | | | | | | is set to 0, it now has the same affect as setting witness_dead used to have. - Added a sysctl handler that allows root to change witness_watch from a non-zero value to zero to disable witness at runtime. Note that you can't turn witness back on once it is off. You can only turn it off as a one-way switch. - Added a comment describing the possible values of witness_watch.
* Trim an extra blank line that snuck into the last commit.jhb2003-03-111-1/+0
|
* - Change witness_displaydescendants() to accept the indentation level asjhb2003-03-111-21/+25
| | | | | | | | | | | | | | | | | | | | a parameter instead of using the level of a given witness. When recursing, pass an indent level of indent + 1. - Make use of the information witness_levelall() provides in witness_display_list() to use an O(n) algorithm instead of an O(n^2) algo to decide which witnesses to display hierarchies from. Basically, we only display a hierarchy for witnesses with a level of 0. - Add a new per-witness flag that is reset at the start of witness_display() for all witness's and is set the first time a witness is displayed in witness_displaydescendants(). If a witness is encountered more than once in the lock order tree (which happens often), witness_displaydescendants() marks the later occurrences with the string "(already displayed)" and doesn't display the subtree under that witness. This avoids duplicating large amounts of the lock order tree in the 'show witness' output in DDB. All these changes serve to make 'show witness' a lot more readable and useful than it was previously.
* - Split the itismychild() function into two functions: insertchild()jhb2003-03-111-42/+144
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | adds a witness to the child list of a parent witness. rebalancetree() runs through the entire tree removing direct descendants of witnesses who already have said child witness as an indirect descendant through another direct descendant. itismychild() now calls insertchild() followed by rebalancetree() and no longer needs the evil hack of having static recursed variable. - Add a function reparentchildren() that adds all the direct descendants of one witness as direct descendants of another witness. - Change the return value of itismychild() and similar functions so that they return 0 in the case of failure due to lack of resources instead of 1. This makes the return value more intuitive. - Check the return value of itismychild() when defining the static lock order in witness_initialize(). - Don't try to setup a lock instance in witness_lock() if itismychild() fails. Witness is hosed anyways so no need to do any more witness related activity at that point. It also makes the code flow easier to understand. - Add a new depart() function as the opposite of enroll(). When the reference count of a witness drops to 0 in witness_destroy(), this function is called on that witness. First, it runs through the lock order tree using reparentchildren() to reparent direct descendants of the departing witness to each of the witness' parents in the tree. Next, it releases it's own child list and other associated resources. Finally it calls rebalanacetree() to rebalance the lock order tree. - Sort function prototypes into something closer to alphabetical order. As a result of these changes, there should no longer be 'dead' witnesses in the order tree, and repeatedly loading and unloading a module should no longer exhaust witness of its internal resources. Inspired by: gallatin
* Trim useless "../" leading strings from filenames passed into witness.jhb2003-03-111-0/+18
|
* Adjust style of #ifdef's and #endif's to be more consistent and in linejhb2003-03-111-7/+7
| | | | with recent additions to style(9).
* Do the lock order check skip for the LOP_TRYLOCK case after the check forjhb2003-03-111-8/+8
| | | | | | | | recursing on a lock instead of before. This fixes a bug where WITNESS could get a little confused if you did an sx_tryslock() on a sx lock that you already had an slock on. WITNESS would still function correctly but it could result in weirdness in the output of 'show locks'. This also makes it possible for mtx_trylock() to recurse on a lock.
* Now that we have WITNESS_WARN(), we only call witness_list() from thejhb2003-03-101-41/+29
| | | | | | | | | ddb 'show locks' command. Thus, move witness_list() to the #ifdef DDB section and remove extra checks for calling this function outside of DDB. Also, witness_list() now returns void instead of returning an int. Reported by: Steve Ames <steve@energistic.com> Prodded by: davidxu
* Oops, fix the double faults people were seeing with the recent changes tojhb2003-03-061-1/+1
| | | | | | | | | witness. Sleepable locks such as sx locks always come before all mutexes including Giant. However, the static lock order list placed Giant before the proctree and allproc sx locks. This resulted in witness creating a cycle in its lock order "tree" (real trees don't have cycles) leading to infinite recursion and eventually a double fault. To fix, put Giant after sx locks in the lock order list.
* Bah, fix a bogon in the last commit: get the sense of a compare test rightjhb2003-03-041-1/+1
| | | | | so that we allow a sleepable lock to be acquired with Giant held rather than allowing a sleepable lock to be acquired with anything but Giant held.
* A small overhaul of witness:jhb2003-03-041-56/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Add a comment about special lock order rules and Giant near the top of subr_witness.c. Specifically, this documents and explains the real lock order relationship between Giant and sleepable locks (i.e. lockmgr locks and sx locks). Basically, Giant can be safely acquired either before or after sleepable locks and the case of Giant before a sleepable lock is exempted as a special case. - Add a new static function 'witness_list_lock()' that displays a single line of information about a struct lock_instance. This is used to make the output of witness messages more consistent and reduce some code duplication. - Fixup a few comments in witness_lock(). - Properly handle the Giant-before-sleepable-lock lock order exception in a more general fashion and remove the no longer needed LI_SLEPT flag. - Break up the last condition before assuming a reversal a bit to try and make the logic less confusing in witness_lock(). - Axe WITNESS_SLEEP() now that LI_SLEPT is no longer needed and replace it with a more general WITNESS_WARN() macro/function combination. WITNESS_WARN() allows you to output a customized message out to the console along with a list of held locks. It will optionally drop into the debugger as well. You can exempt a single lock from the check by passing it in as the second argument. You can also use flags to specify if Giant should be exempt from the check, if all sleepable locks should be exempt from the check, and if witness should panic if any non-exempt locks are found. - Make the witness_list() function static. Other areas of the kernel should use the new WITNESS_WARN() instead.
* Initiate de-orbit burn for USE_PCI_BIOS_FOR_READ_WRITE. This has beenpeter2003-02-181-0/+3
| | | | | | | | | | | | | | #if'ed out for a while. Complete the deed and tidy up some other bits. We need to be able to call this stuff from outer edges of interrupt handlers for devices that have the ISR bits in pci config space. Making the bios code mpsafe was just too hairy. We had also stubbed it out some time ago due to there simply being too much brokenness in too many systems. This adds a leaf lock so that it is safe to use pci_read_config() and pci_write_config() from interrupt handlers. We still will use pcibios to do interrupt routing if there is no acpi.. [yes, I tested this] Briefly glanced at by: imp
* - Split the struct kse into struct upcall and struct kse. struct kse willjeff2003-02-171-1/+1
| | | | | | | soon be visible only to schedulers. This greatly simplifies much the KSE code. Submitted by: davidxu
* Add a 'debug.witness_trace' sysctl (and tunable) when DDB is present.peter2003-02-131-2/+16
| | | | This causes LOR and could-sleep messages to come with a stack trace.
* Reversion of commit by Davidxu plus fixes since applied.julian2003-02-011-1/+1
| | | | | | | | I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.
* Move UPCALL related data structure out of kse, introduce a newdavidxu2003-01-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian
* Oops, add zstty to the witness order list.jake2003-01-091-0/+1
| | | | Noticed by: benno
* - Add a spin lock to single thread cache invalidation and tlb flush ipis,jake2002-12-221-0/+3
| | | | | which allows ipis to be sent outside of Giant. - Remove the ap boot mutex, which is unused.
* Enforce correct ordering of the filedesc structure and pipe mutex, becausekris2002-12-221-0/+2
| | | | | | WITNESS can get the order wrong if it guesses based on first use. Reviewed by: jhb, alfred
* Correct an assertion in the code to traverse the list of locks to find anjhb2002-11-111-1/+1
| | | | | | | | | | earlier acquired lock with the same witness as the lock currently being acquired. If we had released several earlier acquired locks after acquiring enough locks to require another lock_list_entry bucket in the lock list, then subsequent lock_list_entry buckets could contain only one lock instance in which case i would be zero. Reported by: Joel M. Baldwin <qumqats@outel.org>
* Catch up with the removal of the vm page buckets spin mutex.alc2002-11-021-1/+0
|
* #unifdef the code for checking blessed lock collisions until we need it.phk2002-10-201-0/+13
| | | | Spotted by: DARPA & NAI Labs.
* Register the machine check private state spinlock on ia64.peter2002-10-121-0/+3
|
* Be consistent about "static" functions: if the function is markedphk2002-09-281-1/+1
| | | | | | static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512
* - Tell witness about ALQ's spin lock.jeff2002-09-221-0/+1
|
* Make this driver work a whole lot better.jake2002-09-081-0/+1
| | | | | | | | | | | | | - Get the initial mode from the prom settings and don't clobber the mode on open. - Copy output into an internal ring buffer instead of accessing the tty outq directly in the interrupt handler. This fixes a problem where garbage would show up in the output stream. - Reset the console port completely and reprogram all the parameters before enabling it. This fixes seemingly random hangs on startup when using a fast interrupt handler. - Add minimal locking in place of spls. - Remove dead code and minor cleanups.
* Add WITNESS_FILE() and WITNESS_LINE(), which allow users of witnessiedowse2002-08-261-0/+22
| | | | | | | to print out the file and line from the lock object. These will be used shortly by CTR() calls in the mutex code. Reviewed by: jhb, jake
* Silence compiler warnings when DDB is not defined.mp2002-07-151-3/+7
| | | | | PR: 36002 Submitted by: Yoshikazu GOTO <goto@snowy.to>
* Revive backed out pmap related changes from Feb 2002. The highlights are:peter2002-07-121-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - It actually works this time, honest! - Fine grained TLB shootdowns for SMP on i386. IPI's are very expensive, so try and optimize things where possible. - Introduce ranged shootdowns that can be done as a single IPI. - PG_G support for i386 - Specific-cpu targeted shootdowns. For example, there is no sense in globally purging the TLB cache for where we are stealing a page from the local unshared process on the local cpu. Use pm_active to track this. - Add some instrumentation for the tlb shootdown code. - Rip out SMP code from <machine/cpufunc.h> - Try and fix some very bogus PG_G and PG_PS interactions that were bad enough to cause vm86 bios calls to break. vm86 depended on our existing bugs and this was the cause of the VESA panics last time. - Fix the silly one-line error that caused the 'panic: bad pte' last time. - Fix a couple of other silly one-line errors that should have caused more pain than they did. Some more work is needed: - pmap_{zero,copy}_page[_idle]. These can be done without IPI's if we have a hook in cpu_switch. - The IPI handlers need some cleanup. I have a bogus %ds load that can be avoided. - APTD handling is rather bogus and appears to be a large source of global TLB IPI shootdowns for no really good reason. I see speedups of between 1.5% and ~4% on buildworlds in a while 1 loop. I expect to see a bigger difference when there is significant pageout activity or the system otherwise has memory shortages. I have backed out a few optimizations that I had been using over the last few days in order to be a little more conservative. I'll revisit these again over the next few days as the dust settles. New option: DISABLE_PG_G - In case I missed something.
* o Resurrect vm_page_lock_queues(), vm_page_unlock_queues(), and the freealc2002-07-041-0/+1
| | | | | | queue lock (revision 1.33 of vm/vm_page.c removed them). o Make the free queue lock a spin lock because it's sometimes acquired inside of a critical section.
OpenPOWER on IntegriCloud