summaryrefslogtreecommitdiffstats
path: root/sys/kern/subr_trap.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Make KSE a kernel option, turned on by default in all GENERICjb2006-10-261-0/+12
| | | | | | | kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
* Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.hrwatson2006-10-221-1/+2
| | | | | | | | | | | | | begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
* kern_intr.c:bde2006-10-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | - Count (scheduling of) software interrupts (SWIs) as SWIs, not as hardware interrupts. - Don't count (scheduling of) delayed SWIs as interrupts at all, since in the delayed case it is expected that there are many more scheduling calls than handling calls. Perhaps all interrupts should be counted only when they are handled, but it is only counts of delayed SWIs that shouldn never be combined with the other counts. subr_trap.c: - Count (handling of) Asynchronous System Traps (ASTs) as traps, not as software interrupts. Before these changes, the counter for SWIs only counted ASTs, and SWIs weren't counted separately, but a subcounter for ASTs alone is less needed than for most other exception sources. 4.4BSD-Lite uses the counters for similar things (actually matching their names) on its main arches (hp300, ..., !i386) where more of the exceptions are in hardware.
* Test before modifying p_sflag to avoid unconditionally cache linedavidxu2006-02-101-2/+4
| | | | ping-pong on SMP.
* Simplify system time accounting for profiling.phk2006-02-081-10/+4
| | | | | | | | | | Rename struct thread's td_sticks to td_pticks, we will need the other name for more appropriately named use shortly. Reduce it from uint64_t to u_int. Clear td_pticks whenever we enter the kernel instead of recording its value as reference for userret(). Use the absolute value of td->pticks in userret() and eliminate third argument.
* Modify the way we account for CPU time spent (step 1)phk2006-02-071-1/+1
| | | | | | | | | | | | | | | | Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.
* Moderate rewrite of kernel ktrace code to attempt to generally improverwatson2005-11-131-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>
* 1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, mostdavidxu2005-10-141-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code. 2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread. 3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc. 4. Add td_sigqueue to thread structure to hold all signals delivered to thread. 5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed. 6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals. Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64
* - Rev 1.83 of kern_lock.c fixes the td_locks assert, reenable it here.jeff2005-03-281-3/+0
| | | | Sponsored by: Isilon Systems, Inc.
* - The td_locks check is currently broken with snapshots and possiblyjeff2005-03-251-0/+3
| | | | | | | some case in unmount. Disable the KASSERT until these problems can be diagnosed. Sponsored by: Isilon Systems, Inc.
* - Fail an assert if we attempt to return with any lockmgr locks held injeff2005-03-241-0/+2
| | | | | | userret(). Sponsored by: Isilon Systems, Inc.
* Whitespace fix.jhb2004-12-301-0/+1
|
* - Run sched_userret() after thread_userret(). Before, sched_userret() wouldjeff2004-12-261-5/+4
| | | | | | | | | lower the priority of the returning thread to a user priority before calling into thread_userret() which would call wakeup() which in turn would cause the returning thread to eventually context switch rather than completing its slice. Allowing this thread to complete its slice first yields a 15% performance improvement in super-smack on my dual opteron with 4BSD.
* Add a new per-thread private flag: TDP_GEOM.phk2004-10-231-0/+7
| | | | | | | | | | | | | | This flag gets set whenever the thread posts an event on the GEOM event queue, and if the flag is set when the thread is prepared to return to userland from the kernel, g_waitidle() will be called to make sure that the posted events have completed. This can replace an insufficient number of g_waitidle() calls in various other places, and has the advantage of being failsafe: Any system call which does a VOP_OPEN()/VOP_CLOSE will now correctly wait for any geom events it posted as part of spoils or tastes. Assert that topology and Giant is not held in g_waitidle().
* Rework how we store process times in the kernel such that we always storejhb2004-10-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month
* Don't try to protect td_sticks with sched_lock. It doesn't need it as itjhb2004-09-231-3/+1
| | | | is only accessed by curthread.
* Various small style fixes.jhb2004-09-221-1/+2
|
* Remove an unneeded argument..julian2004-08-311-1/+1
| | | | | | | | | The removed argument could trivially be derived from the remaining one. That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument. Having both proc and thread as an argumen tjust gives an opportunity for them to get out sync. MFC after: 3 days
* Remove sched_free_thread() which was only usedjulian2004-08-311-3/+0
| | | | | | | | in diagnostics. It has outlived its usefulness and has started causing panics for people who turn on DIAGNOSTIC, in what is otherwise good code. MFC after: 2 days
* Call thread_user_enter for M:N thread, ast() should be treated as anotherdavidxu2004-08-081-0/+2
| | | | entrance of kernel.
* - Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflagsjhb2004-07-161-5/+5
| | | | | | | | | since they are only accessed by curthread and thus do not need any locking. - Move pr_addr and pr_ticks out of struct uprof (which is per-process) and directly into struct thread as td_profil_addr and td_profil_ticks as these variables are really per-thread. (They are used to defer an addupc_intr() that was too "hard" until ast()).
* - Change mi_switch() and sched_switch() to accept an optional thread tojhb2004-07-021-1/+1
| | | | | | | | | | | | | switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.
* Tidy up uprof locking. Mostly the fields are protected by both the procjhb2004-07-021-8/+6
| | | | | | | lock and sched_lock so they can be read with either lock held. Document the locking as well. The one remaining bogosity is that pr_addr and pr_ticks should be per-thread but profiling of multithreaded apps is currently undefined.
* Remove unused variable.julian2004-03-311-2/+0
|
* Push Giant down a little further:peter2004-03-131-2/+1
| | | | | | | | | | | | | | | - no longer serialize on Giant for thread_single*() and family in fork, exit and exec - thread_wait() is mpsafe, assert no Giant - reduce scope of Giant in exit to not cover thread_wait and just do vm_waitproc(). - assert that thread_single() family are not called with Giant - remove the DROP/PICKUP_GIANT macros from thread_single() family - assert that thread_suspend_check() s not called with Giant - remove manual drop_giant hack in thread_suspend_check since we know it isn't held. - remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family - mark kse_create() mpsafe
* Put "failed to set signal flags properly for ast()" check underrwatson2004-03-051-1/+1
| | | | | | | | | | DIAGNOSTIC instead of INVARIANTS. INVARIANTS is intended for tests that don't substantially change code flow or behavior (passive), but this test required locking both the proc lock and scheduler lock in order to execute. It also appears to be a very advisory diagnostic as opposed to an invariant violation. Following discussion with: bde
* Locking for the per-process resource limits structure.jhb2004-02-041-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
* - Add a flags parameter to mi_switch. The value of flags may be SW_VOL orjeff2004-01-251-2/+1
| | | | | | | | | | SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.
* Log involuntary context switches correctly.peter2003-09-051-2/+2
|
* kse.h is not needed for these files.davidxu2003-08-051-1/+0
|
* When ktracing context switches, make sure we record involuntary switches.peter2003-07-311-0/+14
| | | | | Otherwise, when we get a evicted from the cpu, there is no record of it. This is not a default ktrace flag.
* o Change kse_thr_interrupt to allow send a signal to a specified thread,davidxu2003-06-281-15/+2
| | | | | | | | | | | | | | | | | or unblock a thread in kernel, and allow UTS to specify whether syscall should be restarted. o Add ability for UTS to monitor signal comes in and removed from process, the flag PS_SIGEVENT is used to indicate the events. o Add a KMF_WAITSIGEVENT for KSE mailbox flag, UTS call kse_release with this flag set to wait for above signal event. o For SA based thread, kernel masks all signal in its signal mask, let UTS to use kse_thr_interrupt interrupt a thread, and install a signal frame in userland for the thread. o Add a tm_syncsig in thread mailbox, when a hardware trap occurs, it is used to deliver synchronous signal to userland, and upcall is schedule, so UTS can process the synchronous signal for the thread. Reviewed by: julian (mentor)
* 1. Add code to support bound thread. when blocked, a bound thread neverdavidxu2003-06-151-1/+1
| | | | | | | schedules an upcall. Signal delivering to a bound thread is same as non-threaded process. This is intended to be used by libpthread to implement PTHREAD_SCOPE_SYSTEM thread. 2. Simplify kse_release() a bit, remove sleep loop.
* Rename P_THREADED to P_SA. P_SA means a process is using schedulerdavidxu2003-06-151-2/+2
| | | | activations.
* Use __FBSDID().obrien2003-06-111-1/+3
|
* - Merge struct procsig with struct sigacts.jhb2003-05-131-0/+2
| | | | | | | | | | | | | | | | | - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)
* The signotify() sanity check in userret() doesn't need Giant anymore.jhb2003-04-231-2/+0
|
* - Move PS_PROFIL and its new cousin PS_STOPPROF back over to p_flag andjhb2003-04-221-6/+3
| | | | | | | rename them appropriately. Protect both flags with both the proc lock and the sched_lock. - Protect p_profthreads with the proc lock. - Remove Giant from profil(2).
* Tweak locking in the PS_XCPU handler to hold the sched_lock while readingjhb2003-04-171-4/+5
| | | | p_runtime.
* - Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread withjeff2003-03-311-4/+5
| | | | | | | a follow on commit to kern_sig.c - signotify() now operates on a thread since unmasked pending signals are stored in the thread. - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
* - Change trapsignal() to accept a thread and not a proc.jeff2003-03-311-1/+1
| | | | | | | - Change all consumers to pass in a thread. Right now this does not cause any functional changes but it will be important later when signals can be delivered to specific threads.
* Fix signal delivering bug for threaded process.davidxu2003-03-111-2/+8
|
* Replace calls to WITNESS_SLEEP() and witness_list() with equivalent callsjhb2003-03-041-4/+1
| | | | to WITNESS_WARN().
* Change the process flags P_KSES to be P_THREADED.julian2003-02-271-2/+2
| | | | This is just a cosmetic change but I've been meaning to do it for about a year.
* - Add a new function, thread_signal_add(), that is called from postsig tojeff2003-02-171-1/+8
| | | | | | | | add a signal to a mailbox's pending set. - Add a new function, thread_signal_upcall(), this causes the current thread to upcall so that we can deliver pending signals. Reviewed by: mini
* Move a bunch of flags from the KSE to the thread.julian2003-02-171-9/+8
| | | | | | | | I was in two minds as to where to put them in the first case.. I should have listenned to the other mind. Submitted by: parts by davidxu@ Reviewed by: jeff@ mini@
* - Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back intojeff2003-02-171-2/+2
| | | | | | | the proc. These counters are only examined through calcru. Submitted by: davidxu Tested on: x86, alpha, UP/SMP
* Reversion of commit by Davidxu plus fixes since applied.julian2003-02-011-32/+24
| | | | | | | | I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.
* Use a local variable to store the number of ticks that elapsed intjr2003-01-311-2/+3
| | | | | kernel mode instead of (unintentionally) using the global `ticks'. This error completely broke profiling.
* Move UPCALL related data structure out of kse, introduce a newdavidxu2003-01-261-24/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian
OpenPOWER on IntegriCloud