summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_exit.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Move the ruadd() in kern_exit() to save our final stats in our childjhb2006-02-211-2/+3
| | | | | | | stats even further down in exit1() so that it includes the runtime and tick counts from the final time slice for the dying thread. Reviewed by: phk
* CPU time accounting speedup (step 2)phk2006-02-111-0/+3
| | | | | | | | | | | | | | | | | | | Keep accounting time (in per-cpu) cputicks and the statistics counts in the thread and summarize into struct proc when at context switch. Don't reach across CPUs in calcru(). Add code to calibrate the top speed of cpu_tickrate() for variable cpu_tick hardware (like TSC on power managed machines). Don't enforce monotonicity (at least for now) in calcru. While the calibrated cpu_tickrate ramps up it may not be true. Use 27MHz counter on i386/Geode. Use TSC on amd64 & i386 if present. Use tick counter on sparc64
* Modify the way we account for CPU time spent (step 1)phk2006-02-071-4/+3
| | | | | | | | | | | | | | | | Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.
* - Move the wakeup() for exiting kthreads out of exit1() and intojhb2006-02-061-6/+0
| | | | | | kthread_exit() as that is cleaner and less obscured. It also does the wakeup sooner. - Add some comments to kthread_exit().
* Audit the pid being requested in wait4().wsalamon2006-02-061-0/+2
| | | | | Obtained from: TrustedBSD Project Approved by: rwatson (mentor)
* On process exit, audit the return value of the process, and commit therwatson2006-02-051-0/+11
| | | | | | record immediately, as this system call never returns. Obtained from: TrustedBSD Project
* Add a comment.jhb2006-02-031-0/+3
|
* Hook up audit to fork() and exit() events. These changes manage therwatson2006-02-021-0/+5
| | | | | | | audit state on processes, not auditing of these events. Much work by: wsalamon Obtained from: TrustedBSD Project
* Hopefully fix the "calcru: runtime went backwards from ..." problem byups2006-01-231-2/+3
| | | | | | | keeping the resource values locked (where needed) while we use them for calculations. MFC after: 3 days
* Regenerate sysent with new abort2 system call.phk2005-12-231-0/+83
| | | | | | Implement abort2(const char *reason, int narg, void **args); Submitted by: "Wojciech A. Koszek" <dunstan@freebsd.czest.pl>
* Register itimers_event_hook as a kernel event handler, so I don'tdavidxu2005-12-091-3/+0
| | | | have to duplicate code to call it in exec() and exit1().
* Moderate rewrite of kernel ktrace code to attempt to generally improverwatson2005-11-131-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>
* Giant clean up for exit(2)csjp2005-11-081-7/+7
| | | | | | | | | | | | -Change unconditional aquisition of Giant to only pickup Giant if the vnode for the controlling tty resides on a non-mpsafe file system. -Pickup Giant around executable vnode reference counting operations only if the executable resides on a non-mpsafe file system. -If this process is being traced, pickup Giant for trace file reference count operations only if it resides on a non-mpsafe file system. Discussed with: jhb Tested by: kris
* Add support for queueing SIGCHLD same as other UNIX systems did.davidxu2005-11-081-3/+25
| | | | | | | | | | | | | | | | | | | | For each child process whose status has been changed, a SIGCHLD instance is queued, if the signal is stilling pending, and process changed status several times, signal information is updated to reflect latest process status. If wait() returns because the status of a child process is available, pending SIGCHLD signal associated with the child process is discarded. Any other pending SIGCHLD signals remain pending. The signal information is allocated at the same time when proc structure is allocated, if process signal queue is fully filled or there is a memory shortage, it can still send the signal to process. There is a booting time tunable kern.sigqueue.queue_sigchild which can control the behavior, setting it to zero disables the SIGCHLD queueing feature, the tunable will be removed if the function is proved that it is stable enough. Tested on: i386 (SMP and UP)
* Push down Giant into fdfree() and remove it from two of the callers.jhb2005-11-011-1/+1
| | | | | | | Other callers such as some rfork() cases weren't locking Giant anyway. Reviewed by: csjp MFC after: 1 week
* - Fix leak of struct nlminfo on process exit.glebius2005-10-261-0/+9
| | | | | | | - Fix malloc type collision, that made the above problem difficult to understand. Reported by: Vladimir Sharun <sharun ukr.net>
* Make p_itimers as a pointer, so file sys/proc.h does not need to includedavidxu2005-10-231-0/+1
| | | | sys/timers.h.
* Implement POSIX timers. Current only CLOCK_REALTIME and CLOCK_MONOTONICdavidxu2005-10-231-2/+4
| | | | | | | | | | | | clock are supported. I have plan to merge XSI timer ITIMER_REAL and other two CPU timers into the new code, current three slots are available for the XSI timers. The SIGEV_THREAD notification type is not supported yet because our sigevent struct lacks of two member fields: sigev_notify_function sigev_notify_attributes I have found the sigevent is used in AIO, so I won't add the two members unless the AIO code is adjusted.
* 1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, mostdavidxu2005-10-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code. 2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread. 3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc. 4. Add td_sigqueue to thread structure to hold all signals delivered to thread. 5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed. 6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals. Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64
* Add witness warnings to panic if a thread tries to exit while holding anyjhb2005-09-021-0/+3
| | | | | | | locks. Requested by: jeff MFC after: 3 days
* - Slightly reorder the events around the setting of PRS_ZOMBIE to be lessjhb2005-07-181-8/+20
| | | | | | | | | | | hokie and much more readable and expand the comment to explain why it is the way that it is. - Close a race where one CPU could free the process belonging to a thread on another CPU that hasn't quite finished exiting yet but is beyond the point of setting the process state as PRS_ZOMBIE. Reported and tested by: ps (2) MFC after: 3 days
* Use low level constructs borrowed from interrupt threads to wait forups2005-05-231-9/+1
| | | | | work in proc0. Remove the TDP_WAKEPROC0 workaround.
* Only check signal event, single threading event shouldn't be reported.davidxu2005-05-051-1/+2
|
* Wake up swapper process if needed.davidxu2005-04-231-0/+9
| | | | | PR: kern/78474 Submitted by: Sam Lawrance <boris at brooknet dot com dot au>
* Clear P_STATCHILD earlier to avoid unnecessary retrying.davidxu2005-04-191-0/+5
|
* Fix a race condition between kern_wait() and thread_stopped().davidxu2005-04-191-1/+5
| | | | | | | | | | Problem is in kern_wait(), parent process steps through children list, once a child process is skipped, and later even if the child is stopped, parent process still sleeps in msleep(), the race happens if parent masked SIGCHLD. Submitted by : Peter Edwards peadar.edwards at gmail dot com MFC after : 4 days
* Introduce p_canwait() and MAC Framework and MAC Policy entry pointsrwatson2005-04-181-0/+4
| | | | | | | | | | | | | | | mac_check_proc_wait(), which control the ability to wait4() specific processes. This permits MAC policies to limit information flow from children that have changed label, although has to be handled carefully due to common programming expectations regarding the behavior of wait4(). The cr_seeotheruids() check in p_canwait() is #if 0'd for this reason. The mac_stub and mac_test policies are updated to reflect these new entry points. Sponsored by: SPAWAR, SPARTA Obtained from: TrustedBSD Project
* - A lock is required before calling VOP_REVOKE. Our reference protects usjeff2005-03-131-1/+2
| | | | | | from accessing another vnode so a naked VOP_LOCK is sufficient. Sponsored by: Isilon Systems, Inc.
* In 1.276 of kern/subr_trap.c I introduced a mechanism for delayingphk2005-01-291-0/+7
| | | | | | | | | | a process return to userspace if it had pending GEOM events. We need to have the same check in the exit pass to catch the case where a GEOM related filedescriptor is not explicitly closed by the process. Bumped into by: people using dd(1) to build releases, nanobsd etc.
* In kern_wait(), let the compiler copy the rusage structure rather thanrwatson2005-01-081-1/+1
| | | | an explicit bcopy() -- it probably does a better job.
* /* -> /*- for copyright notices, minor format tweaks as necessaryimp2005-01-061-1/+1
|
* - Move the function prototypes for kern_setrlimit() and kern_wait() tojhb2005-01-051-0/+1
| | | | | sys/syscallsubr.h where all the other kern_foo() prototypes live. - Resort kern_execve() while I'm there.
* Don't include sys/user.h merely for its side-effect of recursivelydas2004-11-271-1/+0
| | | | including other headers.
* Remove P_STOPPED_TRACE bit if debugger dies without a chance todavidxu2004-10-231-1/+1
| | | | detach debugged process.
* Rework how we store process times in the kernel such that we always storejhb2004-10-051-19/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month
* Some more whitespace, style, and comment fixes.jhb2004-09-241-14/+16
| | | | Submitted by: bde (mostly)
* A modest collection of various and sundry style, spelling, and whitespacejhb2004-09-241-11/+11
| | | | | | fixes. Submitted by: bde (mostly)
* Various small style fixes.jhb2004-09-221-3/+5
|
* Refactor a bunch of scheduler code to give basically the same behaviourjulian2004-09-051-17/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
* Add locking to the kqueue subsystem. This also makes the kqueue subsystemjmg2004-08-151-4/+9
| | | | | | | | | | | | | a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)
* Giant is no longer required by vm_waitproc() and vmspace_exitfree().alc2004-07-301-2/+0
| | | | Eliminate it acquisition and release around vm_waitproc() in kern_wait().
* - Use atomic ops for updating the vmspace's refcnt and exitingcnt.alc2004-07-271-3/+7
| | | | | | | | - Push down Giant into shmexit(). (Giant is acquired only if the vmspace contains shm segments.) - Eliminate the acquisition of Giant from proc_rwmem(). - Reduce the scope of Giant in exit1(), uncovering the destruction of the address space.
* When calling scheduler entrypoints for creating new threads and processes,julian2004-07-181-1/+1
| | | | | | | | | | | specify "us" as the thread not the process/ksegrp/kse. You can always find the others from the thread but the converse is not true. Theorotically this would lead to runtime being allocated to the wrong entity in some cases though it is not clear how often this actually happenned. (would only affect threaded processes and would probably be pretty benign, but it WAS a bug..) Reviewed by: peter
* Add code to support debugging threaded process.davidxu2004-07-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel thread current user thread is running on. Add tm_dflags into kse_thr_mailbox, the flags is written by debugger, it tells UTS and kernel what should be done when the process is being debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER. TMDF_SSTEP is used to tell kernel to turn on single stepping, or turn off if it is not set. TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall whenever possible, to UTS, it means do not run the user thread until debugger clears it, this behaviour is necessary because gdb wants to resume only one thread when the thread's pc is at a breakpoint, and thread needs to go forward, in order to avoid other threads sneak pass the breakpoints, it needs to remove breakpoint, only wants one thread to go. Also, add km_lwp to kse_mailbox, the lwp id is copied to kse_thr_mailbox at context switch time when process is not being debugged, so when process is attached, debugger can map kernel thread to user thread. 2. Add p_xthread to proc strcuture and td_xsig to thread structure. p_xthread is used by a thread when it wants to report event to debugger, every thread can set the pointer, especially, when it is used in ptracestop, it is the last thread reporting event will win the race. Every thread has a td_xsig to exchange signal with debugger, thread uses TDF_XSIG flag to indicate it is reporting signal to debugger, if the flag is not cleared, thread will keep retrying until it is cleared by debugger, p_xthread may be used by debugger to indicate CURRENT thread. The p_xstat is still in proc structure to keep wait() to work, in future, we may just use td_xsig. 3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend a thread. When process stops, debugger can set the flag for thread, thread will check the flag in thread_suspend_check, enters a loop, unless it is cleared by debugger, process is detached or process is existing. The flag is also checked in ptracestop, so debugger can temporarily suspend a thread even if the thread wants to exchange signal. 4. Current, in ptrace, we always resume all threads, but if a thread has already a TDF_DBSUSPEND flag set by debugger, it won't run. Encouraged by: marcel, julian, deischen
* Push down the acquisition and release of the page queues lock intoalc2004-07-131-2/+0
| | | | | | | | pmap_remove_pages(). (The implementation of pmap_remove_pages() is optional. If pmap_remove_pages() is unimplemented, the acquisition and release of the page queues lock is unnecessary.) Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().
* Implement the PT_LWPINFO request. This request can be used by themarcel2004-07-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | tracing process to obtain information about the LWP that caused the traced process to stop. Debuggers can use this information to select the thread currently running on the LWP as the current thread. The request has been made compatible with NetBSD for as much as possible. This implementation differs from NetBSD in the following ways: 1. The data argument is allowed to be smaller than the size of the ptrace_lwpinfo structure known to the kernel, but not 0. This is opposite to what NetBSD allows. The reason for this is that we can extend the structure without affecting older binaries. 2. On NetBSD the tracing process is to set the pl_lwpid field to the Id of the LWP it wants information of. We don't do that. Our ptrace interface allows passing the LWP Id instead of the PID. The tracing process is to set the PID to the LWP Id it wants information of. 3. When the PID is actually the PID of the tracing process, this request returns the information about the LWP that caused the process to stop. This was the whole purpose of the request in the first place. When the traced process has exited, this request will return the LWP Id 0, indicating that the process state is not the result of an event specific to a LWP.
* (1) Removed the bogus condition "p->p_pid != 1" on calling sched_exit()bde2004-06-211-7/+1
| | | | | | | | | | | | | | | | | | | | | from exit1(). sched_exit() must be called unconditionally from exit1(). It was called almost unconditionally because the only exits on system shutdown if at all. (2) Removed the comment that presumed to know what sched_exit() does. sched_exit() does different things for the ULE case. The call became essential when it started doing load average stuff, but its caller should not know that. (3) Didn't fix bugs caused by bitrot in the condition. The condition was last correct in rev.1.208 when it was in wait1(). There p was spelled curthread->td_proc and was for the waiting parent; now p is for the exiting child. The condition was to avoid lowering init's priority. It should be in sched_exit() itself. Lowering of priorities is broken in other ways in at least the 4BSD scheduler, and doing it for init causes less noticeable problems than doing it for for shells. Noticed by: julian (1)
* Update p_runtime on exit. This fixes calcru() on zombies, and preparesbde2004-06-211-2/+9
| | | | | | | for not calling calcru() on exit. calcru() on a zombie can happen if ttyinfo() (^T) picks one. PR: 52490
* Add comment to reflect that we should retry after thread singling failed.davidxu2004-06-181-0/+4
|
* Remove a bogus panic. It is possible more than one threads willdavidxu2004-06-181-1/+2
| | | | | | be suspended in thread_suspend_check, after they are resumed, all threads will call thread_single, but only one can be success, others should retry and will exit in thread_suspend_check.
OpenPOWER on IntegriCloud