summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_exit.c
Commit message (Collapse)AuthorAgeFilesLines
* A little infrastructure, preceding some upcoming changesjulian2003-02-081-1/+1
| | | | | | | to the profiling and statistics code. Submitted by: DavidXu@ Reviewed by: peter@
* Reversion of commit by Davidxu plus fixes since applied.julian2003-02-011-4/+14
| | | | | | | | I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.
* Move UPCALL related data structure out of kse, introduce a newdavidxu2003-01-261-14/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-211-1/+1
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
* It is possible for an active aio to prevent shared memory from beingdillon2003-01-131-2/+1
| | | | | | | | | | | | | dereferenced when a process exits due to the vmspace ref-count being bumped. Change shmexit() and shmexit_myhook() to take a vmspace instead of a process and call it in vmspace_dofree(). This way if it is missed in exit1()'s early-resource-free it will still be caught when the zombie is reaped. Also fix a potential race in shmexit_myhook() by NULLing out vmspace->vm_shm prior to calling shm_delete_mapping() and free(). MFC after: 7 days
* Fix a refcount race with the vmspace structure. In order to preventdillon2002-12-151-1/+8
| | | | | | | | | | | | | | | | | | resource starvation we clean-up as much of the vmspace structure as we can when the last process using it exits. The rest of the structure is cleaned up when it is reaped. But since exit1() decrements the ref count it is possible for a double-free to occur if someone else, such as the process swapout code, references and then dereferences the structure. Additionally, the final cleanup of the structure should not occur until the last process referencing it is reaped. This commit solves the problem by introducing a secondary reference count, calling 'vm_exitingcnt'. The normal reference count is decremented on exit and vm_exitingcnt is incremented. vm_exitingcnt is decremented when the process is reaped. When both vm_exitingcnt and vm_refcnt are 0, the structure is freed for real. MFC after: 3 weeks
* Unbreak the KSE code. Keep track of zobie threads using the Per-CPU storagejulian2002-12-101-24/+6
| | | | | | | | during the context switch. Rearrange thread cleanups to avoid problems with Giant. Clean threads when freed or when recycled. Approved by: re (jhb)
* Acquire and release the page queues lock around pmap_remove_pages() becausealc2002-11-251-0/+3
| | | | it updates several of vm_page's fields.
* Introduce p_label, extensible security label storage for the MAC frameworkrwatson2002-11-201-0/+5
| | | | | | | | | | | | | | | | | | | in struct proc. While the process label is actually stored in the struct ucred pointed to by p_ucred, there is a need for transient storage that may be used when asynchronous (deferred) updates need to be performed on the "real" label for locking reasons. Unlike other label storage, this label has no locking semantics, relying on policies to provide their own protection for the label contents, meaning that a policy leaf mutex may be used, avoiding lock order issues. This permits policies that act based on historical process behavior (such as audit policies, the MAC Framework port of LOMAC, etc) can update process properties even when many existing locks are held without violating the lock order. No currently committed policies implement use of this label storage. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* - Add a new global mutex 'ppeers_lock' to protect the p_peers list ofjhb2002-10-151-6/+6
| | | | | | | | | | | | | | | | | | | processes forked with RFTHREAD. - Use a goto to a label for common code when exiting from fork1() in case of an error. - Move the RFTHREAD linkage setup code later in fork since the ppeers_lock cannot be locked while holding a proc lock. Handle the race of a task leader exiting and killing its peers while a peer is forking a new child. In that case, go ahead and let the peer process proceed normally as the parent is about to kill it. However, the task leader may have already gone to sleep to wait for the peers to die, so the new child process may not receive a SIGKILL from the task leader. Rather than try to destruct the new child process, just go ahead and send it a SIGKILL directly and add it to the p_peers list. This ensures that the task leader will wait until both the peer process doing the fork() and the new child process have received their KILL signals and exited. Discussed with: truckman (earlier versions)
* - Create a new scheduler api that is defined in sys/sched.hjeff2002-10-121-12/+5
| | | | | | | | | | - Begin moving scheduler specific functionality into sched_4bsd.c - Replace direct manipulation of scheduler data with hooks provided by the new api. - Remove KSE specific state modifications and single runq assumptions from kern_switch.c Reviewed by: -arch
* Round out the facilty for a 'bound' thread to loan out its KSEjulian2002-10-091-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | in specific situations. The owner thread must be blocked, and the borrower can not proceed back to user space with the borrowed KSE. The borrower will return the KSE on the next context switch where teh owner wants it back. This removes a lot of possible race conditions and deadlocks. It is consceivable that the borrower should inherit the priority of the owner too. that's another discussion and would be simple to do. Also, as part of this, the "preallocatd spare thread" is attached to the thread doing a syscall rather than the KSE. This removes the need to lock the scheduler when we want to access it, as it's now "at hand". DDB now shows a lot mor info for threaded proceses though it may need some optimisation to squeeze it all back into 80 chars again. (possible JKH project) Upcalls are now "bound" threads, but "KSE Lending" now means that other completing syscalls can be completed using that KSE before the upcall finally makes it back to the UTS. (getting threads OUT OF THE KERNEL is one of the highest priorities in the KSE system.) The upcall when it happens will present all the completed syscalls to the KSE for selection.
* Whitespace fix onlyjulian2002-10-021-3/+3
|
* Back our kernel support for reliable signal queues.jmallett2002-10-011-8/+1
| | | | Requested by: rwatson, phk, and many others
* First half of implementation of ksiginfo, signal queues, and such. Thisjmallett2002-09-301-1/+8
| | | | | | | | | | | | | | | | | | | | | | gets signals operating based on a TailQ, and is good enough to run X11, GNOME, and do job control. There are some intricate parts which could be more refined to match the sigset_t versions, but those require further evaluation of directions in which our signal system can expand and contract to fit our needs. After this has been in the tree for a while, I will make in kernel API changes, most notably to trapsignal(9) and sendsig(9), to use ksiginfo more robustly, such that we can actually pass information with our (queued) signals to the userland. That will also result in using a struct ksiginfo pointer, rather than a signal number, in a lot of kern_sig.c, to refer to an individual pending signal queue member, but right now there is no defined behaviour for such. CODAFS is unfinished in this regard because the logic is unclear in some places. Sponsored by: New Gold Technology Reviewed by: bde, tjr, jake [an older version, logic similar]
* Use the fields in the sysentvec and in the vm map header in place of thejake2002-09-211-4/+4
| | | | | | | | constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS. This is mainly so that they can be variable even for the native abi, based on different machine types. Get stack protections from the sysentvec too. This makes it trivial to map the stack non-executable for certain abis, on machines that support it.
* Add kernel support needed for the KSE-aware libpthread:mini2002-09-161-2/+0
| | | | | | | | | | | | - Use ucontext_t's to store KSE thread state. - Synthesize state for the UTS upon each upcall, rather than saving and copying a trapframe. - Deliver signals to KSE-aware processes via upcall. - Rename kse mailbox structure fields to be more BSD-like. - Store the UTS's stack in struct proc in a stack_t. Reviewed by: bde, deischen, julian Approved by: -arch
* Allocate KSEs and KSEGRPs separatly and remove them from the proc structure.julian2002-09-151-2/+2
| | | | | | | | | next step is to allow > 1 to be allocated per process. This would give multi-processor threads. (when the rest of the infrastructure is in place) While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc are diverging more than they should.. corrective action needed soon.
* Use UMA as a complex object allocator.julian2002-09-061-0/+2
| | | | | | | | | | | | | | | The process allocator now caches and hands out complete process structures *including substructures* . i.e. it get's the process structure with the first thread (and soon KSE) already allocated and attached, all in one hit. For the average non threaded program (non KSE that is) the allocated thread and its stack remain attached to the process, even when the process is unused and in the process cache. This saves having to allocate and attach it later, effectively bringing us (hopefully) close to the efficiency of pre-KSE systems where these were a single structure. Reviewed by: davidxu@freebsd.org, peter@freebsd.org
* s/SGNL/SIG/davidxu2002-09-051-1/+1
| | | | | | | | | | s/SNGL/SINGLE/ s/SNGLE/SINGLE/ Fix abbreviation for P_STOPPED_* etc flags, in original code they were inconsistent and difficult to distinguish between them. Approved by: julian (mentor)
* Revert previous revision which accidentally snuck in with another commit.jhb2002-08-011-1/+1
| | | | It just removed a comment that doesn't make sense to me personally.
* If we fail to write to a vnode during a ktrace write, then we drop alljhb2002-08-011-1/+1
| | | | | | | | | other references to that vnode as a trace vnode in other processes as well as in any pending requests on the todo list. Thus, it is possible for a ktrace request structure to have a NULL ktr_vp when it is destroyed in ktr_freerequest(). We shouldn't call vrele() on the vnode in that case. Reported by: bde
* Part 1 of KSE-IIIjulian2002-06-291-9/+88
| | | | | | | | | | | | | The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
* More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t.alfred2002-06-291-16/+15
|
* Add an MD callout like cpu_exit, but which is called after sched_lock isjake2002-06-241-0/+1
| | | | | | | | | obtained, when all other scheduling activity is suspended. This is needed on sparc64 to deactivate the vmspace of the exiting process on all cpus. Otherwise if another unrelated process gets the exact same vmspace structure allocated to it (same address), its address space will not be activated properly. This seems to fix some spontaneous signal 11 problems with smp on sparc64.
* Properly lock accesses to p_tracep and p_traceflag. Also make a fewjhb2002-06-071-0/+5
| | | | ktrace-only things #ifdef KTRACE that were not before.
* Add POSIX.1-2001 WCONTINUED option for waitpid(2). A proc flagmike2002-06-011-1/+17
| | | | | | | | | | (P_CONTINUED) is set when a stopped process receives a SIGCONT and cleared after it has notified a parent process that has requested notification via waitpid(2) with WCONTINUED specified in its options operand. The status value can be checked with the new WIFCONTINUED() macro. Reviewed by: jake
* Whitespace: trim a trailing tab.jhb2002-05-231-1/+1
|
* Make funsetown() take a 'struct sigio **' so that the locking canalfred2002-05-061-4/+0
| | | | | | | | | | | | | | | | be done internally. Ensure that no one can fsetown() to a dying process/pgrp. We need to check the process for P_WEXIT to see if it's exiting. Process groups are already safe because there is no such thing as a pgrp zombie, therefore the proctree lock completely protects the pgrp from having sigio structures associated with it after it runs funsetownlst. Add sigio lock to witness list under proctree and allproc, but over proc and pgrp. Seigo Tanimura helped with this.
* When checking to see if the init process calls exit1(), compare p to thejhb2002-05-061-1/+1
| | | | | | initproc proc pointer instead of checking to see if the pid is 1. Submitted by: bde
* Style fixes in local variable declarations.jhb2002-05-061-9/+10
| | | | Submitted by: bde
* - Style fixes in some comments.jhb2002-05-061-9/+10
| | | | | | | - Whitespace nit. - Sort some includes. Submitted by: bde (mostly)
* style(9): 'if' and 'while' need a space after them.alfred2002-05-041-3/+3
|
* Fix the lock order reversal between the sigio lock and a process/pgrp lock intanimura2002-05-031-0/+2
| | | | funsetownlst() by locking the sigio lock across funsetownlst().
* - Reorder a few things so that when we lock the process at the end ofjhb2002-05-021-107/+118
| | | | | | | | | | | | | | | | | | | | | | | exit1() we don't have to release it until we acquire schd_lock to call cpu_throw(). - Since we can switch at any time due to preemption or a lock release prior to acquiring sched_lock, don't update switchtime and switchticks until the very end of exit1() after we have acquired sched_lock. - Interlock the proctree_lock and proc lock in wait1() and exit1() to avoid lost wakeups when a parent blocks waiting for a child to exit at the bottom of wait1(). In exit1() the proc lock interlocked with proctree_lock (and released after acquiring sched_lock) is that of the parent process. - In wait1() use an exclusive lock of proctree lock while we are looking for a process to harvest. This allows us to completely remove all references to the process once we've found one (i.e., disconnect it from pgrp's, session's, zombproc list, and it's parent's children list) "atomically" without needing to worry about a lock upgrade. - We don't need sched_lock to test if p_stat is SZOMB or SSTOP when holding the proc lock since the proc lock is always held with p_stat is set to SZOMB or SSTOP. - Protect nprocs with an xlock of the allproc_lock.
* Avoid the user-visible effect of setting SA_NOCLDWAIT when theiedowse2002-04-271-3/+3
| | | | | | | SIGCHLD handler is SIG_IGN. This is a reimplementation of the problematic revision 1.131 of kern_exit.c. To avoid accessing process UPAGES, we set a new procsig flag when the SIGCHLD handler is SIG_IGN and use that instead.
* - Lock proctree_lock instead of pgrpsess_lock.jhb2002-04-161-7/+7
| | | | - Exclusively lock proctree_lock while calling leavepgrp().
* We don't need Giant to read the pgrp ID since the proc lock has protectedjhb2002-04-091-5/+3
| | | | | | p_pgrp since the pgrp locking went in. We also don't need it to check for invalid values in the options argument to wait1(), so push Giant down slightly.
* Close some holes with p->p_args by NULL'ing out the p->p_args pointeralfred2002-03-311-1/+4
| | | | | | | while holding the proc lock, and by holding the pargs structure when accessing it from outside of the owner. Submitted by: Jonathan Mini <mini@haikugeek.com>
* Make the reference counting of 'struct pargs' SMP safe.alfred2002-03-271-2/+1
| | | | | | | | | There is still some locations where the PROC lock should be held in order to prevent inconsistent views from outside (like the proc->p_fd fix for kern/vfs_syscalls.c:checkdirs()) that can be fixed later. Submitted by: Jonathan Mini <mini@haikugeek.com>
* Remove references to vm_zone.h and switch over to the new uma API.jeff2002-03-201-2/+2
| | | | | Also, remove maxsockets. If you look carefully you'll notice that the old zone allocator never honored this anyway.
* Remove __P.alfred2002-03-191-1/+1
|
* Do not lock the pgrpsess_lock exclusively across ttywait().tanimura2002-03-111-0/+2
| | | | | Spotted by: David Wolfskill <david@catwhisker.org> Investigated by: rwatson
* Lock struct pgrp, session and sigio.tanimura2002-02-231-17/+47
| | | | | | | | | | | | | | | | | | | | | | | | | New locks are: - pgrpsess_lock which locks the whole pgrps and sessions, - pg_mtx which protects the pgrp members, and - s_mtx which protects the session members. Please refer to sys/proc.h for the coverage of these locks. Changes on the pgrp/session interface: - pgfind() needs the pgrpsess_lock held. - The caller of enterpgrp() is responsible to allocate a new pgrp and session. - Call enterthispgrp() in order to enter an existing pgrp. - pgsignal() requires a pgrp lock held. Reviewed by: jhb, alfred Tested on: cvsup.jp.FreeBSD.org (which is a quad-CPU machine running -current)
* Convert p->p_runtime and PCPU(switchtime) to bintime format.phk2002-02-221-1/+1
|
* Fix a race with free'ing vmspaces at process exit when vmspaces arealfred2002-02-051-3/+4
| | | | | | | | | | | | | | | | | | | shared. Also introduce vm_endcopy instead of using pointer tricks when initializing new vmspaces. The race occured because of how the reference was utilized: test vmspace reference, possibly block, decrement reference When sharing a vmspace between multiple processes it was possible for two processes exiting at the same time to test the reference count, possibly block and neither one free because they wouldn't see the other's update. Submitted by: green
* Release text vnode in exit() rather than wait(). Occasionallydwmalone2002-01-051-8/+8
| | | | | | | | | fifesystem problems could prevent the release from completing and this could result in init being blocked indefinitely. This was looked over by Matt ages ago. Approved by: dillon
* Change the preemption code for software interrupt thread schedules andjhb2002-01-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mutex releases to not require flags for the cases when preemption is not allowed: The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent switching to a higher priority thread on mutex releease and swi schedule, respectively when that switch is not safe. Now that the critical section API maintains a per-thread nesting count, the kernel can easily check whether or not it should switch without relying on flags from the programmer. This fixes a few bugs in that all current callers of swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from fast interrupt handlers and the swi_sched of softclock needed this flag. Note that to ensure that swi_sched()'s in clock and fast interrupt handlers do not switch, these handlers have to be explicitly wrapped in critical_enter/exit pairs. Presently, just wrapping the handlers is sufficient, but in the future with the fully preemptive kernel, the interrupt must be EOI'd before critical_exit() is called. (critical_exit() can switch due to a deferred preemption in a fully preemptive kernel.) I've tested the changes to the interrupt code on i386 and alpha. I have not tested ia64, but the interrupt code is almost identical to the alpha code, so I expect it will work fine. PowerPC and ARM do not yet have interrupt code in the tree so they shouldn't be broken. Sparc64 is broken, but that's been ok'd by jake and tmm who will be fixing the interrupt code for sparc64 shortly. Reviewed by: peter Tested on: i386, alpha
* Eliminate semexit_hook using at_exit(9) and rm_at_exit(9).alc2001-12-301-5/+0
| | | | Reviewed by: alfred
* Make AIO a loadable module.alfred2001-12-291-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO will use at_exit(9). Add functions at_exec(9), rm_at_exec(9) which function nearly the same as at_exec(9) and rm_at_exec(9), these functions are called on behalf of modules at the time of execve(2) after the image activator has run. Use a modified version of tegge's suggestion via at_exec(9) to close an exploitable race in AIO. Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral, the problem was that one had to pass it a paramater indicating the number of arguments which were actually the number of "int". Fix it by using an inline version of the AS macro against the syscall arguments. (AS should be available globally but we'll get to that later.) Add a primative system for dynamically adding kqueue ops, it's really not as sophisticated as it should be, but I'll discuss with jlemon when he's around.
OpenPOWER on IntegriCloud