summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_exit.c
Commit message (Collapse)AuthorAgeFilesLines
* Several threads in a process may do vfork() simultaneously. Then, allkib2008-12-051-0/+2
| | | | | | | | | | | | | | | | | | | parent threads sleep on the parent' struct proc until corresponding child releases the vmspace. Each sleep is interlocked with proc mutex of the child, that triggers assertion in the sleepq_add(). The assertion requires that at any time, all simultaneous sleepers for the channel use the same interlock. Silent the assertion by using conditional variable allocated in the child. Broadcast the variable event on exec() and exit(). Since struct proc * sleep wait channel is overloaded for several unrelated events, I was unable to remove wakeups from the places where cv_broadcast() is added, except exec(). Reported and tested by: ganbold Suggested and reviewed by: jhb MFC after: 2 week
* MFp4:bz2008-11-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible
* Move per-thread userland debugging flags into seperated field,davidxu2008-10-151-0/+4
| | | | | | this eliminates some problems of locking, e.g, a thread lock is needed but can not be used at that time. Only the process lock is needed now for new field.
* Don't remove queued SIGCHLD if options contain WNOWAIT, so otherdavidxu2008-08-291-6/+6
| | | | threads still can be notified by the signal.
* Implement WNOWAIT flag for wait4(2). It specifies that process whose statuskib2008-08-261-2/+14
| | | | | | | | | is returned shall be kept in the waitable state. Add WSTOPPED as an alias for WUNTRACED. Submitted by: Jukka Ukkonen <jau at iki fi> PR: standards/116221 MFC after: 2 weeks
* Integrate the new MPSAFE TTY layer to the FreeBSD operating system.ed2008-08-201-34/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following: - Improved driver model: The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers. If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver. - Improved hotplugging: With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc). The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly. - Improved performance: One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters. Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING. Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan
* Add DTrace 'proc' provider probes using the Statically Defined Tracejb2008-05-241-0/+30
| | | | (sdt) mechanism.
* In abort2(2): Accept a NULL arg pointer if nargs == 0phk2008-03-221-6/+8
|
* - Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice fromjeff2008-03-191-2/+0
| | | | | | | requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.
* Switch from conditionally dropping Giant in exit1() to asserting it iskris2008-02-171-6/+1
| | | | not held, which appears to be always true.
* VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used inattilio2008-01-131-1/+1
| | | | | | | | | | | conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
* Introduce a way to make pure kernal threads.julian2007-10-261-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | kthread_add() takes the same parameters as the old kthread_create() plus a pointer to a process structure, and adds a kernel thread to that process. kproc_kthread_add() takes the parameters for kthread_add, plus a process name and a pointer to a pointer to a process instead of just a pointer, and if the proc * is NULL, it creates the process to the specifications required, before adding the thread to it. All other old kthread_xxx() calls return, but act on (struct thread *) instead of (struct proc *). One reason to change the name is so that any old kernel modules that are lying around and expect kthread_create() to make a process will not just accidentally link. fix top to show kernel threads by their thread name in -SH mode add a tdnam formatting option to ps to show thread names. make all idle threads actual kthreads and put them into their own idled process. make all interrupt threads kthreads and put them in an interd process (mainly for aesthetic and accounting reasons) rename proc 0 to be 'kernel' and it's swapper thread is now 'swapper' man page fixes to follow.
* Merge first in a series of TrustedBSD MAC Framework KPI changesrwatson2007-10-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
* Improve the ktrace locking somewhat to reduce overhead:jhb2007-06-131-18/+24
| | | | | | | | | | | | | | | | | | - Depessimize userret() in kernels where KTRACE is enabled by doing an unlocked check of the per-process queue of pending events before acquiring any locks. Previously ktr_userret() unconditionally acquired the global ktrace_sx lock on every return to userland for every thread, even if ktrace wasn't enabled for the thread. - Optimize the locking in exit() to first perform an unlocked read of p_traceflag to see if ktrace is enabled and only acquire locks and teardown ktrace if the test succeeds. Also, explicitly disable tracing before draining any pending events so the pending events actually get written out. The unlocked read is safe because proc lock is acquired earlier after single-threading so p_traceflag can't change between then and this check (well, it can currently due to a bug in ktrace I will fix next, but that race existed prior to this change as well). Reviewed by: rwatson
* rufetch and calcru sometimes should be called atomically together.attilio2007-06-091-6/+4
| | | | | | | | | | This patch fixes places where they should be called atomically changing their locking requirements (both assume per-proc spinlock held) and introducing rufetchcalc which wrappers both calls to be performed in atomic way. Reviewed by: jeff Approved by: jeff (mentor)
* The current rusage code show peculiar problems:attilio2007-06-091-31/+8
| | | | | | | | | | | | | | - Unsafeness on ruadd() in thread_exit() - Unatomicity of thread_exiit() in the exit1() operations This patch addresses these problems allocating p_fd as part of the process and modifying the way it is accessed. A small chunk of this patch, resolves a race about p_state in kern_wait(), since we have to be sure about the zombif-ing process. Submitted by: jeff Approved by: jeff (mentor)
* Move per-process audit state from a pointer in the proc structure torwatson2007-06-071-3/+0
| | | | | | | | | | | embedded storage in struct ucred. This allows audit state to be cached with the thread, avoiding locking operations with each system call, and makes it available in asynchronous execution contexts, such as deep in the network stack or VFS. Reviewed by: csjp Approved by: re (kensmith) Obtained from: TrustedBSD Project
* Commit 14/14 of sched_lock decomposition.jeff2007-06-051-10/+12
| | | | | | | | | | | - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
* - Move rusage from being per-process in struct pstats to per-thread injeff2007-06-011-9/+17
| | | | | | | | | | | | | | | | | | | td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
* Move cpu_exit() earlier in exit1() to close a race betweenjhb2007-05-141-16/+10
| | | | | | | | | | | | SIGCHLD/kevent(2) notification of process termination and wait(). Now we no longer drop locks between sending the notification and marking the process as a zombie. Previously, if another process attempted to do a wait() with W_NOHANG after receiving a SIGCHLD or kevent and locked the process while the exiting thread was in cpu_exit(), then wait() would fail to find the process, which is quite astonishing to the process calling wait(). MFC after: 3 days
* Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes,jhb2007-03-211-1/+1
| | | | rwlocks, and sx locks to 'lock_object'.
* Further system call comment cleanup:rwatson2007-03-051-5/+4
| | | | | | | | | | - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
* Remove 'MPSAFE' annotations from the comments above most system calls: allrwatson2007-03-041-9/+0
| | | | | | | | system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
* Move sigqueue_take() call into proc_reparent(), this fixed bugs wheredavidxu2006-10-251-4/+3
| | | | proc_reparent() is called but sigqueue_take() is forgotten.
* Protect sigqueue_take() call by child process's lock, it fixed adavidxu2006-10-241-2/+2
| | | | | potential race with ptrace 'attach' which changes parent of the child process.
* Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.hrwatson2006-10-221-1/+1
| | | | | | | | | | | | | begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
* Since revision 1.333 of kern_sig.c no longer uses P_WEXIT, the changedavidxu2006-10-211-2/+8
| | | | | | opened a race window which can cause memory leak in signal queue. Here we free memory for signal queue when process state is set to PRS_ZOMBIE.
* Back out one of the Giant removals from revision 1.272. Giant was not here tocsjp2006-09-131-2/+2
| | | | | | | | | | protect the vnode, it was present to synchronize access to TTY session information between exit(2) and the TTY code. While we are here, note that Giant is required for TTY protection. Clue from: bde Discussed with: jhb MFC after: 1 week
* Close race between vmspace_exitfree() and exit1() and races betweentegge2006-05-291-29/+2
| | | | | | | | | | | | | | | | | vmspace_exitfree() and vmspace_free() which could result in the same vmspace being freed twice. Factor out part of exit1() into new function vmspace_exit(). Attach to vmspace0 to allow old vmspace to be freed earlier. Add new function, vmspace_acquire_ref(), for obtaining a vmspace reference for a vmspace belonging to another process. Avoid changing vmspace refcount from 0 to 1 since that could also lead to the same vmspace being freed twice. Change vmtotal() and swapout_procs() to use vmspace_acquire_ref(). Reviewed by: alc
* Kill the last Giant acquisition in the exit(2) code. This Giant acquisitioncsjp2006-04-101-2/+0
| | | | | | | | | | doesn't appear to be protecting anything. Most of consumers funsetownlst(9) do not appear to be picking up Giant anywhere. This was originally a part of my Giant exit(2) clean up revision 1.272 but I thought it was a good idea to leave it out until we were able to analyze it better. Tested by: kris MFC after: 3 weeks
* Remove the unused sva and eva arguments from pmap_remove_pages().peter2006-04-031-2/+1
|
* 1. Count last time slice, this intends to fixdavidxu2006-03-141-14/+0
| | | | | | | "calcru: runtime went backwards" bug for threaded process. 2. Add comment about possible logical problem with scheduler. MFC after: 3 days
* Close some races between procfs/ptrace and exit(2):jhb2006-02-221-5/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Reorder the events in exit(2) slightly so that we trigger the S_EXIT stop event earlier. After we have signalled that, we set P_WEXIT and then wait for any processes with a hold on the vmspace via PHOLD to release it. PHOLD now KASSERT()'s that P_WEXIT is clear when it is invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops to zero. - Change proc_rwmem() to require that the processing read from has its vmspace held via PHOLD by the caller and get rid of all the junk to screw around with the vmspace reference count as we no longer need it. - In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it doesn't exist. - Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem() to clear an earlier single-step simualted via a breakpoint). We only do one to avoid races. Also, by making the EINVAL error for unknown requests be part of the default: case in the switch, the various switch cases can now just break out to return which removes a _lot_ of duplicated PRELE and proc unlocks, etc. Also, it fixes at least one bug where a LWP ptrace command could return EINVAL with the proc lock still held. - Changed the locking for ptrace_single_step(), ptrace_set_pc(), and ptrace_clear_single_step() to always be called with the proc lock held (it was a mixed bag previously). Alpha and arm have to drop the lock while the mess around with breakpoints, but other archs avoid extra lock release/acquires in ptrace(). I did have to fix a couple of other consumers in kern_kse and a few other places to hold the proc lock and PHOLD. Tested by: ps (1 mostly, but some bits of 2-4 as well) MFC after: 1 week
* Move the ruadd() in kern_exit() to save our final stats in our childjhb2006-02-211-2/+3
| | | | | | | stats even further down in exit1() so that it includes the runtime and tick counts from the final time slice for the dying thread. Reviewed by: phk
* CPU time accounting speedup (step 2)phk2006-02-111-0/+3
| | | | | | | | | | | | | | | | | | | Keep accounting time (in per-cpu) cputicks and the statistics counts in the thread and summarize into struct proc when at context switch. Don't reach across CPUs in calcru(). Add code to calibrate the top speed of cpu_tickrate() for variable cpu_tick hardware (like TSC on power managed machines). Don't enforce monotonicity (at least for now) in calcru. While the calibrated cpu_tickrate ramps up it may not be true. Use 27MHz counter on i386/Geode. Use TSC on amd64 & i386 if present. Use tick counter on sparc64
* Modify the way we account for CPU time spent (step 1)phk2006-02-071-4/+3
| | | | | | | | | | | | | | | | Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.
* - Move the wakeup() for exiting kthreads out of exit1() and intojhb2006-02-061-6/+0
| | | | | | kthread_exit() as that is cleaner and less obscured. It also does the wakeup sooner. - Add some comments to kthread_exit().
* Audit the pid being requested in wait4().wsalamon2006-02-061-0/+2
| | | | | Obtained from: TrustedBSD Project Approved by: rwatson (mentor)
* On process exit, audit the return value of the process, and commit therwatson2006-02-051-0/+11
| | | | | | record immediately, as this system call never returns. Obtained from: TrustedBSD Project
* Add a comment.jhb2006-02-031-0/+3
|
* Hook up audit to fork() and exit() events. These changes manage therwatson2006-02-021-0/+5
| | | | | | | audit state on processes, not auditing of these events. Much work by: wsalamon Obtained from: TrustedBSD Project
* Hopefully fix the "calcru: runtime went backwards from ..." problem byups2006-01-231-2/+3
| | | | | | | keeping the resource values locked (where needed) while we use them for calculations. MFC after: 3 days
* Regenerate sysent with new abort2 system call.phk2005-12-231-0/+83
| | | | | | Implement abort2(const char *reason, int narg, void **args); Submitted by: "Wojciech A. Koszek" <dunstan@freebsd.czest.pl>
* Register itimers_event_hook as a kernel event handler, so I don'tdavidxu2005-12-091-3/+0
| | | | have to duplicate code to call it in exec() and exit1().
* Moderate rewrite of kernel ktrace code to attempt to generally improverwatson2005-11-131-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>
* Giant clean up for exit(2)csjp2005-11-081-7/+7
| | | | | | | | | | | | -Change unconditional aquisition of Giant to only pickup Giant if the vnode for the controlling tty resides on a non-mpsafe file system. -Pickup Giant around executable vnode reference counting operations only if the executable resides on a non-mpsafe file system. -If this process is being traced, pickup Giant for trace file reference count operations only if it resides on a non-mpsafe file system. Discussed with: jhb Tested by: kris
* Add support for queueing SIGCHLD same as other UNIX systems did.davidxu2005-11-081-3/+25
| | | | | | | | | | | | | | | | | | | | For each child process whose status has been changed, a SIGCHLD instance is queued, if the signal is stilling pending, and process changed status several times, signal information is updated to reflect latest process status. If wait() returns because the status of a child process is available, pending SIGCHLD signal associated with the child process is discarded. Any other pending SIGCHLD signals remain pending. The signal information is allocated at the same time when proc structure is allocated, if process signal queue is fully filled or there is a memory shortage, it can still send the signal to process. There is a booting time tunable kern.sigqueue.queue_sigchild which can control the behavior, setting it to zero disables the SIGCHLD queueing feature, the tunable will be removed if the function is proved that it is stable enough. Tested on: i386 (SMP and UP)
* Push down Giant into fdfree() and remove it from two of the callers.jhb2005-11-011-1/+1
| | | | | | | Other callers such as some rfork() cases weren't locking Giant anyway. Reviewed by: csjp MFC after: 1 week
* - Fix leak of struct nlminfo on process exit.glebius2005-10-261-0/+9
| | | | | | | - Fix malloc type collision, that made the above problem difficult to understand. Reported by: Vladimir Sharun <sharun ukr.net>
* Make p_itimers as a pointer, so file sys/proc.h does not need to includedavidxu2005-10-231-0/+1
| | | | sys/timers.h.
OpenPOWER on IntegriCloud