summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_exit.c
Commit message (Collapse)AuthorAgeFilesLines
* To facillitate an upcoming Linuxulator merging partiallydchagin2016-01-091-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | MFC r275121 (by kib). Only merge the syntax changes from r275121, PROC_*LOCK() macros still lock the same proc spinlock. The process spin lock currently has the following distinct uses: - Threads lifetime cycle, in particular, counting of the threads in the process, and interlocking with process mutex and thread lock. The main reason of this is that turnstile locks are after thread locks, so you e.g. cannot unlock blockable mutex (think process mutex) while owning thread lock. - Virtual and profiling itimers, since the timers activation is done from the clock interrupt context. Replace the p_slock by p_itimmtx and PROC_ITIMLOCK(). - Profiling code (profil(2)), for similar reason. Replace the p_slock by p_profmtx and PROC_PROFLOCK(). - Resource usage accounting. Need for the spinlock there is subtle, my understanding is that spinlock blocks context switching for the current thread, which prevents td_runtime and similar fields from changing (updates are done at the mi_switch()). Replace the p_slock by p_statmtx and PROC_STATLOCK(). Discussed with: kib
* MFC r288336: save some bytes by using more concise SDT_PROBE<n>avg2015-10-231-1/+1
|
* MFC r289026:kib2015-10-161-3/+1
| | | | | | Enforce the maxproc limitation before allocating struct proc. In collaboration with: pho
* MFC 283281,283282,283562,283647,283836,284000,286158:jhb2015-09-091-14/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Various fixes to orphan handling which also fix issues with following forks. 283281: Always set p_oppid when attaching to an existing process via procfs tracing. This matches the behavior of ptrace(PT_ATTACH). Also, the procfs detach request assumes p_oppid is always set. 283282: Only reparent a traced process to its old parent if the tracing process is not the old parent. Otherwise, proc_reap() will leave the zombie in place resulting in the process' status being returned twice to its parent. Add test cases for PT_TRACE_ME and PT_ATTACH which are fixed by this change. 283562: Do not allow a process to reap an orphan (a child currently being traced by another process such as a debugger). The parent process does need to check for matching orphan pids to avoid returning ECHILD if an orphan has exited, but it should not return the exited status for the child until after the debugger has detached from the orphan process either explicitly or implicitly via wait(). Add two tests for for this case: one where the debugger is the direct child (thus the parent has a non-empty children list) and one where the debugger is not a direct child (so the only "child" of the parent is the orphan). 283647: Tweak the description of when waitpid() doesn't return any status for a non-blocking wait to avoid the word "empty". 283836: Consistently only use one end of the pipe in the parent and debugger processes and do not rely on EOF due to a close() in the debugger. 284000: Add a CHILD_REQUIRE macro similar to ATF_REQUIRE for use in child processes of the main test process. 286158: Clear P_TRACED before reparenting a detached process back to its original parent. Otherwise the debugee will be set as an orphan of the debugger. Add tests for tracing forks via PT_FOLLOW_FORK.
* MFC r282213:trasz2015-06-211-3/+5
| | | | | | | | | | | | | | | | | | Add kern.racct.enable tunable and RACCT_DISABLED config option. The point of this is to be able to add RACCT (with RACCT_DISABLED) to GENERIC, to avoid having to rebuild the kernel to use rctl(8). MFC r282901: Build GENERIC with RACCT/RCTL support by default. Note that it still needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf. Note those two are MFC-ed together, because the latter one changes the name of RACCT_DISABLED option to RACCT_DEFAULT_TO_DISABLED. Should have committed the renaming separately... Relnotes: yes Sponsored by: The FreeBSD Foundation
* MFC 283546:jhb2015-06-131-0/+9
| | | | Add KTR tracing for some MI ptrace events.
* MFC r283600:kib2015-06-101-0/+6
| | | | | | | | Perform SU cleanup in the AST handler. Do not sleep waiting for SU cleanup while owning vnode lock. On MFC, for KBI stability, td_su member was moved to the end of the struct thread.
* MFC r283735:kib2015-06-051-2/+0
| | | | Remove several write-only variables.
* MFC r279390:kib2015-03-211-0/+2
| | | | Change umtx_lock to be the sleepable mutex.
* Merge r263233 from HEAD to stable/10:rwatson2015-03-191-1/+1
| | | | | | | | | Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. Sponsored by: Google, Inc.
* Merge reaper facility.kib2015-01-051-15/+53
| | | | | | | | | | | | | | | | | | | | | MFC r270443 (by mjg): Properly reparent traced processes when the tracer dies. MFC r273452 (by mjg): Plug unnecessary PRS_NEW check in kern_procctl. MFC 275800: Add a facility for non-init process to declare itself the reaper of the orphaned descendants. MFC r275821: Add missed break. MFC r275846 (by mckusick): Add some additional clarification and fix a few gammer nits. MFC r275847 (by bdrewery): Bump Dd for r275846.
* MFC r275745:kib2014-12-271-1/+1
| | | | | | | | | | Add facility to stop all userspace processes. MFC r275753: Fix gcc build. MFC r275820: Add missed break.
* MFC r275615:kib2014-12-151-9/+16
| | | | | When process is exiting, check for suspension regardless of multithreaded status of the process.
* MFC r270993 (by mjg):kib2014-09-111-2/+6
| | | | | | Fix up proc_realparent to always return correct process. Approved by: re (delphij)
* MFC r269656:kib2014-08-211-8/+45
| | | | | | | | | | Implement and use proc_realparent(9). MFC r270024 (by markj): Correct the order of arguments passed to LIST_INSERT_AFTER(). For merge, the p_treeflag member of struct proc was moved to the end of the structure, to keep KBI intact.
* MFC r268514:mjg2014-08-171-6/+3
| | | | | | Eliminate plim and vtmp local vars in exit1. No functional changes.
* MFC r259407:mjg2014-08-171-2/+0
| | | | | | | proc exit: don't take PROC_LOCK while freeing rlimits Code wishing to check rlimits of some process should check whether it is exiting first, which current consumers do.
* MFC r258622: dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINEavg2014-01-171-1/+1
|
* MFC r258281: Fix siginfo_t.si_status for wait6/waitid/SIGCHLD.jilles2014-01-011-4/+7
| | | | | | | | | | | | | Per POSIX, si_status should contain the value passed to exit() for si_code==CLD_EXITED and the signal number for other si_code. This was incorrect for CLD_EXITED and CLD_DUMPED. This is still not fully POSIX-compliant (Austin group issue #594 says that the full value passed to exit() shall be returned via si_status, not just the low 8 bits) but is sufficient for a si_status-related test in libnih (upstart, Debian/kFreeBSD). PR: kern/184002
* Specify SDT probe argument types in the probe definition itself rather thanmarkj2013-08-151-2/+1
| | | | | | | | | using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API to allow probes with dynamically-translated types. There is no functional change. MFC after: 2 weeks
* Remove cr_prison NULL check from proc_to_reap.mjg2013-07-221-2/+1
| | | | | | Userspace processes always have a prison. MFC after: 2 weeks
* Merge Capsicum overhaul:pjd2013-03-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
* When vforked child is traced, the debugging events are not generatedkib2013-02-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | until child performs exec(). The behaviour is reasonable when a debugger is the real parent, because the parent is stopped until exec(), and sending a debugging event to the debugger would deadlock both parent and child. On the other hand, when debugger is not the parent of the vforked child, not sending debugging signals makes it impossible to debug across vfork. Fix the issue by declining generating debug signals only when vfork() was done and child called ptrace(PT_TRACEME). Set a new process flag P_PPTRACE from the attach code for PT_TRACEME, if P_PPWAIT flag is set, which indicates that the process was created with vfork() and still did not execed. Check P_PPTRACE from issignal(), instead of refusing the trace outright for the P_PPWAIT case. The scope of P_PPTRACE is exactly contained in the scope of P_PPWAIT. Found and tested by: zont Reviewed by: pluknet MFC after: 2 weeks
* The case of pid == WAIT_MYPGRP for the kern_wait() is already handledkib2013-01-301-7/+7
| | | | | | | | | | | | in kern_wait6(), which is called by kern_wait(). Remove the redundand check, introduced in r243136, and add a comment noting this, to make the code less confusing. The blank lines are added to properly delineate the scope of the preceeding comments. Noted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 week
* Protect the p->p_pgrp dereference with the process lock.kib2013-01-061-0/+2
| | | | MFC after: 3 days
* Restore the proper handling of the pid 0 for waitpid(2).kib2012-11-161-4/+9
| | | | | | | Fix the style around. Reported and reviewed by: bde (previous version) MFC after: 28 days
* Style fixes for r242958.kib2012-11-161-8/+6
| | | | | Reported and reviewed by: bde MFC after: 28 days
* Add the wait6(2) system call. It takes POSIX waitid()-like processkib2012-11-131-37/+268
| | | | | | | | | | | | | | | | | | | | | designator to select a process which is waited for. The system call optionally returns siginfo_t which would be otherwise provided to SIGCHLD handler, as well as extended structure accounting for child and cumulative grandchild resource usage. Allow to get the current rusage information for non-exited processes as well, similar to Solaris. The explicit WEXITED flag is required to wait for exited processes, allowing for more fine-grained control of the events the waiter is interested in. Fix the handling of siginfo for WNOWAIT option for all wait*(2) family, by not removing the queued signal state. PR: standards/170346 Submitted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 month
* Remove the support for using non-mpsafe filesystem modules.kib2012-10-221-3/+0
| | | | | | | | | | | | In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
* Ignore stop and continue signals sent to an exiting process. Stop signalsjhb2012-09-131-0/+8
| | | | | | | | | | | set p_xstat to the signal that triggered the stop, but p_xstat is also used to hold the exit status of an exiting process. Without this change, a stop signal that arrived after a process was marked P_WEXIT but before it was marked a zombie would overwrite the exit status with the stop signal number. Reviewed by: kib MFC after: 1 week
* A few whitespace and comment fixes.jhb2012-09-071-3/+3
|
* When process exists, not only the children shall be reparented tokib2012-04-021-0/+16
| | | | | | | | init, but also the orphans shall be removed from the orphan list, because the list header is destroyed. Reported and tested by: pho MFC after: 3 days
* Add helper function to remove the process from the orphans list andkib2012-04-021-8/+14
| | | | | | | use it instead of inlined code. Tested by: pho MFC after: 3 days
* Add an assert for proctree_lock to proc_to_reap().jh2012-03-141-0/+2
| | | | | Discussed with: kib MFC after: 1 week
* Lock the process around manipulations with p_flag.kib2012-03-131-0/+2
| | | | | Reported and reviewed by: jh MFC after: 3 days
* Restore the return statement erronously removed in the r232048.kib2012-02-241-0/+1
| | | | | | Submitted by: cognet Pointy hat to: kib (reuse the one I already got today) MFC after: 13 days
* Allow the parent to gather the exit status of the children reparentedkib2012-02-231-35/+86
| | | | | | | | | | to the debugger. When reparenting for debugging, keep the child in the new orphan list of old parent. When looping over the children in kern_wait(), iterate over both children list and orphan list to search for the process by pid. Submitted by: Dmitry Mikulin <dmitrym juniper.net> MFC after: 2 weeks
* Fix long-standing thinko regarding maxproc accounting. Basically,trasz2011-09-171-16/+4
| | | | | | | | | | we were accounting the newly created process to its parent instead of the child itself. This caused problems later, when the child changed its credentials - the per-uid, per-jail etc counters were not properly updated, because the maxproc counter in the child process was 0. Approved by: re (kib)
* In order to maximize the re-usability of kernel code in user space thiskmacy2011-09-161-7/+7
| | | | | | | | | | | | | patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
* Add experimental support for process descriptorsjonathan2011-08-181-30/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | A "process descriptor" file descriptor is used to manage processes without using the PID namespace. This is required for Capsicum's Capability Mode, where the PID namespace is unavailable. New system calls pdfork(2) and pdkill(2) offer the functional equivalents of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote process for debugging purposes. The currently-unimplemented pdwait(2) will, in the future, allow querying rusage/exit status. In the interim, poll(2) may be used to check (and wait for) process termination. When a process is referenced by a process descriptor, it does not issue SIGCHLD to the parent, making it suitable for use in libraries---a common scenario when using library compartmentalisation from within large applications (such as web browsers). Some observers may note a similarity to Mach task ports; process descriptors provide a subset of this behaviour, but in a UNIX style. This feature is enabled by "options PROCDESC", but as with several other Capsicum kernel features, is not enabled by default in GENERIC 9.0. Reviewed by: jhb, kib Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc
* All the racct_*() calls need to happen with the proc locked. Fixing thistrasz2011-07-061-0/+6
| | | | | | won't happen before 9.0. This commit adds "#ifdef RACCT" around all the "PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order to avoid useless locking/unlocking in kernels built without "options RACCT".
* We should not return ECHILD when debugging a child and the parent does aobrien2011-06-141-3/+8
| | | | | | | "wait4(-1, ..., WNOHANG, ...)". Instead wait(2) should behave as if the child does not wish to report status at this time. Reviewed by: jhb
* Remove stale M_ZOMBIE malloc type.pluknet2011-04-141-3/+0
| | | | | | This type is unused since embedding p_ru into struct proc. MFC after: 1 week
* Some callers of proc_reparent() already have the parent process locked.kib2011-04-101-2/+6
| | | | | | Detect the situation and avoid process lock recursion. Reported by: Fabian Keil <freebsd-listen fabiankeil de>
* Enable accounting for RACCT_NPROC and RACCT_NTHR.trasz2011-03-311-0/+8
| | | | | Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
* Add racct. It's an API to keep per-process, per-jail, per-loginclasstrasz2011-03-291-0/+6
| | | | | | | | | and per-loginclass resource accounting information, to be used by the new resource limits code. It's connected to the build, but the code that actually calls the new functions will come later. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
* By using the 32-bit Linux version of Sun's Java Development Kit 1.6netchild2010-11-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | on FreeBSD (amd64), invocations of "javac" (or "java") eventually end with the output of "Killed" and exit code 137. This is caused by: 1. After calling exec() in multithreaded linux program threads are not destroyed and continue running. They get killed after program being executed finishes. 2. linux_exit_group doesn't return correct exit code when called not from group leader. Which happens regularly using sun jvm. The submitters fix this in a similar way to how NetBSD handles this. I took the PRs away from dchagin, who seems to be out of touch of this since a while (no response from him). The patches committed here are from [2], with some little modifications from me to the style. PR: 141439 [1], 144194 [2] Submitted by: Stefan Schmidt <stefan.schmidt@stadtbuch.de>, gk Reviewed by: rdivacky (in april 2010) MFC after: 5 days
* - When disabling ktracing on a process, free any pending requests thatjhb2010-10-211-31/+1
| | | | | | | | | | | | | | | may be left. This fixes a memory leak that can occur when tracing is disabled on a process via disabling tracing of a specific file (or if an I/O error occurs with the tracefile) if the process's next system call is exit(). The trace disabling code clears p_traceflag, so exit1() doesn't do any KTRACE-related cleanup leading to the leak. I chose to make the free'ing of pending records synchronous rather than patching exit1(). - Move KTRACE-specific logic out of kern_(exec|exit|fork).c and into kern_ktrace.c instead. Make ktrace_mtx private to kern_ktrace.c as a result. MFC after: 1 month
* Create a global thread hash table to speed up thread lookup, usedavidxu2010-10-091-0/+2
| | | | | | | | | | rwlock to protect the table. In old code, thread lookup is done with process lock held, to find a thread, kernel has to iterate through process and thread list, this is quite inefficient. With this change, test shows in extreme case performance is dramatically improved. Earlier patch was reviewed by: jhb, julian
* Add an extra comment to the SDT probes definition. This allows us to getrpaulo2010-08-221-1/+1
| | | | | | | | | use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]
OpenPOWER on IntegriCloud