summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_proc.c
Commit message (Collapse)AuthorAgeFilesLines
* Use a loop instead of a goto in sysctl_kern_proc_kstack().markj2016-04-171-6/+3
| | | | MFC after: 3 days
* The struct thread td_estcpu member is only used by the 4BSD scheduler.kib2016-04-171-2/+2
| | | | | | | | | | | | | | | | | | Move it to the struct td_sched for 4BSD, removing always present field, otherwise unused for ULE. New scheduler method sched_estcpu() returns the estimation for kinfo_proc consumption. As before, it always returns 0 for ULE. Remove sched_tick() scheduler method, unused both by 4BSD and ULE. Update locking comment for the 4BSD struct td_sched, copying it from the same comment for ULE. Spell MAXPRI as PRI_MAX_TIMESHARE in the 4BSD comment. Based on some notes from, and reviewed by: bde Sponsored by: The FreeBSD Foundation
* kern: for pointers replace 0 with NULL.pfg2016-04-151-1/+1
| | | | | | These are mostly cosmetical, no functional change. Found with devel/coccinelle.
* Rename P_KTHREAD struct proc p_flag to P_KPROC.kib2016-02-091-2/+1
| | | | | | | | I left as is an apparent bug in ntoskrnl_var.h:AT_PASSIVE_LEVEL() definition. Suggested by: jhb Sponsored by: The FreeBSD Foundation
* Remove the assert which outlived its usefulness, and, by default,kib2016-02-081-12/+7
| | | | | | | | | | | | | | | | disable compilation of the code which made it possible to call stop_all_proc() from usermode at all. Move the comment to the preamble of stop_all_proc() and reword it to give overview of the function intent. proc0 has P_HADTHREADS flag set due to kthread_add(), but no P_KTHREAD, which triggered the assert, which does not serve a purpose now. Reported by: Oliver Pinter Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* session: avoid proctree lock on proc exit when possiblemjg2016-01-201-0/+73
| | | | | | We can get away with the common case with only proc lock held. Reviewed by: kib
* session: tidy up fixjobcmjg2016-01-201-13/+7
| | | | | | | This stops abusing the 'p' pointer for iteration over children processes and gets rid of useless locking around PRS_ZOMBIE check. Suggested by: kib
* proc: fix a race which could result in dereference of bad p_pgrp pointer on forkmjg2015-12-181-0/+1
| | | | | | | | | | | | | | During fork p_starcopy - p_endcopy area of a process is populated with bcopy with only proc lock held. Another forking thread can find such a process and proceed to access p_pgrp included in said area. Fix the problem by moving the field outside. It is being properly assigned later. Reviewed by: kib Diagnosed by: kib Tested by: Fabian Keil <freebsd-listen fabiankeil.de> MFC after: 10 days
* Fix style issues around existing SDT probes.markj2015-12-161-16/+13
| | | | | | | | | - Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect at the moment, but will be needed for some future changes. - Don't hardcode the module component of the probe identifier. This is set automatically by the SDT framework. MFC after: 1 week
* Add helper functions proc_readmem() and proc_writemem().markj2015-12-071-60/+27
| | | | | | | | | | | | | These helper functions can be used to read in or write a buffer from or to an arbitrary process' address space. Without them, this can only be done using proc_rwmem(), which requires the caller to fill out a uio. This is onerous and results in code duplication; the new functions provide a simpler interface which is sufficient for most existing callers of proc_rwmem(). This change also adds a manual page for proc_rwmem() and the new functions. Reviewed by: jhb, kib Differential Revision: https://reviews.freebsd.org/D4245
* Export various helper variables describing the layout and size ofjhb2015-11-121-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | certain kernel structures for use by debuggers. This mostly aids in examining cores from a kernel without debug symbols as a debugger can infer these values if debug symbols are available. One set of variables describes the layout of 'struct linker_file' to walk the list of loaded kernel modules. A second set of variables describes the layout of 'struct proc' and 'struct thread' to walk the list of processes in the kernel and the threads in each process. The 'pcb_size' variable is used to index into the stoppcbs[] array. The 'vm_maxuser_address' is used to distinguish kernel virtual addresses from user addresses. This doesn't have to be perfect, and 'vm_maxuser_address' is a cheap and simple way to differentiate kernel pointers from simple values like TIDs and PIDs. While here, annotate the fields in struct pcb used by kgdb on amd64 and i386 to note that their ABI should be preserved. Annotations for other platforms will be added in the future. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3773
* Fix core corruption caused by race in note_procstat_vmmapcem2015-10-061-5/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fix is spiritually similar to r287442 and was discovered thanks to the KASSERT added in that revision. NT_PROCSTAT_VMMAP output length, when packing kinfo structs, is tied to the length of filenames corresponding to vnodes in the process' vm map via vn_fullpath. As vnodes may move during coredump, this is racy. We do not remove the race, only prevent it from causing coredump corruption. - Add a sysctl, kern.coredump_pack_vmmapinfo, to allow users to disable kinfo packing for PROCSTAT_VMMAP notes. This avoids VMMAP corruption and truncation, even if names change, at the cost of up to PATH_MAX bytes per mapped object. The new sysctl is documented in core.5. - Fix note_procstat_vmmap to self-limit in the second pass. This addresses corruption, at the cost of sometimes producing a truncated result. - Fix PROCSTAT_VMMAP consumers libutil (and libprocstat, via copy-paste) to grok the new zero padding. Reported by: pho (https://people.freebsd.org/~pho/stress/log/datamove4-2.txt) Relnotes: yes Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3824
* save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBEavg2015-09-281-6/+6
| | | | | | | | | SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters where n is typically smaller than 5. Perhaps SDT_PROBE should be made a private implementation detail. MFC after: 20 days
* When a process group leader exits, all of the processes in the group arejhb2015-09-161-1/+1
| | | | | | | | | | | | | | sent SIGHUP and SIGCONT if any of the processes are stopped. Currently this behavior is triggered for any type of process stop including ptrace() stops and transient stops for single threading during exit() and execve(). Thus, if a debugger is attached to a process in a group when the leader exits, the entire group can be HUPed. Instead, only send the signals if a process in the group is stopped due to SIGSTOP. PR: 201149 Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D3681
* Add stack_save_td_running(), a function to trace the kernel stack of amarkj2015-09-111-4/+7
| | | | | | | | | | | | | | | | | running thread. It is currently implemented only on amd64 and i386; on these architectures, it is implemented by raising an NMI on the CPU on which the target thread is currently running. Unlike stack_save_td(), it may fail, for example if the thread is running in user mode. This change also modifies the kern.proc.kstack sysctl to use this function, so that stacks of running threads are shown in the output of "procstat -kk". This is handy for debugging threads that are stuck in a busy loop. Reviewed by: bdrewery, jhb, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3256
* The si_status field of the siginfo_t, provided by the waitid(2) andkib2015-07-181-3/+4
| | | | | | | | | | | | | | | | SIGCHLD signal, should keep full 32 bits of the status passed to the _exit(2). Split the combined p_xstat of the struct proc into the separate exit status p_xexit for normal process exit, and signalled termination information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs old p_xstat from p_xexit and p_xsig. p_xexit contains complete status and copied out into si_status. Requested by: Joerg Schilling Reviewed by: jilles (previous version), pho Tested by: pho Sponsored by: The FreeBSD Foundation
* Implement lockless resource limits.mjg2015-06-101-1/+1
| | | | | | | | | | Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.
* Provide vnode in memory map info for files on tmpfsvangyzen2015-06-021-2/+18
| | | | | | | | | | | | | | | | When providing memory map information to userland, populate the vnode pointer for tmpfs files. Set the memory mapping to appear as a vnode type, to match FreeBSD 9 behavior. This fixes the use of tmpfs files with the dtrace pid provider, procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY). Submitted by: Eric Badger <eric@badgerio.us> (initial revision) Obtained from: Dell Inc. PR: 198431 MFC after: 2 weeks Reviewed by: jhb Approved by: kib (mentor)
* Make setproctitle(3) work in Capsicum capability mode. This makestrasz2015-04-271-1/+1
| | | | | | | | | | | ctld(8) child processes to indicate initiator address and name in their titles, similar to what iscsid(8) child processes do. PR: 181352 Differential Revision: https://reviews.freebsd.org/D2363 Reviewed by: rwatson@, mjg@ MFC after: 1 month Sponsored by: The FreeBSD Foundation
* The sysctls that return process argv and envv return binary data, so clearian2015-03-221-0/+2
| | | | | | the SBUF_INCLUDENUL flag. Pointed out by: tijl@
* proc: use MTX_NEW flag in proc_initmjg2015-03-211-6/+5
| | | | | | | | | | This allows us to get rid of bzero which was added specifically to make mtx_init on p_mtx reliable. This also fixes a potential problem where mtx_init on other mutexes could trip over on unitialized memory and fire an assertion. Reviewed by: kib
* Set the SBUF_INCLUDENUL flag in sbuf_new_for_sysctl() so that sysctlian2015-03-141-0/+3
| | | | | | | | | | | | | | | strings returned to userland include the nulterm byte. Some uses of sbuf_new_for_sysctl() write binary data rather than strings; clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in those cases. (Note that the sbuf code still automatically adds a nulterm byte in sbuf_finish(), but since it's not included in the length it won't get copied to userland along with the binary data.) Remove explicit adding of a nulterm byte in a couple places now that it gets done automatically by the sbuf drain code. PR: 195668
* Fix gcc build.kib2014-12-141-1/+2
| | | | | Sponsored by: The FreeBSD Foundation MFC after: 13 days
* Add facility to stop all userspace processes. The supposed use of thekib2014-12-131-0/+138
| | | | | | | | | | | | | | | | | | | | | feature is to quisce the system before suspend. Stop is implemented by reusing the thread_single(9) with the special mode SINGLE_ALLPROC. SINGLE_ALLPROC differs from the existing single-threading modes by allowing (requiring) caller to operate on other process. Interruptible sleeps for !TDF_SBDRY threads are suspended like SIGSTOP does it, instead of aborting the sleep, like SINGLE_NO_EXIT, to avoid spurious EINTRs on resume. Provide debugging sysctl debug.stop_all_proc, which causes total stop and suspends syncer, while waiting for variable reset for resume. It is used for debugging; should be removed after the real use of the interface is added. In collaboration with: pho Discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Disable recursion for the process spinlock.kib2014-12-011-1/+1
| | | | | | | Tested by: pho Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 month
* The process spin lock currently has the following distinct uses:kib2014-11-261-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Threads lifetime cycle, in particular, counting of the threads in the process, and interlocking with process mutex and thread lock. The main reason of this is that turnstile locks are after thread locks, so you e.g. cannot unlock blockable mutex (think process mutex) while owning thread lock. - Virtual and profiling itimers, since the timers activation is done from the clock interrupt context. Replace the p_slock by p_itimmtx and PROC_ITIMLOCK(). - Profiling code (profil(2)), for similar reason. Replace the p_slock by p_profmtx and PROC_PROFLOCK(). - Resource usage accounting. Need for the spinlock there is subtle, my understanding is that spinlock blocks context switching for the current thread, which prevents td_runtime and similar fields from changing (updates are done at the mi_switch()). Replace the p_slock by p_statmtx and PROC_STATLOCK(). The split is done mostly for code clarity, and should not affect scalability. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Update the ULE scheduler + thread and kinfo structs to use int for cpuidadrian2014-10-181-0/+24
| | | | | | | | | | | rather than u_char. To try and play nice with the ABI, the u_char CPU ID values are clamped at 254. The new fields now contain the full CPU ID, or -1 for no cpu. Differential Revision: D955 Reviewed by: jhb, kib Sponsored by: Norse Corp, Inc.
* On error, sbuf_bcat() returns -1. Some callers returned this -1 tokib2014-10-051-8/+14
| | | | | | | | | | | | | the upper layers, which interpret it as errno value, which happens to be ERESTART. The result was spurious restarts of the sysctls in loop, e.g. kern.proc.proc, instead of returning ENOMEM to caller. Convert -1 from sbuf_bcat() to ENOMEM, when returning to the callers expecting errno. In collaboration with: pho Sponsored by: The FreeBSD Foundation (kib) MFC after: 1 week
* Plug a hypothetical use after free in sysctl kern.proc.groups.mjg2014-09-041-2/+2
| | | | MFC after: 1 week
* Fix dereference after NULL check.glebius2014-09-031-3/+4
| | | | | CID: 1234607 Sponsored by: Nginx, Inc.
* Return real parent pid in kinfo (used by e.g. ps)mjg2014-08-281-4/+13
| | | | | | | | | | | Add a separate field which exports tracer pid and add a new keyword ("tracer") for ps to display it. This is a follow up to r270444. Reviewed by: kib MFC after: 1 week Relnotes: yes
* Correct the problems with the ptrace(2) making the debuggee an orphan.kib2014-08-071-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | One problem is inferior(9) looping due to the process tree becoming a graph instead of tree if the parent is traced by child. Another issue is due to the use of p_oppid to restore the original parent/child relationship, because real parent could already exited and its pid reused (noted by mjg). Add the function proc_realparent(9), which calculates the parent for given process. It uses the flag P_TREE_FIRST_ORPHAN to detect the head element of the p_orphan list and than stepping back to its container to find the parent process. If the parent has already exited, the init(8) is returned. Move the P_ORPHAN and the new helper flag from the p_flag* to new p_treeflag field of struct proc, which is protected by proctree lock instead of proc lock, since the orphans relationship is managed under the proctree_lock already. The remaining uses of p_oppid in ptrace(PT_DETACH) and process reapping are replaced by proc_realparent(9). Phabric: D417 Reviewed by: jhb Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Simplify the expression, by removing redundand calculation.kib2014-07-291-1/+1
| | | | | Noted by: "O'Connor, Daniel" <Daniel.O'Connor@emc.com> MFC after: 3 days
* Followup to r268466.kib2014-07-151-28/+62
| | | | | | | | | | | | | | | | - Move the code to calculate resident count into separate function. It reduces the indent level and makes the operation of vmmap_skip_res_cnt tunable more clear. - Optimize the calculation of the resident page count for map entry. Skip directly to the next lowest available index and page among the whole shadow chain. - Restore the use of pmap_incore(9), only to verify that current mapping is indeed superpage. - Note the issue with the invalid pages. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Change the calculation of the kinfo_vmentry field kve_private_residentkib2014-07-151-1/+1
| | | | | | | | to reflect its name. Noted and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Unconditionally initialize addr to handle the case of changed mapkib2014-07-101-0/+1
| | | | | | | | timestamp while the map is unlocked. Reported by: bz Sponsored by: The FreeBSD Foundation MFC after: 6 days
* Current code in sysctl proc.vmmap, which intent is to calculate thekib2014-07-091-34/+51
| | | | | | | | | | | | | | | | | | amount of resident pages, in fact calculates the amount of installed pte entries in the region. Resident pages which were not soft-faulted yet are not counted. Calculate the amount of resident pages by looking in the objects chain backing the region. Add a knob to disable the residency calculation at all. For large sparce regions, either previous or updated algorithm runs for too long time, while several introspection tools do not need the (advisory) RSS value at all. PR: kern/188911 Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Expose OBJT_MGTDEVICE VM objects used for GEM/TTM with drm2 as anjhb2014-02-111-0/+3
| | | | | | | explicit object type. Reviewed by: kib MFC after: 1 week
* Add an kinfo sysctl to retrieve signal trampoline location for thekib2013-11-261-0/+58
| | | | | | | | | | | given process. Note that the correctness of the trampoline length returned for ABIs which do not use shared page depends on the correctness of the struct sysvec sv_szsigcodebase member, which will be fixed on as-need basis. Sponsored by: The FreeBSD Foundation MFC after: 1 week
* dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINEavg2013-11-261-6/+6
| | | | | | | | In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks
* - For kernel compiled only with KDTRACE_HOOKS and not any lock debuggingattilio2013-11-251-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip
* Extend the support for exempting processes from being killed when swap isjhb2013-09-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month
* Add the ability to display the default FIB number for a process to thewill2013-08-261-0/+2
| | | | | | | | | | | | | | | | | | ps(1) utility, e.g. "ps -O fib". bin/ps/keyword.c: Add the "fib" keyword and default its column name to "FIB". bin/ps/ps.1: Add "fib" as a supported keyword. sys/compat/freebsd32/freebsd32.h: sys/kern/kern_proc.c: sys/sys/user.h: Add the default fib number for a process (p->p_fibnum) to the user land accessible process data of struct kinfo_proc. Submitted by: Oliver Fromme <olli@fromme.com>, gibbs
* Specify SDT probe argument types in the probe definition itself rather thanmarkj2013-08-151-27/+12
| | | | | | | | | using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API to allow probes with dynamically-translated types. There is no functional change. MFC after: 2 weeks
* Similarly to proc_getargv() and proc_getenvv(), export proc_getauxv()trociny2013-04-141-18/+29
| | | | | | to be able to reuse the code. MFC after: 3 weeks
* Re-factor the code to provide kern_proc_filedesc_out(), kern_proc_out(),trociny2013-04-141-52/+86
| | | | | | | | | | | and kern_proc_vmmap_out() functions to output process kinfo structures to sbuf, to make the code reusable. The functions are going to be used in the coredump routine to store procstat info in the core program header notes. Reviewed by: kib MFC after: 3 weeks
* Switch some "low-hanging fruit" to acquire read lock on vmobjectsattilio2013-04-081-10/+10
| | | | | | | | rather than write locks. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho
* Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() toattilio2013-02-201-10/+10
| | | | | | their "write" versions. Sponsored by: EMC / Isilon storage division
* Switch vm_object lock to be a rwlock.attilio2013-02-201-0/+1
| | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* Look for zombie process only if we were given process id.pjd2012-11-251-5/+6
| | | | | | Reviewed by: kib MFC after: 2 weeks X-MFC-after-or-with: 243142
OpenPOWER on IntegriCloud