summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_proc.c
Commit message (Collapse)AuthorAgeFilesLines
* If the process id specified is invalid, the system call returns ESRCHkevlo2008-09-041-2/+2
|
* Integrate the new MPSAFE TTY layer to the FreeBSD operating system.ed2008-08-201-28/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following: - Improved driver model: The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers. If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver. - Improved hotplugging: With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc). The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly. - Improved performance: One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters. Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING. Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan
* Call pargs_drop() unconditionally in do_execve(), the function correctlykib2008-07-251-1/+2
| | | | | | | handles the NULL argument. Make pargs_free() static. MFC after: 1 week
* Add DTrace 'proc' provider probes using the Statically Defined Tracejb2008-05-241-0/+37
| | | | (sdt) mechanism.
* - Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice fromjeff2008-03-191-13/+5
| | | | | | | requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.
* Remove kernel support for M:N threading.jeff2008-03-121-2/+2
| | | | | | | | While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
* Don't zero td_runtime when billing thread CPU usage to the process;rwatson2008-01-101-7/+13
| | | | | | | | | | | | | | | | | | | | | maintain a separate td_incruntime to hold unbilled CPU usage for the thread that has the previous properties of td_runtime. When thread information is requested using the thread monitoring sysctls, export thread td_runtime instead of process rusage runtime in kinfo_proc. This restores the display of individual ithread and other kernel thread CPU usage since inception in ps -H and top -SH, as well for libthr user threads, valuable debugging information lost with the move to try kthreads since they are no longer independent processes. There is universal agreement that we should rewrite the process and thread export sysctls, but this commit gets things going a bit better in the mean time. Likewise, there are resevations about the continued validity of statclock given the speed of modern processors. Reviewed by: attilio, emaste, jhb, julian
* vn_lock() is currently only used with the 'curthread' passed as argument.attilio2008-01-101-2/+1
| | | | | | | | | | | | | | | | Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
* Return ESRCH when a kernel stack is queried on a process in execve() --rwatson2007-12-271-1/+2
| | | | | | | | | | | | | p_candebug() will return EAGAIN which, if the other process never leaves execve(), will result in the sysctl spinning and never returning to userspace. Processes should always eventually leave execve(), but spinning in kernel while we wait is bad for countless reasons, and particularly harmful if execve() itself is deadlocked. Possibly we should return another error, or return a marker indicating the thread is in execve() so it can be reported that way in userspace. Reported by: kris
* Check for P_WEXIT before PHOLD() on a process in kstack and vm queryrwatson2007-12-091-0/+8
| | | | | | sysctls, as PHOLD() asserts !P_WEXIT. Reported by: Michael Plass <mfp49_freebsd at plass-family dot net>
* Add another new sysctl in support of the forthcoming procstat(1) torwatson2007-12-021-0/+106
| | | | | | | | | | | | | | support its -k argument: kern.proc.kstack - dump the kernel stack of a process, if debugging is permitted. This sysctl is present if either "options DDB" or "options STACK" is compiled into the kernel. Having support for tracing the kernel stacks of processes from user space makes it much easier to debug (or understand) specific wmesg's while avoiding the need to enter DDB in order to determine the path by which a process came to be blocked on a particular wait channel or lock.
* Add two new sysctls in support of the forthcoming procstat(1) to supportrwatson2007-12-021-1/+159
| | | | | | | | | | | | | | | | its -f and -v arguments: kern.proc.filedesc - dump file descriptor information for a process, if debugging is permitted, including socket addresses, open flags, file offsets, file paths, etc. kern.proc.vmmap - dump virtual memory mapping information for a process, if debugging is permitted, including layout and information on underlying objects, such as the type of object and path. These provide a superset of the information historically available through the now-deprecated procfs(4), and are intended to be exported in an ABI-robust form.
* Test that p_textvp is non-NULL be dereferencing, as no executable vnode isrwatson2007-11-201-0/+5
| | | | | | | set for kernel processes. Reported by: Skip Ford <skip at menantico dot com> MFC after: 3 days
* Adds an event handler for:rrs2007-11-151-2/+6
| | | | | | | | | - process_ctor,dtor, init and fini - thread_ctor,dtor, init and fini This allows the ability to add on additional things during construction/destruction of threads and processes. Reviewed by: rwatson
* Fix for the panic("vm_thread_new: kstack allocation failed") andkib2007-11-051-15/+13
| | | | | | | | | | | | | | | | | | | | silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb
* - Redefine p_swtime and td_slptime as p_swtick and td_slptick. Thisjeff2007-09-211-2/+3
| | | | | | | | | | | | changes the units from seconds to the value of 'ticks' when swapped in/out. ULE does not have a periodic timer that scans all threads in the system and as such maintaining a per-second counter is difficult. - Change computations requiring the unit in seconds to subtract ticks and divide by hz. This does make the wraparound condition hz times more frequent but this is still in the range of several months to years and the adverse effects are minimal. Approved by: re
* - Move all of the PS_ flags into either p_flag or td_flags.jeff2007-09-171-2/+5
| | | | | | | | | | | | | | - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)
* rufetch and calcru sometimes should be called atomically together.attilio2007-06-091-0/+2
| | | | | | | | | | This patch fixes places where they should be called atomically changing their locking requirements (both assume per-proc spinlock held) and introducing rufetchcalc which wrappers both calls to be performed in atomic way. Reviewed by: jeff Approved by: jeff (mentor)
* Commit 14/14 of sched_lock decomposition.jeff2007-06-051-12/+16
| | | | | | | | | | | - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
* - Move rusage from being per-process in struct pstats to per-thread injeff2007-06-011-1/+1
| | | | | | | | | | | | | | | | | | | td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
* Stop setting ki_ocomm (thread name) to the proc name by default, as nothingemaste2007-03-231-8/+1
| | | | in the base system relies on this any longer.
* Threading cleanup.. part 2 of several.julian2006-12-061-38/+1
| | | | | | | | | | | | | | | | | | | | | | Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.
* Make KSE a kernel option, turned on by default in all GENERICjb2006-10-261-2/+25
| | | | | | | kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
* Remove duplicated $FreeBSD$.pjd2006-09-301-1/+0
|
* Move Giant up even further since P_CONTROLT isn't really fully lockedmbr2006-09-271-1/+1
| | | | | | yet (p_flag is, but P_CONTROLT isn't really). Submitted by: jhb
* Protect enterpgrp() against another tty/proc race case until the tty locking ↵mbr2006-09-231-0/+3
| | | | | | | | work has been fixed. MFC after: 1 week
* Fix races between tty.c and sessrele() / doenterpgrp() / leavepgrp(). The ttymbr2006-09-191-0/+6
| | | | | | | | | | | code is still under giant lock, but the session/pgrp release code just used proctree_locks. This explains why moving the proctree_lock in sys/kern/tty.c rev. 1.258 did fix the panics in our SMP systems. This should also fix some race panics with revoked ttys. Reviewed by: jhb MFC after: 1 week
* CPU time accounting speedup (step 2)phk2006-02-111-1/+1
| | | | | | | | | | | | | | | | | | | Keep accounting time (in per-cpu) cputicks and the statistics counts in the thread and summarize into struct proc when at context switch. Don't reach across CPUs in calcru(). Add code to calibrate the top speed of cpu_tickrate() for variable cpu_tick hardware (like TSC on power managed machines). Don't enforce monotonicity (at least for now) in calcru. While the calibrated cpu_tickrate ramps up it may not be true. Use 27MHz counter on i386/Geode. Use TSC on amd64 & i386 if present. Use tick counter on sparc64
* Modify the way we account for CPU time spent (step 1)phk2006-02-071-3/+1
| | | | | | | | | | | | | | | | Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.
* Return the thread name in the kinfo_proc structure.julian2006-01-181-0/+7
| | | | Also correct the comment describing what the value is.
* Since p_cansee will end up dereferencing p_ucred, don't check for p_ucredjmallett2006-01-171-9/+7
| | | | | | | | | | | equal to NULL several times later. p_ucred "should probably not" be NULL if the process isn't PRS_NEW anyway. This is strongly reinforced by the fact that we don't see frequent crashes here. Remove the checks after p_cansee and add a KASSERT right before it. Found by: Coverity Prevent (tm) Also trim one nearby trailing space.
* Add code to report zombie state.davidxu2005-12-291-0/+2
| | | | | PR: threads/91044 MFC after: 3 days
* Moderate rewrite of kernel ktrace code to attempt to generally improverwatson2005-11-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>
* Add support for queueing SIGCHLD same as other UNIX systems did.davidxu2005-11-081-0/+4
| | | | | | | | | | | | | | | | | | | | For each child process whose status has been changed, a SIGCHLD instance is queued, if the signal is stilling pending, and process changed status several times, signal information is updated to reflect latest process status. If wait() returns because the status of a child process is available, pending SIGCHLD signal associated with the child process is discarded. Any other pending SIGCHLD signals remain pending. The signal information is allocated at the same time when proc structure is allocated, if process signal queue is fully filled or there is a memory shortage, it can still send the signal to process. There is a booting time tunable kern.sigqueue.queue_sigchild which can control the behavior, setting it to zero disables the SIGCHLD queueing feature, the tunable will be removed if the function is proved that it is stable enough. Tested on: i386 (SMP and UP)
* Document in #ifdef notnow code the actions that proc_fini would need tojhb2005-10-241-0/+9
| | | | take if struct procs were actually freed.
* Always wire the sysctl output buffer in sysctl_kern_proc() beforetruckman2005-10-021-95/+113
| | | | | | | | | | | | | | | | | | calling sysctl_out_proc(). -- fix from jhb Move the code in fill_kinfo_thread() that gathers data from struct proc into the new function fill_kinfo_proc_only(). Change all callers of fill_kinfo_thread() to call both fill_kinfo_proc_only() and fill_kinfo() thread. When gathering data from a multi-threaded process, fill_kinfo_proc_only() only needs to be called once. Grab sched_lock before accessing the process thread list or calling fill_kinfo_thread(). PR: kern/84684 MFC after: 3 days
* Use the refcount API to implement reference counts on process argumentjhb2005-09-271-11/+4
| | | | | | | structures rather than using a global mutex to protect the reference counts. Tested on: i386, alpha, sparc64
* Add a sysctl that returns the full path of a process' text file.das2005-04-181-0/+45
| | | | | This information is needed by things like `gdb -p' and Sun's javac, and previously it could only be obtained via procfs
* Divorce critical sections from spinlocks. Critical sections as denoted byjhb2005-04-041-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | critical_enter() and critical_exit() are now solely a mechanism for deferring kernel preemptions. They no longer have any affect on interrupts. This means that standalone critical sections are now very cheap as they are simply unlocked integer increments and decrements for the common case. Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter() and spinlock_exit(). This KPI is responsible for providing whatever MD guarantees are needed to ensure that a thread holding a spin lock won't be preempted by any other code that will try to lock the same lock. For now all archs continue to block interrupts in a "spinlock section" as they did formerly in all critical sections. Note that I've also taken this opportunity to push a few things into MD code rather than MI. For example, critical_fork_exit() no longer exists. Instead, MD code ensures that new threads have the correct state when they are created. Also, we no longer try to fixup the idlethreads for APs in MI code. Instead, each arch sets the initial curthread and adjusts the state of the idle thread it borrows in order to perform the initial context switch. This change is largely a big NOP, but the cleaner separation it provides will allow for more efficient alternative locking schemes in other parts of the kernel (bare critical sections rather than per-CPU spin mutexes for per-CPU data for example). Reviewed by: grehan, cognet, arch@, others Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
* Add ki_jid field to the kinfo_proc structure and store jail ID there.pjd2005-03-201-1/+5
| | | | | Reviewed by: gad MFC after: 3 days
* In stange circumstances we may end up being the last reference to aphk2005-03-171-10/+17
| | | | | | | | | | | session in tprintf(). SESSRELE() needs to properly dispose of the sessions mutex. Add sessrele() which does the proper cleanup and have SESSRELE() call it. Use SESSRELE also in pgdelete(). Found by: Coverity (ID:526)
* Function jailed() looks into ucred strcture, so be sure ucred is not NULL.pjd2005-03-121-4/+4
| | | | | Reviewed by: rwatson MFC after: 1 week
* Clean up a bit.pjd2005-03-121-11/+12
| | | | | Reviewed by: rwatson MFC after: 1 week
* Make a bunch of SYSCTL_NODEs static.phk2005-02-101-22/+23
|
* /* -> /*- for copyright notices, minor format tweaks as necessaryimp2005-01-061-1/+1
|
* Axe a.out core dump support. Neither older gdb binaries nor currentdas2004-11-271-13/+0
| | | | bfd sources understand the present format.
* Remove local definitions of RANGEOF() and use __rangeof() instead.das2004-11-201-5/+2
| | | | Also remove a few bogus casts.
* Malloc p_stats instead of putting it in the U area. We should considerdas2004-11-201-5/+44
| | | | | | simply embedding it in struct proc. Reviewed by: arch@
* Remove duplicate line.julian2004-10-101-1/+0
|
* Rework how we store process times in the kernel such that we always storejhb2004-10-051-13/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month
OpenPOWER on IntegriCloud