summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_proc.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Commit 14/14 of sched_lock decomposition.jeff2007-06-051-12/+16
| | | | | | | | | | | - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
* - Move rusage from being per-process in struct pstats to per-thread injeff2007-06-011-1/+1
| | | | | | | | | | | | | | | | | | | td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
* Stop setting ki_ocomm (thread name) to the proc name by default, as nothingemaste2007-03-231-8/+1
| | | | in the base system relies on this any longer.
* Threading cleanup.. part 2 of several.julian2006-12-061-38/+1
| | | | | | | | | | | | | | | | | | | | | | Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.
* Make KSE a kernel option, turned on by default in all GENERICjb2006-10-261-2/+25
| | | | | | | kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
* Remove duplicated $FreeBSD$.pjd2006-09-301-1/+0
|
* Move Giant up even further since P_CONTROLT isn't really fully lockedmbr2006-09-271-1/+1
| | | | | | yet (p_flag is, but P_CONTROLT isn't really). Submitted by: jhb
* Protect enterpgrp() against another tty/proc race case until the tty locking ↵mbr2006-09-231-0/+3
| | | | | | | | work has been fixed. MFC after: 1 week
* Fix races between tty.c and sessrele() / doenterpgrp() / leavepgrp(). The ttymbr2006-09-191-0/+6
| | | | | | | | | | | code is still under giant lock, but the session/pgrp release code just used proctree_locks. This explains why moving the proctree_lock in sys/kern/tty.c rev. 1.258 did fix the panics in our SMP systems. This should also fix some race panics with revoked ttys. Reviewed by: jhb MFC after: 1 week
* CPU time accounting speedup (step 2)phk2006-02-111-1/+1
| | | | | | | | | | | | | | | | | | | Keep accounting time (in per-cpu) cputicks and the statistics counts in the thread and summarize into struct proc when at context switch. Don't reach across CPUs in calcru(). Add code to calibrate the top speed of cpu_tickrate() for variable cpu_tick hardware (like TSC on power managed machines). Don't enforce monotonicity (at least for now) in calcru. While the calibrated cpu_tickrate ramps up it may not be true. Use 27MHz counter on i386/Geode. Use TSC on amd64 & i386 if present. Use tick counter on sparc64
* Modify the way we account for CPU time spent (step 1)phk2006-02-071-3/+1
| | | | | | | | | | | | | | | | Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.
* Return the thread name in the kinfo_proc structure.julian2006-01-181-0/+7
| | | | Also correct the comment describing what the value is.
* Since p_cansee will end up dereferencing p_ucred, don't check for p_ucredjmallett2006-01-171-9/+7
| | | | | | | | | | | equal to NULL several times later. p_ucred "should probably not" be NULL if the process isn't PRS_NEW anyway. This is strongly reinforced by the fact that we don't see frequent crashes here. Remove the checks after p_cansee and add a KASSERT right before it. Found by: Coverity Prevent (tm) Also trim one nearby trailing space.
* Add code to report zombie state.davidxu2005-12-291-0/+2
| | | | | PR: threads/91044 MFC after: 3 days
* Moderate rewrite of kernel ktrace code to attempt to generally improverwatson2005-11-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>
* Add support for queueing SIGCHLD same as other UNIX systems did.davidxu2005-11-081-0/+4
| | | | | | | | | | | | | | | | | | | | For each child process whose status has been changed, a SIGCHLD instance is queued, if the signal is stilling pending, and process changed status several times, signal information is updated to reflect latest process status. If wait() returns because the status of a child process is available, pending SIGCHLD signal associated with the child process is discarded. Any other pending SIGCHLD signals remain pending. The signal information is allocated at the same time when proc structure is allocated, if process signal queue is fully filled or there is a memory shortage, it can still send the signal to process. There is a booting time tunable kern.sigqueue.queue_sigchild which can control the behavior, setting it to zero disables the SIGCHLD queueing feature, the tunable will be removed if the function is proved that it is stable enough. Tested on: i386 (SMP and UP)
* Document in #ifdef notnow code the actions that proc_fini would need tojhb2005-10-241-0/+9
| | | | take if struct procs were actually freed.
* Always wire the sysctl output buffer in sysctl_kern_proc() beforetruckman2005-10-021-95/+113
| | | | | | | | | | | | | | | | | | calling sysctl_out_proc(). -- fix from jhb Move the code in fill_kinfo_thread() that gathers data from struct proc into the new function fill_kinfo_proc_only(). Change all callers of fill_kinfo_thread() to call both fill_kinfo_proc_only() and fill_kinfo() thread. When gathering data from a multi-threaded process, fill_kinfo_proc_only() only needs to be called once. Grab sched_lock before accessing the process thread list or calling fill_kinfo_thread(). PR: kern/84684 MFC after: 3 days
* Use the refcount API to implement reference counts on process argumentjhb2005-09-271-11/+4
| | | | | | | structures rather than using a global mutex to protect the reference counts. Tested on: i386, alpha, sparc64
* Add a sysctl that returns the full path of a process' text file.das2005-04-181-0/+45
| | | | | This information is needed by things like `gdb -p' and Sun's javac, and previously it could only be obtained via procfs
* Divorce critical sections from spinlocks. Critical sections as denoted byjhb2005-04-041-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | critical_enter() and critical_exit() are now solely a mechanism for deferring kernel preemptions. They no longer have any affect on interrupts. This means that standalone critical sections are now very cheap as they are simply unlocked integer increments and decrements for the common case. Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter() and spinlock_exit(). This KPI is responsible for providing whatever MD guarantees are needed to ensure that a thread holding a spin lock won't be preempted by any other code that will try to lock the same lock. For now all archs continue to block interrupts in a "spinlock section" as they did formerly in all critical sections. Note that I've also taken this opportunity to push a few things into MD code rather than MI. For example, critical_fork_exit() no longer exists. Instead, MD code ensures that new threads have the correct state when they are created. Also, we no longer try to fixup the idlethreads for APs in MI code. Instead, each arch sets the initial curthread and adjusts the state of the idle thread it borrows in order to perform the initial context switch. This change is largely a big NOP, but the cleaner separation it provides will allow for more efficient alternative locking schemes in other parts of the kernel (bare critical sections rather than per-CPU spin mutexes for per-CPU data for example). Reviewed by: grehan, cognet, arch@, others Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
* Add ki_jid field to the kinfo_proc structure and store jail ID there.pjd2005-03-201-1/+5
| | | | | Reviewed by: gad MFC after: 3 days
* In stange circumstances we may end up being the last reference to aphk2005-03-171-10/+17
| | | | | | | | | | | session in tprintf(). SESSRELE() needs to properly dispose of the sessions mutex. Add sessrele() which does the proper cleanup and have SESSRELE() call it. Use SESSRELE also in pgdelete(). Found by: Coverity (ID:526)
* Function jailed() looks into ucred strcture, so be sure ucred is not NULL.pjd2005-03-121-4/+4
| | | | | Reviewed by: rwatson MFC after: 1 week
* Clean up a bit.pjd2005-03-121-11/+12
| | | | | Reviewed by: rwatson MFC after: 1 week
* Make a bunch of SYSCTL_NODEs static.phk2005-02-101-22/+23
|
* /* -> /*- for copyright notices, minor format tweaks as necessaryimp2005-01-061-1/+1
|
* Axe a.out core dump support. Neither older gdb binaries nor currentdas2004-11-271-13/+0
| | | | bfd sources understand the present format.
* Remove local definitions of RANGEOF() and use __rangeof() instead.das2004-11-201-5/+2
| | | | Also remove a few bogus casts.
* Malloc p_stats instead of putting it in the U area. We should considerdas2004-11-201-5/+44
| | | | | | simply embedding it in struct proc. Reviewed by: arch@
* Remove duplicate line.julian2004-10-101-1/+0
|
* Rework how we store process times in the kernel such that we always storejhb2004-10-051-13/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month
* The zone from which proc structures are allocated is markeddas2004-09-191-16/+5
| | | | | | | UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should never be called. Move an assertion from proc_fini() to proc_dtor() and garbage-collect the rest of the unreachable code. I have retained vm_proc_dispose(), since I consider its disuse a bug.
* Refactor a bunch of scheduler code to give basically the same behaviourjulian2004-09-051-28/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
* Cause pfind() not to return processes in the PRS_NEW state. As a result,rwatson2004-08-141-1/+8
| | | | | | | | | | | | | | threads consuming the result of pfind() will not need to check for a NULL credential pointer or other signs of an incompletely created process. However, this also means that pfind() cannot be used to test for the existence or find such a process. Annotate pfind() to indicate that this is the case. A review of curent consumers seems to indicate that this is not a problem for any of them. This closes a number of race conditions that could result in NULL pointer dereferences and related failure modes. Other related races continue to exist, especially during iteration of the allproc list without due caution. Discussed with: tjr, green
* Remove typos on KASSERT messages.julian2004-08-091-3/+3
|
* * Add a "how" argument to uma_zone constructors and initialization functionsgreen2004-08-021-6/+8
| | | | | | | | | | | | | | | | | so that they know whether the allocation is supposed to be able to sleep or not. * Allow uma_zone constructors and initialation functions to return either success or error. Almost all of the ones in the tree currently return success unconditionally, but mbuf is a notable exception: the packet zone constructor wants to be able to fail if it cannot suballocate an mbuf cluster, and the mbuf allocators want to be able to fail in general in a MAC kernel if the MAC mbuf initializer fails. This fixes the panics people are seeing when they run out of memory for mbuf clusters. * Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing the default. Both bmilekic and jeff have reviewed the changes made to make failable zone allocations work.
* Fill some informations about zombie processes as well.pjd2004-07-291-7/+6
| | | | | | | Before this change every zombie process were reported as an owner of PID 0 in ps(1) output. Reviewed by: julian
* Fill in the values for the ki_tid and ki_numthreads which have beengad2004-06-201-0/+2
| | | | | | | added to kproc_info. PR: bin/65803 (a tiny part...) Submitted by: Cyrille Lefevre
* Add a call to calcru() to update the kproc_info fields of ki_rusage.ru_utimegad2004-06-201-0/+2
| | | | | | and ki_rusage.ru_stime. This greatly improves the accuracy of those fields. Suggested by: bde
* Fill in the some new fields 'struct kinfo_proc', namely ki_childstime,gad2004-06-191-4/+22
| | | | | | | | | | | ki_childutime, and ki_emul. Also uses the timevaladd() routine to correct the calculation of ki_childtime. That will correct the value returned when ki_childtime.tv_usec > 1,000,000. This also implements a new KERN_PROC_GID option for kvm_getprocs(). (there will be a similar update to lib/libkvm/kvm_proc.c) Submitted by: Cyrille Lefevre
* Second half of the dev_t cleanup.phk2004-06-171-2/+2
| | | | | | | | | | | The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.
* Nice, is a property of a process as a whole..julian2004-06-161-1/+1
| | | | | I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.
* Reference count struct tty.phk2004-06-091-1/+3
| | | | | | | | | | | Add two new functions: ttyref() and ttyrel(). ttymalloc() creates a struct tty with a reference count of one. when ttyrel sees the count go to zero, struct tty is freed. Hold references for open ttys and for ttys which are controlling terminal for sessions. Until drivers start using ttyrel(), this commit will make no difference.
* Fix a race in destruction of sessions.phk2004-06-091-2/+3
|
* Implement the new KERN_PROC_RGID option, and also implement thegad2004-05-221-0/+28
| | | | | | | | KERN_PROC_SESSION option which had been previously defined but never implemented. PR: bin/65803 (a very tiny piece of the PR)` Submitted by: Cyrille Lefevre
* Remove advertising clause from University of California Regent's license,imp2004-04-051-4/+0
| | | | | | per letter dated July 22, 1999. Approved by: core
* Remove ps_argsopen check. It is was bogus in the past and was correctedpjd2004-04-011-5/+0
| | | | | | | not quite well by me - if kern.ps_argsopen was set to 0, users weren't permitted to see arguments of even own processes. But kern.ps_argsopen is going away, so just remove this check and leave security checks for p_cansee() function.
* Fix information leakage.pjd2004-03-171-1/+6
| | | | | | | | | | | | | Without this fix it is possible to cheat policies like: - sysctl security.bsd.see_other_[gu]ids=0, - mac_seeotheruids(4), - jail(2) and get full processes list with their arguments. This problem exists from revision 1.62 of kern_proc.c when it was introduced. Reviewed by: nectar, rwatson.
* Split the mlock() kernel code into two parts, mlock(), which unpackstruckman2004-02-261-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way. Enable the RLIMIT_MEMLOCK checking code in kern_mlock(). Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits. Nuke the vslock() and vsunlock() implementations, which are no longer used. Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request. Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request. Modify the callers of sysctl_wire_old_buffer() to look for the error return. Modify sysctl_old_user to obey the wired buffer length and clean up its implementation. Reviewed by: bms
OpenPOWER on IntegriCloud