summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_exit.c
Commit message (Collapse)AuthorAgeFilesLines
* Handle the case where we truss an SUGID program -- in particular, we needsef2000-01-101-0/+1
| | | | | | | | to wake up any processes waiting via PIOCWAIT on process exit, and truss needs to be more aware that a process may actually disappear while it's waiting. Reviewed by: Paul Saab <ps@yahoo-inc.com>
* Scheduler fixes equivalent to the ones logged in the following NetBSDbde1999-11-281-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit to kern_synch.c: ---------------------------- revision 1.55 date: 1999/02/23 02:56:03; author: ross; state: Exp; lines: +39 -10 Scheduler bug fixes and reorganization * fix the ancient nice(1) bug, where nice +20 processes incorrectly steal 10 - 20% of the CPU, (or even more depending on load average) * provide a new schedclk() mechanism at a new clock at schedhz, so high platform hz values don't cause nice +0 processes to look like they are niced * change the algorithm slightly, and reorganize the code a lot * fix percent-CPU calculation bugs, and eliminate some no-op code === nice bug === Correctly divide the scheduler queues between niced and compute-bound processes. The current nice weight of two (sort of, see `algorithm change' below) neatly divides the USRPRI queues in half; this should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op, and it was done after decay_cpu() which can only _reduce_ the value. It has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can scheduler-penalize themselves onto the same queue as nice +20 processes. (Or even a higher one.) === New schedclk() mechansism === Some platforms should be cutting down stathz before hitting the scheduler, since the scheduler algorithm only works right in the vicinity of 64 Hz. Rather than prescale hz, then scale back and forth by 4 every time p_estcpu is touched (each occurance an abstraction violation), use p_estcpu without scaling and require schedhz to be generated directly at the right frequency. Use a default stathz (well, actually, profhz) / 4, so nothing changes unless a platform defines schedhz and a new clock. Define these for alpha, where hz==1024, and nice was totally broke. === Algorithm change === The nice value used to be added to the exponentially-decayed scheduler history value p_estcpu, in _addition_ to be incorporated directly (with greater wieght) into the priority calculation. At first glance, it appears to be a pointless increase of 1/8 the nice effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that because it will ramp up linearly but be decayed only exponentially, thus converging to an additional .75 nice for a loadaverage of one. I killed this, it makes the behavior hard to control, almost impossible to analyze, and the effect (~~nothing at for the first second, then somewhat increased niceness after three seconds or more, depending on load average) pointless. === Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation. Collect scheduler functionality. Try to put each abstraction in just one place. ---------------------------- The details are a little different in FreeBSD: === nice bug === Fixing this is the main point of this commit. We use essentially the same clipping rule as NetBSD (our limit on p_estcpu differs by a scale factor). However, clipping at all is fundamentally bad. It gives free CPU the hoggiest hogs once they reach the limit, and reaching the limit is normal for long-running hogs. This will be fixed later. === New schedclk() mechanism === We don't use the NetBSD schedclk() (now schedclock()) mechanism. We require (real)stathz to be about 128 and scale by an extra factor of 2 compared with NetBSD's statclock(). We scale p_estcpu instead of scaling the clock. This is more accurate and flexible. === Algorithm change === Same change. === Other bugs === The p_pctcpu bug was fixed long ago. We don't try as hard to abstract functionality yet. Related changes: the new limit on p_estcpu must be exported to kern_exit.c for clipping in wait1(). Agreed with by: dufault
* s/p_cred->pc_ucred/p_ucred/gphk1999-11-211-1/+1
|
* The at_exit and at_fork functions currently use a 'roll your own'phk1999-11-191-30/+27
| | | | | | | | | | | | | linked list to store the callbak routines. The patch converts the lists to queue(3) TAILQs, making the code slightly clearer and ensuring that callbacks are executed in FIFO order. Man page also updated as necesary. (discontinued use of M_TEMP malloc type while here anyway /phk) Submitted by: Jake Burkholder jake@checker.org PR: 14912
* Introduce commandline caching in the kernel.phk1999-11-161-0/+6
| | | | | | | | | | | This fixes some nasty procfs problems for SMP, makes ps(1) run much faster, and makes ps(1) even less dependent on /proc which will aid chroot and jails alike. To disable this facility and revert to previous behaviour: sysctl -w kern.ps_arg_cache_limit=0 For full details see the current@FreeBSD.org mail-archives.
* This is a partial commit of the patch from PR 14914:phk1999-11-161-3/+3
| | | | | | | | | | | | | Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. This batch of changes compile to the same object files. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914
* Add a per-signal flag to mark handlers registered with osigaction, so weluoqi1999-10-111-2/+2
| | | | | | | | | | | | | | | can provide the correct context to each signal handler. Fix broken sigsuspend(): don't use p_oldsigmask as a flag, use SAS_OLDMASK as we did before the linuxthreads support merge (submitted by bde). Move ps_sigstk from to p_sigacts to the main proc structure since signal stack should not be shared among threads. Move SAS_OLDMASK and SAS_ALTSTACK flags from sigacts::ps_flags to proc::p_flag. Move PS_NOCLDSTOP and PS_NOCLDWAIT flags from proc::p_flag to procsig::ps_flag. Reviewed by: marcel, jdp, bde
* Clean up some cruft. We don't run <= 4.3 binaries on hp300 or luna68kpeter1999-10-111-21/+0
| | | | arches using owait(2).
* sigset_t change (part 2 of 5)marcel1999-09-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ----------------------------- The core of the signalling code has been rewritten to operate on the new sigset_t. No methodological changes have been made. Most references to a sigset_t object are through macros (see signalvar.h) to create a level of abstraction and to provide a basis for further improvements. The NSIG constant has not been changed to reflect the maximum number of signals possible. The reason is that it breaks programs (especially shells) which assume that all signals have a non-null name in sys_signame. See src/bin/sh/trap.c for an example. Instead _SIG_MAXSIG has been introduced to hold the maximum signal possible with the new sigset_t. struct sigprop has been moved from signalvar.h to kern_sig.c because a) it is only used there, and b) access must be done though function sigprop(). The latter because the table doesn't holds properties for all signals, but only for the first NSIG signals. signal.h has been reorganized to make reading easier and to add the new and/or modified structures. The "old" structures are moved to signalvar.h to prevent namespace polution. Especially the coda filesystem suffers from the change, because it contained lines like (p->p_sigmask == SIGIO), which is easy to do for integral types, but not for compound types. NOTE: kdump (and port linux_kdump) must be recompiled. Thanks to Garrett Wollman and Daniel Eischen for pressing the importance of changing sigreturn as well.
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* Add sysctl variables for the Linuxulator. These reside under `compat.linux' asmarcel1999-08-271-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | discussed on current. The following variables are defined (for now): osname (defaults to "Linux") Allow users to change the name of the OS as returned by uname(2), specially added for all those Linux Netscape users and statistics maniacs :-) We now have what we all wanted! osrelease (defaults to "2.2.5") Allow users to change the version of the OS as returned by uname(2). Since -current supports glibc2.1 now, change the default to 2.2.5 (was 2.0.36). oss_version (defaults to 198144 [0x030600]) This one will be used by the OSS_GETVERSION ioctl (PR 12917) which I can commit now that we have the MIB. The default version number is the lowest version possible with the current 'encoding'. A note about imprisoned processes (see jail(2)): These variables are copy-on-write (as suggested by phk). This means that imprisoned processes will use the system wide value unless it is written/set by the process. From that moment on, a copy local to the prison will be used. A note about the implementation: I choose to add a single pointer to struct prison, because I didn't like the idea of changing struct prison every time I come up with a new variable. As a side effect, the extra storage is only needed when a variable is set from within the prison. This also minimizes kernel bloat when the Linuxulator is not used; both compiled in or as a module. Reviewed by: bde (first version only) and phk
* From the submitter:msmith1999-06-071-15/+12
| | | | | | | | | | | | | | | | | - this causes POSIX locking to use the thread group leader (p->p_leader) as the locking thread for all advisory locks. In non-kernel-threaded code p->p_leader == p, so this will have no effect. This results in (more) correct POSIX threaded flock-ing semantics. It also prevents the leader from exiting before any of the children. (so that p->p_leader will never be stale) in exit1(). We have been running this patch for over a month now in our lab under load and at customer sites. Submitted by: John Plevyak <jplevyak@inktomi.com>
* This Implements the mumbled about "Jail" feature.phk1999-04-281-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/
* Enable vmspace sharing on SMP. Major changes are,luoqi1999-04-281-2/+2
| | | | | | | | | | | | | | | | | - %fs register is added to trapframe and saved/restored upon kernel entry/exit. - Per-cpu pages are no longer mapped at the same virtual address. - Each cpu now has a separate gdt selector table. A new segment selector is added to point to per-cpu pages, per-cpu global variables are now accessed through this new selector (%fs). The selectors in gdt table are rearranged for cache line optimization. - fask_vfork is now on as default for both UP and SMP. - Some aio code cleanup. Reviewed by: Alan Cox <alc@cs.rice.edu> John Dyson <dyson@iquest.net> Julian Elischer <julian@whistel.com> Bruce Evans <bde@zeta.org.au> David Greenman <dg@root.com>
* Well folks, this is it - The second stage of the removal for build supportpeter1999-04-171-2/+2
| | | | for LKM's..
* Fixed runtime accounting. The time since the previous context switchbde1999-03-111-1/+10
| | | | | | | | was discarded on every call to calcru(). Hacking on the `switchtime' global for a related fix in rev.1.38 of kern_resource.c was too fragile and broke when p_switchtime went away. PR: 10402
* Fix thread/process tracking and differentiation for Linux threads emulation.julian1999-03-021-3/+14
| | | | | | Submitted by: Richard Seaman, Jr." <dick@tar.com> Also clean some compiler warnings in surrounding code.
* Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). Thisluoqi1999-02-191-2/+2
| | | | | | | is the preparation step for moving pmap storage out of vmspace proper. Reviewed by: Alan Cox <alc@cs.rice.edu> Matthew Dillion <dillon@apollo.backplane.com>
* Added comments about non-staticization so it doesn't get un-done nextnewton1999-01-311-1/+2
| | | | | | time someone goes on a staticization binge. Suggested by: eivind
* Unstaticized routines which are needed by the svr4 KLD and the streamsnewton1999-01-301-2/+2
| | | | garbage needed to support SysVR4 networking.
* Enable Linux threads support by default.julian1999-01-261-20/+4
| | | | | | | | | This takes the conditionals out of the code that has been tested by various people for a while. ps and friends (libkvm) will need a recompile as some proc structure changes are made. Submitted by: "Richard Seaman, Jr." <dick@tar.com>
* Changes to the LINUX_THREADS support to only allocate extra memory forjulian1999-01-071-17/+26
| | | | | | | | | | | | | | | | | | | | | | shared signal handling when there is shared signal handling being used. This removes the main objection to making the shared signal handling a standard ability in rfork() and friends and 'unconditionalising' this code. (i.e. the allocation of an extra 328 bytes per process). Signal handling information remains in the U area until such a time as it's reference count would be incremented to > 1. At that point a new struct is malloc'd and maintained in KVM so that it can be shared between the processes (threads) using it. A function to check the reference count and move the struct back to the U area when it drops back to 1 is also supplied. Signal information is therefore now swapable for all processes that are not sharing that information with other processes. THis should addres the concerns raised by Garrett and others. Submitted by: "Richard Seaman, Jr." <dick@tar.com>
* Reviewed by: Luoqi Chen, Jordan Hubbardjulian1998-12-191-1/+39
| | | | | | | | | | | | Submitted by: "Richard Seaman, Jr." <lists@tar.com> Obtained from: linux :-) Code to allow Linux Threads to run under FreeBSD. By default not enabled This code is dependent on the conditional COMPAT_LINUX_THREADS (suggested by Garret) This is not yet a 'real' option but will be within some number of hours.
* Installed the second patch attached to kern/7899 with some changes suggestedtruckman1998-11-111-1/+7
| | | | | | | | | | | | | | | | by bde, a few other tweaks to get the patch to apply cleanly again and some improvements to the comments. This change closes some fairly minor security holes associated with F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN had on tty devices. For more details, see the description on the PR. Because this patch increases the size of the proc and pgrp structures, it is necessary to re-install the includes and recompile libkvm, the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w. PR: kern/7899 Reviewed by: bde, elvind
* add #include <sys/kernel.h> where it's needed by MALLOC_DEFINE()peter1998-11-101-1/+2
|
* Moved limit frobbing (and the resulting limcopy()) that occurs fordg1998-06-051-7/+1
| | | | | | accounting to the accounting function so that this isn't needlessly done for some process exits. Reviewed by: bde,phk
* Make a kernel version of the timer* functions called timerval* to bephk1998-04-061-2/+2
| | | | | | more consistent. OK'ed by: bde
* VM level code cleanups.dyson1998-01-221-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM. This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.) This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)
* Make COMPAT_43 and COMPAT_SUNOS new-style options.eivind1997-12-161-1/+2
|
* Use at_exit() to invoke procfs_exit() instead of calling it directly.sef1997-12-081-12/+1
| | | | | | | | Note that an unload facility should be used to call rm_at_exit() (if procfs is being loaded as an LKM and is subsequently removed), but it was non-obvious how to do this in the VFS framework. Reviewed by: Julian Elischer
* Surround the call to procfs_exit() by #ifdef PROCFS/#endif -- much to mysef1997-12-071-1/+5
| | | | | | surprise, procfs actually is optional, and some people truly do generate kernels without it. Wow. I built a kernel without 'options PROCFS' and it compiled and linked.
* Changes to allow event-based process monitoring and control.sef1997-12-061-1/+11
|
* Avoid passing a `retval' to wait1()bde1997-11-201-13/+10
| | | | | | | Disallow wait options that are not a combination of the standard POSIX options WUNTRACED and WNOHANG, as is required by POSIX. BSD doesn't have any extensions here, but the code was `#ifdef notyet' for some reason.
* Move the "retval" (3rd) parameter from all syscall functions and putphk1997-11-061-10/+7
| | | | | | | | | | | | it in struct proc instead. This fixes a boatload of compiler warning, and removes a lot of cruft from the sources. I have not removed the /*ARGSUSED*/, they will require some looking at. libkvm, ps and other userland struct proc frobbing programs will need recompiled.
* Last major round (Unless Bruce thinks of somthing :-) of malloc changes.phk1997-10-121-3/+3
| | | | | | | | Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde
* Distribute and statizice a lot of the malloc M_* types.phk1997-10-111-1/+3
| | | | Substantial input from: bde
* init_main.c subr_autoconf.c:gibbs1997-09-211-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for "interrupt driven configuration hooks". A component of the kernel can register a hook, most likely during auto-configuration, and receive a callback once interrupt services are available. This callback will occur before the root and dump devices are configured, so the configuration task can affect the selection of those two devices or complete any tasks that need to be performed prior to launching init. System boot is posponed so long as a hook is registered. The hook owner is responsible for removing the hook once their task is complete or the system boot can continue. kern_acct.c kern_clock.c kern_exit.c kern_synch.c kern_time.c: Change the interface and implementation for the kernel callout service. The new implemntaion is based on the work of Adam M. Costello and George Varghese, published in a technical report entitled "Redesigning the BSD Callout and Timer Facilities". The interface used in FreeBSD is a little different than the one outlined in the paper. The new function prototypes are: struct callout_handle timeout(void (*func)(void *), void *arg, int ticks); void untimeout(void (*func)(void *), void *arg, struct callout_handle handle); If a client wishes to remove a timeout, it must store the callout_handle returned by timeout and pass it to untimeout. The new implementation gives 0(1) insert and removal of callouts making this interface scale well even for applications that keep 100s of callouts outstanding. See the updated timeout.9 man page for more details.
* Implement SA_NOCLDWAIT.joerg1997-09-131-2/+16
| | | | | | | | | | | | | | | | | | | | The implementation is done (unlike what i've originally been contemplating) by reparenting kids of processes that have the appropriate bit set to PID 1, and let PID 1 handle the zombie. This is far less problematical than what would seem to be ``doing it right'', for a number of reasons. Of our currently shipping PID-1-intended programs, 50 % fail the above assumption. ;-) (Read this: sysinstall doesn't do it right. This is no problem as long as no program called by sysinstall actually uses SA_NOCLDWAIT.) ToDo: . clarify the correct SA_* flag inheritance, compared to other systems, . decide whether the compat cruft (osigvec(9)) should deal with new system additions or not, . merge OpenBSD's SA_SIGINFO implementation. ;) Reviewed by: bde
* Removed unused #includes.bde1997-09-021-10/+1
|
* Fixed some gratuitous ANSIisms.bde1997-08-261-5/+5
|
* #include <machine/limits.h> explicitly in the few places that it is required.bde1997-08-211-1/+2
|
* Clean up some lint associated with the AIO code.dyson1997-07-171-1/+2
|
* This is an upgrade so that the kernel supports the AIO calls fromdyson1997-07-061-1/+3
| | | | | | | | | | | POSIX.4. Additionally, there is some initial code that supports LIO. This code supports AIO/LIO for all types of file descriptors, with few if any restrictions. There will be a followup very soon that will support significantly more efficient operation for VCHR type files (raw.) This code is also dependent on some kernel features that don't work under SMP yet. After I commit the changes to the kernel to support proper address space sharing on SMP, this code will also work under SMP.
* Modifications to existing files to support the initial AIO/LIO anddyson1997-06-161-1/+32
| | | | kernel based threading support.
* Remove cruft relating to p_selbits and p_selbits_sizephk1997-05-221-7/+1
|
* The biggie: Get rid of the UPAGES from the top of the per-process addresspeter1997-04-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | space. (!) Have each process use the kernel stack and pcb in the kvm space. Since the stacks are at a different address, we cannot copy the stack at fork() and allow the child to return up through the function call tree to return to user mode - create a new execution context and have the new process begin executing from cpu_switch() and go to user mode directly. In theory this should speed up fork a bit. Context switch the tss_esp0 pointer in the common tss. This is a lot simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer to each process's tss since the esp0 pointer is a 32 bit pointer, and the sd_base setting is split into three different bit sections at non-aligned boundaries and requires a lot of twiddling to reset. The 8K of memory at the top of the process space is now empty, and unmapped (and unmappable, it's higher than VM_MAXUSER_ADDRESS). Simplity the pmap code to manage process contexts, we no longer have to double map the UPAGES, this simplifies and should measuably speed up fork(). The following parts came from John Dyson: Set PG_G on the UPAGES that are now in kernel context, and invalidate them when swapping them out. Move the upages object (upobj) from the vmspace to the proc structure. Now that the UPAGES (pcb and kernel stack) are out of user space, make rfork(..RFMEM..) do what was intended by sharing the vmspace entirely via reference counting rather than simply inheriting the mappings.
* Don't include <sys/ioctl.h> in the kernel. Stage 1: don't includebde1997-03-241-2/+1
| | | | | it when it is not used. In most cases, the reasons for including it went away when the special ioctl headers became self-sufficient.
* Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are notpeter1997-02-221-1/+1
| | | | ready for it yet.
* This is the kernel Lite/2 commit. There are some requisite userlanddyson1997-02-101-3/+3
| | | | | | | | | | | | | | | changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
* Copy process resource settings before modifying.davidn1997-01-211-0/+5
| | | | Candidate for 2.2.
OpenPOWER on IntegriCloud