summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_resource.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Rework the support for ABIs to override resource limits (used by 32-bitjhb2007-05-141-6/+4
| | | | | | | | | | | | | | | | | | | processes under 64-bit kernels). Previously, each 32-bit process overwrote its resource limits at exec() time. The problem with this approach is that the new limits affect all child processes of the 32-bit process, including if the child process forks and execs a 64-bit process. To fix this, don't ovewrite the resource limits during exec(). Instead, sv_fixlimits() is now replaced with a different function sv_fixlimit() which asks the ABI to sanitize a single resource limit. We then use this when querying and setting resource limits. Thus, if a 32-bit process sets a limit, then that new limit will be inherited by future children. However, if the 32-bit process doesn't change a limit, then a future 64-bit child will see the "full" 64-bit limit rather than the 32-bit limit. MFC is tentative since it will break the ABI of old linux.ko modules (no other modules are affected). MFC after: 1 week
* Further system call comment cleanup:rwatson2007-03-051-3/+0
| | | | | | | | | | - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
* Remove 'MPSAFE' annotations from the comments above most system calls: allrwatson2007-03-041-25/+0
| | | | | | | | system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
* Close race conditions between fork() and [sg]etpriority()'sdelphij2007-02-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PRIO_USER case, possibly also other places that deferences p_ucred. In the past, we insert a new process into the allproc list right after PID allocation, and release the allproc_lock sx. Because most content in new proc's structure is not yet initialized, this could lead to undefined result if we do not handle PRS_NEW with care. The problem with PRS_NEW state is that it does not provide fine grained information about how much initialization is done for a new process. By defination, after PRIO_USER setpriority(), all processes that belongs to given user should have their nice value set to the specified value. Therefore, if p_{start,end}copy section was done for a PRS_NEW process, we can not safely ignore it because p_nice is in this area. On the other hand, we should be careful on PRS_NEW processes because we do not allow non-root users to lower their nice values, and without a successful copy of the copy section, we can get stale values that is inherted from the uninitialized area of the process structure. This commit tries to close the race condition by grabbing proc mutex *before* we release allproc_lock xlock, and do copy as well as zero immediately after the allproc_lock xunlock. This guarantees that the new process would have its p_copy and p_zero sections, as well as user credential informaion initialized. In getpriority() case, instead of grabbing PROC_LOCK for a PRS_NEW process, we just skip the process in question, because it does not affect the final result of the call, as the p_nice value would be copied from its parent, and we will see it during allproc traverse. Other potential solutions are still under evaluation. Discussed with: davidxu, jhb, rwatson PR: kern/108071 MFC after: 2 weeks
* Use priv_check(9) instead of suser(9) for checking the privilege torwatson2007-02-191-1/+1
| | | | | | set real-time priority on a thread. It looks like this suser(9) call was introduced after my first pass through replacing superuser checks with named privilege checks.
* Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.delphij2007-01-171-1/+1
|
* Threading cleanup.. part 2 of several.julian2006-12-061-89/+0
| | | | | | | | | | | | | | | | | | | | | | Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.
* Use scheduler API sched_user_prio() to adjust thread's userland priority,davidxu2006-11-201-12/+15
| | | | | use td_base_user_prio to get real userland priority since POSIX priority mutex may adjust td_user_pri which is an effective priority.
* Sweep kernel replacing suser(9) calls with priv(9) calls, assigningrwatson2006-11-061-3/+5
| | | | | | | | | | | | | specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
* Make KSE a kernel option, turned on by default in all GENERICjb2006-10-261-0/+86
| | | | | | | kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
* Replace system call thr_getscheduler, thr_setscheduler, thr_setschedparamdavidxu2006-09-211-0/+95
| | | | | with rtprio_thread, while rtprio system call is for process only, the new system call rtprio_thread is responsible for LWP.
* Commit the results of the typo hunt by Darren Pilgrim.yar2006-08-041-1/+1
| | | | | | | | | | This change affects documentation and comments only, no real code involved. PR: misc/101245 Submitted by: Darren Pilgrim <darren pilgrim bitfreak org> Tested by: md5(1) MFC after: 1 week
* Go over calcru and friends once more.phk2006-03-111-47/+48
| | | | | Reintroduce the monotonicity for the normal case and make the two special cases behave in what is belived to be the most sensible fasion.
* Add slop to "backwards" cpu accounting messages, 3 usec or 1% whicheverphk2006-03-091-1/+5
| | | | | | | | | | | | | triggers. This should eliminate all the trivial messages which result from minor increases in cpu_tick frequency. Machines which don't du cpu clock fiddling shouldn't issue "backwards" messages now. Laptops and other machines where the initial estimate of cputicks may be waaaay off will still issue warnings.
* Various style and comment fixes.jhb2006-02-221-8/+7
| | | | Submitted by: bde
* Split calcru() back into a calcru1() function shared with calccru() andjhb2006-02-211-10/+33
| | | | | | | | | | a calcru() wrapper that passes a local rusage_ext on the stack that is a snapshot to do the calculations on. Now we can pass p->p_crux to calcru1() in calccru() again which fixes the issues with runtime going backwards messages when dead processes are harvested by init. Reviewed by: phk Tested by: Stefan Ehmann shoesoft at gmx dot net
* CPU time accounting speedup (step 2)phk2006-02-111-68/+45
| | | | | | | | | | | | | | | | | | | Keep accounting time (in per-cpu) cputicks and the statistics counts in the thread and summarize into struct proc when at context switch. Don't reach across CPUs in calcru(). Add code to calibrate the top speed of cpu_tickrate() for variable cpu_tick hardware (like TSC on power managed machines). Don't enforce monotonicity (at least for now) in calcru. While the calibrated cpu_tickrate ramps up it may not be true. Use 27MHz counter on i386/Geode. Use TSC on amd64 & i386 if present. Use tick counter on sparc64
* Modify the way we account for CPU time spent (step 1)phk2006-02-071-9/+12
| | | | | | | | | | | | | | | | Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.
* Back out changes made in rev. 1.151.ups2006-01-251-1/+1
| | | | | | They were bogus. Cluebat applied by: jhb@
* Hopefully fix the "calcru: runtime went backwards from ..." problem byups2006-01-231-1/+1
| | | | | | | keeping the resource values locked (where needed) while we use them for calculations. MFC after: 3 days
* Calling setrlimit from 32bit apps could potentially increase certainps2005-11-021-0/+7
| | | | | | | limits beyond what should be capiable in a 32bit process, so we must fixup the limits. Reviewed by: jhb
* Use the reference count API to manage the reference counts for processjhb2005-09-271-11/+4
| | | | | | | limit structures rather than using pool mutexes to protect the reference counts. Tested on: i386, alpha, sparc64
* Giant is no longer required in kern_setrlimit(); remove its acquisition andalc2005-06-011-2/+0
| | | | | | release. Reviewed by: jhb
* Stop explicitly touching td_base_pri outside of the scheduler and simplyjhb2004-12-301-1/+0
| | | | | set a thread's priority via sched_prio() when that is the desired action. The schedulers will start managing td_base_pri internally shortly.
* Rework how we store process times in the kernel such that we always storejhb2004-10-051-79/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month
* A modest collection of various and sundry style, spelling, and whitespacejhb2004-09-241-38/+33
| | | | | | fixes. Submitted by: bde (mostly)
* Various small style fixes.jhb2004-09-221-2/+4
|
* Push UIDINFO_UNLOCK() slightly earlier in chgsbize(), as it's notrwatson2004-08-061-2/+2
| | | | | needed if we print the local variable version of the limit rather than the shared version.
* Remove spl's from kern_resource.c.rwatson2004-08-041-4/+0
|
* Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This iscperciva2004-07-261-1/+1
| | | | | | | | | | | somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb
* Turned off the "calcru: negative time" warning for certain SMP casesbde2004-06-211-12/+34
| | | | | | | | | | | where it is known to detect a problem but the problem is not very easy to fix. The warning became very common recently after a call to calcru() was added to fill_kinfo_thread(). Another (much older) cause of "negative times" (actually non-monotonic times) was fixed in rev.1.237 of kern_exit.c. Print separate messages for non-monotonic and negative times.
* Nice, is a property of a process as a whole..julian2004-06-161-34/+10
| | | | | I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.
* Deorbit COMPAT_SUNOS.phk2004-06-111-2/+2
| | | | | We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither a sparc32 port nor a SunOS4.x compatibility desire these days.
* Fix rtprio() to do sensible things when called from threaded processes.julian2004-05-081-4/+45
| | | | | | | | | | | It's not quite correct from a posix Point Of view, but it is a lot better than what was there before. This will be revisited later when we decide what form our priority extensions will take. Posix doesn't specify how a system scope thread can change its priority so you need to add non-standard extensions to be able to do it.. For now make this slightly non standard to allow it to be done. Submitted by: Dan Eischen originally, changed by myself.
* Remove a comment that complains about the lack of %qd, to justifymux2004-04-101-3/+2
| | | | | | truncating a rlim_t to a long. We have %qd since some time now. However, the correct format to use here is %jd and a cast to intmax_t, so do this.
* Remove advertising clause from University of California Regent's license,imp2004-04-051-4/+0
| | | | | | per letter dated July 22, 1999. Approved by: core
* Argh! Fix a bogon. lim_cur() was returning the hard (max) limit ratherjhb2004-02-111-1/+1
| | | | | | than the soft (cur) limit. Submitted by: bde
* - Convert the plimit lock to a pool mutex lock.jhb2004-02-061-3/+3
| | | | | | - Hide struct plimit from userland. Submitted by: bde (2)
* - Correct the translation of old rlimit values to properly handle the oldjhb2004-02-061-21/+28
| | | | | | | | | | | | | RLIM_INFINITY case for ogetrlimit(). - Use %jd and intmax_t to output negative time in usec in calcru(). - Rework getrusage() to make a copy of the rusage struct into a local variable while holding Giant and then do the copyout from the local variable to avoid having to have the original process rusage struct locked while doing the copyout (which would not be safe). This also includes a few style fixes from Bruce to getrusage(). Submitted by: bde (1, parts of 3) Suggested by: bde (2)
* A few more style fixes from Bruce including a few I missed last time.jhb2004-02-061-18/+12
| | | | Submitted by: bde
* - A lot of style and whitespace fixes.jhb2004-02-051-60/+53
| | | | | | - Update a few comments regarding locking notes. Submitted by: bde (1, mostly)
* Locking for the per-process resource limits structure.jhb2004-02-041-64/+159
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
* - Don't set td_priority directly here, use sched_prio().jeff2003-10-271-1/+1
|
* Extend the mutex pool implementation to permit the creation and use oftruckman2003-07-131-1/+1
| | | | | | | | | | | | | | | | multiple mutex pools with different options and sizes. Mutex pools can be created with either the default sleep mutexes or with spin mutexes. A dynamically created mutex pool can now be destroyed if it is no longer needed. Create two pools by default, one that matches the existing pool that uses the MTX_NOWITNESS option that should be used for building higher level locks, and a new pool with witness checking enabled. Modify the users of the existing mutex pool to use the appropriate pool in the new implementation. Reviewed by: jhb
* Use __FBSDID().obrien2003-06-111-1/+3
|
* Remove Giant from [gs]etpriority().jhb2003-04-231-6/+0
|
* Lock both the proc lock and sched_lock when calling sched_nice sincejhb2003-04-221-0/+2
| | | | | | kg_nice is now protected by both. Being protected by both means that other places in the kernel that want to read kg_nice only need one of the two locks.
* Add a couple of sched_lock asserts.jhb2003-04-181-0/+2
|
* - Adjust sched hooks for fork and exec to take processes as arguments insteadjeff2003-04-111-1/+1
| | | | | | | | | | of ksegs since they primarily operation on processes. - KSEs take ticks so pass the kse through sched_clock(). - Add a sched_class() routine that adjusts a ksegrp pri class. - Define a sched_fork_{kse,thread,ksegrp} and sched_exit_{kse,thread,ksegrp} that will be used to tell the scheduler about new instances of these structures within the same process. These will be used by THR and KSE. - Change sched_4bsd to reflect this API update.
* Back out previous. The locking here needs a rethink.tjr2003-03-131-15/+3
|
OpenPOWER on IntegriCloud