summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_resource.c
Commit message (Collapse)AuthorAgeFilesLines
* Implement global and per-uid accounting of the anonymous memory. Addkib2009-06-231-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)
* Don't rearm callout if the process is exiting, it may leak a calloutdavidxu2008-10-241-1/+2
| | | | | because callout_drain() only waits for running callout, but not disable it if it is rearmed.
* Retire the MALLOC and FREE macros. They are an abomination unto style(9).des2008-10-231-1/+1
| | | | MFC after: 3 months
* Fix a small typo in a comment in calcru1().ed2008-09-051-1/+1
| | | | | | The word "happene" should read "happened". Submitted by: Jille Timmermans <jille quis cx>
* Integrate the new MPSAFE TTY layer to the FreeBSD operating system.ed2008-08-201-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following: - Improved driver model: The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers. If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver. - Improved hotplugging: With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc). The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly. - Improved performance: One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters. Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING. Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan
* Remove extra uihold() call that accidentally sneak in during perforcepjd2008-03-191-1/+0
| | | | change @125544.
* - Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice fromjeff2008-03-191-16/+4
| | | | | | | requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.
* Whitespace cleanups.pjd2008-03-161-7/+7
|
* - Use wait-free method to manage ui_sbsize and ui_proccnt fields in thepjd2008-03-161-58/+48
| | | | | | | | | | uidinfo structure. This entirely removes contention observed on the ui_mtxp mutex (as it is now gone). - Convert the uihashtbl_mtx mutex to a rwlock, as most of the time we just need to read-lock it. Reviewed by: jhb, jeff, kris & others Tested by: kris
* Style fixes.pjd2008-03-161-11/+7
|
* Fix information leak. We can find PIDs of running processes from withinpjd2008-03-161-1/+2
| | | | | | | | | a jail, etc. by simply calling setpriority(PRIO_PROCESS, <PID>, 0) and checking the return value: 0 means that the process exists and -1 that it doesn't exist. Reviewed by: rwatson MFC after: 1 week
* Remove kernel support for M:N threading.jeff2008-03-121-2/+0
| | | | | | | | While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
* Don't zero td_runtime when billing thread CPU usage to the process;rwatson2008-01-101-3/+3
| | | | | | | | | | | | | | | | | | | | | maintain a separate td_incruntime to hold unbilled CPU usage for the thread that has the previous properties of td_runtime. When thread information is requested using the thread monitoring sysctls, export thread td_runtime instead of process rusage runtime in kinfo_proc. This restores the display of individual ithread and other kernel thread CPU usage since inception in ps -H and top -SH, as well for libthr user threads, valuable debugging information lost with the move to try kthreads since they are no longer independent processes. There is universal agreement that we should rewrite the process and thread export sysctls, but this commit gets things going a bit better in the mean time. Likewise, there are resevations about the continued validity of statclock given the speed of modern processors. Reviewed by: attilio, emaste, jhb, julian
* Fix LOR of thread lock and umtx's priority propagation mutex duedavidxu2007-12-111-1/+8
| | | | | | to the reworking of scheduler lock. MFC: after 3 days
* - Use ruxagg() in calcru() to make sure we have current tick informationjeff2007-07-171-0/+8
| | | | | | | from all threads. Discussed with: bde, attilio Approved by: re
* Fix a couple of issues with the stack limit for 32-bit processes on 64-bitjhb2007-07-121-8/+12
| | | | | | | | | | | | | kernels exposed by the recent fixes to resource limits for 32-bit processes on 64-bit kernels: - Let ABIs expose their maximum stack size via a new pointer in sysentvec and use that in preference to maxssiz during exec() rather than always using maxssiz for all processses. - Apply the ABI's limit fixup to the previous stack size when adjusting RLIMIT_STACK to determine if the existing mapping for the stack needs to be grown or shrunk (as well as how much it should be grown or shrunk). Approved by: re (kensmith)
* Remove the restriction that rtprio(2) cannot be used to set the realtimerwatson2007-06-141-17/+8
| | | | | | | or idle priority of another process owned by the same user. This means that privilege in rtprio(2) (and rtprio_thread(2)) is required indirectly via p_cansched(9) or directly to set realtime/idle privilege, rather than directly affecting target process authorization.
* Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); inrwatson2007-06-121-2/+1
| | | | | | | | | | | | | | | some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project
* rufetch and calcru sometimes should be called atomically together.attilio2007-06-091-13/+21
| | | | | | | | | | This patch fixes places where they should be called atomically changing their locking requirements (both assume per-proc spinlock held) and introducing rufetchcalc which wrappers both calls to be performed in atomic way. Reviewed by: jeff Approved by: jeff (mentor)
* The current rusage code show peculiar problems:attilio2007-06-091-6/+3
| | | | | | | | | | | | | | - Unsafeness on ruadd() in thread_exit() - Unatomicity of thread_exiit() in the exit1() operations This patch addresses these problems allocating p_fd as part of the process and modifying the way it is accessed. A small chunk of this patch, resolves a race about p_state in kern_wait(), since we have to be sure about the zombif-ing process. Submitted by: jeff Approved by: jeff (mentor)
* Commit 14/14 of sched_lock decomposition.jeff2007-06-051-24/+33
| | | | | | | | | | | - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
* - Move rusage from being per-process in struct pstats to per-thread injeff2007-06-011-22/+103
| | | | | | | | | | | | | | | | | | | td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
* Universally adopt most conventional spelling of acquire.rwatson2007-05-271-1/+1
|
* Rework the support for ABIs to override resource limits (used by 32-bitjhb2007-05-141-6/+4
| | | | | | | | | | | | | | | | | | | processes under 64-bit kernels). Previously, each 32-bit process overwrote its resource limits at exec() time. The problem with this approach is that the new limits affect all child processes of the 32-bit process, including if the child process forks and execs a 64-bit process. To fix this, don't ovewrite the resource limits during exec(). Instead, sv_fixlimits() is now replaced with a different function sv_fixlimit() which asks the ABI to sanitize a single resource limit. We then use this when querying and setting resource limits. Thus, if a 32-bit process sets a limit, then that new limit will be inherited by future children. However, if the 32-bit process doesn't change a limit, then a future 64-bit child will see the "full" 64-bit limit rather than the 32-bit limit. MFC is tentative since it will break the ABI of old linux.ko modules (no other modules are affected). MFC after: 1 week
* Further system call comment cleanup:rwatson2007-03-051-3/+0
| | | | | | | | | | - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
* Remove 'MPSAFE' annotations from the comments above most system calls: allrwatson2007-03-041-25/+0
| | | | | | | | system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
* Close race conditions between fork() and [sg]etpriority()'sdelphij2007-02-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PRIO_USER case, possibly also other places that deferences p_ucred. In the past, we insert a new process into the allproc list right after PID allocation, and release the allproc_lock sx. Because most content in new proc's structure is not yet initialized, this could lead to undefined result if we do not handle PRS_NEW with care. The problem with PRS_NEW state is that it does not provide fine grained information about how much initialization is done for a new process. By defination, after PRIO_USER setpriority(), all processes that belongs to given user should have their nice value set to the specified value. Therefore, if p_{start,end}copy section was done for a PRS_NEW process, we can not safely ignore it because p_nice is in this area. On the other hand, we should be careful on PRS_NEW processes because we do not allow non-root users to lower their nice values, and without a successful copy of the copy section, we can get stale values that is inherted from the uninitialized area of the process structure. This commit tries to close the race condition by grabbing proc mutex *before* we release allproc_lock xlock, and do copy as well as zero immediately after the allproc_lock xunlock. This guarantees that the new process would have its p_copy and p_zero sections, as well as user credential informaion initialized. In getpriority() case, instead of grabbing PROC_LOCK for a PRS_NEW process, we just skip the process in question, because it does not affect the final result of the call, as the p_nice value would be copied from its parent, and we will see it during allproc traverse. Other potential solutions are still under evaluation. Discussed with: davidxu, jhb, rwatson PR: kern/108071 MFC after: 2 weeks
* Use priv_check(9) instead of suser(9) for checking the privilege torwatson2007-02-191-1/+1
| | | | | | set real-time priority on a thread. It looks like this suser(9) call was introduced after my first pass through replacing superuser checks with named privilege checks.
* Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.delphij2007-01-171-1/+1
|
* Threading cleanup.. part 2 of several.julian2006-12-061-89/+0
| | | | | | | | | | | | | | | | | | | | | | Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.
* Use scheduler API sched_user_prio() to adjust thread's userland priority,davidxu2006-11-201-12/+15
| | | | | use td_base_user_prio to get real userland priority since POSIX priority mutex may adjust td_user_pri which is an effective priority.
* Sweep kernel replacing suser(9) calls with priv(9) calls, assigningrwatson2006-11-061-3/+5
| | | | | | | | | | | | | specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
* Make KSE a kernel option, turned on by default in all GENERICjb2006-10-261-0/+86
| | | | | | | kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
* Replace system call thr_getscheduler, thr_setscheduler, thr_setschedparamdavidxu2006-09-211-0/+95
| | | | | with rtprio_thread, while rtprio system call is for process only, the new system call rtprio_thread is responsible for LWP.
* Commit the results of the typo hunt by Darren Pilgrim.yar2006-08-041-1/+1
| | | | | | | | | | This change affects documentation and comments only, no real code involved. PR: misc/101245 Submitted by: Darren Pilgrim <darren pilgrim bitfreak org> Tested by: md5(1) MFC after: 1 week
* Go over calcru and friends once more.phk2006-03-111-47/+48
| | | | | Reintroduce the monotonicity for the normal case and make the two special cases behave in what is belived to be the most sensible fasion.
* Add slop to "backwards" cpu accounting messages, 3 usec or 1% whicheverphk2006-03-091-1/+5
| | | | | | | | | | | | | triggers. This should eliminate all the trivial messages which result from minor increases in cpu_tick frequency. Machines which don't du cpu clock fiddling shouldn't issue "backwards" messages now. Laptops and other machines where the initial estimate of cputicks may be waaaay off will still issue warnings.
* Various style and comment fixes.jhb2006-02-221-8/+7
| | | | Submitted by: bde
* Split calcru() back into a calcru1() function shared with calccru() andjhb2006-02-211-10/+33
| | | | | | | | | | a calcru() wrapper that passes a local rusage_ext on the stack that is a snapshot to do the calculations on. Now we can pass p->p_crux to calcru1() in calccru() again which fixes the issues with runtime going backwards messages when dead processes are harvested by init. Reviewed by: phk Tested by: Stefan Ehmann shoesoft at gmx dot net
* CPU time accounting speedup (step 2)phk2006-02-111-68/+45
| | | | | | | | | | | | | | | | | | | Keep accounting time (in per-cpu) cputicks and the statistics counts in the thread and summarize into struct proc when at context switch. Don't reach across CPUs in calcru(). Add code to calibrate the top speed of cpu_tickrate() for variable cpu_tick hardware (like TSC on power managed machines). Don't enforce monotonicity (at least for now) in calcru. While the calibrated cpu_tickrate ramps up it may not be true. Use 27MHz counter on i386/Geode. Use TSC on amd64 & i386 if present. Use tick counter on sparc64
* Modify the way we account for CPU time spent (step 1)phk2006-02-071-9/+12
| | | | | | | | | | | | | | | | Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.
* Back out changes made in rev. 1.151.ups2006-01-251-1/+1
| | | | | | They were bogus. Cluebat applied by: jhb@
* Hopefully fix the "calcru: runtime went backwards from ..." problem byups2006-01-231-1/+1
| | | | | | | keeping the resource values locked (where needed) while we use them for calculations. MFC after: 3 days
* Calling setrlimit from 32bit apps could potentially increase certainps2005-11-021-0/+7
| | | | | | | limits beyond what should be capiable in a 32bit process, so we must fixup the limits. Reviewed by: jhb
* Use the reference count API to manage the reference counts for processjhb2005-09-271-11/+4
| | | | | | | limit structures rather than using pool mutexes to protect the reference counts. Tested on: i386, alpha, sparc64
* Giant is no longer required in kern_setrlimit(); remove its acquisition andalc2005-06-011-2/+0
| | | | | | release. Reviewed by: jhb
* Stop explicitly touching td_base_pri outside of the scheduler and simplyjhb2004-12-301-1/+0
| | | | | set a thread's priority via sched_prio() when that is the desired action. The schedulers will start managing td_base_pri internally shortly.
* Rework how we store process times in the kernel such that we always storejhb2004-10-051-79/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month
* A modest collection of various and sundry style, spelling, and whitespacejhb2004-09-241-38/+33
| | | | | | fixes. Submitted by: bde (mostly)
* Various small style fixes.jhb2004-09-221-2/+4
|
OpenPOWER on IntegriCloud