summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_fork.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Don't lock each of the processes while looking for a pid. The allproc andjhb2006-08-011-5/+1
| | | | proctree locks that we already hold provide sufficient protection.
* - Use suser_cred(9) instead of checking cr_ruid directly.pjd2006-06-271-7/+10
| | | | | | | | | - For privileged processes safe two mutex operations. We may want to consider if this is good idea to use SUSER_ALLOWJAIL here, but for now I didn't wanted to change the original behaviour. Reviewed by: rwatson
* Fix a race between file operations and rfork(RFCFDG) by parkingdavidxu2006-03-151-0/+17
| | | | | | | | all other threads at user boundary, the race can crash kernel under stress testing. Reviewed by: jhb MFC after: 3 days
* Simplify system time accounting for profiling.phk2006-02-081-1/+1
| | | | | | | | | | Rename struct thread's td_sticks to td_pticks, we will need the other name for more appropriately named use shortly. Reduce it from uint64_t to u_int. Clear td_pticks whenever we enter the kernel instead of recording its value as reference for userret(). Use the absolute value of td->pticks in userret() and eliminate third argument.
* We don't need the proc lock to check P_KTHREAD on curthread since it isjhb2006-02-061-3/+0
| | | | only set before the kthread starts executing and is never cleared.
* Audit the args to rfork(), and the child PID for all fork system calls.wsalamon2006-02-061-0/+2
| | | | | Obtained from: TrustedBSD Project Approved by: rwatson (mentor)
* Hook up audit to fork() and exit() events. These changes manage therwatson2006-02-021-1/+11
| | | | | | | audit state on processes, not auditing of these events. Much work by: wsalamon Obtained from: TrustedBSD Project
* Moderate rewrite of kernel ktrace code to attempt to generally improverwatson2005-11-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>
* Fix the recent panics/LORs/hangs created by my kqueue commit by:ssouhlal2005-07-011-1/+1
| | | | | | | | | | | | | | | | | - Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively. - Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread. Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone
* Inherit signal mask for child process in fork1(), RELENG_4 and otherdavidxu2005-04-201-0/+1
| | | | | | | *BSD have this behaviour, also it is required by POSIX. PR: kern/80130 Submitted by: Kostik Belousov konstantin.belousov at zoral dot com dot ua
* Divorce critical sections from spinlocks. Critical sections as denoted byjhb2005-04-041-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | critical_enter() and critical_exit() are now solely a mechanism for deferring kernel preemptions. They no longer have any affect on interrupts. This means that standalone critical sections are now very cheap as they are simply unlocked integer increments and decrements for the common case. Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter() and spinlock_exit(). This KPI is responsible for providing whatever MD guarantees are needed to ensure that a thread holding a spin lock won't be preempted by any other code that will try to lock the same lock. For now all archs continue to block interrupts in a "spinlock section" as they did formerly in all critical sections. Note that I've also taken this opportunity to push a few things into MD code rather than MI. For example, critical_fork_exit() no longer exists. Instead, MD code ensures that new threads have the correct state when they are created. Also, we no longer try to fixup the idlethreads for APs in MI code. Instead, each arch sets the initial curthread and adjusts the state of the idle thread it borrows in order to perform the initial context switch. This change is largely a big NOP, but the cleaner separation it provides will allow for more efficient alternative locking schemes in other parts of the kernel (bare critical sections rather than per-CPU spin mutexes for per-CPU data for example). Reviewed by: grehan, cognet, arch@, others Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
* /* -> /*- for copyright notices, minor format tweaks as necessaryimp2005-01-061-1/+1
|
* Add new function fdunshare() which encapsulates the necessary light magicphk2004-12-141-12/+2
| | | | | | | | for ensuring that a process' filedesc is not shared with anybody. Use it in the two places which previously had private implmentations. This collects all fd_refcnt handling in kern_descrip.c
* Don't include sys/user.h merely for its side-effect of recursivelydas2004-11-271-1/+1
| | | | including other headers.
* Remove local definitions of RANGEOF() and use __rangeof() instead.das2004-11-201-9/+6
| | | | Also remove a few bogus casts.
* Malloc p_stats instead of putting it in the U area. We should considerdas2004-11-201-1/+3
| | | | | | simply embedding it in struct proc. Reviewed by: arch@
* Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST.phk2004-11-131-5/+5
| | | | | | | | Use this in all the places where sleeping with the lock held is not an issue. The distinction will become significant once we finalize the exact lock-type to use for this kind of case.
* Use more intuitive pointer for fdinit() and fdcopy().phk2004-11-081-5/+3
| | | | Change fdcopy() to take unlocked filedesc.
* Allow fdinit() to be called with a NULL fdp argument so we can usephk2004-11-071-4/+0
| | | | | | it when setting up init. Make fdinit() lock the fdp argument as needed.
* Back out rev 1.240; it is unnecessary. In particular,das2004-10-061-8/+3
| | | | | | | p1 == curthread, so _PHOLD(p1) will not have to block to swap in p1. Noticed by: jhb
* Avoid calling _PHOLD(p1) with p2's lock held, since _PHOLD()das2004-10-011-3/+8
| | | | | may block to swap in p1. Instead, call _PHOLD earlier, at a point where the only lock held happens to be p1's.
* make some of these conditions apply equally to both threading systems.julian2004-09-131-3/+3
|
* Refactor a bunch of scheduler code to give basically the same behaviourjulian2004-09-051-11/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
* Push Giant deep into vm_forkproc(), acquiring it only if the process hasalc2004-09-031-16/+12
| | | | | mapped System V shared memory segments (see shmfork_myhook()) or requires the allocation of an ldt (see vm_fault_wire()).
* Give setrunqueue() and sched_add() more of a clue as tojulian2004-09-011-1/+1
| | | | | | where they are coming from and what is expected from them. MFC after: 2 days
* Remove sched_free_thread() which was only usedjulian2004-08-311-3/+0
| | | | | | | | in diagnostics. It has outlived its usefulness and has started causing panics for people who turn on DIAGNOSTIC, in what is otherwise good code. MFC after: 2 days
* Add locking to the kqueue subsystem. This also makes the kqueue subsystemjmg2004-08-151-1/+2
| | | | | | | | | | | | | a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)
* Increase the amount of data exported by KTR in the KTR_RUNQ setting.julian2004-08-091-2/+2
| | | | | This extra data is needed to really follow what is going on in the threaded case.
* Move the schedlock owner state update following the contextbmilekic2004-07-271-12/+14
| | | | | | | | | switch in fork_exit() to before anything else is done (but keep schedlock for the deadthread check). This means one less nasty bug if ever in the future whatever might have been called before the update played with schedlock or critical sections. Discussed with: tjr
* In revision 1.228, I accidentally broke the "total number of processes incperciva2004-07-261-1/+2
| | | | | | | | | | | | the system" resource limit code: When checking if the caller has superuser privileges, we should be checking the *real* user, not the *effective* user. (In general, resource limiting is done based on the real user, in order to avoid resource-exhaustion-by-setuid-program attacks.) Now that a SUSER_RUID flag to suser_cred exists, use it here to return this code to its correct behaviour. Pointed out by: rwatson
* When calling scheduler entrypoints for creating new threads and processes,julian2004-07-181-1/+1
| | | | | | | | | | | specify "us" as the thread not the process/ksegrp/kse. You can always find the others from the thread but the converse is not true. Theorotically this would lead to runtime being allocated to the wrong entity in some cases though it is not clear how often this actually happenned. (would only affect threaded processes and would probably be pretty benign, but it WAS a bug..) Reviewed by: peter
* fix compilation.phk2004-07-131-1/+1
|
* Replace "uid != 0" with "suser(td->td_ucred) != 0" when checking if we'vecperciva2004-07-131-1/+2
| | | | | hit the maximum number of processes. The last ten processes are reserved for the *non-jailed* superuser.
* Allocate TIDs in thread_init() and deallocate them in thread_fini().marcel2004-06-261-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | The overhead of unconditionally allocating TIDs (and likewise, unconditionally deallocating them), is amortized across multiple thread creations by the way UMA makes it possible to have type-stable storage. Previously the cost was kept down by having threads created as part of a fork operation use the process' PID as the TID. While this had some nice properties, it also introduced complexity in the way TIDs were allocated. Most importantly, by using the type-stable storage that UMA gives us this was also unnecessary. This change affects how core dumps are created and in particular how the PRSTATUS notes are dumped. Since we don't have a thread with a TID equalling the PID, we now need a different way to preserve the old and previous behavior. We do this by having the given thread (i.e. the thread passed to the core dump code in td) dump it's state first and fill in pr_pid with the actual PID. All other threads will have pr_pid contain their TIDs. The upshot of all this is that the debugger will now likely select the right LWP (=TID) as the initial thread. Credits to: julian@ for spotting how we can utilize UMA. Thanks to: all who provided julian@ with test results.
* Remove advertising clause from University of California Regent's license,imp2004-04-051-4/+0
| | | | | | per letter dated July 22, 1999. Approved by: core
* Assign thread IDs to kernel threads. The purpose of the thread ID (tid)marcel2004-04-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | is twofold: 1. When a 1:1 or M:N threaded process dumps core, we need to put the register state of each of its kernel threads in the core file. This can only be done by differentiating the pid field in the respective note. For this we need the tid. 2. When thread support is present for remote debugging the kernel with gdb(1), threads need to be identified by an integer due to limitations in the remote protocol. This requires having a tid. To minimize the impact of having thread IDs, threads that are created as part of a fork (i.e. the initial thread in a process) will inherit the process ID (i.e. tid=pid). Subsequent threads will have IDs larger than PID_MAX to avoid interference with the pid allocation algorithm. The assignment of tids is handled by thread_new_tid(). The thread ID allocation algorithm has been written with 3 assumptions in mind: 1. IDs need to be created as fast a possible, 2. Reuse of IDs may happen instantaneously, 3. Someone else will write a better algorithm.
* Make the process_exit eventhandler run without Giant. Add Giant hookspeter2004-03-141-2/+0
| | | | | | in the two consumers that need it.. processes using AIO and netncp. Update docs. Say that process_exec is called with Giant, but not to depend on it. All our consumers can handle it without Giant.
* Move the process_fork event out from under Giant. This one is easy,peter2004-03-141-1/+3
| | | | since there are no consumers in the tree. Document this.
* Push Giant down a little further:peter2004-03-131-3/+0
| | | | | | | | | | | | | | | - no longer serialize on Giant for thread_single*() and family in fork, exit and exec - thread_wait() is mpsafe, assert no Giant - reduce scope of Giant in exit to not cover thread_wait and just do vm_waitproc(). - assert that thread_single() family are not called with Giant - remove the DROP/PICKUP_GIANT macros from thread_single() family - assert that thread_suspend_check() s not called with Giant - remove manual drop_giant hack in thread_suspend_check since we know it isn't held. - remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family - mark kse_create() mpsafe
* make sure we had the filedesc lock when calling fdinit when RFCFDG is setjmg2004-03-101-0/+4
| | | | | | | on call to rfork. Submitted by: Brian Buchanan Semi-Reviewed by: rwatson
* Move a vref call outside of proc locks and Giant. By virtue of the factpeter2004-03-081-5/+4
| | | | | | | that we (p1) are currently running, we hold a reference on p_textvp which means the vnode cannot go away. p2 cannot run yet (and hence cannot exit) so this should be safe to do at this point. As a bonus, it removes a block of under-Giant code that was there to support the vref.
* Giant is not required for vm_thread_new_altkstack().alc2004-03-071-4/+1
|
* Add a missing part of jhb's previous commit. It looks like he had apeter2004-03-061-1/+5
| | | | | | | patch chunk rejected that he missed. This would manifest as a lock assertion panic at boot (Giant not locked in kern_fork.c). Obtained from: jhb
* - Grab a share lock of the proctree lock while looking for a pid due to thejhb2004-03-051-13/+25
| | | | | | | | process group and session dereferences. Also, check that p_pgrp and p_sesssion are NULL before dereferencing them. - Push down Giant in fork1(). Requested by: peter
* Fixed some style bugs (mainly English usage errors in comments).bde2004-03-041-9/+10
|
* Split the mlock() kernel code into two parts, mlock(), which unpackstruckman2004-02-261-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way. Enable the RLIMIT_MEMLOCK checking code in kern_mlock(). Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits. Nuke the vslock() and vsunlock() implementations, which are no longer used. Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request. Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request. Modify the callers of sysctl_wire_old_buffer() to look for the error return. Modify sysctl_old_user to obey the wired buffer length and clean up its implementation. Reviewed by: bms
* Always set a process' state to normal when it is fully constructed injhb2004-02-051-5/+9
| | | | | fork1() rather than only doing it for the RFSTOPPED case and then having to fix it up in other places later on.
* Locking for the per-process resource limits structure.jhb2004-02-041-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
* When aborting fork() due to a failure, if using MAC, make sure to cleanrwatson2004-01-251-0/+3
| | | | | | | up the p_label field. Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research
* Reduce gratuitous includes: don't include jail.h if it's not needed.rwatson2004-01-211-1/+0
| | | | | | | Presumably, at some point, you had to include jail.h if you included proc.h, but that is no longer required. Result of: self injury involving adding something to struct prison
OpenPOWER on IntegriCloud