summaryrefslogtreecommitdiffstats
path: root/sys/kern/init_main.c
Commit message (Collapse)AuthorAgeFilesLines
...
* SI_ORDER_THIRD + 2, not SI_ORDER_FOURTH + 2.rwatson2006-09-261-1/+1
| | | | | MFC after: 3 days Submitted by: mlaier
* Add "FreeBSD" trademark statement to copyright section of boot messages.rwatson2006-09-251-3/+4
| | | | | MFC after: 3 days Approved by: core, board at FreeBSDFoundation dot org
* Initialize kg_base_user_pri.davidxu2006-08-251-0/+1
|
* The VERBOSE_SYSINIT stuff sees the DDB define a lot better if we includebenno2006-05-141-0/+1
| | | | | | | opt_ddb.h. Spotted by: benno Pointy hat to: benno
* Add a new kernel config option, VERBOSE_SYSINIT.benno2006-05-121-0/+45
| | | | | | | | | | | | | | When porting FreeBSD to a new platform, one of the more useful things to do is get mi_startup() to let you know which SYSINIT it's up to. Most people tend to whack a printf in the SYSINIT loop to print the address of the function it's about to call. Going one better, jhb made a version that uses DDB to look up the name of the function and print that instead. This version is essentially his with the addition of some ifdeffery to make it optional and to allow it to work (although using only the function address, not the symbol) if you forgot to enable DDB. All the cool bits by: jhb Approved by: scottl, rink, cognet, imp
* Modify the way we account for CPU time spent (step 1)phk2006-02-071-3/+2
| | | | | | | | | | | | | | | | Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.
* rwlock expects the struct thread to be aligned on 8 bytes, so make surecognet2006-02-061-1/+1
| | | | thread0 is.
* Hook up audit to the initial process creation events (proc0, proc1).rwatson2006-02-021-0/+9
| | | | | Much help from: wsalamon Obtained from: TrustedBSD Project
* Moderate rewrite of kernel ktrace code to attempt to generally improverwatson2005-11-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>
* Remove mac_create_root_mount() and mpo_create_root_mount(), whichrwatson2005-09-191-4/+0
| | | | | | | | | | | | | | | | | provided access to the root file system before the start of the init process. This was used briefly by SEBSD before it knew about preloading data in the loader, and using that method to gain access to data earlier results in fewer inconsistencies in the approach. Policy modules still have access to the root file system creation event through the mac_create_mount() entry point. Removed now, and will be removed from RELENG_6, in order to gain third party policy dependencies on the entry point for the lifetime of the 6.x branch. MFC after: 3 days Submitted by: Chris Vance <Christopher dot Vance at SPARTA dot com> Sponsored by: SPARTA
* Fix system shutdown timeout handling by again supporting longer runningrse2005-09-151-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | shutdown procedures (which have a duration of more than 120 seconds). We have two user-space affecting shutdown timeouts: a "soft" one in /etc/rc.shutdown and a "hard" one in init(8). The first one can be configured via /etc/rc.conf variable "rcshutdown_timeout" and defaults to 30 seconds. The second one was originally (in 1998) intended to be configured via sysctl(8) variable "kern.shutdown_timeout" and defaults to 120 seconds. Unfortunately, the "kern.shutdown_timeout" was declared "unused" in 1999 (as it obviously is actually not used within the kernel itself) and hence was intentionally but misleadingly removed in revision 1.107 from init_main.c. Kernel sysctl(8) variables are certainly a wrong way to control user-space processes in general, but in this particular case the sysctl(8) variable should have remained as it supports init(8), which isn't passed command line flags (which in turn could have been set via /etc/rc.conf), etc. As there is already a similar "kern.init_path" sysctl(8) variable which directly affects init(8), resurrect the init(8) shutdown timeout under sysctl(8) variable "kern.init_shutdown_timeout". But this time document it as being intentionally unused within the kernel and used by init(8). Also document it in the manpages init(8) and rc.conf(5). Reviewed by: phk MFC after: 2 weeks
* Fix the recent panics/LORs/hangs created by my kqueue commit by:ssouhlal2005-07-011-1/+1
| | | | | | | | | | | | | | | | | - Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively. - Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread. Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone
* Add /rescue/init to the default init_path, before /stand/sysinstall.des2005-02-171-1/+1
| | | | MFC after: 2 weeks
* /* -> /*- for copyright notices, minor format tweaks as necessaryimp2005-01-061-1/+1
|
* The remaining part of nmount/omount/rootfs mount changes. I cannot sensiblyphk2004-12-071-29/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | split the conversion of the remaining three filesystems out from the root mounting changes, so in one go: cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility. ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() Remove vfs_omount() method, all filesystems are now converted. Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now. Change rootmounting to use DEVFS trampoline: vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem. Remove now unnecessary getdiskbyname(). kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c. Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer. Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).
* Don't include sys/user.h merely for its side-effect of recursivelydas2004-11-271-1/+0
| | | | including other headers.
* Malloc p_stats instead of putting it in the U area. We should considerdas2004-11-201-6/+2
| | | | | | simply embedding it in struct proc. Reviewed by: arch@
* Allow fdinit() to be called with a NULL fdp argument so we can usephk2004-11-071-14/+3
| | | | | | it when setting up init. Make fdinit() lock the fdp argument as needed.
* Rework how we store process times in the kernel such that we always storejhb2004-10-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month
* Refactor a bunch of scheduler code to give basically the same behaviourjulian2004-09-051-21/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
* Give setrunqueue() and sched_add() more of a clue as tojulian2004-09-011-1/+1
| | | | | | where they are coming from and what is expected from them. MFC after: 2 days
* Add locking to the kqueue subsystem. This also makes the kqueue subsystemjmg2004-08-151-0/+1
| | | | | | | | | | | | | a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)
* Remove global variable rootdevs and rootvp, they are unused as such.phk2004-07-281-1/+0
| | | | | | | | Add local rootvp variables as needed. Remove checks for miniroot's in the swappartition. We never did that and most of the filesystems could never be used for that, but it had still been copy&pasted all over the place.
* Make VFS_ROOT() and vflush() take a thread argument.alfred2004-07-121-1/+1
| | | | | | This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.
* Nice, is a property of a process as a whole..julian2004-06-161-1/+1
| | | | | I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.
* Loudly announce WITNESS and DIAGNOSTIC options and warn about reducedphk2004-02-291-0/+14
| | | | performance.
* Locking for the per-process resource limits structure.jhb2004-02-041-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
* KASSERT() that initproc->p_pid is 1. Very bad things happen if init'srwatson2004-01-161-0/+1
| | | | | | | pid isn't 1, and it can actually occur if kthread_create() is called before SUB_SI_CREATE_INIT without RFHIGHPID. Discussed with: jhb
* New file descriptor allocation code, derived from similar code introduceddes2004-01-151-0/+2
| | | | | | | | | | | in OpenBSD by Niels Provos. The patch introduces a bitmap of allocated file descriptors which is used to locate available descriptors when a new one is needed. It also moves the task of growing the file descriptor table out of fdalloc(), reducing complexity in both fdalloc() and do_dup(). Debts of gratitude are owed to tjr@ (who provided the original patch on which this work is based), grog@ (for the gdb(4) man page) and rwatson@ (for assistance with pxeboot(8)).
* Remove the global variable 'cmask', which was used to initialize therwatson2003-10-021-3/+1
| | | | | | | | | fd_cmask field in the file descriptor structure for the first process indirectly from CMASK, and when an fd structure is initialized before being filled in, and instead just use CMASK. This appears to be an artifact left over from the initial integration of quotas into BSD. Suggested by: peter
* Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bitpeter2003-09-251-0/+1
| | | | | | | | | | | | | | | | | | | | | systems where the data/stack/etc limits are too big for a 32 bit process. Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c. Supply an ia32_fixlimits function. Export the clip/default values to sysctl under the compat.ia32 heirarchy. Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max value rather than the sysctl tweakable variable. This allows mmap to place mappings at sensible locations when limits have been reduced. Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same method as mmap(0, ...) now does. Note that we cannot remove all references to the sysctl tweakable maxdsiz etc variables because /etc/login.conf specifies a datasize of 'unlimited'. And that causes exec etc to fail since it can no longer find space to mmap things.
* Change instances of callout_init that specify MPSAFE behaviour tosam2003-08-191-2/+2
| | | | | use CALLOUT_MPSAFE instead of "1" for the second parameter. This does not change the behaviour; it just makes the intent more clear.
* Revert stuff which accidentally ended up in the previous commit.phk2003-07-221-5/+0
|
* Don't attempt to inline large functions mb_alloc() and mb_free(),phk2003-07-221-0/+5
| | | | | | it more than doubles the text size of this file. GCC has wisely ignored us on this previously
* Use __FBSDID().obrien2003-06-111-1/+3
|
* Add tracking of process leaders sharing a file descriptor table andtegge2003-06-021-0/+1
| | | | | | | allow a file descriptor table to be shared between multiple process leaders. PR: 50923
* - Merge struct procsig with struct sigacts.jhb2003-05-131-7/+4
| | | | | | | | | | | | | | | | | - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)
* Instead of recording the Unix time in a process when it starts, record thedes2003-05-011-1/+1
| | | | | | | uptime. Where necessary, convert it back to Unix time by adding boottime to it. This fixes a potential problem in the accounting code, which would compute the elapsed time incorrectly if the Unix time was stepped during the lifetime of the process.
* Made vmspace0 non-static. Its useful to be able to identify a vmspace asjake2003-04-131-1/+1
| | | | the kernel vmspace.
* Move the _oncpu entry from the KSE to the thread.julian2003-04-101-1/+1
| | | | | The entry in the KSE still exists but it's purpose will change a bit when we add the ability to lock a KSE to a cpu.
* Use the proc lock to protect p_realtimer instead of Giant, and obtaintjr2003-02-171-1/+1
| | | | | | sched_lock around accesses to p_stats->p_timer[] to avoid a potential race with hardclock. getitimer(), setitimer() and the realitexpire() callout are now Giant-free.
* - Split the struct kse into struct upcall and struct kse. struct kse willjeff2003-02-171-1/+0
| | | | | | | soon be visible only to schedulers. This greatly simplifies much the KSE code. Submitted by: davidxu
* It seems the extra precautions are no longer needed.des2003-02-131-2/+0
|
* Correct grammatical error in previous commit.des2003-02-041-1/+1
|
* Extra precautions before trying to start init(8).des2003-02-041-0/+2
|
* Reversion of commit by Davidxu plus fixes since applied.julian2003-02-011-0/+1
| | | | | | | | I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.
* NODEVFS cleanup: remove #ifdefs.phk2003-01-301-2/+0
|
* Move UPCALL related data structure out of kse, introduce a newdavidxu2003-01-261-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian
* Originally when DEVFS was added, a global variable "devfs_present"phk2003-01-191-15/+15
| | | | | | | | | | | | was used to control code which were conditional on DEVFS' precense since this avoided the need for large-scale source pollution with #include "opt_geom.h" Now that we approach making DEVFS standard, replace these tests with an #ifdef to facilitate mechanical removal once DEVFS becomes non-optional. No functional change by this commit.
* Improve the way that an elf image activator for an alternate word size isjake2003-01-041-5/+0
| | | | | | | | | | | included in the kernel. Include imgact_elf.c in conf/files, instead of both imgact_elf32.c and imgact_elf64.c, which will use the default word size for an architecture as defined in machine/elf.h. Architectures that wish to build an additional image activator for an alternate word size can include either imgact_elf32.c or imgact_elf64.c in files.${ARCH}, which allows it to be dependent on MD options instead of solely on architecture. Glanced at by: peter
OpenPOWER on IntegriCloud