summaryrefslogtreecommitdiffstats
path: root/sys/kern/init_main.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r283382:dchagin2016-01-091-0/+1
| | | | | In preparation for switching linuxulator to the use the native 1:1 threads add a hook for cleaning thread resources before the thread die.
* To facillitate an upcoming Linuxulator merging partiallydchagin2016-01-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | MFC r275121 (by kib). Only merge the syntax changes from r275121, PROC_*LOCK() macros still lock the same proc spinlock. The process spin lock currently has the following distinct uses: - Threads lifetime cycle, in particular, counting of the threads in the process, and interlocking with process mutex and thread lock. The main reason of this is that turnstile locks are after thread locks, so you e.g. cannot unlock blockable mutex (think process mutex) while owning thread lock. - Virtual and profiling itimers, since the timers activation is done from the clock interrupt context. Replace the p_slock by p_itimmtx and PROC_ITIMLOCK(). - Profiling code (profil(2)), for similar reason. Replace the p_slock by p_profmtx and PROC_PROFLOCK(). - Resource usage accounting. Need for the spinlock there is subtle, my understanding is that spinlock blocks context switching for the current thread, which prevents td_runtime and similar fields from changing (updates are done at the mi_switch()). Replace the p_slock by p_statmtx and PROC_STATLOCK(). Discussed with: kib
* MFC: r287183, r287264, r287265imp2015-09-031-0/+4
| | | | Export kern.features.invariants when kernel is compiled with invariants.
* Fix r281843 mis-merge.pluknet2015-04-221-1/+1
| | | | Reported by: Thomas Mueller tmueller at sysgo com
* MFC revisions 277693,278335,280382-280385,280923-280926,280931,dteske2015-04-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 280933-280939,280974-280976,281002,281009,281081,281176-281180, 281271,281275,281616 (described in-breif below): r277693: Font fix (des) r278335: Revert that r280382: Whitespace, comments, and copyright update r280383: Prevent inadvertent bootlock condition r280384: Increase max passowrd length from 16 to 255 chars r280385: Add missing variable hints to loader.conf(5) defaults r280923: Whitespace r280924: Comments r280925: Optimize bootmsg to use fg/bg/me from screen.4th r280926: Whitespace and cleanup r280931: Comments r280933: Move beastie to logo-*.4th; brands to brand-*.4th r280934: Add remainder of supported ANSI escape sequences r280935: Securely overwrite (zero) user input after password checks r280936: Use equals for ASCII double frames r280937: Solve dreaded "dictionary full" issue r280938: Add "GELI Passphrase:" prompt to boot loader r280939: Revert that (premature commit) r280974: Use fg/b/me from screen.4th instead of literals r280975: Eliminate literal escape sequences from *.4th r280976: Use ^[[m mode-ending versus ^[[37m r281002: Install newly added brand-*.4th and logo-*.4th files (jkim) r281009: Revert .PATH changes to fix mips build (jkim) r281081: Make sure forth manpages are only installed once (bapt) r281176: Back to previous mode-endings based on feedback r281177: Back to previous mode-endings based on feedback r281178: Back to previous mode-endings based on feedback r281179: Back to previous mode-endings based on feedback r281180: Eliminate literal escape sequences from *.rc r281271: Fix a bootlock condition if loader_version is set NB: Commit message of r281271 has a typo, s/_logo/_version/ r281275: Re-do proper mode-endings r281616: Add "GELI Passphrase:" prompt to boot loader Relnotes: Added "GELI Passphrase:" prompt to boot loader
* MFC r279361, r279395, r279396:ian2015-03-251-1/+1
| | | | | | | | | | | Allow the kern.osrelease and kern.osreldate sysctl values to be set in a jail's creation parameters. This allows the kernel version to be reliably spoofed within the jail whether examined directly with sysctl or indirectly with the uname -r and -K options. Export the new osreldate and osrelease jail parms in jail_get(2). Fix line wrap.
* Merge reaper facility.kib2015-01-051-1/+6
| | | | | | | | | | | | | | | | | | | | | MFC r270443 (by mjg): Properly reparent traced processes when the tracer dies. MFC r273452 (by mjg): Plug unnecessary PRS_NEW check in kern_procctl. MFC 275800: Add a facility for non-init process to declare itself the reaper of the orphaned descendants. MFC r275821: Add missed break. MFC r275846 (by mckusick): Add some additional clarification and fix a few gammer nits. MFC r275847 (by bdrewery): Bump Dd for r275846.
* Extend the support for exempting processes from being killed when swap isjhb2013-09-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month
* Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping usejhb2013-09-091-2/+2
| | | | | | | | | | | | | an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)
* Don't call sleepinit() from proc0_init(), make it a SYSINIT instead.cognet2013-08-091-4/+0
| | | | | vmem needs the sleepq locks to be initialized when free'ing kva, so we want it called as early as possible.
* Replace kernel virtual address space allocation with vmem. This providesjeff2013-08-071-5/+0
| | | | | | | | | | | | | transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
* rename scheduler->swapper and SI_SUB_RUN_SCHEDULER->SI_SUB_LASTavg2013-07-241-6/+9
| | | | | | | | | | | | | | | | Also directly call swapper() at the end of mi_startup instead of relying on swapper being the last thing in sysinits order. Rationale: - "RUN_SCHEDULER" was misleading, scheduling already takes place at that stage - "scheduler" was misleading, the function swaps in the swapped out processes - another SYSINIT(SI_SUB_RUN_SCHEDULER, SI_ORDER_ANY) could never be invoked depending on its relative order with scheduler; this was not obvious and the bug actually used to exist Reviewed by: kib (ealier version) MFC after: 14 days
* MFP4 change 210763brooks2013-04-031-2/+9
| | | | | | | | Allow boothowto and bootverbose to be set via kernel options, which is useful on architectures that are unable to rely on a boot loader to pass configuration variables to the kernel. Submitted by: rwatson
* print compiler version in the kernel banneravg2013-02-021-0/+1
| | | | | | | | | | And provide kernel compiler version as a sysctl as well. This is useful while we have gcc and clang cohabitation. This could be even more useful when we have support for external toolchains. In cooperation with: mjg MFC after: 13 days
* Fix a race between kern_setitimer() and realitexpire(), where thekib2012-12-041-1/+1
| | | | | | | | | | | | | | | | | | | | | callout is started before kern_setitimer() acquires process mutex, but looses a race and kern_setitimer() gets the process mutex before the callout. Then, assuming that new specified struct itimerval has it_interval zero, but it_value non-zero, the callout, after it starts executing again, clears p->p_realtimer.it_value, but kern_setitimer() already rescheduled the callout. As the result of the race, both p_realtimer is zero, and the callout is rescheduled. Then, in the exit1(), the exit code sees that it_value is zero and does not even try to stop the callout. This allows the struct proc to be reused and eventually the armed callout is re-initialized. The consequence is the corrupted callwheel tailq. Use process mutex to interlock the callout start, which fixes the race. Reported and tested by: pho Reviewed by: jhb MFC after: 2 weeks
* Fix grammar.kib2012-08-161-1/+1
| | | | | Submitted by: jh MFC after: 1 week
* Add a sysctl kern.pid_max, which limits the maximum pid the system iskib2012-08-151-0/+1
| | | | | | | allowed to allocate, and corresponding tunable with the same name. Note that existing processes with higher pids are left intact. MFC after: 1 week
* Extend VERBOSE_SYSINIT to also print out the name of variables passedjhb2012-06-011-9/+28
| | | | | | | | to SYSINIT routines if they can be resolved via symbol look up in DDB. To avoid false positives, only honor a name if the symbol resolves exactly to the pointer value (no offset). MFC after: 1 week
* TDF_* flags should be used with td_flags field and TDP_* flags should be usedpjd2012-01-221-1/+2
| | | | | | | with td_pflags field. Correct two places where it was not the case. Discussed with: kib MFC after: 1 week
* Remove the long reprecated ``/stand/sysinstall'' from the init_path.pluknet2011-10-271-1/+1
| | | | | | | It can be put back using the INIT_PATH config option or init_path loader variable, if still needed (which I doubt). MFC after: 1 week
* In order to maximize the re-usability of kernel code in user space thiskmacy2011-09-161-1/+1
| | | | | | | | | | | | | patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
* Add experimental support for process descriptorsjonathan2011-08-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | A "process descriptor" file descriptor is used to manage processes without using the PID namespace. This is required for Capsicum's Capability Mode, where the PID namespace is unavailable. New system calls pdfork(2) and pdkill(2) offer the functional equivalents of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote process for debugging purposes. The currently-unimplemented pdwait(2) will, in the future, allow querying rusage/exit status. In the interim, poll(2) may be used to check (and wait for) process termination. When a process is referenced by a process descriptor, it does not issue SIGCHLD to the parent, making it suitable for use in libraries---a common scenario when using library compartmentalisation from within large applications (such as web browsers). Some observers may note a similarity to Mach task ports; process descriptors provide a subset of this behaviour, but in a UNIX style. This feature is enabled by "options PROCDESC", but as with several other Capsicum kernel features, is not enabled by default in GENERIC 9.0. Reviewed by: jhb, kib Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc
* Enable accounting for RACCT_NPROC and RACCT_NTHR.trasz2011-03-311-0/+3
| | | | | Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
* Add racct. It's an API to keep per-process, per-jail, per-loginclasstrasz2011-03-291-0/+4
| | | | | | | | | and per-loginclass resource accounting information, to be used by the new resource limits code. It's connected to the build, but the code that actually calls the new functions will come later. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
* Extend struct sysvec with new method sv_schedtail, which is used for andchagin2011-03-081-0/+1
| | | | | | | | | | | | | | | explicit process at fork trampoline path instead of eventhadler(schedtail) invocation for each child process. Remove eventhandler(schedtail) code and change linux ABI to use newly added sysvec method. While here replace explicit comparing of module sysentvec structure with the newly created process sysentvec to detect the linux ABI. Discussed with: kib MFC after: 2 Week
* Add two new system calls, setloginclass(2) and getloginclass(2). This makestrasz2011-03-051-0/+2
| | | | | | | | | it possible for the kernel to track login class the process is assigned to, which is required for RCTL. This change also make setusercontext(3) call setloginclass(2) and makes it possible to retrieve current login class using id(1). Reviewed by: kib (as part of a larger patch)
* - Properly initialize the base priority (td_base_pri) of thread0 to PVMjhb2011-01-061-1/+1
| | | | | | | | | | to match the desired priority in td_priority. Otherwise the first time thread0 used a borrowed priority it would drop down to PUSER instead of PVM. - Explicitly initialize the starting priority of new kprocs to PVM to avoid inheriting some random priority from thread0. MFC after: 2 weeks
* MFp4:davidxu2010-12-091-0/+1
| | | | | | | | | It is possible a lower priority thread lending priority to higher priority thread, in old code, it is ignored, however the lending should always be recorded, add field td_lend_user_pri to fix the problem, if a thread does not have borrowed priority, its value is PRI_MAX. MFC after: 1 week
* Set bootverbose directly in mi_startup() rather than via a SYSINIT. Thisjhb2010-10-281-9/+3
| | | | | | | ensures 'bootverbose' is in a valid state for all SYSINITs. Reported by: avg MFC after: 1 week
* - Insert thread0 into correct thread hash link list.davidxu2010-10-171-1/+1
| | | | | | | | | | | - In thr_exit() and kthread_exit(), only remove thread from hash if it can directly exit, otherwise let exit1() do it. - In thread_suspend_check(), fix cleanup code when thread needs to exit. This change seems fixed the "Bad link elm " panic found by Peter Holm. Stress testing: pho
* Create a global thread hash table to speed up thread lookup, usedavidxu2010-10-091-0/+1
| | | | | | | | | | rwlock to protect the table. In old code, thread lookup is done with process lock held, to find a thread, kernel has to iterate through process and thread list, this is quite inefficient. With this change, test shows in extreme case performance is dramatically improved. Earlier patch was reviewed by: jhb, julian
* Add descriptions to a handful of sysctl nodes.gavin2010-08-091-3/+6
| | | | | | PR: kern/148580 Submitted by: Galimov Albert <wtfcrap mail.ru> MFC after: 1 week
* Remove spurious '/*-' marks and fix some other style problems.trasz2010-07-221-4/+3
| | | | Submitted by: bde@
* Revert r210225 - turns out I was wrong; the "/*-" is not license-onlytrasz2010-07-181-1/+1
| | | | | | | thing; it's also used to indicate that the comment should not be automatically rewrapped. Explained by: cperciva@
* The "/*-" comment marker is supposed to denote copyrights. Remove non-copyrighttrasz2010-07-181-1/+1
| | | | occurences from sys/sys/ and sys/kern/.
* Reorganize syscall entry and leave handling.kib2010-05-231-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_*syscall* pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month
* Initialize the virtual memory-related resource limits in a single place.alc2010-04-111-5/+12
| | | | | | | | | | | | | Previously, one of these limits was initialized in two places to a different value in each place. Moreover, because an unsigned int was used to represent the amount of pageable physical memory, some of these limits were incorrectly initialized on 64-bit architectures. (Currently, this error is masked by login.conf's default settings.) Make vm_thread_swapin() and vm_thread_swapout() static. Submitted by: bde (an earlier version) Reviewed by: kib
* Make _vm_map_init() the one place where the vm map's pmap field isalc2010-04-031-3/+2
| | | | | | initialized. Reviewed by: kib
* Random number generator initialization cleanup:ru2009-10-201-0/+13
| | | | | | | | | | | | | | | | | | | | | - Introduce new SI_SUB_RANDOM point in boot sequence to make it clear from where one may start using random(9). It should be as early as possible, so place it just after SI_SUB_CPU where we have some randomness on most platforms via get_cyclecount(). - Move stack protector initialization to be after SI_SUB_RANDOM as before this point we have no randomness at all. This fixes stack protector to actually protect stack with some random guard value instead of a well-known one. Note that this patch doesn't try to address arc4random(9) issues. With current code, it will be implicitly seeded by stack protector and hence will get the same entropy as random(9). It will be securely reseeded once /dev/random is feeded by some entropy from userland. Submitted by: Maxim Dounin <mdounin@mdounin.ru> MFC after: 3 days
* Add a mitigation feature that will prevent user mappings atbz2009-10-021-0/+5
| | | | | | | | | | | | | | | | | virtual address 0, limiting the ability to convert a kernel NULL pointer dereference into a privilege escalation attack. If the sysctl is set to 0 a newly started process will not be able to map anything in the address range of the first page (0 to PAGE_SIZE). This is the default. Already running processes are not affected by this. You can either change the sysctl or the tunable from loader in case you need to map at a virtual address of 0, for example when running any of the extinct species of a set of a.out binaries, vm86 emulation, .. In that case set security.bsd.map_at_zero="1". Superseeds: r197537 In collaboration with: jhb, kib, alc
* print machine in kernel boot version stringavg2009-10-011-1/+14
| | | | | | Discussed with: gavin, kib, jhb PR: kern/126926 MFC after: 2 weeks
* print_caddr_t: drop incorrect __unused attribute from parameteravg2009-09-301-1/+1
| | | | | | | seems like a purely cosmetic change Reviewed by: jhb, kib MFC after: 1 week
* Remove the interim vimage containers, struct vimage and struct procg,jamie2009-07-171-7/+0
| | | | | | and the ioctl-based interface that supported them. Approved by: re (kib), bz (mentor)
* Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Usekib2009-06-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | vnode interlock to protect the knote fields [1]. The locking assumes that shared vnode lock is held, thus we get exclusive access to knote either by exclusive vnode lock protection, or by shared vnode lock + vnode interlock. Do not use kl_locked() method to assert either lock ownership or the fact that curthread does not own the lock. For shared locks, ownership is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared lock not owned by curthread, causing false positives in kqueue subsystem assertions about knlist lock. Remove kl_locked method from knlist lock vector, and add two separate assertion methods kl_assert_locked and kl_assert_unlocked, that are supposed to use proper asserts. Change knlist_init accordingly. Add convenience function knlist_init_mtx to reduce number of arguments for typical knlist initialization. Submitted by: jhb [1] Noted by: jhb [2] Reviewed by: jhb Tested by: rnoland
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICrwatson2009-06-051-1/+0
| | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
* Add hierarchical jails. A jail may further virtualize its environmentjamie2009-05-271-1/+3
| | | | | | | | | | | | | | | | | | | | | | by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)
* Introduce a new virtualization container, provisionally named vprocg, to holdzec2009-05-081-1/+4
| | | | | | | | | | | | | | | | | | | | | | virtualized instances of hostname and domainname, as well as a new top-level virtualization struct vimage, which holds pointers to struct vnet and struct vprocg. Struct vprocg is likely to become replaced in the near future with a new jail management API import. As a consequence of this change, change struct ucred to point to a struct vimage, instead of directly pointing to a vnet. Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage branch. Permit kldload / kldunload operations to be executed only from the default vimage context. This change should have no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz Approved by: julian (mentor)
* Change the curvnet variable from a global const struct vnet *,zec2009-05-051-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_* macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)
* Rename three MAC entry points from _proc_ to _cred_ to reflect the factrwatson2008-10-281-2/+2
| | | | | | | that they operate directly on credentials: mac_proc_create_swapper(), mac_proc_create_init(), and mac_proc_associate_nfsd(). Update policies. Obtained from: TrustedBSD Project
* Change the static struct sysentvec and struct Elf_Brandinfo initializerskib2008-09-241-26/+27
| | | | | | | | | | | to the C99 style. At least, it is easier to read sysent definitions that way, and search for the actual instances of sigcode etc. Explicitely initialize sysentvec.sv_maxssiz that was missed in most sysvecs. No objection from: jhb MFC after: 1 month
OpenPOWER on IntegriCloud