summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_sysctl.c
Commit message (Collapse)AuthorAgeFilesLines
* Update comments to reflect r246689.marius2013-02-111-2/+8
|
* Make SYSCTL_{LONG,QUAD,ULONG,UQUAD}(9) work as advertised and also handlemarius2013-02-111-9/+17
| | | | | | | constant values. Reviewed by: kib MFC after: 3 days
* Handle spurious page faults that may occur in no-fault sections of thealc2012-03-221-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | kernel. When access restrictions are added to a page table entry, we flush the corresponding virtual address mapping from the TLB. In contrast, when access restrictions are removed from a page table entry, we do not flush the virtual address mapping from the TLB. This is exactly as recommended in AMD's documentation. In effect, when access restrictions are removed from a page table entry, AMD's MMUs will transparently refresh a stale TLB entry. In short, this saves us from having to perform potentially costly TLB flushes. In contrast, Intel's MMUs are allowed to generate a spurious page fault based upon the stale TLB entry. Usually, such spurious page faults are handled by vm_fault() without incident. However, when we are executing no-fault sections of the kernel, we are not allowed to execute vm_fault(). This change introduces special-case handling for spurious page faults that occur in no-fault sections of the kernel. In collaboration with: kib Tested by: gibbs (an earlier version) I would also like to acknowledge Hiroki Sato's assistance in diagnosing this problem. MFC after: 1 week
* In order to maximize the re-usability of kernel code in user space thiskmacy2011-09-161-1/+1
| | | | | | | | | | | | | patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
* Define two new sysctl node flags: CTLFLAG_CAPRD and CTLFLAG_CAPRW, whichrwatson2011-07-171-6/+36
| | | | | | | | | | | | | | | | | | | may be jointly referenced via the mask CTLFLAG_CAPRW. Sysctls with these flags are available in Capsicum's capability mode; other sysctl nodes are not. Flag several useful sysctls as available in capability mode, such as memory layout sysctls required by the run-time linker and malloc(3). Also expose access to randomness and available kernel features. A few sysctls are enabled to support name->MIB conversion; these may leak information to capability mode by virtue of providing resolution on names not flagged for access in capability mode. This is, generally, not a huge problem, but might be something to resolve in the future. Flag these cases with XXX comments. Submitted by: jonathan Sponsored by: Google, Inc.
* Use a name instead of a magic number for kern_yield(9) when the prioritymdf2011-05-131-1/+1
| | | | | | | | should not change. Fetch the td_user_pri under the thread lock. This is probably not necessary but a magic number also seems preferable to knowing the implementation details here. Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >
* - Merge changes to the base system to support OFED. These includejeff2011-03-211-4/+26
| | | | | a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.
* Based on discussions on the svn-src mailing list, rework r218195:mdf2011-02-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | - entirely eliminate some calls to uio_yeild() as being unnecessary, such as in a sysctl handler. - move should_yield() and maybe_yield() to kern_synch.c and move the prototypes from sys/uio.h to sys/proc.h - add a slightly more generic kern_yield() that can replace the functionality of uio_yield(). - replace source uses of uio_yield() with the functional equivalent, or in some cases do not change the thread priority when switching. - fix a logic inversion bug in vlrureclaim(), pointed out by bde@. - instead of using the per-cpu last switched ticks, use a per thread variable for should_yield(). With PREEMPTION, the only reasonable use of this is to determine if a lock has been held a long time and relinquish it. Without PREEMPTION, this is essentially the same as the per-cpu variable.
* Explicitly wire the user buffer rather than doing it implicitly inmdf2011-01-271-4/+2
| | | | | | | | sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT drain for extremely large amounts of data where the caller knows that appropriate references are held, and sleeping is not an issue. Inspired by: rwatson
* Remove the CTLFLAG_NOLOCK as it seems to be both unused andmdf2011-01-261-5/+3
| | | | | | | | | | | | unfunctional. Wiring the user buffer has only been done explicitly since r101422. Mark the kern.disks sysctl as MPSAFE since it is and it seems to have been mis-using the NOLOCK flag. Partially break the KPI (but not the KBI) for the sysctl_req 'lock' field since this member should be private and the "REQ_LOCKED" state seems meaningless now.
* Introduce signed and unsigned version of CTLTYPE_QUAD, renamingmdf2011-01-191-3/+6
| | | | existing uses. Rename sysctl_handle_quad() to sysctl_handle_64().
* Specify a CTLTYPE_FOO so that a future sysctl(8) change does not needmdf2011-01-181-1/+2
| | | | to rely on the format string.
* Fix uninitialized variable warning that shows on Tinderbox but not mymdf2010-11-291-1/+1
| | | | | | setup. (??) Submitted by: Michael Butler <imb at protected-networks dot net>
* Do not hold the sysctl lock across a call to the handler. This fixes amdf2010-11-291-27/+67
| | | | | | | | | | general LOR issue where the sysctl lock had no good place in the hierarchy. One specific instance is #284 on http://sources.zabbadoz.net/freebsd/lor.html . Reviewed by: jhb MFC after: 1 month X-MFC-note: split oid_refcnt field for oid_running to preserve KBI
* Slightly modify the logic in sysctl_find_oid to reduce the indentation.mdf2010-11-291-19/+22
| | | | | | There should be no functional change. MFC after: 3 days
* Use the SYSCTL_CHILDREN macro in kern_sysctl.c to help de-obfuscate themdf2010-11-291-7/+6
| | | | | | code. MFC after: 3 days
* Re-add r212370 now that the LOR in powerpc64 has been resolved:mdf2010-09-161-0/+28
| | | | | | | | | | | | Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough SBUF_FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk (original patch)
* Revert r212370, as it causes a LOR on powerpc. powerpc does a fewmdf2010-09-131-28/+0
| | | | | | | unexpected things in copyout(9) and so wiring the user buffer is not sufficient to perform a copyout(9) while holding a random mutex. Requested by: nwhitehorn
* Add a drain function for struct sysctl_req, and use it for a variety ofmdf2010-09-091-0/+28
| | | | | | | | | | handlers, some of which had to do awkward things to get a large enough FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk
* Make it possible to change the vnet sysctl variables on jailsbz2009-08-131-2/+10
| | | | | | | | | with their own virtual network stack. Jails only inheriting a network stack cannot change anything that cannot be changed from within a prison. Reviewed by: rwatson, zec Approved by: re (kib)
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andrwatson2009-08-011-1/+0
| | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
* sysctl_msec_to_ticks is used with both virtualized andbz2009-07-211-5/+0
| | | | | | | | | | | | | | | non-vrtiualized sysctls so we cannot used one common function. Add a macro to convert the arg1 in the virtualized case to vnet.h to not expose the maths to all over the code. Add a wrapper for the single virtualized call, properly handling arg1 and call the default implementation from there. Convert the two over places to use the new macro. Reviewed by: rwatson Approved by: re (kib)
* Build on Jeff Roberson's linker-set based dynamic per-CPU allocatorrwatson2009-07-141-97/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
* Add explicit includes for jail.h to the files that need them andbz2009-06-171-0/+1
| | | | remove the "hidden" one from vimage.h.
* Get vnets from creds instead of threads where they're available, and fromjamie2009-06-151-1/+1
| | | | | | | passed threads instead of curthread. Reviewed by: zec, julian Approved by: bz (mentor)
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICrwatson2009-06-051-1/+0
| | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
* Remove do-nothing code that was required to dirty the old buffer on Alpha.des2009-05-151-12/+1
| | | | | Coverity ID: 838 Approved by: jhb, alc
* Revert r192094. The revision caused problems for sysctl(3) consumerskib2009-05-151-5/+3
| | | | | | | | | | | | | that expect that oldlen is filled with required buffer length even when supplied buffer is too short and returned error is ENOMEM. Redo the fix for kern.proc.filedesc, by reverting the req->oldidx when remaining buffer space is too short for the current kinfo_file structure. Also, only ignore ENOMEM. We have to convert ENOMEM to no error condition to keep existing interface for the sysctl, though. Reported by: ed, Florian Smeets <flo kasimir com> Tested by: pho
* - Use a separate sx lock to try to limit the number of concurrent userlandjhb2009-05-141-7/+16
| | | | | | | | | sysctl requests to avoid wiring too much user memory. Only grab this lock if the user's old buffer is larger than a page as a tradeoff to allow more concurrency for common small requests. - Just use a shared lock on the sysctl tree for user sysctl requests now. MFC after: 1 week
* Do not advance req->oldidx when sysctl_old_user returning ankib2009-05-141-3/+5
| | | | | | | | | | | | | error due to copyout failure or short buffer. The later breaks the usermode iterators of the sysctl results that pack arbitrary number of variable-sized structures. Iterator expects that kernel filled exactly oldlen bytes, and tries to interpret half-filled or garbage structure at the end of the buffer. In particular, kinfo_getfile(3) segfaulted. Reported and tested by: pho MFC after: 3 weeks
* Permit buiding kernels with options VIMAGE, restricted to only a singlezec2009-04-301-2/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build: 1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes: options VIMAGE: ((struct vnet_net *) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet 2) INIT_VNET_* macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace: INIT_VNET_NET(ifp->if_vnet); becomes struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET]; 3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures. 4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required. 5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet. 6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_* family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing. Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted. Reviewed by: bz, rwatson Approved by: julian (mentor)
* Add a new type of KTRACE record for sysctl(3) invocations. It uses thejhb2009-03-111-1/+10
| | | | | | | | internal sysctl_sysctl_name() handler to map the MIB array to a string name and logs this name in the trace log. This can be useful to see exactly which sysctls a thread is invoking. MFC after: 1 month
* - Remove a recently added comment from kernel_sysctlbyname() that isn'tjhb2009-03-101-9/+1
| | | | | | | | | needed. - Move the release of the sysctl sx lock after the vsunlock() in userland_sysctl() to restore the original memlock behavior of minimizing the amount of memory wired to handle sysctl requests. MFC after: 1 week
* Expand the scope of the sysctllock sx lock to protect the sysctl tree itself.jhb2009-02-061-21/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Back in 1.1 of kern_sysctl.c the sysctl() routine wired the "old" userland buffer for most sysctls (everything except kern.vnode.*). I think to prevent issues with wiring too much memory it used a 'memlock' to serialize all sysctl(2) invocations, meaning that only one user buffer could be wired at a time. In 5.0 the 'memlock' was converted to an sx lock and renamed to 'sysctl lock'. However, it still only served the purpose of serializing sysctls to avoid wiring too much memory and didn't actually protect the sysctl tree as its name suggested. These changes expand the lock to actually protect the tree. Later on in 5.0, sysctl was changed to not wire buffers for requests by default (sysctl_handle_opaque() will still wire buffers larger than a single page, however). As a result, user buffers are no longer wired as often. However, many sysctl handlers still wire user buffers, so it is still desirable to serialize userland sysctl requests. Kernel sysctl requests are allowed to run in parallel, however. - Expose sysctl_lock()/sysctl_unlock() routines to exclusively lock the sysctl tree for a few places outside of kern_sysctl.c that manipulate the sysctl tree directly including the kernel linker and vfs_register(). - sysctl_register() and sysctl_unregister() require the caller to lock the sysctl lock using sysctl_lock() and sysctl_unlock(). The rest of the public sysctl API manage the locking internally. - Add a locked variant of sysctl_remove_oid() for internal use so that external uses of the API do not need to be aware of locking requirements. - The kernel linker no longer needs Giant when manipulating the sysctl tree. - Add a missing break to the loop in vfs_register() so that we stop looking at the sysctl MIB once we have changed it. MFC after: 1 month
* Mark most often used sysctl's as MPSAFE.ed2009-01-281-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | After running a `make buildkernel', I noticed most of the Giant locks in sysctl are only caused by a very small amount of sysctl's: - sysctl.name2oid. This one is locked by SYSCTL_LOCK, just like sysctl.oidfmt. - kern.ident, kern.osrelease, kern.version, etc. These are just constant strings. - kern.arandom, used by the stack protector. It is already protected by arc4_mtx. I also saw the following sysctl's show up. Not as often as the ones above, but still quite often: - security.jail.jailed. Also mark security.jail.list as MPSAFE. They don't need locking or already use allprison_lock. - kern.devname, used by devname(3), ttyname(3), etc. This seems to reduce Giant locking inside sysctl by ~75% in my primitive test setup.
* Add a flag to tag individual sysctl leaf nodes as MPSAFE and thus notjhb2009-01-231-4/+4
| | | | | | needing Giant. Submitted by: csjp (an older version)
* Don't clobber sysctl_root()'s error number.ed2009-01-011-2/+5
| | | | | | | When sysctl() is being called with a buffer that is too small, it will return ENOMEM. Unfortunately the changes I made the other day sets the error number to 0, because it just returns the error number of the copyout(). Revert this part of the change.
* Fix compilation. Also move ogetkerninfo() to kern_xxx.c.ed2008-12-291-211/+0
| | | | | | | It seems I forgot to remove `int error' from a single piece of code. I'm also moving ogetkerninfo() to kern_xxx.c, because it belongs to the class of compat system information system calls, not the generic sysctl code.
* Push down Giant inside sysctl. Also add some more assertions to the code.ed2008-12-291-19/+23
| | | | | | | | | | | | | | In the existing code we didn't really enforce that callers hold Giant before calling userland_sysctl(), even though there is no guarantee it is safe. Fix this by just placing Giant locks around the call to the oid handler. This also means we only pick up Giant for a very short period of time. Maybe we should add MPSAFE flags to sysctl or phase it out all together. I've also added SYSCTL_LOCK_ASSERT(). We have to make sure sysctl_root() and name2oid() are called with the sysctl lock held. Reviewed by: Jille Timmermans <jille quis cx>
* Uio_yield() already does DROP_GIANT/PICKUP_GIANT, no need to repeat thiskib2008-12-121-2/+0
| | | | | | around the call. Noted by: bde
* The userland_sysctl() function retries sysctl_root() until returnedkib2008-12-121-2/+8
| | | | | | | | | | | | | | | | | | | error is not EAGAIN. Several sysctls that inspect another process use p_candebug() for checking access right for the curproc. p_candebug() returns EAGAIN for some reasons, in particular, for the process doing exec() now. If execing process tries to lock Giant, we get a livelock, because sysctl handlers are covered by Giant, and often do not sleep. Break the livelock by dropping Giant and allowing other threads to execute in the EAGAIN loop. Also, do not return EAGAIN from p_candebug() when process is executing, use more appropriate EBUSY error [1]. Reported and tested by: pho Suggested by: rwatson [1] Reviewed by: rwatson, des MFC after: 1 week
* Merge more of currently non-functional (i.e. resolving tozec2008-11-261-0/+3
| | | | | | | | | | | | | | | | | whitespace) macros from p4/vimage branch. Do a better job at enclosing all instantiations of globals scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks. De-virtualize and mark as const saorder_state_alive and saorder_state_any arrays from ipsec code, given that they are never updated at runtime, so virtualizing them would be pointless. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
* Add sysctl_rename_oid() to support device_set_unit() usage. Otherwise,peter2007-11-301-0/+19
| | | | | when unit numbers are changed, the sysctl devinfo tree gets out of sync and duplicate trees are attempted to be attached with the original name.
* Merge first in a series of TrustedBSD MAC Framework KPI changesrwatson2007-10-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
* In userland_sysctl(), call useracc() with the actual newlen value to berwatson2007-09-021-1/+1
| | | | | | | | | | | | | used, rather than the one passed via 'req', which may not reflect a rewrite. This call to useracc() is redundant to validation performed by later copyin()/copyout() calls, so there isn't a security issue here, but this could technically lead to excessive validation of addresses if the length in newlen is shorter than req.newlen. Approved by: re (kensmith) Reviewed by: jhb Submitted by: Constantine A. Murenin <cnst+freebsd@bugmail.mojo.ru> Sponsored by: Google Summer of Code 2007
* Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); inrwatson2007-06-121-2/+1
| | | | | | | | | | | | | | | some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project
* Add a function for exporting 64 bit types.dwmalone2007-06-041-0/+25
|
* Further system call comment cleanup:rwatson2007-03-051-2/+1
| | | | | | | | | | - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
* Remove 'MPSAFE' annotations from the comments above most system calls: allrwatson2007-03-041-6/+0
| | | | | | | | system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
* Sweep kernel replacing suser(9) calls with priv(9) calls, assigningrwatson2006-11-061-6/+5
| | | | | | | | | | | | | specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
OpenPOWER on IntegriCloud