summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Remove unused taskqueue_find() function.pjd2009-08-181-43/+0
| | | | | Reviewed by: dfr Approved by: re (kib)
* * Change the scope of the ASSERT_ATOMIC_LOAD() from a generic check toattilio2009-08-174-6/+12
| | | | | | | | | | | | | | | a pointer-fetching specific operation check. Consequently, rename the operation ASSERT_ATOMIC_LOAD_PTR(). * Fix the implementation of ASSERT_ATOMIC_LOAD_PTR() by checking directly alignment on the word boundry, for all the given specific architectures. That's a bit too strict for some common case, but it assures safety. * Add a comment explaining the scope of the macro * Add a new stub in the lockmgr specific implementation Tested by: marcel (initial version), marius Reviewed by: rwatson, jhb (comment specific review) Approved by: re (kib)
* Remove OpenSolaris taskq port (it performs very poorly in our kernel) andpjd2009-08-171-0/+20
| | | | | | | | replace it with wrappers around our taskqueue(9). To make it possible implement taskqueue_member() function which returns 1 if the given thread was created by the given taskqueue. Approved by: re (kib)
* Because taskqueue_run() can drop tq_mutex, we need to check if thepjd2009-08-171-0/+7
| | | | | | | TQ_FLAGS_ACTIVE flag wasn't removed in the meantime, which means we missed a wakeup. Approved by: re (kib)
* Fix small style regression introduced by the MPSAFE newbus code.ed2009-08-161-1/+1
| | | | Approved by: re (rwatson)
* Rather than fix questionable ifnet list locking in the implementation ofrwatson2009-08-151-49/+0
| | | | | | | | the kern.polling.enable sysctl, remove the sysctl. It has been deprecated since FreeBSD 6 in favour of per-ifnet polling flags. Reviewed by: luigi Approved by: re (kib)
* Add a new macro to test that a variable could be loaded atomically.bz2009-08-143-0/+6
| | | | | | | | | | | | | | | | | | | Check that the given variable is at most uintptr_t in size and that it is aligned. Note: ASSERT_ATOMIC_LOAD() uses ALIGN() to check for adequate alignment -- however, the function of ALIGN() is to guarantee alignment, and therefore may lead to stronger alignment enforcement than necessary for types that are smaller than sizeof(uintptr_t). Add checks to mtx, rw and sx locks init functions to detect possible breakage. This was used during debugging of the problem fixed with r196118 where a pointer was on an un-aligned address in the dpcpu area. In collaboration with: rwatson Reviewed by: rwatson Approved by: re (kib)
* Correctly handle unlock for !MAKEENTRY case, after successfull attempt ofkib2009-08-141-1/+2
| | | | | | | | lock upgrade cache shall be unlocked from write. Reported by: Lucius Windschuh <lwindschuh googlemail com> Reviewed by: kan Approved by: re (rwatson)
* * Completely Remove the option STOP_NMI from the kernel. This optionattilio2009-08-133-10/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | has proven to have a good effect when entering KDB by using a NMI, but it completely violates all the good rules about interrupts disabled while holding a spinlock in other occasions. This can be the cause of deadlocks on events where a normal IPI_STOP is expected. * Adds an new IPI called IPI_STOP_HARD on all the supported architectures. This IPI is responsible for sending a stop message among CPUs using a privileged channel when disponible. In other cases it just does match a normal IPI_STOP. Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64 architectures, while on the other has a normal IPI_STOP effect. It is responsibility of maintainers to eventually implement an hard stop when necessary and possible. * Use the new IPI facility in order to implement a new userend SMP kernel function called stop_cpus_hard(). That is specular to stop_cpu() but it does use the privileged channel for the stopping facility. * Let KDB use the newly introduced function stop_cpus_hard() and leave stop_cpus() for all the other cases * Disable interrupts on CPU0 when starting the process of APs suspension. * Style cleanup and comments adding This patch should fix the reboot/shutdown deadlocks many users are constantly reporting on mailing lists. Please don't forget to update your config file with the STOP_NMI option removal Reviewed by: jhb Tested by: pho, bz, rink Approved by: re (kib)
* Make it possible to change the vnet sysctl variables on jailsbz2009-08-132-2/+33
| | | | | | | | | with their own virtual network stack. Jails only inheriting a network stack cannot change anything that cannot be changed from within a prison. Reviewed by: rwatson, zec Approved by: re (kib)
* Make the kernel compile without IP networking by movingbz2009-08-121-1/+2
| | | | | | a variable under a proper #ifdef. Approved by: re (rwatson)
* Add ddb show dpcpu_off command to ease dpcpu memory debugging.bz2009-08-121-0/+12
| | | | | | | | | | While show pcpu prints pc_dynamic this also prints the original memory address as well as the maths. Once dpcpu goes NUMA this is considered to help debugging as well. Reviewed by: rwatson Approved by: re
* Stop uuidgen(2) from crashing in vimage kerenels.julian2009-08-021-0/+4
| | | | | | | make curvnet valid when needed. Reviewed by: bz@ Approved by: re (kib)
* Make the newbus subsystem Giant free by adding the new newbus sxlock.attilio2009-08-021-7/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The newbus lock is responsible for protecting newbus internIal structures, device states and devclass flags. It is necessary to hold it when all such datas are accessed. For the other operations, softc locking should ensure enough protection to avoid races. Newbus lock is automatically held when virtual operations on the device and bus are invoked when loading the driver or when the suspend/resume take place. For other 'spourious' operations trying to access/modify the newbus topology, newbus lock needs to be automatically acquired and dropped. For the moment Giant is also acquired in some key point (modules subsystem) in order to avoid problems before the 8.0 release as module handlers could make assumptions about it. This Giant locking should go just after the release happens. Please keep in mind that the public interface can be expanded in order to provide more support, if there are really necessities at some point and also some bugs could arise as long as the patch needs a bit of further testing. Bump __FreeBSD_version in order to reflect the newbus lock introduction. Reviewed by: ed, hps, jhb, imp, mav, scottl No answer by: ariff, thompsa, yongari Tested by: pho, G. Trematerra <giovanni dot trematerra at gmail dot com>, Brandon Gooch <jamesbrandongooch at gmail dot com> Sponsored by: Yahoo! Incorporated Approved by: re (ksmith)
* Fix two bugs related to TTY input:ed2009-08-022-1/+11
| | | | | | | | | | - fix write() on pseudo-terminal masters to return the amount of bytes passed to the TTY, not the amount of bytes read from user. - fix ttydisc_rint_bypass() to set the high watermark when it cannot write all input, just like ttydisc_rint() itself. Approved by: re (kib)
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andrwatson2009-08-0111-196/+16
| | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
* Make the "enforce_statfs" default 2 (most restrictive) in jail_set(2),jamie2009-07-311-3/+4
| | | | | | | | instead of whatever the parent/system has (which is generally 0). This mirrors the old-style default used for jail(2) in conjunction with the security.jail.enforce_statfs sysctl. Approved by: re (kib), bz (mentor)
* Fix some LORs between vnode locks and filedescriptor table locks.jhb2009-07-313-13/+6
| | | | | | | | | | - Don't grab the filedesc lock just to read fd_cmask. - Drop vnode locks earlier when mounting the root filesystem and before sanitizing stdin/out/err file descriptors during execve(). Submitted by: kib Approved by: re (rwatson) MFC after: 1 week
* Remove a LOR, where the the sleepable allprison_lock was being obtainedjamie2009-07-301-309/+187
| | | | | | | | | | | | | | | | | | | | in prison_equal_ip4/6 while an inp mutex was held. Locking allprison_lock can be avoided by making a restriction on the IP addresses associated with jails: Don't allow the "ip4" and "ip6" parameters to be changed after a jail is created. Setting the "ip4.addr" and "ip6.addr" parameters is allowed, but only if the jail was already created with either ip4/6=new or ip4/6=disable. With this restriction, the prison flags in question (PR_IP4_USER and PR_IP6_USER) become read-only and can be checked without locking. This also allows the simplification of a messy code path that was needed to handle an existing prison gaining an IP address list. PR: kern/136899 Reported by: Dirk Meyer Approved by: re (kib), bz (mentor)
* Don't allow mixing the "vnet" and "ip4/6" jail parameters, since vnetjamie2009-07-291-11/+98
| | | | | | | | jails have their own IP stack and don't have access to the parent IP addresses anyway. Note that a virtual network stack forms a break between prisons with regard to the list of allowed IP addresses. Approved by: re (kib), bz (mentor)
* Change the default value of the "ip4" and "ip6" jail parameters tojamie2009-07-291-27/+7
| | | | | | | | | "disable", which only allows access to the parent/physical system's IP addresses when specifically directed. Change the default value of "host" to "new", and don't copy the parent host values, to insulate jails from the parent hostname et al. Approved by: re (kib), bz (mentor)
* Eliminate ARG_UPATH[12] arguments to AUDIT_ARG_UPATH() and insteadrwatson2009-07-292-3/+3
| | | | | | | | | | | | provide specific macros, AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2() to capture path information for audit records. This allows us to move the definitions of ARG_* out of the public audit header file, as they are an implementation detail of our current kernel-internal audit record, which may change. Approved by: re (kensmith) Obtained from: TrustedBSD Project MFC after: 1 month
* Rework vnode argument auditing to follow the same structure, in orderrwatson2009-07-283-13/+13
| | | | | | | | | | to avoid exposing ARG_ macros/flag values outside of the audit code in order to name which one of two possible vnodes will be audited for a system call. Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 month
* Audit file descriptors passed to fooat(2) system calls, which are usedrwatson2009-07-281-1/+6
| | | | | | | | | | | | | | | instead of the root/current working directory as the starting point for lookups. Up to two such descriptors can be audited. Add audit record BSM encoding for fooat(2). Note: due to an error in the OpenBSM 1.1p1 configuration file, a further change is required to that file in order to fix openat(2) auditing. Approved by: re (kib) Reviewed by: rdivacky (fooat(2) portions) Obtained from: TrustedBSD Project MFC after: 1 month
* Somewhere along the line accept sockets stopped honoring thejulian2009-07-281-0/+1
| | | | | | | | FIB selected for them. Fix this. Reviewed by: ambrisko Approved by: re (kib) MFC after: 3 days
* Make the in-kernel logic for the SIOCSIFVNET, SIOCSIFRVNET ioctlsbz2009-07-261-55/+0
| | | | | | | | | | | | | | | | | | | | (ifconfig ifN (-)vnet <jname|jid>) work correctly. Move vi_if_move to if.c and split it up into two functions(*), one for each ioctl. In the reclaim case, correctly set the vnet before calling if_vmove. Instead of silently allowing a move of an interface from the current vnet to the current vnet, return an error. (*) There is some duplicate interface name checking before actually moving the interface between network stacks without locking and thus race prone. Ideally if_vmove will correctly and automagically handle these in the future. Suggested by: rwatson (*) Approved by: re (kib)
* Some jail parameters (in particular, "ip4" and "ip6" for IP addressjamie2009-07-251-35/+85
| | | | | | | | | restrictions) were found to be inadequately described by a boolean. Define a new parameter type with three values (disable, new, inherit) to handle these and future cases. Approved by: re (kib), bz (mentor) Discussed with: rwatson
* Introduce a new sysctl process mib, kern.proc.groups which adds thebrooks2009-07-241-0/+40
| | | | | | | | | | | | ability to retrieve the group list of each process. Modify procstat's -s option to query this mib when the kinfo_proc reports that the field has been truncated. If the mib does not exist, fall back to the truncated list. Reviewed by: rwatson Approved by: re (kib) MFC after: 2 weeks
* Revert the changes to struct kinfo_proc in r194498. Instead, fillbrooks2009-07-241-3/+9
| | | | | | | | | | in up to 16 (KI_NGROUPS) values and steal a bit from ki_cr_flags (all bits currently unused) to indicate overflow with the new flag KI_CRF_GRP_OVERFLOW. This fixes procstat -s. Approved by: re (kib)
* Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar tojhb2009-07-241-0/+6
| | | | | | | | | | | a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
* Introduce and use a sysinit-based initialization scheme for virtualrwatson2009-07-232-241/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | network stacks, VNET_SYSINIT: - Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will occur each time a network stack is instantiated and destroyed. In the !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT. For the VIMAGE case, we instead use SYSINIT's to track their order and properties on registration, using them for each vnet when created/ destroyed, or immediately on module load for already-started vnets. - Remove vnet_modinfo mechanism that existed to serve this purpose previously, as well as its dependency scheme: we now just use the SYSINIT ordering scheme. - Implement VNET_DOMAIN_SET() to allow protocol domains to declare that they want init functions to be called for each virtual network stack rather than just once at boot, compiling down to DOMAIN_SET() in the non-VIMAGE case. - Walk all virtualized kernel subsystems and make use of these instead of modinfo or DOMAIN_SET() for init/uninit events. In some cases, convert modular components from using modevent to using sysinit (where appropriate). In some cases, do minor rejuggling of SYSINIT ordering to make room for or better manage events. Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup) Discussed with: jhb, bz, julian, zec Reviewed by: bz Approved by: re (VIMAGE blanket)
* sysctl_msec_to_ticks is used with both virtualized andbz2009-07-211-5/+0
| | | | | | | | | | | | | | | non-vrtiualized sysctls so we cannot used one common function. Add a macro to convert the arg1 in the virtualized case to vnet.h to not expose the maths to all over the code. Add a wrapper for the single virtualized call, properly handling arg1 and call the default implementation from there. Convert the two over places to use the new macro. Reviewed by: rwatson Approved by: re (kib)
* Improve the printf message when a module failed to load. This gives therpaulo2009-07-211-2/+2
| | | | | | | user some clue about the possibility of a __FreeBSD_version mismatch. Discussed with: rwatson, jhb Approved by: re (kib)
* Add macros VNET_SETNAME and VNET_SYMPREFIX, and expose to userspace ifrwatson2009-07-201-4/+4
| | | | | | | | _WANT_VNET is defined. This way we don't need separate definitions in libkvm. Reviewed by: bz Approved by: re (vimage blanket)
* When buffer write is failed, it is wrong for brelse() to invalidatekib2009-07-191-1/+2
| | | | | | | | | | portion of the page that was written. Among other problems, this page might be picked up by pagedaemon, with failed assertion in vm_pageout_flush() about validity of the page. Reported and tested by: pho Approved by: re (kensmith) MFC after: 3 weeks
* Normalize field naming for struct vnet, fix two debugging printfs thatrwatson2009-07-192-8/+8
| | | | | | | print them. Reviewed by: bz Approved by: re (kensmith, kib)
* Reimplement and/or implement vnet list locking by replacing a mostlyrwatson2009-07-191-21/+27
| | | | | | | | | | | | | | | | | | | | | | unused custom mutex/condvar-based sleep locks with two locks: an rwlock (for non-sleeping use) and sxlock (for sleeping use). Either acquired for read is sufficient to stabilize the vnet list, but both must be acquired for write to modify the list. Replace previous no-op read locking macros, used in various places in the stack, with actual locking to prevent race conditions. Callers must declare when they may perform unbounded sleeps or not when selecting how to lock. Refactor vnet sysinits so that the vnet list and locks are initialized before kernel modules are linked, as the kernel linker will use them for modules loaded by the boot loader. Update various consumers of these KPIs based on whether they may sleep or not. Reviewed by: bz Approved by: re (kib)
* Remove the interim vimage containers, struct vimage and struct procg,jamie2009-07-177-397/+24
| | | | | | and the ioctl-based interface that supported them. Approved by: re (kib), bz (mentor)
* r195699 introduced an assertion regarding when progbits data in kernelrwatson2009-07-151-3/+0
| | | | | | | | | modules was present, which turns out to be false in some situations. Back out the assertion. Reported by: Luiz Otavio O Souza <lists.br at gmail.com>, Florian Smeets <flo at kasimir.com> Approved by: re (kensmith) (implicit)
* Add new msleep(9) flag PBDY that shall be specified together withkib2009-07-145-26/+48
| | | | | | | | | | | | PCATCH, to indicate that thread shall not be stopped upon receipt of SIGSTOP until it reaches the kernel->usermode boundary. Also change thread_single(SINGLE_NO_EXIT) to only stop threads at the user boundary unconditionally. Tested by: pho Reviewed by: jhb Approved by: re (kensmith)
* Move the repeated code to calculate the number of the threads in thekib2009-07-141-18/+19
| | | | | | | | | process that still need to be suspended or exited from thread_single into the new function calc_remaining(). Tested by: pho Reviewed by: jhb Approved by: re (kensmith)
* When wakeup(9) is going to notify swapper, assert that wait channel is notkib2009-07-141-1/+4
| | | | | | | | equal to &proc0. It shall be not, since proc0 stack is not swappable, and kick_proc0() is wakeup(&proc0). Reviewed by: jhb Approved by: re (kensmith)
* Build on Jeff Roberson's linker-set based dynamic per-CPU allocatorrwatson2009-07-148-226/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
* Add support to the virtual memory system for configuring machine-alc2009-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dependent memory attributes: Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior. Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages. Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager: kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386. vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes. Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping. Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7. In collaboration with: jhb Approved by: re (kib)
* The control terminal revocation at the session leader exit does notkib2009-07-091-3/+4
| | | | | | | | | | correctly checks for reclaimed vnode, possibly calling VOP_REVOKE for such vnode. If the terminal is already revoked, or devfs mount was forcibly unmounted, the revocation of doomed ctty vnode causes panic. Reported and tested by: lstewart Approved by: re (kensmith) MFC after: 2 weeks
* Remove crcopy call from seteuid now that it calls crcopysafe.jamie2009-07-081-1/+0
| | | | | Reviewed by: brooks Approved by: re (kib), bz (mentor)
* Regenerate after lpathconf(2) addition.trasz2009-07-083-2/+25
| | | | Approved by: re (kib)
* There is an optimization in chmod(1), that makes it not to call chmod(2)trasz2009-07-082-4/+24
| | | | | | | | | | | | | if the new file mode is the same as it was before; however, this optimization must be disabled for filesystems that support NFSv4 ACLs. Chmod uses pathconf(2) to determine whether this is the case - however, pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't. This change adds lpathconf(3) to make it possible to solve that problem in a clean way. Reviewed by: rwatson (earlier version) Approved by: re (kib)
* Fix regressions in return events of poll() on TTYs.ed2009-07-082-11/+8
| | | | | | | | | As pointed out, POLLHUP should be generated, even if it hasn't been specified on input. It is also not allowed to return both POLLOUT and POLLHUP at the same time. Reported by: jilles Approved by: re (kib)
* Increase HZ_VM from 10 to 100. While 10 hz saves cpu timesilby2009-07-081-1/+1
| | | | | | | | under VM environments, it's too slow for FreeBSD to work properly. For example, ping at 10hz pings about every 600ms instead of about every second. Approved by: re (kib)
OpenPOWER on IntegriCloud