summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Rework the support for ABIs to override resource limits (used by 32-bitjhb2007-05-143-16/+5
| | | | | | | | | | | | | | | | | | | processes under 64-bit kernels). Previously, each 32-bit process overwrote its resource limits at exec() time. The problem with this approach is that the new limits affect all child processes of the 32-bit process, including if the child process forks and execs a 64-bit process. To fix this, don't ovewrite the resource limits during exec(). Instead, sv_fixlimits() is now replaced with a different function sv_fixlimit() which asks the ABI to sanitize a single resource limit. We then use this when querying and setting resource limits. Thus, if a 32-bit process sets a limit, then that new limit will be inherited by future children. However, if the 32-bit process doesn't change a limit, then a future 64-bit child will see the "full" 64-bit limit rather than the 32-bit limit. MFC is tentative since it will break the ABI of old linux.ko modules (no other modules are affected). MFC after: 1 week
* Move cpu_exit() earlier in exit1() to close a race betweenjhb2007-05-141-16/+10
| | | | | | | | | | | | SIGCHLD/kevent(2) notification of process termination and wait(). Now we no longer drop locks between sending the notification and marking the process as a zombie. Previously, if another process attempted to do a wait() with W_NOHANG after receiving a SIGCHLD or kevent and locked the process while the exiting thread was in cpu_exit(), then wait() would fail to find the process, which is quite astonishing to the process calling wait(). MFC after: 3 days
* Update entries for building tags.mckusick2007-05-131-5/+9
|
* Improve INCLUDE_CONFIG_FILE support.wkoszek2007-05-121-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change will let us to have full configuration of a running kernel available in sysctl: sysctl -b kern.conftxt The same configuration is also contained within the kernel image. It can be obtained with: config -x <kernelfile> Current functionality lets you to quickly recover kernel configuration, by simply redirecting output from commands presented above and starting kernel build procedure. "include" statements are also honored, which means options and devices from included files are also included. Please note that comments from configuration files are not preserved by default. In order to preserve them, you can use -C flag for config(8). This will bring configuration file and included files literally; however, redirection to a file no longer works directly. This commit was followed by discussion, that took place on freebsd-current@. For more details, look here: http://lists.freebsd.org/pipermail/freebsd-current/2007-March/069994.html http://lists.freebsd.org/pipermail/freebsd-current/2007-May/071844.html Development of this patch took place in Perforce, hierarchy: //depot/user/wkoszek/wkoszek_kconftxt/ Support from: freebsd-current@ (links above) Reviewed by: imp@ Approved by: imp@
* Make the TCP timer callout obtain Giant if the network stack is markedandre2007-05-111-2/+11
| | | | | | as non-mpsafe. This change is to be removed when all protocols are mp-safe.
* Remove more one more stale comment regarding unpcb type-safety.rwatson2007-05-111-4/+0
|
* Clarify and update quite a few comments to reflect locking optimizations,rwatson2007-05-111-38/+21
| | | | | | | the addition of unpcb refcounts, and bug fixes. Some of these fixes are appropriate for MFC. MFC after: 3 days
* Add destroyed cookie values for sx locks and rwlocks as well as extrajhb2007-05-082-2/+38
| | | | | KASSERTs so that any lock operations on a destroyed lock will panic or hang.
* Teach 'show lock' to properly handle a destroyed mutex.jhb2007-05-081-1/+5
|
* Fix a potential LOR with sx_sleep() and cv_wait() with sx locks byjhb2007-05-082-6/+32
| | | | | | 1) adding the thread to the sleepq via sleepq_add() before dropping the lock, and 2) dropping the sleepq lock around calls to lc_unlock() for sleepable locks (i.e. locks that use sleepq's in their implementation).
* Add missing socket buffer unlock before returning to userland.yongari2007-05-081-1/+1
| | | | Reviewed by: rwatson
* Bring in the reminaing bits to make interrupt filtering work:piso2007-05-061-6/+570
| | | | | | | | | | | | | | | o push much of the i386 and amd64 MD interrupt handling code (intr_machdep.c::intr_execute_handlers()) into MI code (kern_intr.c::ithread_loop()) o move filter handling to kern_intr.c::intr_filter_loop() o factor out the code necessary to mask and ack an interrupt event (intr_machdep.c::intr_eoi_src() and intr_machdep.c::intr_disab_eoi_src()), and make them part of 'struct intr_event', passing them as arguments to kern_intr.c::intr_event_create(). o spawn a private ithread per handler (struct intr_handler::ih_thread) with filter and ithread functions. Approved by: re (implicit?)
* Don't acquire Giant unconditionally.wkoszek2007-05-061-14/+20
| | | | Reviewed by: rwatson
* Mark the filedescriptor table entries with VOP_OPEN being performed for themkib2007-05-042-3/+19
| | | | | | | | | | | | as UF_OPENING. Disable closing of that entries. This should fix the crashes caused by devfs_open() (and fifo_open()) dereferencing struct file * by index, while the filedescriptor is closed by parallel thread. Idea by: tegge Reviewed by: tegge (previous version of patch) Tested by: Peter Holm Approved by: re (kensmith) MFC after: 3 weeks
* sblock() implements a sleep lock by interlocking SB_WANT and SB_LOCK flagsrwatson2007-05-034-92/+84
| | | | | | | | | | | | | | | | | | | | | | | on each socket buffer with the socket buffer's mutex. This sleep lock is used to serialize I/O on sockets in order to prevent I/O interlacing. This change replaces the custom sleep lock with an sx(9) lock, which results in marginally better performance, better handling of contention during simultaneous socket I/O across multiple threads, and a cleaner separation between the different layers of locking in socket buffers. Specifically, the socket buffer mutex is now solely responsible for serializing simultaneous operation on the socket buffer data structure, and not for I/O serialization. While here, fix two historic bugs: (1) a bug allowing I/O to be occasionally interlaced during long I/O operations (discovere by Isilon). (2) a bug in which failed non-blocking acquisition of the socket buffer I/O serialization lock might be ignored (discovered by sam). SCTP portion of this patch submitted by rrs.
* Remove unneeded include files.alc2007-05-011-2/+0
|
* Complete removal of restriction about overlaps to rman_manage_region:jmg2007-04-281-4/+0
| | | | | | | | remove comment and man page verbage... Document return values for rman_init and rman_manage_region.. MFC after: 1 week
* Avoid a lot of code duplication by using kern_open() to open /dev/nulljhb2007-04-261-45/+9
| | | | | | | in fdcheckstd() instead of a stripped down version of kern_open()'s code. MFC after: 1 week Reviewed by: cperciva
* Allow the dounmount() to proceed even for doomed coveredvp.kib2007-04-261-3/+1
| | | | | | | | | | | | | In dounmount(), before or while vn_lock(coveredvp) is called, coveredvp vnode may be VI_DOOMED due to one of the following: - other thread finished unmount and vput()ed it, and vnode was chosen for recycling, while vn_lock() slept; - forced unmount of the coveredvp->v_mount fs. In the first case, next check for changed v_mountedhere or mnt_gen counter would be successfull. In the second case, the unmount shall be allowed. Submitted by: sobomax MFC after: 2 weeks
* Disable nesting of BOP_BDFLUSH(). VOP_FSYNC() call in bdwrite() couldkib2007-04-241-2/+4
| | | | | | | | result in bdwrite() being reentered, thus causing infinite recursion. Reported and tested by: Peter Holm Reviewed by: tegge MFC after: 2 weeks
* Correct typo.pjd2007-04-231-1/+1
|
* Remove MAC Framework access control check entry points made redundant withrwatson2007-04-222-19/+0
| | | | | | | | | | | | | | | | | | the introduction of priv(9) and MAC Framework entry points for privilege checking/granting. These entry points exactly aligned with privileges and provided no additional security context: - mac_check_sysarch_ioperm() - mac_check_kld_unload() - mac_check_settime() - mac_check_system_nfsd() Add mpo_priv_check() implementations to Biba and LOMAC policies, which, for each privilege, determine if they can be granted to processes considered unprivileged by those two policies. These mostly, but not entirely, align with the set of privileges granted in jails. Obtained from: TrustedBSD Project
* Add support for specifying a minimal size for vm.kmem_size in the loader viasepotvin2007-04-211-0/+12
| | | | | | | | vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will be at least 256mb (for example) without forcing a particular value via vm.kmem_size. Approved by: njl (mentor) Reviewed by: alc
* Don't reinvent vm_page_grab().pjd2007-04-201-23/+3
| | | | Reviewed by: ups
* Schedule the ithread on the same cpu as the interruptkmacy2007-04-201-2/+1
| | | | | Tested by: kmacy Submitted by: jeffr
* Fix witness(4) warnings about mutex use.jkoshy2007-04-191-0/+10
| | | | | | | | | | | | | | | | | | Group mutexes used in hwpmc(4) into 3 "types" in the sense of witness(4): - leaf spin mutexes---only one of these should be held at a time, so these mutexes are specified as belonging to a single witness type "pmc-leaf". - `struct pmc_owner' descriptors are protected by a spin mutex of witness type "pmc-owner-proc". Since we call wakeup_one() while holding these mutexes, the witness type of these mutexes needs to dominate that of "sleepq chain" mutexes. - logger threads use a sleep mutex, of type "pmc-sleep". Submitted by: wkoszek (earlier patch)
* Fix a bug in sendfile(2) when files larger than page size and nbytes=0.pjd2007-04-191-2/+2
| | | | | | | When nbytes=0, sendfile(2) should use file size. Because of the bug, it was sending half of a file. The bug is that 'off' variable can't be used for size calculation, because it changes inside the loop, so we should use uap->offset instead.
* Bump the interrupt storm detection counter to 1000. My slow fileservernjl2007-04-191-4/+4
| | | | | | | | | | | gets a bogus irq storm detected when periodic daily kicks off at 3 am and disconnects the disk. Change the print logic to print once per second when the storm is occurring instead of only once. Otherwise, it appeared that something else was causing the errors each night at 3 am since the print only occurred the first time. Reviewed by: jhb MFC after: 1 week
* Export vfs_mount_alloc() as it is used in ZFS.pjd2007-04-171-3/+1
|
* - Add a 'show rman <rm>' DDB command to dump the resources in a resourcejhb2007-04-161-0/+50
| | | | | | manager similar to 'devinfo -u'. - Add a 'show allrman' DDB command that effectively does 'show rman' on all resource managers in the system.
* remove now invalid check from m_sanitykmacy2007-04-141-10/+5
| | | | panic on m_sanity check failure with INVARIANTS
* Fix jails and jail-friendly file systems handling:pjd2007-04-133-5/+25
| | | | | | | | - We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail. - Move security checks to vfs_suser() and deny unmounting and updating for jailed root from different jails, etc. OK'ed by: rwatson
* When we are running low on vnodes, there is currently no way to ask otherpjd2007-04-131-0/+1
| | | | | subsystems to release some vnodes. Implement backpressure based on vfs_lowvnodes event (similar to vm_lowmem for memory).
* Remove now-obsolete comment regarding mqueue privileges in jail.rwatson2007-04-111-4/+0
|
* Allow PRIV_NETINET_REUSEPORT in jail.rwatson2007-04-101-1/+3
|
* Do allow POSIX mqueue unlink privilege inside a jail, as we all allrwatson2007-04-101-1/+2
| | | | other POSIX mqueue privileges inside a jail.
* Minor style cleanups (mostly removal of trailing whitespaces).pjd2007-04-101-22/+22
|
* Correct typos.pjd2007-04-101-1/+1
|
* Restore the locking for the sleep/wakeup to avoid waiting an extra 1 secnjl2007-04-091-5/+11
| | | | | if a race was lost. We're still single-threaded at this point, but just be safe for the future.
* Clean up the root mount and mount wait code. No mutexes are needed herenjl2007-04-091-15/+10
| | | | | since a spurious wakeup() is the only possible outcome and this is fine in the BSD programming model.
* Add kern.hostuuid sysctl, which will be used to keep host's UUID.pjd2007-04-091-0/+3
| | | | Reviewed by: mlaier, rink, brooks, rwatson
* Add root_mounted() function that returns true if the root file system ispjd2007-04-081-0/+14
| | | | already mounted.
* prison_free() can be called with a mutex held. This wasn't a problem untilpjd2007-04-081-11/+16
| | | | | | | | | | | I converted allprison_mtx mutex to allprison_lock sx lock. To fix this LOR, move prison removal to prison_complete() entirely. To ensure that noone will reference this prison before it's beeing removed from the list skip prisons with 'pr_ref == 0' in prison_find() and assert that pr_ref has to greater than 0 in prison_hold(). Reported by: kris OK'ed by: rwatson
* Only use prison mutex to protect the fields that need to be protected by it.pjd2007-04-081-2/+2
|
* pr_list is protected by the allprison_lock.pjd2007-04-081-1/+1
|
* Remove XXX comment that changes to file fields should be protected withrwatson2007-04-061-5/+0
| | | | | | | the file lock rather than the filedesc lock: I fixed this in the last revision. Spotted by: kris
* allprison mutex was converted to sx(9) lock.pjd2007-04-051-1/+1
|
* Implement functionality I called 'jail services'.pjd2007-04-051-27/+244
| | | | | | | | | | | | | | | | | | | | | | It may be used for external modules to attach some data to jail's in-kernel structure. - Change allprison_mtx mutex to allprison_sx sx(9) lock. We will need to call external functions while holding this lock, which may want to allocate memory. Make use of the fact that this is shared-exclusive lock and use shared version when possible. - Implement the following functions: prison_service_register() - registers a service that wants to be noticed when a jail is created and destroyed prison_service_deregister() - deregisters service prison_service_data_add() - adds service-specific data to the jail structure prison_service_data_get() - takes service-specific data from the jail structure prison_service_data_del() - removes service-specific data from the jail structure Reviewed by: rwatson
* Make prison_find() globally accessible.pjd2007-04-051-2/+1
|
* Implement SEEK_DATA and SEEK_HOLE extensions to lseek(2) as found inpjd2007-04-051-0/+7
| | | | | | OpenSolaris. For more information please refer to: http://blogs.sun.com/bonwick/entry/seek_hole_and_seek_data
OpenPOWER on IntegriCloud