summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Better fix than my previous commit:cognet2003-11-142-8/+9
| | | | | | | | | | | in exit1(), make sure the p_klist is empty after sending NOTE_EXIT. The process won't report fork() or execve() and won't be able to handle NOTE_SIGNAL knotes anyway. This fixes some race conditions with do_tdsignal() calling knote() while the process is exiting. Reported by: Stefan Farfeleder <stefan@fafoe.narf.at> MFC after: 1 week
* Fix a number of style(9) bugs introduced in r1.113 by me.kan2003-11-141-47/+46
| | | | Suggested by: bde
* - regen.jeff2003-11-142-3/+3
|
* - Revision 1.156 marked ptrace() SMP safe. Unfortunately, alpha implementsjeff2003-11-141-1/+1
| | | | | | parts of ptrace using proc_rwmem(). proc_rwmem() requires giant, and giant must be acquired prior to the proc lock, so ptrace must require giant still.
* Various minor details:phk2003-11-131-8/+17
| | | | | | | | Give the HZ/overflow check a 10% margin. Eliminate bogus newline. If timecounters have equal quality, prefer higher frequency. Some inspiration from: bde
* - Close a race where a thread on another CPU could release a contested lockjhb2003-11-121-4/+12
| | | | | | | | | | | | and empty its turnstile while the blocking threads still pointed to the turnstile. If the thread on the first CPU blocked on a lock owned by one of the threads blocked on the turnstile just woken up, then the first CPU could try to manipulate a bogus thread queue in the turnstile during priority propagation. - Update locking notes for ts_owner and always clear ts_owner, not just under INVARIANTS. Tested by: sam (1)
* At the request of several developers, restore the DIAGNOSIC codemckusick2003-11-121-0/+28
| | | | | | | | | | deleted in 1.81. Increase the initial timeout limit to 2ms to eliminate spurious messages of excessive timeouts in the NFS client code. Requested by: Poul-Henning Kamp <phk@phk.freebsd.dk> Requested by: Mike Silbersack <silby@silby.com> Requested by: Sam Leffler <sam@errno.com>
* Mark __mac_get_pid() as MPSAFE in the comment, as it runs withoutrwatson2003-11-121-37/+15
| | | | | | | | | | | Giant and is also MPSAFE. Push Giant further down into __mac_get_fd() and __mac_set_fd(), grabbing it only for constrained regions dealing with VFS, and dropping it entirely for operations related to labeling of pipes. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* MNAMELEN is back to an int again after Kirk's statfs commitpeter2003-11-121-1/+1
| | | | | kern/vfs_mount.c:1305: warning: signed size_t format, different type arg (arg 4) *** Error code 1
* Fix a typo in a comment.jhb2003-11-121-1/+1
| | | | Submitted by: das
* Replace B_PHYS conditional assignment to bio_offset with KASSERT checkphk2003-11-121-2/+7
| | | | to see that the originating code already did it right.
* Update the five files derived from /sys/kern/syscalls.mastermckusick2003-11-122-18/+18
| | | | | | | | | after the additions made for the new statfs structure (version 1.157). These must be updated in a separate checkin after syscalls.master has been checked in so that they reflect its new CVS identity. As these are purely derived files, it is not clear to me why they are under CVS at all. I presume that it has something to do with having `make world' operate properly.
* Update the statfs structure with 64-bit fields to allowmckusick2003-11-125-38/+606
| | | | | | | | | | | | | | | | | accurate reporting of multi-terabyte filesystem sizes. You should build and boot a new kernel BEFORE doing a `make world' as the new kernel will know about binaries using the old statfs structure, but an old kernel will not know about the new system calls that support the new statfs structure. Running an old kernel after a `make world' will cause programs such as `df' that do a statfs system call to fail with a bad system call. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Tim Robbins <tjr@freebsd.org> Reviewed by: Julian Elischer <julian@elischer.org> Reviewed by: the hoards of <arch@freebsd.org> Sponsored by: DARPA & NAI Labs.
* Modify the MAC Framework so that instead of embedding a (struct label)rwatson2003-11-122-75/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures. This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability. While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory. NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol. Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* 1. Consolidate mount struct allocation/destruction into a common code inkan2003-11-121-429/+171
| | | | | | | | | | | | | | | | | | | | | | | vfs_mount_alloc/vfs_mount_destroy functions and take care to completely destroy the mount point along with its locks. Mount struct has grown in coplexity recently and depending on each failure path to destroy it completely isn't working anymore. 2. Eliminate largely identical vfs_mount and vfs_unmount question by moving the code to handle both cases into a newly introduced vfs_domount function. 3. Simplify nfs_mount_diskless to always expect an allocated mount struct and never attempt an allocation/destruction itself. The vfs_allocroot allocation was there to support 'magic' swap space configuration for diskless clients that was already removed by PHK some time ago. 4. Include a vfs_buildopts cleanups by Peter Edwards to validate the sanity of nmount parameters passed from userland. Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com> Reviewed by: rwatson
* Add an implementation of turnstiles and change the sleep mutex code to usejhb2003-11-115-977/+510
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | turnstiles to implement blocking isntead of implementing a thread queue directly. These turnstiles are somewhat similar to those used in Solaris 7 as described in Solaris Internals but are also different. Turnstiles do not come out of a fixed-sized pool. Rather, each thread is assigned a turnstile when it is created that it frees when it is destroyed. When a thread blocks on a lock, it donates its turnstile to that lock to serve as queue of blocked threads. The queue associated with a given lock is found by a lookup in a simple hash table. The turnstile itself is protected by a lock associated with its entry in the hash table. This means that sched_lock is no longer needed to contest on a mutex. Instead, sched_lock is only used when manipulating run queues or thread priorities. Turnstiles also implement priority propagation inherently. Currently turnstiles only support mutexes. Eventually, however, turnstiles may grow two queue's to support a non-sleepable reader/writer lock implementation. For more details, see the comments in sys/turnstile.h and kern/subr_turnstile.c. The two primary advantages from the turnstile code include: 1) the size of struct mutex shrinks by four pointers as it no longer stores the thread queue linkages directly, and 2) less contention on sched_lock in SMP systems including the ability for multiple CPUs to contend on different locks simultaneously (not that this last detail is necessarily that much of a big win). Note that 1) means that this commit is a kernel ABI breaker, so don't mix old modules with a new kernel and vice versa. Tested on: i386 SMP, sparc64 SMP, alpha SMP
* Bound the number of iterations a thread can perform insidejkoshy2003-11-111-6/+8
| | | | | | | | | ktr_resize_pool(); this eliminates a potential livelock. Return ENOSPC only if we encountered an out-of-memory condition when trying to increase the pool size. Reviewed by: jhb, bde (style)
* Have utrace(2) return ENOMEM if malloc() fails. Document this errorjkoshy2003-11-111-1/+1
| | | | | | return in its manual page. Reviewed by: jhb
* - Revision 1.469 of vfs_subr.c resulted in the buf's b_object field beingalc2003-11-111-14/+7
| | | | | | consistency initialized. Consequently, a number of conditionals that checked the validity of b_object before passing it to VM_OBJECT_LOCK() and VM_OBJECT_UNLOCK() are no longer needed.
* Whitespace sync to MAC branch, expand comment at the head of the file.rwatson2003-11-111-3/+9
|
* Fix a bug where the taskqueue kproc was being parented by initalfred2003-11-101-1/+1
| | | | | | | | | | | | | because RFNOWAIT was being passed to kproc_create. The result was that shutdown took quite a bit longer because this errant "child" would not respond to termination signals from init at system shutdown. RFNOWAIT dissassociates itself from the caller by attaching to init as a parent proc. We could have had the taskqueue proc listen for SIGKILL, but being able to SIGKILL a potentially critical system process doesn't seem like a good idea.
* When there are no free sem_undo structs available in semu_alloc(), onlytjr2003-11-101-3/+4
| | | | | | | | | | | | | free one sem_undo with un_cnt == 0 instead of all of them. This is a temporary workaround until the SLIST_FOREACH_PREVPTR loop gets fixed so that it doesn't cause cycles in semu_list when removing multiple adjacent items. It might be easier to just use (doubly-linked) LISTs here instead of complicated SLIST code to achieve O(1) removals. This bug manifested itself as a complete lockup under heavy semaphore use by multiple processes with the SEM_UNDO flag set. PR: 58984
* Change the clear_ret argument of get_mcontext() to be a flags argument.marcel2003-11-091-2/+2
| | | | | | | | | | Since all callers either passed 0 or 1 for clear_ret, define bit 0 in the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI code for possible (but unlikely) future use. The remaining bits are for use by MD code. This change is triggered by a need on ia64 to have another knob for get_mcontext().
* Quick fix for scaling of statclock ticks in the SMP case. As explainedbde2003-11-091-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | in the log message for kern_sched.c 1.83 (which should have been repo-copied to preserve history for this file), the (4BSD) scheduler algorithm only works right if stathz is nearly 128 Hz. The old commit lock said 64 Hz; the scheduler actually wants nearly 16 Hz but there was a scale factor of 4 to give the requirement of 64 Hz, and rev.1.83 changed the scale factor so that the requirement became 128 Hz. The change of the scale factor was incomplete in the SMP case. Then scheduling ticks are provided by smp_ncpu CPUs, and the scheduler cannot tell the difference between this and 1 CPU providing scheduling ticks smp_ncpu times faster, so we need another scale factor of smp_ncp or an algorithm change. This quick fix uses the scale factor without even trying to optimize the runtime divisions required for this as is done for the other scale factor. The main algorithmic problem is the clamp on the scheduling tick counts. This was 295; it is now approximately 295 * smp_ncpu. When the limit is reached, threads get free timeslices and scheduling becomes very unfair to the threads that don't hit the limit. The limit can be reached and maintained in the worst case if the load average is larger than (limit / effective_stathz - 1) / 2 = 0.65 now (was just 0.08 with 2 CPUs before this change), so there are algorithmic problems even for a load average of 1. Fortunately, the worst case isn't common enough for the problem to be very noticeable (it is mainly for niced CPU hogs competing with less nice CPU hogs).
* - Implement selwakeuppri() which allows raising the priority of atanimura2003-11-0911-17/+45
| | | | | | | | | | | | | thread being waken up. The thread waken up can run at a priority as high as after tsleep(). - Replace selwakeup()s with selwakeuppri()s and pass appropriate priorities. - Add cv_broadcastpri() which raises the priority of the broadcast threads. Used by selwakeuppri() if collision occurs. Not objected in: -arch, -current
* o add a flags parameter to netisr_register that is used to specifysam2003-11-081-2/+2
| | | | | | | | | | | | | | | | whether or not the isr needs to hold Giant when running; Giant-less operation is also controlled by the setting of debug_mpsafenet o mark all netisr's except NETISR_IP as needing Giant o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant o pickup Giant (when debug_mpsafenet is 1) inside ip_input before calling up with a packet o change netisr handling so swi_net runs w/o Giant; instead we grab Giant before invoking handlers based on whether the handler needs Giant o change netisr handling so that netisr's that are marked MPSAFE may have multiple instances active at a time o add netisr statistics for packets dropped because the isr is inactive Supported by: FreeBSD Foundation
* Return a reasonable number for top or ps to display for M:N thread,davidxu2003-11-081-0/+2
| | | | | | | since there is no direct association between M:N thread and kse, sometimes, a thread does not have a kse, in that case, return a pctcpu from its last kse, it is not perfect, but gives a good number to be displayed.
* Regen.jhb2003-11-072-7/+7
|
* Mark ptrace(), ktrace(), utrace(), sysarch(), and issetugid() as MP safe.jhb2003-11-071-5/+5
| | | | The parts of these calls that are not yet MP safe acquire Giant explicitly.
* Slight whitespace consistency improvement:rwatson2003-11-073-4/+4
| | | | | Trim trailing whitespace. Remove unmatched " " before ")".
* - Somehow I botched my last commit. Add an extra ( to fix things up. I'mjeff2003-11-061-1/+1
| | | | | | still not sure how this happened. Reported by: ps
* - Delay the allocation of memory for the pipe mutex until we need it.alc2003-11-061-5/+1
| | | | | This avoids the need to free said memory in various error cases along the way.
* - Simplify pipespace() by eliminating the explicit creation of vm objects.alc2003-11-061-10/+2
| | | | | | Instead, let the vm objects be lazily instantiated at fault time. This results in the allocation of fewer vm objects and vm map entries due to aggregation in the vm system.
* Remove the flags argument from mac_externalize_*_label(), as it's notrwatson2003-11-061-6/+6
| | | | | | | passed into policies or used internally to the MAC Framework. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* - Remove the local definition of sched_pin and unpin. They are provided injeff2003-11-061-17/+3
| | | | | sched.h now. - Respect the td pin count.
* o make debug_mpsafenet globally visiblesam2003-11-051-10/+0
| | | | | | | | o move it from subr_bus.c to netisr.c where it more properly belongs o add NET_PICKUP_GIANT and NET_DROP_GIANT macros that will be used to grab Giant as needed when MPSAFE operation is enabled Supported by: FreeBSD Foundation
* Minor style(9) nitimp2003-11-051-8/+8
|
* - It's ok if sched_runnable() has races in it, we don't need the sched_lockjeff2003-11-051-3/+4
| | | | here unless we have something on the assigned queue.
* Remove mntvnode_mtx and replace it with per-mountpoint mutex.kan2003-11-053-33/+35
| | | | | | | | | | Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff
* Back out the following revisions:fjoe2003-11-051-18/+21
| | | | | | | | | | | | | | | | | | 1.36 +73 -60 src/sys/compat/linux/linux_ipc.c 1.83 +102 -48 src/sys/kern/sysv_shm.c 1.8 +4 -0 src/sys/sys/syscallsubr.h That change was intended to support vmware3, but wantrem parameter is useless because vmware3 uses SYSV shared memory to talk with X server and X server is native application. The patch worked because check for wantrem was not valid (wantrem and SHMSEG_REMOVED was never checked for SHMSEG_ALLOCATED segments). Add kern.ipc.shm_allow_removed (integer, rw) sysctl (default 0) which when set to 1 allows to return removed segments in shm_find_segment_by_shmid() and shm_find_segment_by_shmidx(). MFC after: 1 week
* Get rid of DIAGNOSTIC that gives false positives on slow CPUs.mckusick2003-11-041-28/+0
|
* - Add initial support for pinning and binding.jeff2003-11-041-2/+53
|
* Allow the bufdaemon and update daemon processes to skip themckusick2003-11-041-4/+8
| | | | | | | waitrunningbufspace() calls so that they are always able to proceed and clean up buffer space. Submitted by: Brian Fundakowski Feldman <green@freebsd.org>
* disable MPSAFE network drivers; we aren't ready yet`sam2003-11-041-1/+1
|
* I believe kbyanc@ really meant this in rev 1.58.cognet2003-11-041-2/+2
| | | | | | | | Use zpfind() to see if the process became a zombie if pfind() doesn't find it and if the caller wants to know about process death, so that the caller knows the process died even if it happened before the kevent was actually registered. MFC after: 1 week
* Do not attempt to report proc event if NOTE_EXIT has already been received.cognet2003-11-041-0/+7
| | | | | | | | This fixes a race condition (specifically with signal events) that could lead to the kn being re-inserted into the list after it has been destroyed, which is not something we want to happen. PR: kern/58258
* Don't require INTR_FAST handlers to be exclusive in the MI layer. Instead,jhb2003-11-031-7/+11
| | | | | | | let the MD code choose whether or not to implement such a policy. The new i386 interrupt code allows multiple FAST handlers for a given source for example. However, the code does not allow FAST and non-FAST handlers to be mixed.
* Update spin lock order list for new i386 interrupt and SMP code.jhb2003-11-031-3/+2
|
* Unlock pipe mutex when failing MAC pipe ioctl access control check.rwatson2003-11-031-1/+3
| | | | | Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* - Remove kseq_find(), we no longer scan other cpu's run queues when we gojeff2003-11-031-66/+17
| | | | | | | | | | | | | idle. They figure out that we're idle fast enough that the cache pollution introduces by scanning their run queue is more expensive than waiting a little longer. - Add kseq_setidle() to mark us as being idle. Use this in place of kseq_find(). - Remove kseq_load_highest(), kseq_find() was the only consumer of this interface. kseq_balance() has it's own customized version that finds the lowest and highest loads simultaneously. Continuously told that this would be faster by: terry
OpenPOWER on IntegriCloud