summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Add instrumentation which tells us how much work softclock() doesphk2003-06-041-2/+26
| | | | per invocation.
* Implementations of extattr_list_fd(), extattr_list_file(), andrwatson2003-06-042-0/+286
| | | | | | | | | | | | | | | extattr_list_link() system calls, which return a least of extended attributes defined for a vnode referenced by a file descriptor or path name. Currently, we just invoke VOP_GETEXTATTR() since it will convert a request for an empty name into a query for a name list, which was the old (more hackish) API. At some point in the near future, we'll push the distinction between get and list down to the vnode operation layer, but this provides access to the new API for applications in the short term. Pointed out by: Dominic Giampaolo <dbg@apple.com> Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Regen from syscalls.master:1.149, addition of extended attributerwatson2003-06-042-2/+8
| | | | list system calls for fd, file, link.
* Add system calls to explicitly list extended attributes on arwatson2003-06-041-0/+6
| | | | | | | | | | | | | | | | | | file/directory/link, rather than using a less explicit hack on the extattr retrieval API: extattr_list_fd() extattr_list_file() extattr_list_link() The existing API was counter-intuitive, and poorly documented. The prototypes for these system calls are identical to extattr_get_*(), but without a specific attribute name to leave NULL. Pointed out by: Dominic Giampaolo <dbg@apple.com> Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Assert the vnode lock when returning successfully from vn_open_cred().rwatson2003-06-041-0/+1
|
* Remove un-needed code.julian2003-06-042-98/+54
| | | | | | | | Don't copyin() data we are about to overwrite. Add a flag to tell userland that KSE is officially "DONE" with the mailbox and has gone away. Obtained from: davidxu@
* Fix a potential bucket leak where when freeing to an empty bucketbmilekic2003-06-031-57/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | we failed to put the bucket back into the general cache/container. Also, fix a bad assumption. There was a KASSERT() that aimed to guarantee that whenever the pcpu container's mc_starved was > 0, that whatever the bucket we were freeing to was an empty bucket, assuming it belonged to the pcpu container cache. However, there is at least one case where this is not true anymore; consider: 1) All containers empty, next thread to try to alloc will touch a pcpu container, notice it's empty, and increment the pcpu container's mc_starved. 2) Some other thread frees an mbuf belonging to a bucket in the general cache/container. Then it frees another mbuf belonging to the same bucket (still in gen container). 3) Some third thread tries to allocate an mbuf from the pcpu container and, since empty, grabs one mbuf now available in the general cache and moves the non-empty bucket from which it took 1 mbuf and to which the thread in (2) freed to, and moves it to the pcpu container. 4) A final thread tries to free an mbuf belonging to the NON-EMPTY bucket mentionned in (2) and (3) and, since the pcpu container's mc_starved is > 0, but the bucket is obviously non-empty, it trips on the KASSERT. This meant that one could potentially get a panic in some cases when out of mbufs and clusters. The problem could be mitigated by commenting out some cv_signal() calls, but I'm assuming that was pure coincidence and this is the correct fix.
* - Remove the blocked pointer from the umtx structure.jeff2003-06-031-171/+163
| | | | | | | | - Use a hash of umtx queues to queue blocked threads. We hash on pid and the virtual address of the umtx structure. This eliminates cases where we previously held a lock across a casuptr call. Reviwed by: jhb (quickly)
* Add tracking of process leaders sharing a file descriptor table andtegge2003-06-023-19/+226
| | | | | | | allow a file descriptor table to be shared between multiple process leaders. PR: 50923
* Remove the ia64 hackery in threadinit() that was needed to work aroundmarcel2003-06-012-28/+0
| | | | | | | the lameness of the kstack code. The EPC overhaul de-lame-ified the kstack code by removing the need for contigmalloc(). We can now allocate stacks using malloc(). We probably want to make the stacks swappable as well so that we can make it MI. But that's another story.
* Attempt to further comment and clarify System V IPC logic: documentrwatson2003-05-311-9/+24
| | | | | | | | | | | why certain exceptions are made, note an inconsistency between FreeBSD and some other implementations regarding IPC_M, and let suser() generate our EPERM rather than forcing it ourselves. Remove a carriage return that crept in in the last commit. Reviewed by: gordon Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Attempt to marginally de-obfuscate sections of the System V IPC accessrwatson2003-05-311-2/+7
| | | | | | | control logic. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Add "" around mutex name to make message less confusing.phk2003-05-312-2/+2
|
* Remove unused variable(s).phk2003-05-316-20/+2
| | | | Found by: FlexeLint
* Remove return after panic.phk2003-05-311-2/+0
| | | | Found by: FlexeLint
* Remove needless returnphk2003-05-311-1/+0
| | | | Found by: FlexeLint
* Add a couple of XXX comments where the intent is not clear.phk2003-05-311-0/+2
| | | | Found by: FlexeLint
* Remove unused variable(s).phk2003-05-311-5/+2
| | | | | | Remove break after goto Found by: FlexeLint
* Remove return after panic.phk2003-05-311-1/+0
| | | | Found by: FlexeLint
* Remove unused variable and now unbalanced call to splbio();phk2003-05-311-2/+0
| | | | Found by: FlexeLint
* Fix ia32 compat on ia64. Recent ia64 MD changes caused the garbage onmarcel2003-05-311-5/+4
| | | | | | | | | | | the stack to be changed in a way incompatible with elf32_map_insert() where we used data_buf without initializing it for when the partial mapping resulting in a misaligned image (typical when the page size implied by the image is not the same as the page size in use by the kernel). Since data_buf is passed by reference to vm_map_find(), the compiler cannot warn about it. While here, move all local variables to the top of the function.
* "break" rather than fall through to a break in the default clause.phk2003-05-311-0/+1
| | | | Found by: FlexeLint
* Introduce {be,le}_uuid_{enc,dec}() functions for explicitly encodingphk2003-05-311-0/+80
| | | | and decoding UUID's in big endian and little endian binary format.
* The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to preventphk2003-05-312-8/+4
| | | | | | deadlocks with vnode backed md(4) devices because md now uses a kthread to run the bio requests instead of doing it directly from the bio down path.
* Add __amd64__ to the ifdefs that introduce the "pcicfg" spinlock topeter2003-05-311-1/+1
| | | | | | witness. Approved by: re (safe amd64 support)
* When loading a module that contains a sysctl which is already compiledmux2003-05-291-1/+24
| | | | | | | | | | in the kernel, the sysctl_register() call would fail, as expected. However, when unloading this module again, the kernel would then panic in sysctl_unregister(). Print a message error instead. Submitted by: Nicolai Petri <nicolai@catpipe.net> Reviewed by: imp Approved by: re@ (jhb)
* Add an INVARIENTS only check to make sure Giant is held if mbufdwmalone2003-05-291-0/+2
| | | | | | | allocation is attempted with M_TRYWAIT. Reviewed by: bmilekic Approved by: re (scottl)
* Grab giant in sendit rather than kern_sendit because sockargs maydwmalone2003-05-291-4/+6
| | | | | | | allocate mbufs with M_TRYWAIT, which may require Giant. Reviewed by: bmilekic Approved by: re (scottl)
* In cluster_wbuild(), initialise b_iocmd to BIO_WRITE before callingiedowse2003-05-281-1/+3
| | | | | | | | | | | | | | | buf_start() to avoid triggering a panic in softdep_disk_io_initiation() if b_iocmd happened to be BIO_READ. The later initialisation of b_iocmd in cluster_wbuild() could probably be moved to before the buf_start() call, but this patch keeps the change as simple as possible. This is reported to fix occasional "softdep_disk_io_initiation: read" panics, especially on NFS servers. Reported by: Nick Hilliard <nick@netability.ie> Tested by: Nick Hilliard <nick@netability.ie> Approved by: re (rwatson)
* Copy the va_list in sbuf_vprintf() before passing it to vsnprintf(),peter2003-05-251-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | because we could fail due to a small buffer and loop and rerun. If this happens, then the vsnprintf() will have already taken the arguments off the va_list. For i386 and others, this doesn't matter because the va_list type is a passed as a copy. But on powerpc and amd64, this is fatal because the va_list is a reference to an external structure that keeps the vararg state due to the more complicated argument passing system. On amd64, arguments can be passed as follows: First 6 int/pointer type arguments go in registers, the rest go on the memory stack. Float and double are similar, except using SSE registers. long double (80 bit precision) are similar except using the x87 stack. Where the 'next argument' comes from depends on how many have been processed so far and what type it is. For amd64, gcc keeps this state somewhere that is referenced by the va_list. I found a description that showed the va_copy was required here: http://mirrors.ccs.neu.edu/cgi-bin/unixhelp/man-cgi?va_end+9 The single unix spec doesn't mention va_copy() at all. Anyway, the problem was that the sysctl kern.geom.conf* nodes would panic due to walking off the end of the va_arg lists in vsnprintf. A better fix would be to have sbuf_vprintf() use a single pass and call kvprintf() with a callback function that stored the results and grew the buffer as needed. Approved by: re (scottl)
* - Create a new lock, umtx_lock, for use instead of the proc lock forjeff2003-05-251-6/+13
| | | | | | | protecting the umtx queues. We can't use the proc lock because we need to hold the lock across calls to casuptr, which can fault. Approved by: re
* - Reset the free ent to NULL if we have consumed the last free entry. Thisjeff2003-05-251-0/+2
| | | | | | | | fixes a problem where we would overwrite old data if we ran out of free entries. Submitted by: sam Approved by: re (scottl)
* Make the maximum number of vnodes a function of both the physical memoryalc2003-05-231-1/+10
| | | | | | | | | | | | | | | | | size and the kernel's heap size, specifically, vm_kmem_size. This function allows a maximum of 40% of the vm_kmem_size to be used for vnodes and vm objects. This is a conservative bound based upon recent problem reports. (In other words, a slight increase in this percentage may be safe.) Finally, machines with less than ~3GB of RAM should be unaffected by this change, i.e., the maximum number of vnodes should remain the same. If necessary, machines with 3GB or more of RAM can increase the maximum number of vnodes by increasing vm_kmem_size. Desired by: scottl Tested by: jake Approved by: re (rwatson,scottl)
* When we are spilling threads out of the run queue during panic, make sure wejulian2003-05-211-3/+6
| | | | | | | | | | | keep the thread state variable consistent with its real state. i.e. Don't say it's on the run queue when it isn't. Also clarify the associated comment. Turns a double panic back to a single panic :-/ Approved by: re@ (jhb)
* Revamp of the syscall path, exception and context handling. Themarcel2003-05-164-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | prime objectives are: o Implement a syscall path based on the epc inststruction (see sys/ia64/ia64/syscall.s). o Revisit the places were we need to save and restore registers and define those contexts in terms of the register sets (see sys/ia64/include/_regset.h). Secundairy objectives: o Remove the requirement to use contigmalloc for kernel stacks. o Better handling of the high FP registers for SMP systems. o Switch to the new cpu_switch() and cpu_throw() semantics. o Add a good unwinder to reconstruct contexts for the rare cases we need to (see sys/contrib/ia64/libuwx) Many files are affected by this change. Functionally it boils down to: o The EPC syscall doesn't preserve registers it does not need to preserve and places the arguments differently on the stack. This affects libc and truss. o The address of the kernel page directory (kptdir) had to be unstaticized for use by the nested TLB fault handler. The name has been changed to ia64_kptdir to avoid conflicts. The renaming affects libkvm. o The trapframe only contains the special registers and the scratch registers. For syscalls using the EPC syscall path no scratch registers are saved. This affects all places where the trapframe is accessed. Most notably the unaligned access handler, the signal delivery code and the debugger. o Context switching only partly saves the special registers and the preserved registers. This affects cpu_switch() and triggered the move to the new semantics, which additionally affects cpu_throw(). o The high FP registers are either in the PCB or on some CPU. context switching for them is done lazily. This affects trap(). o The mcontext has room for all registers, but not all of them have to be defined in all cases. This mostly affects signal delivery code now. The *context syscalls are as of yet still unimplemented. Many details went into the removal of the requirement to use contigmalloc for kernel stacks. The details are mostly CPU specific and limited to exception_save() and exception_restore(). The few places where we create, destroy or switch stacks were mostly simplified by not having to construct physical addresses and additionally saving the virtual addresses for later use. Besides more efficient context saving and restoring, which of course yields a noticable speedup, this also fixes the dreaded SMP bootup problem as a side-effect. The details of which are still not fully understood. This change includes all the necessary backward compatibility code to have it handle older userland binaries that use the break instruction for syscalls. Support for break-based syscalls has been pessimized in favor of a clean implementation. Due to the overall better performance of the kernel, this will still be notived as an improvement if it's noticed at all. Approved by: re@ (jhb)
* Detect that a vnode has been reclaimed while vflush() was waiting to locktruckman2003-05-161-0/+11
| | | | | | | | | the vnode and restart the loop. Vflush() is vulnerable since it does not hold a reference to the vnode and it holds no other locks while waiting for the vnode lock. The vnode will no longer be on the list when the loop is restarted. Approved by: re (rwatson)
* Fix long standing bug that prevents the PT_CONTINUE, PT_KILL andobrien2003-05-161-9/+10
| | | | | | | | | | | | | | | | PT_DETACH ptrace(2) requests from functioning as advertised in the manual page. As described in kern/35175, the PT_DETACH request will, under certain circumstances, pass an unwanted signal on to the traced process upan detaching from it. The PT_CONTINUE request will sometimes fail if you make it pass a signal that has "properties" that differ from the properties of the signal that origionally caused the traced process to be stopped. Since PT_KILL is nothing than PT_CONTINUE with SIGKILL, it is broken too. In the PT_KILL case, this leads to an unkillable process. PR: 44011 Submitted by: Mark Kettenis <kettenis@chello.nl> Approved by: re(jhb)
* VOP_PATHCONF() requires a vnode lock; this patch adds locking torwatson2003-05-151-0/+2
| | | | | | | | fpathconf(). The lock is held for direct calls to VOP_PATHCONF() in pathconf() already. Approved by: re (jhb) Pointed out by: DEBUG_VFS_LOCKS
* Make the mb_alloc low-watermark sysctl-tunable read-only and makebmilekic2003-05-151-2/+5
| | | | | | | | netstat(1) not display it for now because its effects are not yet completely implemented and we're about to cut 5.2-RELEASE. This is temporary. Approved by: re (scottl, rwatson)
* p_sigignore moved into struct sigacts. move one which was missed.ps2003-05-141-1/+1
| | | | Approved by: re (scottl)
* - Merge struct procsig with struct sigacts.jhb2003-05-1312-156/+216
| | | | | | | | | | | | | | | | | - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)
* In setitimer(2), if the it_value of the new itimer value is clear, thenjhb2003-05-131-3/+4
| | | | | | | | don't add the current time to it, but leave it as clear so that when the timer is disabled, the it_value is always clear. Reviewed by: bde Approved by: re (rwatson)
* Optimize the use of splay in gbincore(). During a "make buildworld" thealc2003-05-131-7/+22
| | | | | | | | desired buffer is found at one of the roots more than 60% of the time. Thus, checking both roots before performing either splay eliminates unnecessary splays on the first tree splayed. Approved by: re (jhb)
* Bail out if there were not two loadable sections. Add XXX comment aboutphk2003-05-122-0/+16
| | | | | | one other issue. Approved by: re/rwatson.
* Remove bogus locking from DDB's "show lockedvnods" command: usingrwatson2003-05-121-11/+7
| | | | | | | | | | | | | synchronization primitives from inside DDB is generally a bad idea, and in this case it frequently results in panics due to DDB commands being executed from the sio fast interrupt context on a serial console. Replace the locking with a note that a lack of locking means that DDB may get see inconsistent views of the mount and vnode lists, which could also result in a panic. More frequently, though, this avoids a panic than causes it. Discussed with ages ago: bde Approved by: re (scottl)
* Don't pass NULL pointer to memset if we are compiled with DIAGNOSTICphk2003-05-121-4/+3
| | | | Approved by: re/rwatson
* Make m_freem() just use m_free() instead of duplicating the code. Thebmilekic2003-05-101-32/+2
| | | | | | | | | | | | | | reason for the duplication was that m_freem() was meant to eventually be optimized to hold the lock of the cache being freed to as long as possible across frees but the difficulty of implementing said optimization right now is too high, given that in some cases (see MAC and non-cluster external buffers), we need to call into other subsytems, something not permissible when the cache lock is held. This change minimizes code duplication while keeping at least the atomic mbuf+cluster free optimization. Suggested by: luigi
* Remove Giant from kern_sigsuspend() and osigsuspend() as these should nowjhb2003-05-091-10/+2
| | | | | | be MP safe. Approved by: re (scottl)
* Rename MAC_MAX_POLICIES to MAC_MAX_SLOTS, since the variables andrwatson2003-05-081-8/+8
| | | | | | | | | | | | constants in question refer to the number of label slots, not the maximum number of policies that may be loaded. This should reduce confusion regarding an element in the MAC sysctl MIB, as well as make it more clear what the affect of changing the compile-time constants is. Approved by: re (jhb) Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Clean up locking for the MAC Framework:rwatson2003-05-071-78/+180
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (1) Accept that we're now going to use mutexes, so don't attempt to avoid treating them as mutexes. This cleans up locking accessor function names some. (2) Rename variables to _mtx, _cv, _count, simplifying the naming. (3) Add a new form of the _busy() primitive that conditionally makes the list busy: if there are entries on the list, bump the busy count. If there are no entries, don't bump the busy count. Return a boolean indicating whether or not the busy count was bumped. (4) Break mac_policy_list into two lists: one with the same name holding dynamic policies, and a new list, mac_static_policy_list, which holds policies loaded before mac_late and without the unload flag set. The static list may be accessed without holding the busy count, since it can't change at run-time. (5) In general, prefer making the list busy conditionally, meaning we pay only one mutex lock per entry point if all modules are on the static list, rather than two (since we don't have to lower the busy count when we're done with the framework). For systems running just Biba or MLS, this will halve the mutex accesses in the network stack, and may offer a substantial performance benefits. (6) Lay the groundwork for a dynamic-free kernel option which eliminates all locking associated with dynamically loaded or unloaded policies, for pre-configured systems requiring maximum performance but less run-time flexibility. These changes have been running for a few weeks on MAC development branch systems. Approved by: re (jhb) Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
OpenPOWER on IntegriCloud