summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Lock the vm object when performing vm_page_grab().alc2003-06-081-2/+2
|
* - When a new thread is added to a kseq the load is incremented prior tojeff2003-06-081-5/+20
| | | | | | | | | adding it to the nice tables. Therefore, in kseq_add_nice, we should keep in mind that the load will be 1 if we are the only thread, and not 0. - Assert that the sched lock is held in all the appropriate places. - Increase the scope of the sched lock in sched_pctcpu_update(). - Hold the sched lock in sched_runnable(). It is not held by the caller.
* Improve the root-dev prompt facility for printing devices which couldphk2003-06-071-11/+2
| | | | possibly be a root filesystem.
* thread_signal_add now is called with ps_mtx held, unlock it beforedavidxu2003-06-062-6/+10
| | | | calling copyin.
* If a system call comes in requesting to retrieve an attribute namedrwatson2003-06-052-2/+26
| | | | | | | | | | | | "", temporarily map it to a call to extattr_list_vp() to provide compatibility for older applications using the "" API to retrieve EA lists. Use VOP_LISTEXTATTR() to support extattr_list_vp() rather than VOP_GETEXTATTR(..., "", ...). Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Asssociates Laboratories
* Add vop_listextattr(), similar to vop_getextattr() but without arwatson2003-06-051-0/+12
| | | | | | | | | specific attribute name. It will have the same semantics as the older vop_getextattr() "retrieve the names" hack, returning a buffer with ASCII nul-seperated names. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Change the second (and last) argument of cpu_set_upcall(). Previouslymarcel2003-06-043-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | we were passing in a void* representing the PCB of the parent thread. Now we pass a pointer to the parent thread itself. The prime reason for this change is to allow cpu_set_upcall() to copy (parts of) the trapframe instead of having it done in MI code in each caller of cpu_set_upcall(). Copying the trapframe cannot always be done with a simply bcopy() or may not always be optimal that way. On ia64 specifically the trapframe contains information that is specific to an entry into the kernel and can only be used by the corresponding exit from the kernel. A trapframe copied verbatim from another frame is in most cases useless without some additional normalization. Note that this change removes the assignment to td->td_frame in some implementations of cpu_set_upcall(). The assignment is redundant. A previous call to cpu_thread_setup() already did the exact same assignment. An added benefit of removing the redundant assignment is that we can now change td_pcb without nasty side-effects. This change officially marks the ability on ia64 for 1:1 threading. Not tested on: amd64, powerpc Compile & boot tested on: alpha, sparc64 Functionally tested on: i386, ia64
* Add instrumentation which tells us how much work softclock() doesphk2003-06-041-2/+26
| | | | per invocation.
* Implementations of extattr_list_fd(), extattr_list_file(), andrwatson2003-06-042-0/+286
| | | | | | | | | | | | | | | extattr_list_link() system calls, which return a least of extended attributes defined for a vnode referenced by a file descriptor or path name. Currently, we just invoke VOP_GETEXTATTR() since it will convert a request for an empty name into a query for a name list, which was the old (more hackish) API. At some point in the near future, we'll push the distinction between get and list down to the vnode operation layer, but this provides access to the new API for applications in the short term. Pointed out by: Dominic Giampaolo <dbg@apple.com> Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Regen from syscalls.master:1.149, addition of extended attributerwatson2003-06-042-2/+8
| | | | list system calls for fd, file, link.
* Add system calls to explicitly list extended attributes on arwatson2003-06-041-0/+6
| | | | | | | | | | | | | | | | | | file/directory/link, rather than using a less explicit hack on the extattr retrieval API: extattr_list_fd() extattr_list_file() extattr_list_link() The existing API was counter-intuitive, and poorly documented. The prototypes for these system calls are identical to extattr_get_*(), but without a specific attribute name to leave NULL. Pointed out by: Dominic Giampaolo <dbg@apple.com> Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Assert the vnode lock when returning successfully from vn_open_cred().rwatson2003-06-041-0/+1
|
* Remove un-needed code.julian2003-06-042-98/+54
| | | | | | | | Don't copyin() data we are about to overwrite. Add a flag to tell userland that KSE is officially "DONE" with the mailbox and has gone away. Obtained from: davidxu@
* Fix a potential bucket leak where when freeing to an empty bucketbmilekic2003-06-031-57/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | we failed to put the bucket back into the general cache/container. Also, fix a bad assumption. There was a KASSERT() that aimed to guarantee that whenever the pcpu container's mc_starved was > 0, that whatever the bucket we were freeing to was an empty bucket, assuming it belonged to the pcpu container cache. However, there is at least one case where this is not true anymore; consider: 1) All containers empty, next thread to try to alloc will touch a pcpu container, notice it's empty, and increment the pcpu container's mc_starved. 2) Some other thread frees an mbuf belonging to a bucket in the general cache/container. Then it frees another mbuf belonging to the same bucket (still in gen container). 3) Some third thread tries to allocate an mbuf from the pcpu container and, since empty, grabs one mbuf now available in the general cache and moves the non-empty bucket from which it took 1 mbuf and to which the thread in (2) freed to, and moves it to the pcpu container. 4) A final thread tries to free an mbuf belonging to the NON-EMPTY bucket mentionned in (2) and (3) and, since the pcpu container's mc_starved is > 0, but the bucket is obviously non-empty, it trips on the KASSERT. This meant that one could potentially get a panic in some cases when out of mbufs and clusters. The problem could be mitigated by commenting out some cv_signal() calls, but I'm assuming that was pure coincidence and this is the correct fix.
* - Remove the blocked pointer from the umtx structure.jeff2003-06-031-171/+163
| | | | | | | | - Use a hash of umtx queues to queue blocked threads. We hash on pid and the virtual address of the umtx structure. This eliminates cases where we previously held a lock across a casuptr call. Reviwed by: jhb (quickly)
* Add tracking of process leaders sharing a file descriptor table andtegge2003-06-023-19/+226
| | | | | | | allow a file descriptor table to be shared between multiple process leaders. PR: 50923
* Remove the ia64 hackery in threadinit() that was needed to work aroundmarcel2003-06-012-28/+0
| | | | | | | the lameness of the kstack code. The EPC overhaul de-lame-ified the kstack code by removing the need for contigmalloc(). We can now allocate stacks using malloc(). We probably want to make the stacks swappable as well so that we can make it MI. But that's another story.
* Attempt to further comment and clarify System V IPC logic: documentrwatson2003-05-311-9/+24
| | | | | | | | | | | why certain exceptions are made, note an inconsistency between FreeBSD and some other implementations regarding IPC_M, and let suser() generate our EPERM rather than forcing it ourselves. Remove a carriage return that crept in in the last commit. Reviewed by: gordon Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Attempt to marginally de-obfuscate sections of the System V IPC accessrwatson2003-05-311-2/+7
| | | | | | | control logic. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Add "" around mutex name to make message less confusing.phk2003-05-312-2/+2
|
* Remove unused variable(s).phk2003-05-316-20/+2
| | | | Found by: FlexeLint
* Remove return after panic.phk2003-05-311-2/+0
| | | | Found by: FlexeLint
* Remove needless returnphk2003-05-311-1/+0
| | | | Found by: FlexeLint
* Add a couple of XXX comments where the intent is not clear.phk2003-05-311-0/+2
| | | | Found by: FlexeLint
* Remove unused variable(s).phk2003-05-311-5/+2
| | | | | | Remove break after goto Found by: FlexeLint
* Remove return after panic.phk2003-05-311-1/+0
| | | | Found by: FlexeLint
* Remove unused variable and now unbalanced call to splbio();phk2003-05-311-2/+0
| | | | Found by: FlexeLint
* Fix ia32 compat on ia64. Recent ia64 MD changes caused the garbage onmarcel2003-05-311-5/+4
| | | | | | | | | | | the stack to be changed in a way incompatible with elf32_map_insert() where we used data_buf without initializing it for when the partial mapping resulting in a misaligned image (typical when the page size implied by the image is not the same as the page size in use by the kernel). Since data_buf is passed by reference to vm_map_find(), the compiler cannot warn about it. While here, move all local variables to the top of the function.
* "break" rather than fall through to a break in the default clause.phk2003-05-311-0/+1
| | | | Found by: FlexeLint
* Introduce {be,le}_uuid_{enc,dec}() functions for explicitly encodingphk2003-05-311-0/+80
| | | | and decoding UUID's in big endian and little endian binary format.
* The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to preventphk2003-05-312-8/+4
| | | | | | deadlocks with vnode backed md(4) devices because md now uses a kthread to run the bio requests instead of doing it directly from the bio down path.
* Add __amd64__ to the ifdefs that introduce the "pcicfg" spinlock topeter2003-05-311-1/+1
| | | | | | witness. Approved by: re (safe amd64 support)
* When loading a module that contains a sysctl which is already compiledmux2003-05-291-1/+24
| | | | | | | | | | in the kernel, the sysctl_register() call would fail, as expected. However, when unloading this module again, the kernel would then panic in sysctl_unregister(). Print a message error instead. Submitted by: Nicolai Petri <nicolai@catpipe.net> Reviewed by: imp Approved by: re@ (jhb)
* Add an INVARIENTS only check to make sure Giant is held if mbufdwmalone2003-05-291-0/+2
| | | | | | | allocation is attempted with M_TRYWAIT. Reviewed by: bmilekic Approved by: re (scottl)
* Grab giant in sendit rather than kern_sendit because sockargs maydwmalone2003-05-291-4/+6
| | | | | | | allocate mbufs with M_TRYWAIT, which may require Giant. Reviewed by: bmilekic Approved by: re (scottl)
* In cluster_wbuild(), initialise b_iocmd to BIO_WRITE before callingiedowse2003-05-281-1/+3
| | | | | | | | | | | | | | | buf_start() to avoid triggering a panic in softdep_disk_io_initiation() if b_iocmd happened to be BIO_READ. The later initialisation of b_iocmd in cluster_wbuild() could probably be moved to before the buf_start() call, but this patch keeps the change as simple as possible. This is reported to fix occasional "softdep_disk_io_initiation: read" panics, especially on NFS servers. Reported by: Nick Hilliard <nick@netability.ie> Tested by: Nick Hilliard <nick@netability.ie> Approved by: re (rwatson)
* Copy the va_list in sbuf_vprintf() before passing it to vsnprintf(),peter2003-05-251-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | because we could fail due to a small buffer and loop and rerun. If this happens, then the vsnprintf() will have already taken the arguments off the va_list. For i386 and others, this doesn't matter because the va_list type is a passed as a copy. But on powerpc and amd64, this is fatal because the va_list is a reference to an external structure that keeps the vararg state due to the more complicated argument passing system. On amd64, arguments can be passed as follows: First 6 int/pointer type arguments go in registers, the rest go on the memory stack. Float and double are similar, except using SSE registers. long double (80 bit precision) are similar except using the x87 stack. Where the 'next argument' comes from depends on how many have been processed so far and what type it is. For amd64, gcc keeps this state somewhere that is referenced by the va_list. I found a description that showed the va_copy was required here: http://mirrors.ccs.neu.edu/cgi-bin/unixhelp/man-cgi?va_end+9 The single unix spec doesn't mention va_copy() at all. Anyway, the problem was that the sysctl kern.geom.conf* nodes would panic due to walking off the end of the va_arg lists in vsnprintf. A better fix would be to have sbuf_vprintf() use a single pass and call kvprintf() with a callback function that stored the results and grew the buffer as needed. Approved by: re (scottl)
* - Create a new lock, umtx_lock, for use instead of the proc lock forjeff2003-05-251-6/+13
| | | | | | | protecting the umtx queues. We can't use the proc lock because we need to hold the lock across calls to casuptr, which can fault. Approved by: re
* - Reset the free ent to NULL if we have consumed the last free entry. Thisjeff2003-05-251-0/+2
| | | | | | | | fixes a problem where we would overwrite old data if we ran out of free entries. Submitted by: sam Approved by: re (scottl)
* Make the maximum number of vnodes a function of both the physical memoryalc2003-05-231-1/+10
| | | | | | | | | | | | | | | | | size and the kernel's heap size, specifically, vm_kmem_size. This function allows a maximum of 40% of the vm_kmem_size to be used for vnodes and vm objects. This is a conservative bound based upon recent problem reports. (In other words, a slight increase in this percentage may be safe.) Finally, machines with less than ~3GB of RAM should be unaffected by this change, i.e., the maximum number of vnodes should remain the same. If necessary, machines with 3GB or more of RAM can increase the maximum number of vnodes by increasing vm_kmem_size. Desired by: scottl Tested by: jake Approved by: re (rwatson,scottl)
* When we are spilling threads out of the run queue during panic, make sure wejulian2003-05-211-3/+6
| | | | | | | | | | | keep the thread state variable consistent with its real state. i.e. Don't say it's on the run queue when it isn't. Also clarify the associated comment. Turns a double panic back to a single panic :-/ Approved by: re@ (jhb)
* Revamp of the syscall path, exception and context handling. Themarcel2003-05-164-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | prime objectives are: o Implement a syscall path based on the epc inststruction (see sys/ia64/ia64/syscall.s). o Revisit the places were we need to save and restore registers and define those contexts in terms of the register sets (see sys/ia64/include/_regset.h). Secundairy objectives: o Remove the requirement to use contigmalloc for kernel stacks. o Better handling of the high FP registers for SMP systems. o Switch to the new cpu_switch() and cpu_throw() semantics. o Add a good unwinder to reconstruct contexts for the rare cases we need to (see sys/contrib/ia64/libuwx) Many files are affected by this change. Functionally it boils down to: o The EPC syscall doesn't preserve registers it does not need to preserve and places the arguments differently on the stack. This affects libc and truss. o The address of the kernel page directory (kptdir) had to be unstaticized for use by the nested TLB fault handler. The name has been changed to ia64_kptdir to avoid conflicts. The renaming affects libkvm. o The trapframe only contains the special registers and the scratch registers. For syscalls using the EPC syscall path no scratch registers are saved. This affects all places where the trapframe is accessed. Most notably the unaligned access handler, the signal delivery code and the debugger. o Context switching only partly saves the special registers and the preserved registers. This affects cpu_switch() and triggered the move to the new semantics, which additionally affects cpu_throw(). o The high FP registers are either in the PCB or on some CPU. context switching for them is done lazily. This affects trap(). o The mcontext has room for all registers, but not all of them have to be defined in all cases. This mostly affects signal delivery code now. The *context syscalls are as of yet still unimplemented. Many details went into the removal of the requirement to use contigmalloc for kernel stacks. The details are mostly CPU specific and limited to exception_save() and exception_restore(). The few places where we create, destroy or switch stacks were mostly simplified by not having to construct physical addresses and additionally saving the virtual addresses for later use. Besides more efficient context saving and restoring, which of course yields a noticable speedup, this also fixes the dreaded SMP bootup problem as a side-effect. The details of which are still not fully understood. This change includes all the necessary backward compatibility code to have it handle older userland binaries that use the break instruction for syscalls. Support for break-based syscalls has been pessimized in favor of a clean implementation. Due to the overall better performance of the kernel, this will still be notived as an improvement if it's noticed at all. Approved by: re@ (jhb)
* Detect that a vnode has been reclaimed while vflush() was waiting to locktruckman2003-05-161-0/+11
| | | | | | | | | the vnode and restart the loop. Vflush() is vulnerable since it does not hold a reference to the vnode and it holds no other locks while waiting for the vnode lock. The vnode will no longer be on the list when the loop is restarted. Approved by: re (rwatson)
* Fix long standing bug that prevents the PT_CONTINUE, PT_KILL andobrien2003-05-161-9/+10
| | | | | | | | | | | | | | | | PT_DETACH ptrace(2) requests from functioning as advertised in the manual page. As described in kern/35175, the PT_DETACH request will, under certain circumstances, pass an unwanted signal on to the traced process upan detaching from it. The PT_CONTINUE request will sometimes fail if you make it pass a signal that has "properties" that differ from the properties of the signal that origionally caused the traced process to be stopped. Since PT_KILL is nothing than PT_CONTINUE with SIGKILL, it is broken too. In the PT_KILL case, this leads to an unkillable process. PR: 44011 Submitted by: Mark Kettenis <kettenis@chello.nl> Approved by: re(jhb)
* VOP_PATHCONF() requires a vnode lock; this patch adds locking torwatson2003-05-151-0/+2
| | | | | | | | fpathconf(). The lock is held for direct calls to VOP_PATHCONF() in pathconf() already. Approved by: re (jhb) Pointed out by: DEBUG_VFS_LOCKS
* Make the mb_alloc low-watermark sysctl-tunable read-only and makebmilekic2003-05-151-2/+5
| | | | | | | | netstat(1) not display it for now because its effects are not yet completely implemented and we're about to cut 5.2-RELEASE. This is temporary. Approved by: re (scottl, rwatson)
* p_sigignore moved into struct sigacts. move one which was missed.ps2003-05-141-1/+1
| | | | Approved by: re (scottl)
* - Merge struct procsig with struct sigacts.jhb2003-05-1312-156/+216
| | | | | | | | | | | | | | | | | - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)
* In setitimer(2), if the it_value of the new itimer value is clear, thenjhb2003-05-131-3/+4
| | | | | | | | don't add the current time to it, but leave it as clear so that when the timer is disabled, the it_value is always clear. Reviewed by: bde Approved by: re (rwatson)
* Optimize the use of splay in gbincore(). During a "make buildworld" thealc2003-05-131-7/+22
| | | | | | | | desired buffer is found at one of the roots more than 60% of the time. Thus, checking both roots before performing either splay eliminates unnecessary splays on the first tree splayed. Approved by: re (jhb)
OpenPOWER on IntegriCloud