summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Introduce the KDB debugger frontend. The frontend provides a frameworkmarcel2004-07-101-0/+384
| | | | | | | | | | | | | | | in which multiple (presumably different) debugger backends can be configured and which provides basic services to those backends. Besides providing services to backends, it also serves as the single point of contact for any and all code that wants to make use of the debugger functions, such as entering the debugger or handling of the alternate break sequence. For this purpose, the frontend has been made non-optional. All debugger requests are forwarded or handed over to the current backend, if applicable. Selection of the current backend is done by the debug.kdb.current sysctl. A list of configured backends can be obtained with the debug.kdb.available sysctl. One can enter the debugger by writing to the debug.kdb.enter sysctl.
* Clean up and wash struct iovec and struct uio handling.phk2004-07-106-353/+181
| | | | | | | | | | | | Add copyiniov() which copies a struct iovec array in from userland into a malloc'ed struct iovec. Caller frees. Change uiofromiov() to malloc the uio (caller frees) and name it copyinuio() which is more appropriate. Add cloneuio() which returns a malloc'ed copy. Caller frees. Use them throughout.
* Now socket buffer locks are being asserted at higher code blocks inrwatson2004-07-101-4/+1
| | | | soreceive(), remove some leaf assertions that are redundant.
* Assert socket buffer lock at strategic points between sections of coderwatson2004-07-101-0/+5
| | | | | in soreceive() to confirm we've moved from block to block properly maintaining locking invariants.
* Check the lock lists to see if they are empty directly rather thanjhb2004-07-091-9/+21
| | | | | | | | | | | | | assigning a pointer to the list and then dereferencing the pointer as a second step. When the first spin lock is acquired, curthread is not in a critical section so it may be preempted and would end up using another CPUs lock list instead of its own. When this code was in witness_lock() this sequence was safe as curthread was in a critical section already since witness_lock() is called after the lock is acquired. Tested by: Daniel Lang dl at leo.org
* Cosmetic adjustment to previous commit: name the second argument todes2004-07-091-4/+4
| | | | sbuf_bcat() and sbuf_bcpy() "buf" rather than "data".
* Have sbuf_bcat() and sbuf_bcpy() take a const void * instead of ades2004-07-091-3/+5
| | | | | const char *, since callers are likely to pass in pointers to all kinds of structs and whatnot.
* Eliminate struct shm_handle. It is an unnecessary level of indirection toalc2004-07-091-24/+12
| | | | a vm_object.
* Remove spl()'s from do_sendfile().rwatson2004-07-091-6/+1
|
* - Move contents of sched_add() into a sched_add_internal() function thatjhb2004-07-081-5/+11
| | | | | | | | | | | takes an argument to specify if it should preempt or not. Don't preempt when sched_add_internal() is called from kseq_idled() or kseq_assign() as in those cases we are about to call mi_switch() anyways. Also, doing so during the first context switch on an AP leads to a NULL pointer deref because curthread is NULL. - Reenable preemption for ULE. Submitted by: Taku YAMAMOTO taku at tackymt.homeip.net
* fixup sysctl by fsid nodealfred2004-07-081-2/+2
|
* style(9)alfred2004-07-071-1/+12
|
* do the vfsstd thing instead of messing up our VFS_SYSCTL macro.alfred2004-07-072-0/+12
|
* Fix bug introduced in rev 1.434:peadar2004-07-061-3/+3
| | | | | | | | | | | | | When avoiding the zeroing of "bogus_page" when it appears in a buf, be sure to advance the pointers into the data for successive pages. The bug caused file corruption when read(2)ing from a "hole" in a file where a previous page of the read block had already been faulted in: fsx tripped up on this pretty quickly. The particular access pattern is probably pretty unusual, so other applications probably wouldn't have had problems, but you'd never know. Reviewed By: alc@
* Use vfs_suser() where appropriate.alfred2004-07-061-11/+7
|
* Introduce vfs_suser(), used to test if a user should have special privsalfred2004-07-061-0/+16
| | | | for a mount.
* NFS mobility PHASE I, II & III (phase VI, and V pending):alfred2004-07-062-3/+3
| | | | | | | | | | | | | | | Rebind the client socket when we experience a timeout. This fixes the case where our IP changes for some reason. Signal a VFS event when NFS transitions from up to down and vice versa. Add a placeholder vfs_sysctl where we will put status reporting shortly. Also: Make down NFS mounts return EIO instead of EINTR when there is a soft timeout or force unmount in progress.
* Temporarily disable preemption in SCHED_ULE due to reported panics andrwatson2004-07-061-0/+2
| | | | | | | | | hangs due to recent preemption changes. This change appears to remove the panic that I was running into, but at the cost of increasing ithread scheduling latency, and as such is a temporary band-aid until jhb has a chance to resolve the ule<->preemption interaction that is the source of the problem. If it doesn't fix the problem for others-- sorry!
* Unconditionally set last_work_seen while in the SYNCER_RUNNING statetruckman2004-07-051-5/+4
| | | | | | | | | | | so that last_work_seen has a reasonable value at the transition to the SYNCER_SHUTTING_DOWN state, even if net_worklist_len happened to be zero at the time. Initialize last_work_seen to zero as a safety measure in case the syncer never ran in the SYNCER_RUNNING state. Tested by: phk
* Drop the socket buffer lock around a call to m_copym() with M_TRYWAIT.rwatson2004-07-051-1/+4
| | | | | | A subset of locking changes to soreceive() in the queue for merging. Bumped into by: Willem Jan Withagen <wjw@withagen.nl>
* Rework syncer termination code:truckman2004-07-051-33/+79
| | | | | | | | | | | | | | | | | | | | | Speed up the syncer when shutting down by sleeping for a shorter period of time instead of cranking up rushjob and using the normal one second sleep. Skip empty worklist slots when shutting down to avoid lengthy intervals of inactivity. Give I/O more time to complete between steps by not speeding the syncer quite as much. Terminate the syncer after one full pass through the worklist plus one second with the worklist containing nothing but syncer vnodes. Print an indication of shutdown progress to the console. Add a sysctl, vfs.worklist_len, to allow the size of the syncer worklist to be monitored.
* Give synthetic root filesystem device vnodes a v_bsize of DEV_BSIZE.phk2004-07-041-0/+1
|
* Pass the operation in with the fsidctl.alfred2004-07-041-2/+8
| | | | | Remove some fsidctls that we will not be using. Correct prototypes for fs sysctls.
* Make the last commit handle non-phk root devices better.phk2004-07-042-2/+4
|
* Consistently use __inline instead of __inline__ as the former is an empty macrostefanf2004-07-041-3/+3
| | | | in <sys/cdefs.h> for compilers without support for inline.
* Blocksize for I/O should be a property of the vnode and not found by gropingphk2004-07-042-0/+6
| | | | | | | | | around in the vnodes surroundings when we allocate a block. Assign a blocksize when we create a vnode, and yell a warning (and ignore it) if we got the wrong size. Please email all such warnings to me.
* Introduce a new kevent filter. EVFILT_FS that will be used to signalalfred2004-07-043-0/+66
| | | | | | | | | | | generic filesystem events to userspace. Currently only mount and unmount of filesystems are signalled. Soon to be added, up/down status of NFS. Introduce a sysctl node used to route requests to/from filesystems based on filesystem ids. Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/ entrypoint by the sysctl code to change individual filesystems.
* Revision 1.496 would not boot on my system due toalfred2004-07-041-1/+2
| | | | | | | | ffs_mount -> bdevvp -> getnewvnode(..., mp = NULL, ...) -> insmntqueue(vp, mp = NULL) -> KASSERT -> panic Make getnewvnode() only call insmntqueue() if the mountpoint parameter is not NULL.
* When we traverse the vnodes on a mountpoint we need to look out forphk2004-07-043-47/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list. Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this: MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp); The code which takes vnodes off a mountpoint looks like this: MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something; (Take a moment and try to spot the locking error before you read on.) On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint. Fix: Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code. Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say. Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.
* Remove stale commentphk2004-07-031-1/+0
|
* Add NULL arg to mi_switch() call to stop kernel compiles from breaking.phk2004-07-031-1/+1
|
* Add a NULL param to an mi_switch() that I missed.jhb2004-07-031-1/+1
| | | | Reported by: Jung-uk Kim jkim at niksun dot com
* Fix SCHED_ULE build on SMP. The previous revision (1.110)bmilekic2004-07-031-1/+1
| | | | | | | | introduced a KSE_CAN_MIGRATE() invocation with one argument missing (class). Either this is a genuine forget or it crept in from JHB's repo where he may have modified it. If it's the latter then it may require more attention. For now fix the make depend.
* Unbreak build for the the !PREEMPTION case: don't define variablesmarcel2004-07-031-0/+2
| | | | that aren't used in that case.
* Implement preemption of kernel threads natively in the scheduler ratherjhb2004-07-027-36/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)
* - Change mi_switch() and sched_switch() to accept an optional thread tojhb2004-07-0213-30/+38
| | | | | | | | | | | | | switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.
* Allow ptrace to deal with lwpid.davidxu2004-07-021-6/+36
| | | | Reviewed by: marcel
* We allocate an array of pointers to the global file table whilealfred2004-07-021-1/+12
| | | | | not holding the filelist_lock. This means the filelist can change size while allocating. Detect this race and retry the allocation.
* Tidy up uprof locking. Mostly the fields are protected by both the procjhb2004-07-022-18/+24
| | | | | | | lock and sched_lock so they can be read with either lock held. Document the locking as well. The one remaining bogosity is that pr_addr and pr_ticks should be per-thread but profiling of multithreaded apps is currently undefined.
* - Assert that any process that has statclock called on it has both ajhb2004-07-021-11/+10
| | | | | | | | stats structure and a vmspace as this should always be true rather than checking the always true condition in an if statement. - Remove never-false check: if ((ru = &pstats->p_ru) != NULL) - Remove pstats variable that is only used once and inline its one use instead.
* Change the thread ID (thr_id_t) used for 1:1 threading from being amarcel2004-07-023-20/+25
| | | | | | | | | | | | | | | | | | | | pointer to the corresponding struct thread to the thread ID (lwpid_t) assigned to that thread. The primary reason for this change is that libthr now internally uses the same ID as the debugger and the kernel when referencing to a kernel thread. This allows us to implement the support for debugging without additional translations and/or mappings. To preserve the ABI, the 1:1 threading syscalls, including the umtx locking API have not been changed to work on a lwpid_t. Instead the 1:1 threading syscalls operate on long and the umtx locking API has not been changed except for the contested bit. Previously this was the least significant bit. Now it's the most significant bit. Since the contested bit should not be tested by userland, this change is not expected to be visible. Just to be sure, UMTX_CONTESTED has been removed from <sys/umtx.h>. Reviewed by: mtm@ ABI preservation tested on: i386, ia64
* Regen.marcel2004-07-022-2/+2
|
* When shutting down the syncer kernel thread, first tell it to runtruckman2004-07-011-6/+68
| | | | | | | | | | | | | faster and iterate to over its work list a few times in an attempt to empty the work list before the syncer terminates. This leaves fewer dirty blocks to be written at the "syncing disks" stage and keeps the the "giving up on N buffers" problem from being triggered by the presence of a large soft updates work list at system shutdown time. The downside is that the syncer takes noticeably longer to terminate. Tested by: "Arjan van Leeuwen" <avleeuwen AT piwebs DOT com> Approved by: mckusick
* Add ability to set start/end for rmanimp2004-07-011-0/+12
|
* Trim a few things from the dmesg output and stick them under bootverbose tojhb2004-07-011-2/+3
| | | | | | | cut down on the clutter including PCI interrupt routing, MTRR, pcibios, etc. Discussed with: USENIX Cabal
* Hide struct resource and struct rman. You must defineimp2004-06-302-0/+2
| | | | | | __RMAN_RESOURCE_VISIBLE to see inside these now. Reviewed by: dfr, njl (not njr)
* Include more information about the device in the devadded andimp2004-06-301-2/+38
| | | | | | | | devremoved events. This reduces the races around these events. We now include the pnp info in both. This lets one do more interesting thigns with devd on device insertion. Submitted by: Bernd Walter
* Oops, this didn't make it into my submit before I committed: Deferjhb2004-06-291-7/+19
| | | | | | creation of the sysctl tree for the turnstile profiling stats until a SI_SUB_LOCK sysinit. Doing it in init_turnstiles() is too early as it is called before mi_startup().
* Wrap long line.peter2004-06-291-1/+3
|
* Add two new kernel options to allow rudimentary profiling of the internaljhb2004-06-292-3/+89
| | | | | | | hash tables used in the sleep queue and turnstile code. Each option adds a sysctl tree under debug containing the maximum depth of any bucket in the hash table as well as a separate node for each bucket (or chain) containing the current depth and maximum depth for that bucket.
OpenPOWER on IntegriCloud