summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* You always spot the typos after you have committed.. Start sentencejulian2004-07-191-1/+1
| | | | with a Cap.
* Allow the user who calls doadump() from the kernel debuggerjulian2004-07-191-2/+11
| | | | | | | | | | to not get a page fault if he has not defined a dump device. Panic can often not do a dump as it can hang forever in some cases. The original PR was for amd64 only. This is a generalised version of that change. PR: amd64/67712 Submitted by: wjw@withagen.nl <Willen Jan Withagen>
* Reimplement contigmalloc(9) with an algorithm which stands a greatly-green2004-07-191-27/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | improved chance of working despite pressure from running programs. Instead of trying to throw a bunch of pages out to swap and hope for the best, only a range that can potentially fulfill contigmalloc(9)'s request will have its contents paged out (potentially, not forcibly) at a time. The new contigmalloc operation still operates in three passes, but it could potentially be tuned to more or less. The first pass only looks at pages in the cache and free pages, so they would be thrown out without having to block. If this is not enough, the subsequent passes page out any unwired memory. To combat memory pressure refragmenting the section of memory being laundered, each page is removed from the systems' free memory queue once it has been freed so that blocking later doesn't cause the memory laundered so far to get reallocated. The page-out operations are now blocking, as it would make little sense to try to push out a page, then get its status immediately afterward to remove it from the available free pages queue, if it's unlikely to have been freed. Another change is that if KVA allocation fails, the allocated memory segment will be freed and not leaked. There is a sysctl/tunable, defaulting to on, which causes the old contigmalloc() algorithm to be used. Nonetheless, I have been using vm.old_contigmalloc=0 for over a month. It is safe to switch at run-time to see the difference it makes. A new interface has been used which does not require mapping the allocated pages into KVA: vm_page.h functions vm_page_alloc_contig() and vm_page_release_contig(). These are what vm.old_contigmalloc=0 uses internally, so the sysctl/tunable does not affect their operation. When using the contigmalloc(9) and contigfree(9) interfaces, memory is now tracked with malloc(9) stats. Several functions have been exported from kern_malloc.c to allow other subsystems to use these statistics, as well. This invalidates the BUGS section of the contigmalloc(9) manpage.
* When calling scheduler entrypoints for creating new threads and processes,julian2004-07-187-37/+40
| | | | | | | | | | | specify "us" as the thread not the process/ksegrp/kse. You can always find the others from the thread but the converse is not true. Theorotically this would lead to runtime being allocated to the wrong entity in some cases though it is not clear how often this actually happenned. (would only affect threaded processes and would probably be pretty benign, but it WAS a bug..) Reviewed by: peter
* Now we have NO_ADAPTIVE_MUTEXES option, so use it here too.pjd2004-07-181-1/+1
| | | | Missed by: scottl
* After maintaining previous behaviour in writing out the core notes, it'smarcel2004-07-181-8/+5
| | | | | | | | | | | | | | | | | time now to break with the past: do not write the PID in the first note. Rationale: 1. [impact of the breakage] Process IDs in core files serve no immediate purpose to the debugger itself. They are only useful to relate a core file to a process. This can provide context to the person looking at the core file, provided one keeps track of this. Overall, not having the PID in the core file is only in very rare occasions unfortunate. 2. [reason of the breakage] Having one PRSTATUS note contain the PID, while all others contain the LWPID of the corresponding kernel thread creates an irregularity for the debugger that cannot easily be worked around. This is caused by libthread_db correlating user thread IDs to kernel thread (aka LWP) IDs and thus aware of the actual LWPIDs. Update comments accordingly.
* The recent changes to control message passing broke some thingsdwmalone2004-07-181-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | that get certain types of control messages (ping6 and rtsol are examples). This gets the new code closer to working: 1) Collect control mbufs for processing in the controlp == NULL case, so that they can be freed by externalize. 2) Loop over the list of control mbufs, as the externalize function may not know how to deal with chains. 3) In the case where there is no externalize function, remember to add the control mbuf to the controlp list so that it will be returned. 4) After adding stuff to the controlp list, walk to the end of the list of stuff that was added, incase we added a chain. This code can be further improved, but this is enough to get most things working again. Reviewed by: rwatson
* Add doxygen doc comments for most of newbus and the BUS interface.dfr2004-07-182-152/+1241
|
* Enable ADAPTIVE_MUTEXES by default by changing the sense of the option toscottl2004-07-181-3/+3
| | | | | | | | | NO_ADAPTIVE_MUTEXES. This option has been enabled by default on amd64 for quite some time, and has been extensively tested on i386 and sparc64. It shows measurable performance gains in many circumstances, and few negative effects. It would be nice in t he future if adaptive mutexes actually went to sleep after a certain amount of spinning, but that will require quite a bit more testing.
* Remove GIANT_REQUIRED from vmapbuf().alc2004-07-181-2/+0
|
* Drop Giant and acquire the UNIX domain socket subsystem lock a bitrwatson2004-07-181-4/+4
| | | | | | | | | earlier in unp_connect() so that vp->v_socket can't change between our copying its value to a local variable and later use of that variable. This may have been responsible for a panic during shutdown that I experienced where simultaneous closing of a listen socket by rpcbind and a new connection being made to rpcbind by mountd.
* Fix typo.davidxu2004-07-171-1/+1
|
* Add a kern_setsockopt and kern_getsockopt which can read the optiondwmalone2004-07-171-34/+86
| | | | | | values from either user land or from the kernel. Use them for [gs]etsockopt and to clean up some calls to [gs]etsockopt in the Linux emulation code that uses the stackgap.
* - Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflagsjhb2004-07-167-19/+25
| | | | | | | | | since they are only accessed by curthread and thus do not need any locking. - Move pr_addr and pr_ticks out of struct uprof (which is per-process) and directly into struct thread as td_profil_addr and td_profil_ticks as these variables are really per-thread. (They are used to defer an addupc_intr() that was too "hard" until ast()).
* Whitespace fix.jhb2004-07-161-1/+1
|
* Improve readability a bit by changing some code at the end of a functionjhb2004-07-161-6/+2
| | | | | | | | | | | | | | | | that did: if (foo) return else blah to just do the simpler if (!foo) blah instead.
* Add a SUSER_RUID flag to suser_cred. This flag indicates that we want tocperciva2004-07-161-3/+2
| | | | | | | check if the *real* user is the superuser (vs. the normal behaviour, which checks the effective user). Reviewed by: rwatson
* When entering soclose(), assert that SS_NOFDREF is not already set.rwatson2004-07-161-0/+2
|
* Preparation commit for the tty cleanups that will follow in the nearphk2004-07-155-10/+10
| | | | | | | | | future: rename ttyopen() -> tty_open() and ttyclose() -> tty_close(). We need the ttyopen() and ttyclose() for the new generic cdevsw functions for tty devices in order to have consistent naming.
* Do a pass over all modules in the kernel and make them return EOPNOTSUPPphk2004-07-155-2/+12
| | | | | | | | for unknown events. A number of modules return EINVAL in this instance, and I have left those alone for now and instead taught MOD_QUIESCE to accept this as "didn't do anything".
* Cleanup shutdown output.alfred2004-07-152-7/+4
|
* Tidy up system shutdown.alfred2004-07-152-6/+24
|
* Disable SIGIO for now, leave a comment as to why it's busted and hardalfred2004-07-151-0/+20
| | | | to fix.
* Clean up the output on reboot by keeping completion messages on the samenjl2004-07-151-2/+2
| | | | | | line as the announcement. Someone should probably update the "buffers remaining" message since we now no longer should have any buffers remaining at that point.
* A module with no modevent function gets modevent_nop() as default.phk2004-07-141-1/+9
| | | | | | | | | | | | | Until now the function has just returned zero for any event, but that is downright wrong for MOD_UNLOAD and not very useful for any future events we add where it may be crucial to be able to tell if the event was unhandled or successful. Change the function to return as follows: MOD_LOAD -> 0 MOD_UNLOAD -> EBUSY anything else -> EOPNOTSUPP
* In addition to the real user ID check, do an explicit jailcsjp2004-07-141-2/+3
| | | | | | | | | | check to ensure that the caller is not prison root. The intention is to fix file descriptor creation so that prison root can not use the last remaining file descriptors. This privilege should be reserved for non-jailed root users. Approved by: bmilekic (mentor)
* Make FIOASYNC, FIOSETOWN and FIOGETOWN work on kqueues.alfred2004-07-141-2/+29
|
* Set TDF_NEEDRESCHED when a higher priority thread is scheduled injhb2004-07-131-1/+1
| | | | | | | | | sched_add() rather than just doing it in sched_wakeup(). The old ithread preemption code used to set NEEDRESCHED unconditionally if it didn't preempt which masked this bug in SCHED_4BSD. Noticed by: jake Reported by: kensmith, marcel
* Give kldunload a -f(orce) argument.phk2004-07-137-18/+53
| | | | | | | | | | | | | | | | | Add a MOD_QUIESCE event for modules. This should return error (EBUSY) of the module is in use. MOD_UNLOAD should now only fail if it is impossible (as opposed to inconvenient) to unload the module. Valid reasons are memory references into the module which cannot be tracked down and eliminated. When kldunloading, we abandon if MOD_UNLOAD fails, and if -force is not given, MOD_QUIESCE failing will also prevent the unload. For backwards compatibility, we treat EOPNOTSUPP from MOD_QUIESCE as success. Document that modules should return EOPNOTSUPP for unknown events.
* Add kldunloadf() system call. Stay tuned for follwing commit messages.phk2004-07-131-0/+1
|
* fix compilation.phk2004-07-131-1/+1
|
* Replace "uid != 0" with "suser(td->td_ucred) != 0" when checking if we'vecperciva2004-07-131-1/+2
| | | | | hit the maximum number of processes. The last ten processes are reserved for the *non-jailed* superuser.
* Add code to support debugging threaded process.davidxu2004-07-131-75/+153
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel thread current user thread is running on. Add tm_dflags into kse_thr_mailbox, the flags is written by debugger, it tells UTS and kernel what should be done when the process is being debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER. TMDF_SSTEP is used to tell kernel to turn on single stepping, or turn off if it is not set. TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall whenever possible, to UTS, it means do not run the user thread until debugger clears it, this behaviour is necessary because gdb wants to resume only one thread when the thread's pc is at a breakpoint, and thread needs to go forward, in order to avoid other threads sneak pass the breakpoints, it needs to remove breakpoint, only wants one thread to go. Also, add km_lwp to kse_mailbox, the lwp id is copied to kse_thr_mailbox at context switch time when process is not being debugged, so when process is attached, debugger can map kernel thread to user thread. 2. Add p_xthread to proc strcuture and td_xsig to thread structure. p_xthread is used by a thread when it wants to report event to debugger, every thread can set the pointer, especially, when it is used in ptracestop, it is the last thread reporting event will win the race. Every thread has a td_xsig to exchange signal with debugger, thread uses TDF_XSIG flag to indicate it is reporting signal to debugger, if the flag is not cleared, thread will keep retrying until it is cleared by debugger, p_xthread may be used by debugger to indicate CURRENT thread. The p_xstat is still in proc structure to keep wait() to work, in future, we may just use td_xsig. 3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend a thread. When process stops, debugger can set the flag for thread, thread will check the flag in thread_suspend_check, enters a loop, unless it is cleared by debugger, process is detached or process is existing. The flag is also checked in ptracestop, so debugger can temporarily suspend a thread even if the thread wants to exchange signal. 4. Current, in ptrace, we always resume all threads, but if a thread has already a TDF_DBSUSPEND flag set by debugger, it won't run. Encouraged by: marcel, julian, deischen
* Implement following commands: PT_CLEARSTEP, PT_SETSTEP, PT_SUSPENDdavidxu2004-07-131-10/+109
| | | | PT_RESUME, PT_GETNUMLWPS, PT_GETLWPLIST.
* Add code to support debugging threaded process.davidxu2004-07-133-46/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel thread current user thread is running on. Add tm_dflags into kse_thr_mailbox, the flags is written by debugger, it tells UTS and kernel what should be done when the process is being debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER. TMDF_SSTEP is used to tell kernel to turn on single stepping, or turn off if it is not set. TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall whenever possible, to UTS, it means do not run the user thread until debugger clears it, this behaviour is necessary because gdb wants to resume only one thread when the thread's pc is at a breakpoint, and thread needs to go forward, in order to avoid other threads sneak pass the breakpoints, it needs to remove breakpoint, only wants one thread to go. Also, add km_lwp to kse_mailbox, the lwp id is copied to kse_thr_mailbox at context switch time when process is not being debugged, so when process is attached, debugger can map kernel thread to user thread. 2. Add p_xthread to proc strcuture and td_xsig to thread structure. p_xthread is used by a thread when it wants to report event to debugger, every thread can set the pointer, especially, when it is used in ptracestop, it is the last thread reporting event will win the race. Every thread has a td_xsig to exchange signal with debugger, thread uses TDF_XSIG flag to indicate it is reporting signal to debugger, if the flag is not cleared, thread will keep retrying until it is cleared by debugger, p_xthread may be used by debugger to indicate CURRENT thread. The p_xstat is still in proc structure to keep wait() to work, in future, we may just use td_xsig. 3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend a thread. When process stops, debugger can set the flag for thread, thread will check the flag in thread_suspend_check, enters a loop, unless it is cleared by debugger, process is detached or process is existing. The flag is also checked in ptracestop, so debugger can temporarily suspend a thread even if the thread wants to exchange signal. 4. Current, in ptrace, we always resume all threads, but if a thread has already a TDF_DBSUSPEND flag set by debugger, it won't run. Encouraged by: marcel, julian, deischen
* Push down the acquisition and release of the page queues lock intoalc2004-07-132-4/+0
| | | | | | | | pmap_remove_pages(). (The implementation of pmap_remove_pages() is optional. If pmap_remove_pages() is unimplemented, the acquisition and release of the page queues lock is unnecessary.) Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().
* Rename Alfred's kern_setsockopt to so_setsockopt, as this seems adwmalone2004-07-121-1/+1
| | | | | | | | a better name. I have a kern_[sg]etsockopt which I plan to commit shortly, but the arguments to these function will be quite different from so_setsockopt. Approved by: alfred
* writers must hold both sched_lock and the process lock; therefore, readersmtm2004-07-122-10/+5
| | | | need only obtain the process lock.
* Make VFS_ROOT() and vflush() take a thread argument.alfred2004-07-128-14/+15
| | | | | | This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.
* Change kse_switchin to accept kse_thr_mailbox pointer, the syscalldavidxu2004-07-122-11/+22
| | | | | | will be used heavily in debugging KSE threads. This breaks libpthread on IA64, but because libpthread was not in 5.2.1 release, I would like to change it so we needn't to introduce another syscall.
* Use SO_REUSEADDR and SO_REUSEPORT when reconnecting NFS mounts.alfred2004-07-121-0/+19
| | | | | | | Tune the timeout from 5 seconds to 12 seconds. Provide a sysctl to show how many reconnects the NFS client has done. Seems to fix IPv6 from: kuriyama
* Implement the PT_LWPINFO request. This request can be used by themarcel2004-07-123-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | tracing process to obtain information about the LWP that caused the traced process to stop. Debuggers can use this information to select the thread currently running on the LWP as the current thread. The request has been made compatible with NetBSD for as much as possible. This implementation differs from NetBSD in the following ways: 1. The data argument is allowed to be smaller than the size of the ptrace_lwpinfo structure known to the kernel, but not 0. This is opposite to what NetBSD allows. The reason for this is that we can extend the structure without affecting older binaries. 2. On NetBSD the tracing process is to set the pl_lwpid field to the Id of the LWP it wants information of. We don't do that. Our ptrace interface allows passing the LWP Id instead of the PID. The tracing process is to set the PID to the LWP Id it wants information of. 3. When the PID is actually the PID of the tracing process, this request returns the information about the LWP that caused the process to stop. This was the whole purpose of the request in the first place. When the traced process has exited, this request will return the LWP Id 0, indicating that the process state is not the result of an event specific to a LWP.
* Dump the actual bad values when this assertion is tripped.alfred2004-07-121-1/+3
|
* Make kdb_dbbe_select() available as an interface function. This allowsmarcel2004-07-121-9/+20
| | | | | | changing the backend from outside the KDB frontend. For example from within a backend. Rewrite kdb_sysctl_current to make use of this function as well.
* Use sockbuf_pushsync() to synchronize stack and socket buffer staterwatson2004-07-111-34/+47
| | | | | | | | | | | | | | | | | | | | | in soreceive() after removing an MT_SONAME mbuf from the head of the socket buffer. When processing MT_CONTROL mbufs in soreceive(), first remove all of the MT_CONTROL mbufs from the head of the socket buffer to a local mbuf chain, then feed them into dom_externalize() as a set, which both avoids thrashing the socket buffer lock when handling multiple control mbufs, and also avoids races with other threads acting on the socket buffer when the socket buffer mutex is released to enter the externalize code. Existing races that might occur if the protocol externalize method blocked during processing have also been closed. Now that we synchronize socket buffer and stack state following modifications to the socket buffer, turn the manual synchronization that previously followed control mbuf processing with a set of assertions. This can eventually be removed. The soreceive() code is now substantially more MPSAFE.
* Add sockbuf_pushsync(), an inline function that, following a change torwatson2004-07-111-0/+38
| | | | | | | | | | the head of the mbuf chains in a socket buffer, re-synchronizes the cache pointers used to optimize socket buffer appends. This will be used by soreceive() before dropping socket buffer mutexes to make sure a consistent version of the socket buffer is visible to other threads. While here, update copyright to account for substantial rewrite of much socket code required for fine-grained locking.
* Better descriptions of the cdev malloc class and mutex.phk2004-07-111-2/+2
|
* Add additional annotations to soreceive(), documenting the effects ofrwatson2004-07-111-1/+35
| | | | | | | | | | | locking on 'nextrecord' and concerns regarding potentially inconsistent or stale use of socket buffer or stack fields if they aren't carefully synchronized whenever the socket buffer mutex is released. Document that the high-level sblock() prevents races against other readers on the socket. Also document the 'type' logic as to how soreceive() guarantees that it will only return one of normal data or inline out-of-band data.
* Expand and rewrite documentation using doxygen markup so that we candfr2004-07-111-49/+206
| | | | generate funky web pages from it.
* Fix braino: Make sure there is a current backend before we return itsmarcel2004-07-111-2/+5
| | | | | | | name in the debug.kdb.current sysctl. All other dereferences are properly guarded, but this one was overlooked. Reported by: Morten Rodal (morten at rodal dot no)
OpenPOWER on IntegriCloud