FreeBSD-src - Raptor Engineering's fork of pfsense FreeBSD src with pfSense changes

	Commit message (Collapse)	Author	Age	Files	Lines
*	Last step to make mq_notify conform to POSIX standard, If the process	davidxu	2005-11-30	1	-0/+2
\| \| \| \| \|	has successfully attached a notification request to the message queue via a queue descriptor, file closing should remove the attachment.
*	Moderate rewrite of kernel ktrace code to attempt to generally improve	rwatson	2005-11-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>
*	Add support for queueing SIGCHLD same as other UNIX systems did.	davidxu	2005-11-08	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For each child process whose status has been changed, a SIGCHLD instance is queued, if the signal is stilling pending, and process changed status several times, signal information is updated to reflect latest process status. If wait() returns because the status of a child process is available, pending SIGCHLD signal associated with the child process is discarded. Any other pending SIGCHLD signals remain pending. The signal information is allocated at the same time when proc structure is allocated, if process signal queue is fully filled or there is a memory shortage, it can still send the signal to process. There is a booting time tunable kern.sigqueue.queue_sigchild which can control the behavior, setting it to zero disables the SIGCHLD queueing feature, the tunable will be removed if the function is proved that it is stable enough. Tested on: i386 (SMP and UP)
*	Add thread_find() function to search a thread by lwpid.	davidxu	2005-11-03	1	-0/+1
\|
*	Reorganize the interrupt handling code a bit to make a few things cleaner	jhb	2005-10-25	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and increase flexibility to allow various different approaches to be tried in the future. - Split struct ithd up into two pieces. struct intr_event holds the list of interrupt handlers associated with interrupt sources. struct intr_thread contains the data relative to an interrupt thread. Currently we still provide a 1:1 relationship of events to threads with the exception that events only have an associated thread if there is at least one threaded interrupt handler attached to the event. This means that on x86 we no longer have 4 bazillion interrupt threads with no handlers. It also means that interrupt events with only INTR_FAST handlers no longer have an associated thread either. - Renamed struct intrhand to struct intr_handler to follow the struct intr_foo naming convention. This did require renaming the powerpc MD struct intr_handler to struct ppc_intr_handler. - INTR_FAST no longer implies INTR_EXCL on all architectures except for powerpc. This means that multiple INTR_FAST handlers can attach to the same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach to the same interrupt. Sharing INTR_FAST handlers may not always be desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun either. Drivers can always still use INTR_EXCL to ask for an interrupt exclusively. The way this sharing works is that when an interrupt comes in, all the INTR_FAST handlers are executed first, and if any threaded handlers exist, the interrupt thread is scheduled afterwards. This type of layout also makes it possible to investigate using interrupt filters ala OS X where the filter determines whether or not its companion threaded handler should run. - Aside from the INTR_FAST changes above, the impact on MD interrupt code is mostly just 's/ithread/intr_event/'. - A new MI ddb command 'show intrs' walks the list of interrupt events dumping their state. It also has a '/v' verbose switch which dumps info about all of the handlers attached to each event. - We currently don't destroy an interrupt thread when the last threaded handler is removed because it would suck for things like ppbus(8)'s braindead behavior. The code is present, though, it is just under #if 0 for now. - Move the code to actually execute the threaded handlers for an interrrupt event into a separate function so that ithread_loop() becomes more readable. Previously this code was all in the middle of ithread_loop() and indented halfway across the screen. - Made struct intr_thread private to kern_intr.c and replaced td_ithd with a thread private flag TDP_ITHREAD. - In statclock, check curthread against idlethread directly rather than curthread's proc against idlethread's proc. (Not really related to intr changes) Tested on: alpha, amd64, i386, sparc64 Tested on: arm, ia64 (older version of patch by cognet and marcel)
*	Make p_itimers as a pointer, so file sys/proc.h does not need to include	davidxu	2005-10-23	1	-2/+1
\| \| \| \|	sys/timers.h.
*	Implement POSIX timers. Current only CLOCK_REALTIME and CLOCK_MONOTONIC	davidxu	2005-10-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	clock are supported. I have plan to merge XSI timer ITIMER_REAL and other two CPU timers into the new code, current three slots are available for the XSI timers. The SIGEV_THREAD notification type is not supported yet because our sigevent struct lacks of two member fields: sigev_notify_function sigev_notify_attributes I have found the sigevent is used in AIO, so I won't add the two members unless the AIO code is adjusted.
*	1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most	davidxu	2005-10-14	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code. 2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread. 3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc. 4. Add td_sigqueue to thread structure to hold all signals delivered to thread. 5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed. 6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals. Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64
*	Un-staticize runningbufwakeup() and staticize updateproc.	truckman	2005-09-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new private thread flag to indicate that the thread should not sleep if runningbufspace is too large. Set this flag on the bufdaemon and syncer threads so that they skip the waitrunningbufspace() call in bufwrite() rather than than checking the proc pointer vs. the known proc pointers for these two threads. A way of preventing these threads from being starved for I/O but still placing limits on their outstanding I/O would be desirable. Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from blocking on the runningbufspace check while holding snaplk. This prevents snaplk from being held for an arbitrarily long period of time if runningbufspace is high and greatly reduces the contention for snaplk. The disadvantage is that ffs_copyonwrite() can start a large amount of I/O if there are a large number of snapshots, which could cause a deadlock in other parts of the code. Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace before attempting to grab snaplk so that I/O requests waiting on snaplk are not counted in runningbufspace as being in-progress. Increment runningbufspace again before actually launching the original I/O request. Prior to the above two changes, the system could deadlock if enough I/O requests were blocked by snaplk to prevent runningbufspace from falling below lorunningspace and one of the bawrite() calls in ffs_copyonwrite() blocked in waitrunningbufspace() while holding snaplk. See <http://www.holm.cc/stress/log/cons143.html>
*	Use the refcount API to implement reference counts on process argument	jhb	2005-09-27	1	-5/+0
\| \| \| \| \| \| \|	structures rather than using a global mutex to protect the reference counts. Tested on: i386, alpha, sparc64
*	- Add a new simple facility for marking the current thread as being in a	jhb	2005-09-15	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	state where sleeping on a sleep queue is not allowed. The facility doesn't support recursion but uses a simple private per-thread flag (TDP_NOSLEEPING). The sleepq_add() function will panic if the flag is set and INVARIANTS is enabled. - Use this new facility to replace the g_xup and g_xdown mutexes that were (ab)used to achieve similar behavior. - Disallow sleeping in interrupt threads when invoking interrupt handlers. MFC after: 1 week Reviewed by: phk
*	Remove public declarations of variables that were forgotten when they were	obrien	2005-08-10	1	-1/+0
\| \| \| \|	made static.
*	Validate if the value written into {FS,GS}.base is a canonical	davidxu	2005-07-10	1	-1/+1
\| \| \| \| \| \| \| \| \|	address, writting non-canonical address can cause kernel a panic, by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring only canonical values get written to the registers. Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com > Approved by: re (scottl)
*	Use the proc mtx to prevent simultaneous changes to p_aioinfo.	alc	2005-05-30	1	-1/+1
\|
*	Remove thread_upcall_check, it was used to avoid race bug in earlier	davidxu	2005-05-27	1	-1/+0
\| \| \| \| \|	day's sleep queue code, today the bug no longer exists. please see 04/25/2004 freebsd-threads@ mailing list archive.
*	Use low level constructs borrowed from interrupt threads to wait for	ups	2005-05-23	1	-1/+2
\| \| \| \| \|	work in proc0. Remove the TDP_WAKEPROC0 workaround.
*	Change cpu_set_kse_upcall to more generic style, so we can reuse it	davidxu	2005-04-23	1	-1/+3
\| \| \| \| \| \| \|	in other codes. Add cpu_set_user_tls, use it to tweak user register and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall, but since cpu_set_kse_upcall is also used by M:N threads which may not need this feature, so I wrote a separated cpu_set_user_tls.
*	Fix a race condition between kern_wait() and thread_stopped().	davidxu	2005-04-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Problem is in kern_wait(), parent process steps through children list, once a child process is skipped, and later even if the child is stopped, parent process still sleeps in msleep(), the race happens if parent masked SIGCHLD. Submitted by : Peter Edwards peadar.edwards at gmail dot com MFC after : 4 days
*	Bring a working snapshot of hwpmc(4), its associated libraries, userland ↵	jkoshy	2005-04-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	utilities and documentation into -CURRENT. Bump FreeBSD_version. Reviewed by: alc, jhb (kernel changes)
*	Introduce p_canwait() and MAC Framework and MAC Policy entry points	rwatson	2005-04-18	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mac_check_proc_wait(), which control the ability to wait4() specific processes. This permits MAC policies to limit information flow from children that have changed label, although has to be handled carefully due to common programming expectations regarding the behavior of wait4(). The cr_seeotheruids() check in p_canwait() is #if 0'd for this reason. The mac_stub and mac_test policies are updated to reflect these new entry points. Sponsored by: SPAWAR, SPARTA Obtained from: TrustedBSD Project
*	Sprinkle some volatile magic and rearrange things a bit to avoid race	ups	2005-04-08	1	-2/+4
\| \| \| \| \| \|	conditions in critical_exit now that it no longer blocks interrupts. Reviewed by: jhb
*	Bring back the WITNESS_WARN() check to _STOPEVENT() as all the callers have	jhb	2005-03-31	1	-0/+2
\| \| \| \|	been fixed for quite a while now.
*	When the softupdates worklist gets too long, threads that attempt to	das	2005-03-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	add more work are forced to process two worklist items first. However, processing an item may generate additional work, causing the unlucky thread to recursively process the worklist. Add a per-thread flag to detect this situation and avoid the recursion. This should fix the stack overflows that could occur while removing large directory trees. Tested by: kris Reviewed by: mckusick
*	In stange circumstances we may end up being the last reference to a	phk	2005-03-17	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \|	session in tprintf(). SESSRELE() needs to properly dispose of the sessions mutex. Add sessrele() which does the proper cleanup and have SESSRELE() call it. Use SESSRELE also in pgdelete(). Found by: Coverity (ID:526)
*	Allocate umtx_q from heap instead of stack, this avoids	davidxu	2005-03-05	1	-1/+1
\| \| \| \|	page fault panic in kernel under heavy swapping.
*	The td_waitset is pointing to a stack address when thread is waiting	davidxu	2005-03-04	1	-1/+0
\| \| \| \| \| \|	for a signal, because kernel stack is swappable, this causes page fault in kernel under heavy swapping case. Fix this bug by eliminating unneeded code.
*	Backout previous change (disabling of security checks for signals delivered	sobomax	2005-02-13	1	-4/+2
\| \| \| \| \| \|	in emulation layers), since it appears to be too broad. Requested by: rwatson
*	Split out kill(2) syscall service routine into user-level and kernel part, the	sobomax	2005-02-13	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	former is callable from user space and the latter from the kernel one. Make kernel version take additional argument which tells if the respective call should check for additional restrictions for sending signals to suid/sugid applications or not. Make all emulation layers using non-checked version, since signal numbers in emulation layers can have different meaning that in native mode and such protection can cause misbehaviour. As a result remove LIBTHR from the signals allowed to be delivered to a suid/sugid application. Requested (sorta) by: rwatson MFC after: 2 weeks
*	Bit 0 of td_flags is now used by the priority borrowing flag, so remove	jhb	2005-01-06	1	-1/+0
\| \| \| \|	the unused placeholder constant.
*	Rework the interface between priority propagation (lending) and the	jhb	2004-12-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	schedulers a bit to ensure more correct handling of priorities and fewer priority inversions: - Add two functions to the sched(9) API to handle priority lending: sched_lend_prio() and sched_unlend_prio(). The turnstile code uses these functions to ask the scheduler to lend a thread a set priority and to tell the scheduler when it thinks it is ok for a thread to stop borrowing priority. The unlend case is slightly complex in that the turnstile code tells the scheduler what the minimum priority of the thread needs to be to satisfy the requirements of any other threads blocked on locks owned by the thread in question. The scheduler then decides where the thread can go back to normal mode (if it's normal priority is high enough to satisfy the pending lock requests) or it it should continue to use the priority specified to the sched_unlend_prio() call. This involves adding a new per-thread flag TDF_BORROWING that replaces the ULE-only kse flag for priority elevation. - Schedulers now refuse to lower the priority of a thread that is currently borrowing another therad's priority. - If a scheduler changes the priority of a thread that is currently sitting on a turnstile, it will call a new function turnstile_adjust() to inform the turnstile code of the change. This function resorts the thread on the priority list of the turnstile if needed, and if the thread ends up at the head of the list (due to having the highest priority) and its priority was raised, then it will propagate that new priority to the owner of the lock it is blocked on. Some additional fixes specific to the 4BSD scheduler include: - Common code for updating the priority of a thread when the user priority of its associated kse group has been consolidated in a new static function resetpriority_thread(). One change to this function is that it will now only adjust the priority of a thread if it already has a time sharing priority, thus preserving any boosts from a tsleep() until the thread returns to userland. Also, resetpriority() no longer calls maybe_resched() on each thread in the group. Instead, the code calling resetpriority() is responsible for calling resetpriority_thread() on any threads that need to be updated. - schedcpu() now uses resetpriority_thread() instead of just calling sched_prio() directly after it updates a kse group's user priority. - sched_clock() now uses resetpriority_thread() rather than writing directly to td_priority. - sched_nice() now updates all the priorities of the threads after the group priority has been adjusted. Discussed with: bde Reviewed by: ups, jeffr Tested on: 4bsd, ule Tested on: i386, alpha, sparc64
*	1. make umtx sharable between processes, the way is two or more processes	davidxu	2004-12-18	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	call mmap() to create a shared space, and then initialize umtx on it, after that, each thread in different processes can use the umtx same as threads in same process. 2. introduce a new syscall _umtx_op to support timed lock and condition variable semantics. also, orignal umtx_lock and umtx_unlock inline functions now are reimplemented by using _umtx_op, the _umtx_op can use arbitrary id not just a thread id.
*	- Garbage collect several unused members of struct kse and struce ksegrp.	jeff	2004-12-14	1	-2/+0
\| \| \| \|	As best as I can tell, some of these were never used.
*	Remove the p_uarea and p_upages_obj fields from struct proc.	das	2004-11-20	1	-2/+3
\| \| \| \| \| \|	Prototype new routines to allocate, copy, and free pstats. Reviewed by: arch@
*	Add a new per-thread private flag: TDP_GEOM.	phk	2004-10-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This flag gets set whenever the thread posts an event on the GEOM event queue, and if the flag is set when the thread is prepared to return to userland from the kernel, g_waitidle() will be called to make sure that the posted events have completed. This can replace an insufficient number of g_waitidle() calls in various other places, and has the advantage of being failsafe: Any system call which does a VOP_OPEN()/VOP_CLOSE will now correctly wait for any geom events it posted as part of spoils or tastes. Assert that topology and Giant is not held in g_waitidle().
*	If a process needs to be swapped in, wakeup the swapper from within	scottl	2004-10-16	1	-1/+1
\| \| \| \| \| \| \| \| \|	critical_exit as the process is getting scheduled to run. This is subotimal but for now avoid the LOR between the scheduler and the sleepq systems. This is a 5.3 candidate. Submitted by: davidxu MFC After: 3 days
*	In original kern_execve() code, at the start of the function, it forces	davidxu	2004-10-06	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	all other threads to suicide, problem is execve() could be failed, and a failed execve() would change threaded process to unthreaded, this side effect is unexpected. The new code introduces a new single threading mode SINGLE_BOUNDARY, in the mode, all threads should suspend themself at user boundary except the singler. we can not use SINGLE_NO_EXIT because we want to start from a clean state if execve() is successful, suspending other threads at unknown point and later resuming them from there and forcing them to exit at user boundary may cause the process to start from a dirty state. If execve() is successful, current thread upgrades to SINGLE_EXIT mode and forces other threads to suicide at user boundary, otherwise, other threads will be resumed and their interrupted syscall will be restarted. Reviewed by: julian
*	When preempting a thread, put it back on the HEAD of its run queue.	julian	2004-10-05	1	-0/+2
\| \| \| \| \| \|	(Only really implemented in 4bsd) MFC after: 4 days
*	Break out to a separate function, the code to revert a multithreaded	julian	2004-10-05	1	-0/+1
\| \| \| \| \| \|	process back to officially being a non-threaded program. MFC after: 4 days
*	Rework how we store process times in the kernel such that we always store	jhb	2004-10-05	1	-7/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month
*	Forward declare struct kaioinfo to un-void a pointer in struct proc.	alfred	2004-09-30	1	-1/+2
\|
*	A modest collection of various and sundry style, spelling, and whitespace	jhb	2004-09-24	1	-60/+62
\| \| \| \| \| \|	fixes. Submitted by: bde (mostly)
*	Update locking notes on several fields to reflect locking already in the	jhb	2004-09-23	1	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \|	tree: - td_standin is (k + a) as it is only touched by either curthread or when a thread is being created. - td_upcall is (k + j) - td_sticks is (k) rather than the earlier (j) note. - td_uuticks and td_usticks are both (k). - td_intrval is (j) - Neither kg_nextupcall or kg_upquantum seem to be locked and that seems to be on purpose, so mark those as (n).
*	Tidy a comment.	jhb	2004-09-13	1	-1/+1
\|
*	Revert the previous round of changes to td_pinned. The scheduler isn't	scottl	2004-09-11	1	-0/+1
\| \| \| \| \| \| \| \|	fully initialed when the pmap layer tries to call sched_pini() early in the boot and results in an quick panic. Use ke_pinned instead as was originally done with Tor's patch. Approved by: julian
*	Make up my mind if cpu pinning is stored in the thread structure or the	julian	2004-09-10	1	-1/+0
\| \| \| \| \| \| \| \| \|	scheduler specific extension to it. Put it in the extension as the implimentation details of how the pinning is done needn't be visible outside the scheduler. Submitted by: tegge (of course!) (with changes) MFC after: 3 days
*	Refactor a bunch of scheduler code to give basically the same behaviour	julian	2004-09-05	1	-93/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
*	Give setrunqueue() and sched_add() more of a clue as to	julian	2004-09-01	1	-1/+7
\| \| \| \| \| \|	where they are coming from and what is expected from them. MFC after: 2 days
*	Note free bit.. as per the flags word.	julian	2004-08-31	1	-0/+1
\| \| \| \|	MFC after: 2 days
*	Remove TDP_USTATCLOCK, we no longer need it because we now always	davidxu	2004-08-31	1	-1/+0
\| \| \| \| \| \| \| \| \|	update tick count for userland in thread_userret. This change also removes a "no upcall owned" panic because fuword() schedules an upcall under heavily loaded, and code assumes there is no upcall can occur. Reported and Tested by: Peter Holm <peter@holm.cc>
*	Remove an unneeded argument..	julian	2004-08-31	1	-1/+1
\| \| \| \| \| \| \| \| \|	The removed argument could trivially be derived from the remaining one. That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument. Having both proc and thread as an argumen tjust gives an opportunity for them to get out sync. MFC after: 3 days