diff options
author | archie <archie@FreeBSD.org> | 2002-10-02 18:01:51 +0000 |
---|---|---|
committer | archie <archie@FreeBSD.org> | 2002-10-02 18:01:51 +0000 |
commit | f43114e2a8ad080fb49d9d5d3f2326231572fc56 (patch) | |
tree | 4319a1f9d364823baec8bcd4de8d04d018a6db1a /lib/libc/sys/kse.2 | |
parent | f5105eda7e9777bceb2fcf689986b7ece035cfd1 (diff) | |
download | FreeBSD-src-f43114e2a8ad080fb49d9d5d3f2326231572fc56.zip FreeBSD-src-f43114e2a8ad080fb49d9d5d3f2326231572fc56.tar.gz |
Add a man page for the KSE system calls.
Reviewed by: julian, ru
Diffstat (limited to 'lib/libc/sys/kse.2')
-rw-r--r-- | lib/libc/sys/kse.2 | 585 |
1 files changed, 585 insertions, 0 deletions
diff --git a/lib/libc/sys/kse.2 b/lib/libc/sys/kse.2 new file mode 100644 index 0000000..0b1ebbe --- /dev/null +++ b/lib/libc/sys/kse.2 @@ -0,0 +1,585 @@ +.\" Copyright (c) 2002 Packet Design, LLC. +.\" All rights reserved. +.\" +.\" Subject to the following obligations and disclaimer of warranty, +.\" use and redistribution of this software, in source or object code +.\" forms, with or without modifications are expressly permitted by +.\" Packet Design; provided, however, that: +.\" +.\" (i) Any and all reproductions of the source or object code +.\" must include the copyright notice above and the following +.\" disclaimer of warranties; and +.\" (ii) No rights are granted, in any manner or form, to use +.\" Packet Design trademarks, including the mark "PACKET DESIGN" +.\" on advertising, endorsements, or otherwise except as such +.\" appears in the above copyright notice or in the software. +.\" +.\" THIS SOFTWARE IS BEING PROVIDED BY PACKET DESIGN "AS IS", AND +.\" TO THE MAXIMUM EXTENT PERMITTED BY LAW, PACKET DESIGN MAKES NO +.\" REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, REGARDING +.\" THIS SOFTWARE, INCLUDING WITHOUT LIMITATION, ANY AND ALL IMPLIED +.\" WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, +.\" OR NON-INFRINGEMENT. PACKET DESIGN DOES NOT WARRANT, GUARANTEE, +.\" OR MAKE ANY REPRESENTATIONS REGARDING THE USE OF, OR THE RESULTS +.\" OF THE USE OF THIS SOFTWARE IN TERMS OF ITS CORRECTNESS, ACCURACY, +.\" RELIABILITY OR OTHERWISE. IN NO EVENT SHALL PACKET DESIGN BE +.\" LIABLE FOR ANY DAMAGES RESULTING FROM OR ARISING OUT OF ANY USE +.\" OF THIS SOFTWARE, INCLUDING WITHOUT LIMITATION, ANY DIRECT, +.\" INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, PUNITIVE, OR CONSEQUENTIAL +.\" DAMAGES, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES, LOSS OF +.\" USE, DATA OR PROFITS, HOWEVER CAUSED AND UNDER ANY THEORY OF +.\" LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +.\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF +.\" THE USE OF THIS SOFTWARE, EVEN IF PACKET DESIGN IS ADVISED OF +.\" THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd September 10, 2002 +.Dt KSE 2 +.Os +.Sh NAME +.Nm kse +.Nd "kernel support for user threads" +.Sh LIBRARY +.Lb libc +.Sh SYNOPSIS +.In sys/types.h +.In sys/kse.h +.Ft int +.Fn kse_create "struct kse_mailbox *mbx" "int newgroup" +.Ft int +.Fn kse_exit void +.Ft int +.Fn kse_release void +.Ft int +.Fn kse_wakeup "struct kse_mailbox *mbx" +.Ft int +.Fn kse_thr_interrupt "struct kse_thr_mailbox *tmbx" +.Sh DESCRIPTION +These functions implement kernel support for multi-threaded processes. +.\" +.Ss Overview +.\" +Traditionally, user threading has been implemented in one of two ways: +either all threads are managed in user space and the kernel is unaware +of any threading (also known as +.Dq "N to 1" ) , +or else separate processes sharing +a common memory space are created for each thread (also known as +.Dq "N to N" ) . +These approaches have advantages and disadvantages: +.Bl -column "- Cannot utilize multiple CPUs" "+ Can utilize multiple CPUs" +.It Sy "User threading Kernel threading" +.It "+ Lightweight - Heavyweight" +.It "+ User controls scheduling - Kernel controls scheduling" +.It "- Syscalls must be wrapped + No syscall wrapping required" +.It "- Cannot utilize multiple CPUs + Can utilize multiple CPUs" +.El +.Pp +The KSE system is a +hybrid approach that achieves the advantages of both the user and kernel +threading approaches. +The underlying philosophy of the KSE system is to give kernel support +for user threading without taking away any of the user threading library's +ability to make scheduling decisions. +A kernel-to-user upcall mechanism is used to pass control to the user +threading library whenever a scheduling decision needs to be made. +Arbitrarily many user threads are multiplexed onto a fixed number of +virtual CPUs supplied by the kernel. +This can be thought of as an +.Dq "N to M" +threading scheme. +.Pp +Some general implications of this approach include: +.Bl -bullet +.It +The user process can run multiple threads simultaneously on multi-processor +machines. +The kernel grants the process virtual CPUs to schedule as it +wishes; these may run concurrently on real CPUs. +.It +All operations that block in the kernel become asynchronous, allowing +the user process to schedule another thread when any thread blocks. +.It +Multiple thread schedulers within the same process are possible, and they +may operate independently of each other. +.El +.\" +.Ss Definitions +.\" +KSE allows a user process to have multiple +.Sy threads +of execution in existence at the same time, some of which may be blocked +in the kernel while others may be executing or blocked in user space. +A +.Sy "kernel scheduling entity" +(KSE) is a +.Dq "virtual CPU" +granted to the process for the purpose of executing threads. +A thread that is currently executing is always associated with +exactly one KSE, whether executing in user space or in the kernel. +The KSE is said to be +.Sy assigned +to the thread. +.Pp +The KSE becomes +.Sy unassigned , +and the associated thread is suspended, when the KSE has an associated +.Sy mailbox +(see below) and any of the following occurs: +.Bl -bullet +.It +The thread invokes a blocking system call. +.It +The thread makes any other demand of the kernel that cannot be immediately +satisfied, e.g., touches a page of memory that needs to be fetched from disk, +causing a page fault. +.It +Another thread that was previously blocked in the kernel completes its +work in the kernel (or is +.Sy interrupted ) +and becomes ready to return to user space. +.It +A signal is delivered to the process, and this KSE is chosen to deliver it. +.El +.Pp +In other words, as soon as there is a scheduling decision to be made, +the KSE becomes unassigned, because the kernel does not presume to know +how the process' other runnable threads should be scheduled. +Unassigned KSEs always return to user space as soon as possible via +the +.Sy upcall +mechanism (described below), allowing the user process to decide how +that KSE should be utilized next. +KSEs always complete as much work as possible in the kernel before +becoming unassigned. +.Pp +A +.Sy "KSE group" +is a collection of KSEs that are scheduled uniformly and which share +access to the same pool of threads, which are associated with the KSE group. +A KSE group is the smallest entity to which a kernel scheduling +priority may be assigned. +For the purposes of process scheduling and accounting, each +KSE group +counts the same as a traditional unthreaded process. +Individual KSEs within a KSE group are effectively indistinguishable, +and any KSE in a KSE group may be assigned by the kernel to any runnable +thread associated with that KSE group. +In practice, the kernel attempts to preserve the affinity between threads +and actual CPUs to optimize cache behavior, but this is invisible to the +user process. +.Pp +Each KSE has a unique +.Sy "KSE mailbox" +supplied by the user process. +A mailbox consists of a control structure containing a pointer to an +.Sy "upcall function" +and a user stack. +The KSE invokes this function whenever it becomes unassigned. +The kernel updates this structure with information about threads that have +become runnable and signals that have been delivered before each upcall. +Upcalls may be temporarily blocked by the user thread scheduling code +during critical sections. +.Pp +Each user thread has a unique +.Sy "thread mailbox" +as well. +Threads are referred to using pointers to these mailboxes when communicating +between the kernel and the user thread scheduler. +Each KSE's mailbox contains a pointer to the mailbox of the user thread +that the KSE is currently executing. +This pointer is saved when the thread blocks in the kernel. +.Pp +Whenever a thread blocked in the kernel is ready to return to user space, +it is added to the KSE group's list of +.Sy completed +threads. +This list is presented to the user code at the next upcall as a linked list +of thread mailboxes. +.\" +.Ss Managing KSEs +.\" +To become multi-threaded, a process must first invoke +.Fn kse_create . +.Fn kse_create +creates a new KSE (except for the very first invocation; see below). +The KSE will be associated with the mailbox pointed to by +.Fa mbx . +If +.Fa newgroup +is non-zero, a new KSE group is also created containing the KSE. +Otherwise, the new KSE is added to the current KSE group. +Newly created KSEs are initially unassigned; therefore, +they will upcall immediately. +.Pp +Each process initially has a single KSE in a single KSE group executing +a single user thread. +Since the KSE does not have an associated mailbox, it must remain assigned +to the thread and does not perform any upcalls. +The result is the traditional, unthreaded mode of operation. +Therefore, as a special case, the first call to +.Fn kse_create +by this initial thread with +.Fa newgroup +equal to zero does not create a new KSE; instead, it simply associates the +current KSE with the supplied KSE mailbox, and no immediate upcall results. +However, the upcall will be invoked the next time the thread blocks. +.Pp +The kernel does not allow more KSEs to exist in a KSE group than the +number of physical CPUs in the system (this number is available as the +.Xr sysctl 3 +variable +.Va hw.ncpu ) . +Having more KSEs than CPUs would not add any value to the user process, +as the additional KSEs would just compete with each other for access to +the real CPUs. +Since the extra KSEs would always be side-lined, the result +to the application would be exactly the same as having fewer KSEs. +There may however be arbitrarily many user threads, and it is up to the +user thread scheduler to handle mapping the application's user threads +onto the available KSEs. +.Pp +.Fn kse_exit +causes the KSE assigned to the currently running thread to be destroyed. +If this KSE is the last one in the KSE group, there must be no remaining +threads associated with the KSE group blocked in the kernel. +This function does not return. +.Pp +As a special case, if the last remaining KSE in the last remaining KSE group +invokes this function, then the KSE is not destroyed; +instead, the KSE just looses the association with its mailbox and +.Fn kse_exit +returns normally. +This returns the process to its original, unthreaded state. +.Pp +.Fn kse_release +is used to +.Dq park +the KSE assigned to the currently running thread when it is not needed, +e.g., when there are more available KSEs than runnable user threads. +The KSE remains unassigned but does not upcall until there is a new reason to +do so, e.g., a previously blocked thread becomes runnable. +If successful, +.Fn kse_release +does not return. +.Pp +.Fn kse_wakeup +is the opposite of +.Fn kse_release . +It causes the KSE associated with the mailbox pointed to by +.Fa mbx +to be woken up, causing it to upcall. +If the KSE has already woken up for another reason, this function has no +effect. +The +.Fa mbx +may be +.Dv NULL +to specify +.Dq "any KSE in the current KSE group" . +.Pp +.Fn kse_thr_interrupt +is used to interrupt a currently blocked thread. +The thread must either be blocked in the kernel or assigned to a KSE +(i.e., executing). +The thread is then marked as interrupted. +As soon as the thread invokes an interruptible system call (or immediately +for threads already blocked in one), the thread will be made runnable again, +even though the kernel operation may not have completed. +The effect on the interrupted system call is the same as if it had been +interrupted by a signal; typically this means an error is returned with +.Va errno +set to +.Er EINTR . +.\" +.Ss Signals +.\" +When a process has at least one KSE with an associated mailbox, then +signals are no longer delivered on the process stack. +Instead, signals are delivered via upcalls. +Multiple signals may be delivered with one upcall. +.Pp +If there are multiple KSE groups in the process, which KSE group is +chosen to deliver the signal is indeterminate. +However, once a signal has been delivered to a specific KSE group, +that KSE group then takes ownership of signal delivery and all subsequent +signals are delivered via that KSE group. +When this KSE group is destroyed, a new KSE group is chosen as needed. +.\" +.Ss KSE Mailboxes +.\" +Each KSE has a unique mailbox for user-kernel communication: +.Bd -literal +/* Upcall function type */ +typedef void kse_func_t(struct kse_mailbox *); + +/* KSE mailbox */ +struct kse_mailbox { + struct kse_thr_mailbox *km_curthread; /* Current thread */ + struct kse_thr_mailbox *km_completed; /* Completed threads */ + sigset_t km_sigscaught; /* Caught signals */ + unsigned int km_flags; /* KSE flags */ + kse_func_t *km_func; /* UTS function */ + stack_t km_stack; /* UTS context */ + void *km_udata; /* For use by the UTS */ +}; +.Ed +.Pp +.Va km_udata +is an opaque pointer ignored by the kernel. +.Pp +.Va km_func +points to the KSE's upcall function; +it will be invoked using +.Va km_stack , +which must remain valid for the lifetime of the KSE. +.Pp +.Va km_curthread +always points to the thread that is currently assigned to this KSE if any, +or +.Dv NULL +otherwise. +This field is modified by both the kernel and the user process as follows. +.Pp +When +.Va km_curthread +is not +.Dv NULL , +it is assumed to be pointing at the mailbox for the currently executing +thread, and the KSE may be unassigned, e.g., if the thread blocks in the +kernel. +The kernel will then save the contents of +.Va km_curthread +with the blocked thread, set +.Va km_curthread +to +.Dv NULL , +and upcall to invoke +.Fn km_func . +.Pp +When +.Va km_curthread +is +.Dv NULL , +the kernel will never perform any upcalls with this KSE; in other words, +the KSE remains assigned to the thread even if it blocks. +.Va km_curthread +must be +.Dv NULL +while the KSE is executing critical user thread scheduler +code that would be disrupted by an intervening upcall; +in particular, while +.Fn km_func +itself is executing. +.Pp +Before invoking +.Fn km_func +in any upcall, the kernel always sets +.Va km_curthread +to +.Dv NULL . +Once the user thread scheduler has chosen a new thread to run, +it should point +.Va km_curthread +at the thread's mailbox, re-enabling upcalls, and then resume the thread. +.Em Note : +modification of +.Va km_curthread +by the user thread scheduler must be atomic to avoid the race condition +where the kernel saves a partially modified value. +.Pp +.Va km_completed +points to a linked list of user threads that have completed their work +in the kernel since the last upcall. +The user thread scheduler should put these threads back into its +own runnable queue. +Each thread in a KSE group that completes is guaranteed to be +linked into exactly one KSE's +.Va km_completed +list; which KSE in the group, however, is indeterminate. +Furthermore, the thread will appear in only one upcall. +.Pp +.Va km_sigscaught +contains the list of signals caught by this process since the previous +upcall to any KSE in the process. +As long as there exists one or more KSEs with an associated mailbox in +the user process, signals are delivered this way rather than the +traditional way. +.Pp +.Va km_flags +may contain any of the following bits OR'ed together: +.Bl -tag -width indent +.It \& +(No flags are defined yet.) +.El +.\" +.Ss Thread Mailboxes +.\" +Each user thread must have associated with it a unique +.Vt "struct kse_thr_mailbox" : +.Bd -literal +/* Thread mailbox */ +struct kse_thr_mailbox { + ucontext_t tm_context; /* User thread context */ + unsigned int tm_flags; /* Thread flags */ + struct kse_thr_mailbox *tm_next; /* Next thread in list */ + void *tm_udata; /* For use by the UTS */ +}; +.Ed +.Pp +.Va tm_udata +is an opaque pointer ignored by the kernel. +.Pp +.Va tm_context +stores the context for the thread when the thread is blocked in user space. +This field is updated by the kernel before a completed thread is returned +to the user thread scheduler via +.Va km_completed . +.Pp +.Va tm_next +links the +.Va km_completed +threads together when returned by the kernel with an upcall. +The end of the list is marked with a +.Dv NULL +pointer. +.Pp +.Va tm_flags +may contain any of the following bits OR'ed together: +.Bl -tag -width indent +.It \& +(No flags are defined yet.) +.El +.Sh RETURN VALUES +.Fn kse_create , kse_wakeup , +and +.Fn kse_thr_interrupt +return zero if successful. +.Fn kse_exit +and +.Fn kse_release +do not return if successful. +.Pp +All of these functions return a non-zero error code in case of an error. +.Pp +.Em Note : +error codes are returned directly rather than via +.Va errno . +.Sh ERRORS +.Fn kse_create +will fail if: +.Bl -tag -width Er +.It Bq Er ENXIO +There are already as many KSEs in the KSE group as hardware processors. +.It Bq Er EAGAIN +The system-imposed limit on the total number of KSE groups under +execution would be exceeded. +The limit is given by the +.Xr sysctl 3 +MIB variable +.Dv KERN_MAXPROC . +(The limit is actually ten less than this +except for the super user.) +.It Bq Er EAGAIN +The user is not the super user, and the system-imposed limit on the total +number of KSE groups under execution by a single user would be exceeded. +The limit is given by the +.Xr sysctl 3 +MIB variable +.Dv KERN_MAXPROCPERUID . +.It Bq Er EAGAIN +The user is not the super user, and the soft resource limit corresponding +to the resource parameter +.Dv RLIMIT_NPROC +would be exceeded (see +.Xr getrlimit 2 ) . +.It Bq Er EFAULT +.Fa mbx +points to an address which is not a valid part of the process address space. +.El +.Pp +.Fn kse_exit +will fail if: +.Bl -tag -width Er +.It Bq Er EDEADLK +The current KSE is the last in its KSE group and there are still one or more +threads associated with the KSE group blocked in the kernel. +.It Bq Er ESRCH +The current KSE has no associated mailbox, i.e., the process is operating +in traditional, unthreaded mode (in this case use +.Xr exit 2 +to exit the process). +.El +.Pp +.Fn kse_release +will fail if: +.Bl -tag -width Er +.It Bq Er ESRCH +The current KSE has no associated mailbox, i.e., the process is operating is +traditional, unthreaded mode. +.El +.Pp +.Fn kse_wakeup +will fail if: +.Bl -tag -width Er +.It Bq Er ESRCH +.Fa mbx +is not +.Dv NULL +and the mailbox pointed to by +.Fa mbx +is not associated with any KSE in the process. +.It Bq Er ESRCH +.Fa mbx +is +.Dv NULL +and the current KSE has no associated mailbox, i.e., the process is operating +in traditional, unthreaded mode. +.El +.Pp +.Fn kse_thr_interrupt +will fail if: +.Bl -tag -width Er +.It Bq Er ESRCH +The thread corresponding to +.Fa tmbx +is neither currently assigned to any KSE in the process nor blocked in the +kernel. +.El +.Sh SEE ALSO +.Xr rfork 2 , +.Xr pthread 3 , +.Xr ucontext 3 +.Rs +.%A "Thomas E. Anderson" +.%A "Brian N. Bershad" +.%A "Edward D. Lazowska" +.%A "Henry M. Levy" +.%J "ACM Transactions on Computer Systems" +.%N Issue 1 +.%V Volume 10 +.%D February 1992 +.%I ACM Press +.%P pp. 53-79 +.%T "Scheduler activations: effective kernel support for the user-level management of parallelism" +.Re +.Sh HISTORY +The KSE function calls first appeared in +.Fx 5.0 . +.Sh AUTHORS +KSE was originally implemented by +.An -nosplit +.An "Julian Elischer" Aq julian@FreeBSD.org , +with additional contributions by +.An "Jonathan Mini" Aq mini@FreeBSD.org , +.An "Daniel Eischen" Aq deischen@FreeBSD.org , +and +.An "David Xu" Aq davidxu@FreeBSD.org . +.Pp +This manual page was written by +.An "Archie Cobbs" Aq archie@FreeBSD.org . +.Sh BUGS +The KSE code is +.Ud . |