summaryrefslogtreecommitdiffstats
path: root/lib/libc/sys/kse.2
diff options
context:
space:
mode:
authorarchie <archie@FreeBSD.org>2002-10-02 18:01:51 +0000
committerarchie <archie@FreeBSD.org>2002-10-02 18:01:51 +0000
commitf43114e2a8ad080fb49d9d5d3f2326231572fc56 (patch)
tree4319a1f9d364823baec8bcd4de8d04d018a6db1a /lib/libc/sys/kse.2
parentf5105eda7e9777bceb2fcf689986b7ece035cfd1 (diff)
downloadFreeBSD-src-f43114e2a8ad080fb49d9d5d3f2326231572fc56.zip
FreeBSD-src-f43114e2a8ad080fb49d9d5d3f2326231572fc56.tar.gz
Add a man page for the KSE system calls.
Reviewed by: julian, ru
Diffstat (limited to 'lib/libc/sys/kse.2')
-rw-r--r--lib/libc/sys/kse.2585
1 files changed, 585 insertions, 0 deletions
diff --git a/lib/libc/sys/kse.2 b/lib/libc/sys/kse.2
new file mode 100644
index 0000000..0b1ebbe
--- /dev/null
+++ b/lib/libc/sys/kse.2
@@ -0,0 +1,585 @@
+.\" Copyright (c) 2002 Packet Design, LLC.
+.\" All rights reserved.
+.\"
+.\" Subject to the following obligations and disclaimer of warranty,
+.\" use and redistribution of this software, in source or object code
+.\" forms, with or without modifications are expressly permitted by
+.\" Packet Design; provided, however, that:
+.\"
+.\" (i) Any and all reproductions of the source or object code
+.\" must include the copyright notice above and the following
+.\" disclaimer of warranties; and
+.\" (ii) No rights are granted, in any manner or form, to use
+.\" Packet Design trademarks, including the mark "PACKET DESIGN"
+.\" on advertising, endorsements, or otherwise except as such
+.\" appears in the above copyright notice or in the software.
+.\"
+.\" THIS SOFTWARE IS BEING PROVIDED BY PACKET DESIGN "AS IS", AND
+.\" TO THE MAXIMUM EXTENT PERMITTED BY LAW, PACKET DESIGN MAKES NO
+.\" REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, REGARDING
+.\" THIS SOFTWARE, INCLUDING WITHOUT LIMITATION, ANY AND ALL IMPLIED
+.\" WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE,
+.\" OR NON-INFRINGEMENT. PACKET DESIGN DOES NOT WARRANT, GUARANTEE,
+.\" OR MAKE ANY REPRESENTATIONS REGARDING THE USE OF, OR THE RESULTS
+.\" OF THE USE OF THIS SOFTWARE IN TERMS OF ITS CORRECTNESS, ACCURACY,
+.\" RELIABILITY OR OTHERWISE. IN NO EVENT SHALL PACKET DESIGN BE
+.\" LIABLE FOR ANY DAMAGES RESULTING FROM OR ARISING OUT OF ANY USE
+.\" OF THIS SOFTWARE, INCLUDING WITHOUT LIMITATION, ANY DIRECT,
+.\" INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, PUNITIVE, OR CONSEQUENTIAL
+.\" DAMAGES, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES, LOSS OF
+.\" USE, DATA OR PROFITS, HOWEVER CAUSED AND UNDER ANY THEORY OF
+.\" LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+.\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+.\" THE USE OF THIS SOFTWARE, EVEN IF PACKET DESIGN IS ADVISED OF
+.\" THE POSSIBILITY OF SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd September 10, 2002
+.Dt KSE 2
+.Os
+.Sh NAME
+.Nm kse
+.Nd "kernel support for user threads"
+.Sh LIBRARY
+.Lb libc
+.Sh SYNOPSIS
+.In sys/types.h
+.In sys/kse.h
+.Ft int
+.Fn kse_create "struct kse_mailbox *mbx" "int newgroup"
+.Ft int
+.Fn kse_exit void
+.Ft int
+.Fn kse_release void
+.Ft int
+.Fn kse_wakeup "struct kse_mailbox *mbx"
+.Ft int
+.Fn kse_thr_interrupt "struct kse_thr_mailbox *tmbx"
+.Sh DESCRIPTION
+These functions implement kernel support for multi-threaded processes.
+.\"
+.Ss Overview
+.\"
+Traditionally, user threading has been implemented in one of two ways:
+either all threads are managed in user space and the kernel is unaware
+of any threading (also known as
+.Dq "N to 1" ) ,
+or else separate processes sharing
+a common memory space are created for each thread (also known as
+.Dq "N to N" ) .
+These approaches have advantages and disadvantages:
+.Bl -column "- Cannot utilize multiple CPUs" "+ Can utilize multiple CPUs"
+.It Sy "User threading Kernel threading"
+.It "+ Lightweight - Heavyweight"
+.It "+ User controls scheduling - Kernel controls scheduling"
+.It "- Syscalls must be wrapped + No syscall wrapping required"
+.It "- Cannot utilize multiple CPUs + Can utilize multiple CPUs"
+.El
+.Pp
+The KSE system is a
+hybrid approach that achieves the advantages of both the user and kernel
+threading approaches.
+The underlying philosophy of the KSE system is to give kernel support
+for user threading without taking away any of the user threading library's
+ability to make scheduling decisions.
+A kernel-to-user upcall mechanism is used to pass control to the user
+threading library whenever a scheduling decision needs to be made.
+Arbitrarily many user threads are multiplexed onto a fixed number of
+virtual CPUs supplied by the kernel.
+This can be thought of as an
+.Dq "N to M"
+threading scheme.
+.Pp
+Some general implications of this approach include:
+.Bl -bullet
+.It
+The user process can run multiple threads simultaneously on multi-processor
+machines.
+The kernel grants the process virtual CPUs to schedule as it
+wishes; these may run concurrently on real CPUs.
+.It
+All operations that block in the kernel become asynchronous, allowing
+the user process to schedule another thread when any thread blocks.
+.It
+Multiple thread schedulers within the same process are possible, and they
+may operate independently of each other.
+.El
+.\"
+.Ss Definitions
+.\"
+KSE allows a user process to have multiple
+.Sy threads
+of execution in existence at the same time, some of which may be blocked
+in the kernel while others may be executing or blocked in user space.
+A
+.Sy "kernel scheduling entity"
+(KSE) is a
+.Dq "virtual CPU"
+granted to the process for the purpose of executing threads.
+A thread that is currently executing is always associated with
+exactly one KSE, whether executing in user space or in the kernel.
+The KSE is said to be
+.Sy assigned
+to the thread.
+.Pp
+The KSE becomes
+.Sy unassigned ,
+and the associated thread is suspended, when the KSE has an associated
+.Sy mailbox
+(see below) and any of the following occurs:
+.Bl -bullet
+.It
+The thread invokes a blocking system call.
+.It
+The thread makes any other demand of the kernel that cannot be immediately
+satisfied, e.g., touches a page of memory that needs to be fetched from disk,
+causing a page fault.
+.It
+Another thread that was previously blocked in the kernel completes its
+work in the kernel (or is
+.Sy interrupted )
+and becomes ready to return to user space.
+.It
+A signal is delivered to the process, and this KSE is chosen to deliver it.
+.El
+.Pp
+In other words, as soon as there is a scheduling decision to be made,
+the KSE becomes unassigned, because the kernel does not presume to know
+how the process' other runnable threads should be scheduled.
+Unassigned KSEs always return to user space as soon as possible via
+the
+.Sy upcall
+mechanism (described below), allowing the user process to decide how
+that KSE should be utilized next.
+KSEs always complete as much work as possible in the kernel before
+becoming unassigned.
+.Pp
+A
+.Sy "KSE group"
+is a collection of KSEs that are scheduled uniformly and which share
+access to the same pool of threads, which are associated with the KSE group.
+A KSE group is the smallest entity to which a kernel scheduling
+priority may be assigned.
+For the purposes of process scheduling and accounting, each
+KSE group
+counts the same as a traditional unthreaded process.
+Individual KSEs within a KSE group are effectively indistinguishable,
+and any KSE in a KSE group may be assigned by the kernel to any runnable
+thread associated with that KSE group.
+In practice, the kernel attempts to preserve the affinity between threads
+and actual CPUs to optimize cache behavior, but this is invisible to the
+user process.
+.Pp
+Each KSE has a unique
+.Sy "KSE mailbox"
+supplied by the user process.
+A mailbox consists of a control structure containing a pointer to an
+.Sy "upcall function"
+and a user stack.
+The KSE invokes this function whenever it becomes unassigned.
+The kernel updates this structure with information about threads that have
+become runnable and signals that have been delivered before each upcall.
+Upcalls may be temporarily blocked by the user thread scheduling code
+during critical sections.
+.Pp
+Each user thread has a unique
+.Sy "thread mailbox"
+as well.
+Threads are referred to using pointers to these mailboxes when communicating
+between the kernel and the user thread scheduler.
+Each KSE's mailbox contains a pointer to the mailbox of the user thread
+that the KSE is currently executing.
+This pointer is saved when the thread blocks in the kernel.
+.Pp
+Whenever a thread blocked in the kernel is ready to return to user space,
+it is added to the KSE group's list of
+.Sy completed
+threads.
+This list is presented to the user code at the next upcall as a linked list
+of thread mailboxes.
+.\"
+.Ss Managing KSEs
+.\"
+To become multi-threaded, a process must first invoke
+.Fn kse_create .
+.Fn kse_create
+creates a new KSE (except for the very first invocation; see below).
+The KSE will be associated with the mailbox pointed to by
+.Fa mbx .
+If
+.Fa newgroup
+is non-zero, a new KSE group is also created containing the KSE.
+Otherwise, the new KSE is added to the current KSE group.
+Newly created KSEs are initially unassigned; therefore,
+they will upcall immediately.
+.Pp
+Each process initially has a single KSE in a single KSE group executing
+a single user thread.
+Since the KSE does not have an associated mailbox, it must remain assigned
+to the thread and does not perform any upcalls.
+The result is the traditional, unthreaded mode of operation.
+Therefore, as a special case, the first call to
+.Fn kse_create
+by this initial thread with
+.Fa newgroup
+equal to zero does not create a new KSE; instead, it simply associates the
+current KSE with the supplied KSE mailbox, and no immediate upcall results.
+However, the upcall will be invoked the next time the thread blocks.
+.Pp
+The kernel does not allow more KSEs to exist in a KSE group than the
+number of physical CPUs in the system (this number is available as the
+.Xr sysctl 3
+variable
+.Va hw.ncpu ) .
+Having more KSEs than CPUs would not add any value to the user process,
+as the additional KSEs would just compete with each other for access to
+the real CPUs.
+Since the extra KSEs would always be side-lined, the result
+to the application would be exactly the same as having fewer KSEs.
+There may however be arbitrarily many user threads, and it is up to the
+user thread scheduler to handle mapping the application's user threads
+onto the available KSEs.
+.Pp
+.Fn kse_exit
+causes the KSE assigned to the currently running thread to be destroyed.
+If this KSE is the last one in the KSE group, there must be no remaining
+threads associated with the KSE group blocked in the kernel.
+This function does not return.
+.Pp
+As a special case, if the last remaining KSE in the last remaining KSE group
+invokes this function, then the KSE is not destroyed;
+instead, the KSE just looses the association with its mailbox and
+.Fn kse_exit
+returns normally.
+This returns the process to its original, unthreaded state.
+.Pp
+.Fn kse_release
+is used to
+.Dq park
+the KSE assigned to the currently running thread when it is not needed,
+e.g., when there are more available KSEs than runnable user threads.
+The KSE remains unassigned but does not upcall until there is a new reason to
+do so, e.g., a previously blocked thread becomes runnable.
+If successful,
+.Fn kse_release
+does not return.
+.Pp
+.Fn kse_wakeup
+is the opposite of
+.Fn kse_release .
+It causes the KSE associated with the mailbox pointed to by
+.Fa mbx
+to be woken up, causing it to upcall.
+If the KSE has already woken up for another reason, this function has no
+effect.
+The
+.Fa mbx
+may be
+.Dv NULL
+to specify
+.Dq "any KSE in the current KSE group" .
+.Pp
+.Fn kse_thr_interrupt
+is used to interrupt a currently blocked thread.
+The thread must either be blocked in the kernel or assigned to a KSE
+(i.e., executing).
+The thread is then marked as interrupted.
+As soon as the thread invokes an interruptible system call (or immediately
+for threads already blocked in one), the thread will be made runnable again,
+even though the kernel operation may not have completed.
+The effect on the interrupted system call is the same as if it had been
+interrupted by a signal; typically this means an error is returned with
+.Va errno
+set to
+.Er EINTR .
+.\"
+.Ss Signals
+.\"
+When a process has at least one KSE with an associated mailbox, then
+signals are no longer delivered on the process stack.
+Instead, signals are delivered via upcalls.
+Multiple signals may be delivered with one upcall.
+.Pp
+If there are multiple KSE groups in the process, which KSE group is
+chosen to deliver the signal is indeterminate.
+However, once a signal has been delivered to a specific KSE group,
+that KSE group then takes ownership of signal delivery and all subsequent
+signals are delivered via that KSE group.
+When this KSE group is destroyed, a new KSE group is chosen as needed.
+.\"
+.Ss KSE Mailboxes
+.\"
+Each KSE has a unique mailbox for user-kernel communication:
+.Bd -literal
+/* Upcall function type */
+typedef void kse_func_t(struct kse_mailbox *);
+
+/* KSE mailbox */
+struct kse_mailbox {
+ struct kse_thr_mailbox *km_curthread; /* Current thread */
+ struct kse_thr_mailbox *km_completed; /* Completed threads */
+ sigset_t km_sigscaught; /* Caught signals */
+ unsigned int km_flags; /* KSE flags */
+ kse_func_t *km_func; /* UTS function */
+ stack_t km_stack; /* UTS context */
+ void *km_udata; /* For use by the UTS */
+};
+.Ed
+.Pp
+.Va km_udata
+is an opaque pointer ignored by the kernel.
+.Pp
+.Va km_func
+points to the KSE's upcall function;
+it will be invoked using
+.Va km_stack ,
+which must remain valid for the lifetime of the KSE.
+.Pp
+.Va km_curthread
+always points to the thread that is currently assigned to this KSE if any,
+or
+.Dv NULL
+otherwise.
+This field is modified by both the kernel and the user process as follows.
+.Pp
+When
+.Va km_curthread
+is not
+.Dv NULL ,
+it is assumed to be pointing at the mailbox for the currently executing
+thread, and the KSE may be unassigned, e.g., if the thread blocks in the
+kernel.
+The kernel will then save the contents of
+.Va km_curthread
+with the blocked thread, set
+.Va km_curthread
+to
+.Dv NULL ,
+and upcall to invoke
+.Fn km_func .
+.Pp
+When
+.Va km_curthread
+is
+.Dv NULL ,
+the kernel will never perform any upcalls with this KSE; in other words,
+the KSE remains assigned to the thread even if it blocks.
+.Va km_curthread
+must be
+.Dv NULL
+while the KSE is executing critical user thread scheduler
+code that would be disrupted by an intervening upcall;
+in particular, while
+.Fn km_func
+itself is executing.
+.Pp
+Before invoking
+.Fn km_func
+in any upcall, the kernel always sets
+.Va km_curthread
+to
+.Dv NULL .
+Once the user thread scheduler has chosen a new thread to run,
+it should point
+.Va km_curthread
+at the thread's mailbox, re-enabling upcalls, and then resume the thread.
+.Em Note :
+modification of
+.Va km_curthread
+by the user thread scheduler must be atomic to avoid the race condition
+where the kernel saves a partially modified value.
+.Pp
+.Va km_completed
+points to a linked list of user threads that have completed their work
+in the kernel since the last upcall.
+The user thread scheduler should put these threads back into its
+own runnable queue.
+Each thread in a KSE group that completes is guaranteed to be
+linked into exactly one KSE's
+.Va km_completed
+list; which KSE in the group, however, is indeterminate.
+Furthermore, the thread will appear in only one upcall.
+.Pp
+.Va km_sigscaught
+contains the list of signals caught by this process since the previous
+upcall to any KSE in the process.
+As long as there exists one or more KSEs with an associated mailbox in
+the user process, signals are delivered this way rather than the
+traditional way.
+.Pp
+.Va km_flags
+may contain any of the following bits OR'ed together:
+.Bl -tag -width indent
+.It \&
+(No flags are defined yet.)
+.El
+.\"
+.Ss Thread Mailboxes
+.\"
+Each user thread must have associated with it a unique
+.Vt "struct kse_thr_mailbox" :
+.Bd -literal
+/* Thread mailbox */
+struct kse_thr_mailbox {
+ ucontext_t tm_context; /* User thread context */
+ unsigned int tm_flags; /* Thread flags */
+ struct kse_thr_mailbox *tm_next; /* Next thread in list */
+ void *tm_udata; /* For use by the UTS */
+};
+.Ed
+.Pp
+.Va tm_udata
+is an opaque pointer ignored by the kernel.
+.Pp
+.Va tm_context
+stores the context for the thread when the thread is blocked in user space.
+This field is updated by the kernel before a completed thread is returned
+to the user thread scheduler via
+.Va km_completed .
+.Pp
+.Va tm_next
+links the
+.Va km_completed
+threads together when returned by the kernel with an upcall.
+The end of the list is marked with a
+.Dv NULL
+pointer.
+.Pp
+.Va tm_flags
+may contain any of the following bits OR'ed together:
+.Bl -tag -width indent
+.It \&
+(No flags are defined yet.)
+.El
+.Sh RETURN VALUES
+.Fn kse_create , kse_wakeup ,
+and
+.Fn kse_thr_interrupt
+return zero if successful.
+.Fn kse_exit
+and
+.Fn kse_release
+do not return if successful.
+.Pp
+All of these functions return a non-zero error code in case of an error.
+.Pp
+.Em Note :
+error codes are returned directly rather than via
+.Va errno .
+.Sh ERRORS
+.Fn kse_create
+will fail if:
+.Bl -tag -width Er
+.It Bq Er ENXIO
+There are already as many KSEs in the KSE group as hardware processors.
+.It Bq Er EAGAIN
+The system-imposed limit on the total number of KSE groups under
+execution would be exceeded.
+The limit is given by the
+.Xr sysctl 3
+MIB variable
+.Dv KERN_MAXPROC .
+(The limit is actually ten less than this
+except for the super user.)
+.It Bq Er EAGAIN
+The user is not the super user, and the system-imposed limit on the total
+number of KSE groups under execution by a single user would be exceeded.
+The limit is given by the
+.Xr sysctl 3
+MIB variable
+.Dv KERN_MAXPROCPERUID .
+.It Bq Er EAGAIN
+The user is not the super user, and the soft resource limit corresponding
+to the resource parameter
+.Dv RLIMIT_NPROC
+would be exceeded (see
+.Xr getrlimit 2 ) .
+.It Bq Er EFAULT
+.Fa mbx
+points to an address which is not a valid part of the process address space.
+.El
+.Pp
+.Fn kse_exit
+will fail if:
+.Bl -tag -width Er
+.It Bq Er EDEADLK
+The current KSE is the last in its KSE group and there are still one or more
+threads associated with the KSE group blocked in the kernel.
+.It Bq Er ESRCH
+The current KSE has no associated mailbox, i.e., the process is operating
+in traditional, unthreaded mode (in this case use
+.Xr exit 2
+to exit the process).
+.El
+.Pp
+.Fn kse_release
+will fail if:
+.Bl -tag -width Er
+.It Bq Er ESRCH
+The current KSE has no associated mailbox, i.e., the process is operating is
+traditional, unthreaded mode.
+.El
+.Pp
+.Fn kse_wakeup
+will fail if:
+.Bl -tag -width Er
+.It Bq Er ESRCH
+.Fa mbx
+is not
+.Dv NULL
+and the mailbox pointed to by
+.Fa mbx
+is not associated with any KSE in the process.
+.It Bq Er ESRCH
+.Fa mbx
+is
+.Dv NULL
+and the current KSE has no associated mailbox, i.e., the process is operating
+in traditional, unthreaded mode.
+.El
+.Pp
+.Fn kse_thr_interrupt
+will fail if:
+.Bl -tag -width Er
+.It Bq Er ESRCH
+The thread corresponding to
+.Fa tmbx
+is neither currently assigned to any KSE in the process nor blocked in the
+kernel.
+.El
+.Sh SEE ALSO
+.Xr rfork 2 ,
+.Xr pthread 3 ,
+.Xr ucontext 3
+.Rs
+.%A "Thomas E. Anderson"
+.%A "Brian N. Bershad"
+.%A "Edward D. Lazowska"
+.%A "Henry M. Levy"
+.%J "ACM Transactions on Computer Systems"
+.%N Issue 1
+.%V Volume 10
+.%D February 1992
+.%I ACM Press
+.%P pp. 53-79
+.%T "Scheduler activations: effective kernel support for the user-level management of parallelism"
+.Re
+.Sh HISTORY
+The KSE function calls first appeared in
+.Fx 5.0 .
+.Sh AUTHORS
+KSE was originally implemented by
+.An -nosplit
+.An "Julian Elischer" Aq julian@FreeBSD.org ,
+with additional contributions by
+.An "Jonathan Mini" Aq mini@FreeBSD.org ,
+.An "Daniel Eischen" Aq deischen@FreeBSD.org ,
+and
+.An "David Xu" Aq davidxu@FreeBSD.org .
+.Pp
+This manual page was written by
+.An "Archie Cobbs" Aq archie@FreeBSD.org .
+.Sh BUGS
+The KSE code is
+.Ud .
OpenPOWER on IntegriCloud