summaryrefslogtreecommitdiffstats
path: root/share/man/man9/scheduler.9
blob: c56fd0b6c6538b5c5d576f7d48aaabeba4eb7c93 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
.\" Copyright (c) 2000-2001 John H. Baldwin <jhb@FreeBSD.org>
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\"    notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\"    notice, this list of conditions and the following disclaimer in the
.\"    documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE DEVELOPERS ``AS IS'' AND ANY EXPRESS OR
.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
.\" IN NO EVENT SHALL THE DEVELOPERS BE LIABLE FOR ANY DIRECT, INDIRECT,
.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
.\" NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
.\" DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
.\" THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
.\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
.\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd November 3, 2000
.Dt SCHEDULER 9
.Os
.Sh NAME
.Nm curpriority_cmp ,
.Nm maybe_resched ,
.Nm resetpriority ,
.Nm roundrobin ,
.Nm roundrobin_interval ,
.Nm sched_setup ,
.Nm schedclock ,
.Nm schedcpu ,
.Nm setrunnable ,
.Nm updatepri
.Nd perform round-robin scheduling of runnable processes
.Sh SYNOPSIS
.In sys/param.h
.In sys/proc.h
.Ft int
.Fn curpriority_cmp "struct proc *p"
.Ft void
.Fn maybe_resched "struct thread *td"
.Ft void
.Fn propagate_priority "struct proc *p"
.Ft void
.Fn resetpriority "struct ksegrp *kg"
.Ft void
.Fn roundrobin "void *arg"
.Ft int
.Fn roundrobin_interval "void"
.Ft void
.Fn sched_setup "void *dummy"
.Ft void
.Fn schedclock "struct thread *td"
.Ft void
.Fn schedcpu "void *arg"
.Ft void
.Fn setrunnable "struct thread *td"
.Ft void
.Fn updatepri "struct thread *td"
.Sh DESCRIPTION
Each process has three different priorities stored in
.Vt "struct proc" :
.Va p_usrpri ,
.Va p_nativepri ,
and
.Va p_priority .
.Pp
The
.Va p_usrpri
member is the user priority of the process calculated from a process'
estimated CPU time and nice level.
.Pp
The
.Va p_nativepri
member is the saved priority used by
.Fn propagate_priority .
When a process obtains a mutex, its priority is saved in
.Va p_nativepri .
While it holds the mutex, the process's priority may be bumped by another
process that blocks on the mutex.
When the process releases the mutex, then its priority is restored to the
priority saved in
.Va p_nativepri .
.Pp
The
.Va p_priority
member is the actual priority of the process and is used to determine what
.Xr runqueue 9
it runs on, for example.
.Pp
The
.Fn curpriority_cmp
function compares the cached priority of the currently running process with
process
.Fa p .
If the currently running process has a higher priority, then it will return
a value less than zero.
If the current process has a lower priority, then it will return a value
greater than zero.
If the current process has the same priority as
.Fa p ,
then
.Fn curpriority_cmp
will return zero.
The cached priority of the currently running process is updated when a process
resumes from
.Xr tsleep 9
or returns to userland in
.Fn userret
and is stored in the private variable
.Va curpriority .
.Pp
The
.Fn maybe_resched
function compares the priorities of the current thread and
.Fa td .
If
.Fa td
has a higher priority than the current thread, then a context switch is
needed, and
.Dv KEF_NEEDRESCHED
is set.
.Pp
The
.Fn propagate_priority
looks at the process that owns the mutex
.Fa p
is blocked on.
That process's priority is bumped to the priority of
.Fa p
if needed.
If the process is currently running, then the function returns.
If the process is on a
.Xr runqueue 9 ,
then the process is moved to the appropriate
.Xr runqueue 9
for its new priority.
If the process is blocked on a mutex, its position in the list of
processes blocked on the mutex in question is updated to reflect its new
priority.
Then, the function repeats the procedure using the process that owns the
mutex just encountered.
Note that a process's priorities are only bumped to the priority of the
original process
.Fa p ,
not to the priority of the previously encountered process.
.Pp
The
.Fn resetpriority
function recomputes the user priority of the ksegrp
.Fa kg
(stored in
.Va kg_user_pri )
and calls
.Fn maybe_resched
to force a reschedule of each thread in the group if needed.
.Pp
The
.Fn roundrobin
function is used as a
.Xr timeout 9
function to force a reschedule every
.Va sched_quantum
ticks.
.Pp
The
.Fn roundrobin_interval
function simply returns the number of clock ticks in between reschedules
triggered by
.Fn roundrobin .
Thus, all it does is return the current value of
.Va sched_quantum .
.Pp
The
.Fn sched_setup
function is a
.Xr SYSINIT 9
that is called to start the callout driven scheduler functions.
It just calls the
.Fn roundrobin
and
.Fn schedcpu
functions for the first time.
After the initial call, the two functions will propagate themselves by
registering their callout event again at the completion of the respective
function.
.Pp
The
.Fn schedclock
function is called by
.Fn statclock
to adjust the priority of the currently running thread's ksegrp.
It updates the group's estimated CPU time and then adjusts the priority via
.Fn resetpriority .
.Pp
The
.Fn schedcpu
function updates all process priorities.
First, it updates statistics that track how long processes have been in various
process states.
Secondly, it updates the estimated CPU time for the current process such
that about 90% of the CPU usage is forgotten in 5 * load average seconds.
For example, if the load average is 2.00,
then at least 90% of the estimated CPU time for the process should be based
on the amount of CPU time the process has had in the last 10 seconds.
It then recomputes the priority of the process and moves it to the
appropriate
.Xr runqueue 9
if necessary.
Thirdly, it updates the %CPU estimate used by utilities such as
.Xr ps 1
and
.Xr top 1
so that 95% of the CPU usage is forgotten in 60 seconds.
Once all process priorities have been updated,
.Fn schedcpu
calls
.Fn vmmeter
to update various other statistics including the load average.
Finally, it schedules itself to run again in
.Va hz
clock ticks.
.Pp
The
.Fn setrunnable
function is used to change a process's state to be runnable.
The process is placed on a
.Xr runqueue 9
if needed, and the swapper process is woken up and told to swap the process in
if the process is swapped out.
If the process has been asleep for at least one run of
.Fn schedcpu ,
then
.Fn updatepri
is used to adjust the priority of the process.
.Pp
The
.Fn updatepri
function is used to adjust the priority of a process that has been asleep.
It retroactively decays the estimated CPU time of the process for each
.Fn schedcpu
event that the process was asleep.
Finally, it calls
.Fn resetpriority
to adjust the priority of the process.
.Sh SEE ALSO
.Xr mi_switch 9 ,
.Xr runqueue 9 ,
.Xr sleepqueue 9 ,
.Xr tsleep 9
.Sh BUGS
The
.Va curpriority
variable really should be per-CPU.
In addition,
.Fn maybe_resched
should compare the priority of
.Fa chk
with that of each CPU, and then send an IPI to the processor with the lowest
priority to trigger a reschedule if needed.
.Pp
Priority propagation is broken and is thus disabled by default.
The
.Va p_nativepri
variable is only updated if a process does not obtain a sleep mutex on the
first try.
Also, if a process obtains more than one sleep mutex in this manner, and
had its priority bumped in between, then
.Va p_nativepri
will be clobbered.
OpenPOWER on IntegriCloud