1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
|
.\"
.\" Copyright (c) 2008-2009 Lawrence Stewart <lstewart@FreeBSD.org>
.\" Copyright (c) 2010-2011 The FreeBSD Foundation
.\" All rights reserved.
.\"
.\" Portions of this documentation were written at the Centre for Advanced
.\" Internet Architectures, Swinburne University of Technology, Melbourne,
.\" Australia by David Hayes and Lawrence Stewart under sponsorship from the
.\" FreeBSD Foundation.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd February 15, 2011
.Dt CC 9
.Os
.Sh NAME
.Nm cc ,
.Nm DECLARE_CC_MODULE ,
.Nm CC_VAR
.Nd Modular Congestion Control
.Sh SYNOPSIS
.In netinet/cc.h
.In netinet/cc/cc_module.h
.Fn DECLARE_CC_MODULE "ccname" "ccalgo"
.Fn CC_VAR "ccv" "what"
.Sh DESCRIPTION
The
.Nm
framework allows congestion control algorithms to be implemented as dynamically
loadable kernel modules via the
.Xr kld 4
facility.
Transport protocols can select from the list of available algorithms on a
connection-by-connection basis, or use the system default (see
.Xr cc 4
for more details).
.Pp
.Nm
modules are identified by an
.Xr ascii 7
name and set of hook functions encapsulated in a
.Vt "struct cc_algo" ,
which has the following members:
.Bd -literal -offset indent
struct cc_algo {
char name[TCP_CA_NAME_MAX];
int (*mod_init) (void);
int (*mod_destroy) (void);
int (*cb_init) (struct cc_var *ccv);
void (*cb_destroy) (struct cc_var *ccv);
void (*conn_init) (struct cc_var *ccv);
void (*ack_received) (struct cc_var *ccv, uint16_t type);
void (*cong_signal) (struct cc_var *ccv, uint32_t type);
void (*post_recovery) (struct cc_var *ccv);
void (*after_idle) (struct cc_var *ccv);
};
.Ed
.Pp
The
.Va name
field identifies the unique name of the algorithm, and should be no longer than
TCP_CA_NAME_MAX-1 characters in length (the TCP_CA_NAME_MAX define lives in
.In netinet/tcp.h
for compatibility reasons).
.Pp
The
.Va mod_init
function is called when a new module is loaded into the system but before the
registration process is complete.
It should be implemented if a module needs to set up some global state prior to
being available for use by new connections.
Returning a non-zero value from
.Va mod_init
will cause the loading of the module to fail.
.Pp
The
.Va mod_destroy
function is called prior to unloading an existing module from the kernel.
It should be implemented if a module needs to clean up any global state before
being removed from the kernel.
The return value is currently ignored.
.Pp
The
.Va cb_init
function is called when a TCP control block
.Vt struct tcpcb
is created.
It should be implemented if a module needs to allocate memory for storing
private per-connection state.
Returning a non-zero value from
.Va cb_init
will cause the connection set up to be aborted, terminating the connection as a
result.
.Pp
The
.Va cb_destroy
function is called when a TCP control block
.Vt struct tcpcb
is destroyed.
It should be implemented if a module needs to free memory allocated in
.Va cb_init .
.Pp
The
.Va conn_init
function is called when a new connection has been established and variables are
being initialised.
It should be implemented to initialise congestion control algorithm variables
for the newly established connection.
.Pp
The
.Va ack_received
function is called when a TCP acknowledgement (ACK) packet is received.
Modules use the
.Fa type
argument as an input to their congestion management algorithms.
The ACK types currently reported by the stack are CC_ACK and CC_DUPACK.
CC_ACK indicates the received ACK acknowledges previously unacknowledged data.
CC_DUPACK indicates the received ACK acknowledges data we have already received
an ACK for.
.Pp
The
.Va cong_signal
function is called when a congestion event is detected by the TCP stack.
Modules use the
.Fa type
argument as an input to their congestion management algorithms.
The congestion event types currently reported by the stack are CC_ECN, CC_RTO,
CC_RTO_ERR and CC_NDUPACK.
CC_ECN is reported when the TCP stack receives an explicit congestion notification
(RFC3168).
CC_RTO is reported when the retransmission time out timer fires.
CC_RTO_ERR is reported if the retransmission time out timer fired in error.
CC_NDUPACK is reported if N duplicate ACKs have been received back-to-back,
where N is the fast retransmit duplicate ack threshold (N=3 currently as per
RFC5681).
.Pp
The
.Va post_recovery
function is called after the TCP connection has recovered from a congestion event.
It should be implemented to adjust state as required.
.Pp
The
.Va after_idle
function is called when data transfer resumes after an idle period.
It should be implemented to adjust state as required.
.Pp
The
.Fn DECLARE_CC_MODULE
macro provides a convenient wrapper around the
.Xr DECLARE_MODULE 9
macro, and is used to register a
.Nm
module with the
.Nm
framework.
The
.Fa ccname
argument specifies the module's name.
The
.Fa ccalgo
argument points to the module's
.Vt struct cc_algo .
.Pp
.Nm
modules must instantiate a
.Vt struct cc_algo ,
but are only required to set the name field, and optionally any of the function
pointers.
The stack will skip calling any function pointer which is NULL, so there is no
requirement to implement any of the function pointers.
Using the C99 designated initialiser feature to set fields is encouraged.
.Pp
Each function pointer which deals with congestion control state is passed a
pointer to a
.Vt struct cc_var ,
which has the following members:
.Bd -literal -offset indent
struct cc_var {
void *cc_data;
int bytes_this_ack;
tcp_seq curack;
uint32_t flags;
int type;
union ccv_container {
struct tcpcb *tcp;
struct sctp_nets *sctp;
} ccvc;
};
.Ed
.Pp
.Vt struct cc_var
groups congestion control related variables into a single, embeddable structure
and adds a layer of indirection to accessing transport protocol control blocks.
The eventual goal is to allow a single set of
.Nm
modules to be shared between all congestion aware transport protocols, though
currently only
.Xr tcp 4
is supported.
.Pp
To aid the eventual transition towards this goal, direct use of variables from
the transport protocol's data structures is strongly discouraged.
However, it is inevitable at the current time to require access to some of these
variables, and so the
.Fn CC_VAR
macro exists as a convenience accessor.
The
.Fa ccv
argument points to the
.Vt struct cc_var
passed into the function by the
.Nm
framework.
The
.Fa what
argument specifies the name of the variable to access.
.Pp
Apart from the
.Va type
and
.Va ccv_container
fields, the remaining fields in
.Vt struct cc_var
are for use by
.Nm
modules.
.Pp
The
.Va cc_data
field is available for algorithms requiring additional per-connection state to
attach a dynamic memory pointer to.
The memory should be allocated and attached in the module's
.Va cb_init
hook function.
.Pp
The
.Va bytes_this_ack
field specifies the number of new bytes acknowledged by the most recently
received ACK packet.
It is only valid in the
.Va ack_received
hook function.
.Pp
The
.Va curack
field specifies the sequence number of the most recently received ACK packet.
It is only valid in the
.Va ack_received ,
.Va cong_signal
and
.Va post_recovery
hook functions.
.Pp
The
.Va flags
field is used to pass useful information from the stack to a
.Nm
module.
The CCF_ABC_SENTAWND flag is relevant in
.Va ack_received
and is set when appropriate byte counting (RFC3465) has counted a window's worth
of bytes has been sent.
It is the module's responsibility to clear the flag after it has processed the
signal.
The CCF_CWND_LIMITED flag is relevant in
.Va ack_received
and is set when the connection's ability to send data is currently constrained
by the value of the congestion window.
Algorithms should use the abscence of this flag being set to avoid accumulating
a large difference between the congestion window and send window.
.Sh SEE ALSO
.Xr cc 4 ,
.Xr cc_chd 4 ,
.Xr cc_cubic 4 ,
.Xr cc_hd 4 ,
.Xr cc_htcp 4 ,
.Xr cc_newreno 4 ,
.Xr cc_vegas 4 ,
.Xr tcp 4
.Sh ACKNOWLEDGEMENTS
Development and testing of this software were made possible in part by grants
from the FreeBSD Foundation and Cisco University Research Program Fund at
Community Foundation Silicon Valley.
.Sh FUTURE WORK
Integrate with
.Xr sctp 4 .
.Sh HISTORY
The modular Congestion Control (CC) framework first appeared in
.Fx 9.0 .
.Pp
The framework was first released in 2007 by James Healy and Lawrence Stewart
whilst working on the NewTCP research project at Swinburne University of
Technology's Centre for Advanced Internet Architectures, Melbourne, Australia,
which was made possible in part by a grant from the Cisco University Research
Program Fund at Community Foundation Silicon Valley.
More details are available at:
.Pp
http://caia.swin.edu.au/urp/newtcp/
.Sh AUTHORS
.An -nosplit
The
.Nm
framework was written by
.An Lawrence Stewart Aq lstewart@FreeBSD.org ,
.An James Healy Aq jimmy@deefa.com
and
.An David Hayes Aq david.hayes@ieee.org .
.Pp
This manual page was written by
.An David Hayes Aq david.hayes@ieee.org
and
.An Lawrence Stewart Aq lstewart@FreeBSD.org .
|