summaryrefslogtreecommitdiffstats
path: root/share/doc/iso/wisc/trans_design.nr
blob: 32946ad287505ebb4b326fbe9a3810169999bd23 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
.\"	$FreeBSD$
.\"
.NC "The Design of the ARGO Transport Entity"
.sh 1 "Protocol Hooks"
.pp
The design of the AOS kernel IPC support to some
extent mandates the
design of protocols. 
Each protocol must provide the following 
protocol hooks, which are procedures called through a
protocol switch table
(an array of type \fIprotosw\fR as described in
Chapter Five.
.ip "pr_input()" 5
Called when data are to be passed up from a lower layer.
.ip "pr_output()" 5
Called when data are to be passed down from a higher layer.
.ip "pr_init()" 5
Called when the system is brought up.
.ip "pr_fasttimo()" 5
Called every 200 milliseconds by the clock functional unit.
.ip "pr_slowtimo()" 5
Called every 500 milliseconds by the clock functional unit.
.ip "pr_drain()" 5
This is meant to be called when buffer space is low.
Each protocol is expected to provide this routine to free
non-critical buffer space.
This is not yet called anywhere.
.ip "pr_ctlinput()" 5
Used for exchanging information between
protocols, such as notifying a transport protocol of changes
in routing or configuration information.
.ip "pr_ctloutput()" 5
Supports the protocol-dependent 
\fIgetsockopt()\fR
and 
\fIsetsockopt()\fR
options.
.ip "pr_usrreq()" 5
Called by the socket code to pass along a \*(lquser request\*(rq -
in other words a service primitive.
This call is also used for other protocol functions.
The functions served by the \fIpr_usrreq()\fR routine are:
.ip "     PRU_ATTACH" 10
Creates a protocol control block and attaches it to a given socket.
Called as a result of a \fIsocket()\fR system call.
.ip "     PRU_DISCONNECT" 10
Called as a result of a 
\fIclose()\fR system call.
Initiates disconnection.
.ip "     PRU_DETACH" 10
Disassociates a protocol control block from a socket and recycles
the buffer space used for the protocol control block.
Called after PRU_DISCONNECT.
.ip "     PRU_SHUTDOWN" 10
Called as a result of a 
\fIshutdown()\fR system call.
If the protocol supports the notion of half-open connections,
this closes the connection in one direction or both directions,
depending on the arguments passed to
\fIshutdown\fR.
.ip "     PRU_BIND" 10
Gives an address to a socket.
Called as a result of a 
\fIbind()\fR system call, also
when 
socket without a bound address is used.
In the latter case, an unused transport suffix is located and
bound to the socket.
.ip "     PRU_LISTEN" 10
Called as a result of a 
\fIlisten()\fR system call.
Marks the socket as willing to queue incoming connection
requests.
.ip "     PRU_CONNECT" 10
Called as a result of a 
\fIconnect()\fR system call.
Initiates a connection request.
.ip "     PRU_ACCEPT" 10
Called as a result of an 
\fIaccept()\fR system call.
Dequeues a pending connection request, or blocks waiting for
a connection request to arrive.
In the latter case, it marks the socket as willing to accept
connections.
.ip "     PRU_RCVD" 10
The protocol module is expected to have put incoming data
into the socket's receive buffer, \fIso_rcv\fR.
When a receive primitive is used
(\fIrecv(), recvmsg(), recvfrom(),
read(), readv(), \fRand 
\fIrecvv()\fR system calls)
the socket code module copies data from the
\fIso_rcv\fR to the user's
address space.
The protocol module may arrange to be informed each time the socket code
does this, in which case the socket code calls \fIpr_usrreq\fR(PRU_RCVD)
after the data were copied to the user.
.ip "     PRU_SEND" 10
This performs the protocol-dependent part of a send primitive
(\fIsend(), sendmsg(), sendto(), write(), writev(), 
\fRand \fIsendv()\fR system calls).
The socket code 
(procedures \fIsendit() and \fIsosend()\fR)
moves outgoing data from the user's
address space into a chain of \fImbufs\fR.
The socket code takes as much data from the user as it
determines will fit into the outgoing socket buffer, so_snd. 
It passes this much data in the form of an mbuf chain to the protocol
via \fIpr_usrreq\fR(PRU_SEND).
If there are more data than 
the so_snd can accommodate,
the socket code, which is running on behalf of a user process,
puts the user process to sleep.
The protocol module is expected to wake up the user process when
more room appears in so_snd.
.ip "     PRU_ABORT" 10
Called when a socket is closed and that socket
is accepting connections and has
queued pending
connection requests or
partially open connections.
.ip "     PRU_CONTROL" 10
Called as a result of an 
\fIioctl()\fR system call.
.ip "     PRU_SENSE" 10
Called as a result of an 
\fIfstat()\fR system call.
.ip "     PRU_RCVOOB" 10
Performs the work of receiving \*(lqout-of-band\*(rq data.
The socket module has already allocated an mbuf into which
the protocol module is expected to put the incoming 
\*(lqout-of-band\*(rq data.
The socket code will then move the data from this mbuf
to the user's address space.
.ip "     PRU_SENDOOB" 10
Performs the work of sending \*(lqout-of-band\*(rq data.
The socket module has already moved the data
from the user's address space into a chain of mbufs,
which it now passes to the protocol module.
.ip "     PRU_SOCKADDR" 10
Supports the system call
\fIgetsockname()\fR.
Puts the socket's bound address into an mbuf.
.ip "     PRU_PEERADDR" 10
Supports the system call
\fIgetpeername\fR().
Puts the peer's address into an mbuf.
.ip "     PRU_CONNECT2" 10
This is used in the Unix domain to support pipes.
It is not generally supported by transport protocols.
.ip "     PRU_FASTTIMO, PRU_SLOWTIMO" 10
These are superfluous.
None of the transport protocols uses them.
.ip "     PRU_PROTORCV, PRU_PROTOSEND" 10
None of the transport protocols uses these.
.ip "     PRU_SENDEOT" 10
This was added to support TP.
This indicates that the end of the data sent in this
send primitive should
be marked by the protocol as the end of the TSDU.
.sh 1 "The Interface Between the Transport Entity and Lower Layers"
.pp
The transport layer may run over a network layer such as IP
or the ISO connectionless network layer,
or it may run over a multi-purpose layer such as the service
provided by X.25.
X.25 is viewed as a network layer when
TP runs over X.25, and as a 
subnetwork layer 
when IP is running over X.25.
The software interface between data link and network layers differs
considerably from the software interface between transport and network
layers in AOS.
For this reason some modification of the transport-to-lower-layer
interface is necessary to support the suite of protocols included in 
ARGO.
.pp
In AOS it is assumed that the transport layer will run over one
and only one network layer, and therefore it may call the
network layer output procedure directly.
In order to allow TP to run over a set of lower layers,
all domain-specific functions have been put into a set of routines
that are called indirectly through a domain-specific switch table.
The primary reason for this is that the transport and network
layers share information, mostly information pertaining to addresses.
The protocol control blocks for different network layers
differ, so the transport layer cannot just directly
access the network layer's pcb.
Similarly, a network layer may not directly access the transport
pcb because a multitude of transport protocols can run over each
of the network protocols.
.pp
To permit different network-layer protocol control blocks to coexist
under one transport layer, all transport-dependent control
information was put into a transport-specific protocol control block.
A new field, \fIso_tpcb\fR,
was added to the \fIsocket\fR structure to hold a pointer to
the transport-layer protocol control block. 
The existing
field \fCso_pcb\fR is used for the network layer pcb.
.pp
The following structure was added to allow domain-specific
functions to be called indirectly.
All these functions operate on a network-layer pcb.
.pp
.(b
\fC
.TS
tab(+);
l s s s.
struct nl_protosw {
.T&
l l l l.
+int+nlp_afamily;+/* address family */
+int+(*nlp_putnetaddr)();+/* puts addrs in pcb */
+int+(*nlp_getnetaddr)();+/* gets addrs from pcb */
+int+(*nlp_putsufx)();+/* transp suffix -> pcb */
+int+(*nlp_getsufx)();+/* gets t-suffix */
+int+(*nlp_recycle_suffix)();+/* zeroes suffix */
+int+(*nlp_mtu)();+/* get maximum
+++transmission unit size */
+int+(*nlp_pcbbind)();+/* bind to pcb */
+int+(*nlp_pcbconn)();+/* connect */
+int+(*nlp_pcbdisc)();+/* disconnect */
+int+(*nlp_pcbdetach)();+/* detach pcb */
+int+(*nlp_pcballoc)();+/* allocate a pcb */
+int+(*nlp_output)();+/* emit packet */
+int+(*nlp_dgoutput)();+/* emit datagram */
+caddr_t+nlp_pcblist;+/* list of pcbs 
+++for management 
+++of connections */
};
.TE
\fR
.)b
.lp
The switch is based on the address family chosen when the
\fIsocket()\fR system call is made prior to connection establishment.
This unfortunately ties the address family to the domain,
but the only alternative is to add an argument to the \fIsocket()\fR
system call to let the user specify the desired network layer.
In the case of a connection oriented environment with no multi-homing,
it would be possible to determine which network layer is to be
used
from routing
information, but to do this requires unrealistic assumptions
about the environment.
For these reasons, linking the address family to the network
layer protocol is seen as the least of the evils.
The transport suffixes are kept in the network layer's pcb
as well as in the transport layer because 
full transport address pairs are used to identify a connection
in the Internet domain.
.sh 1 "The Architecture of the Transport Protocol Entity"
.pp
A set of protocol hooks is required
by the AOS IPC architecture.
These hooks are used by the protocol-independent parts of the kernel
to gain entry to protocol-specific code.
The protocol code can be entered in one of the following ways:
.ip "1) " 5
at boot time, when autoconfiguration
initializes each protocol through
the 
\fIpr_init()\fR
hook,
.ip "2) " 5
from above, either
a user program making a system call, through
the \fIpr_usrreq()\fR or \fIpr_ctloutput()\fR hooks, or
from a higher layer protocol using the
\fIpr_output()\fR hook,
.ip "3) " 5
from below, a device interrupt servicing an incoming packet
through the \fIpr_input()\fR  and \fIpr_ctlinput()\fR hooks, and
.ip "4) " 5
from a clock interrupt through the \fIpr_slowtimo()\fR
or the
\fIpr_fasttimo()\fR hook.
.\" FIGURE
.so figs/trans_flow.nr
.\".so figs/trans_flow.grn
.pp
The protocol code can be divided into
the following modules, which are described in more detail below.
.CF
shows the flow of data and control 
among these modules.
.in +5
.ip "Timers and References:" 5
The code executed on behalf of \fIpr_slowtimo()\fR.
The fast timeout is not used by TP.
.ip "Driver:" 5
This is the finite state machine for TP.
.ip "Input:     " 5
This is the module that decodes incoming packets,
identifies or creates the pcb for which 
the packet is destined, and creates an "event" to
pass to the driver.
.ip "Output:" 5
This is the module that creates a packet header of a given type
with fields containing 
values that are appropriate to the connection
on which the packet is being sent, appends data if necessary,
and hands a packet
to the lower layer, according to the transport-to-lower-layer
interface.
.ip "Send:      " 5
This module packetizes data from the outbound
socket buffer, \fIso_snd\fR,
handles retransmissions of packetized data, and
drops packetized data from the retransmission queue.
.ip "Receive:" 5
This module reorders packets if necessary,
depacketizes data, passes it to the socket code module,
and determines when acknowledgments should be sent.
.in -5
.sh 1 "Timers and References"
.pp
TP identifies sockets by \fIreference numbers\fR, or
\fIreferences\fR,
which are \*(lqfrozen\*(rq (may not be reassigned)
until some locally defined time after
a connection is broken and its protocol control block
is discarded.
An array of \fIreference blocks\fR is maintained by TP.
The reference number of a reference block is its
offset in the array.
When a reference block is in use it contains 
a pointer to the pcb for the socket to which the
reference applies.
.pp
The system clock calls the \fIpr_slowtimo()\fR and 
\fIpr_fasttimo()\fR hooks for each protocol in the protocol switch table
every 500 and 200 microseconds, respectively.
Each protocol handles its own timers its own way.
The timers in TP take two forms
- those that typically are cancelled and
those that usually expire.
The latter form may have more than one instantiation at any given
time.
The former may not.
The two are implemented slightly
differently for the sake of performance.
.pp
The timers that normally expire 
are kept in a queue, their values all relative
to the value of preceding timer.
Thus all timer values are decremented by a single
operation on the value of the first timer.
The timer is represented by the Ecallout structure:
.(b
\fC
.TS
tab(+);
l s s s.
struct Ecallout {
.T&
l l l l.
+int+c_time;+/* incremental time */
+int+c_func;+/* function to call */
+u_int+c_arg1;+/* argument to routine */
+u_int+c_arg2;+/* argument to routine */
+int+c_arg3;+/* argument to routine */
+struct Ecallout+*c_next;
};
.TE
\fR
.)b
.lp
When an Ecallout structure migrates to the head
of the E timer list, and its \fIc_time\fR
field is decremented to zero, 
the function stored in \fIc_func\fR is
called, with \fIc_arg1, c_arg2\fR, and \fIc_arg3\fR
as arguments.
Setting and cancelling these timers
are accomplished by a linear search and one
insertion or deletion from the timer queue.
This queue is linked to the 
reference block associated with a communication endpoint.
This form used for the reference timer
and for the retransmission timers for data TPDUs.
.pp
The second form of timer, the type that
typically is cancelled, is used for several
timers - the inactivity timer, the sendack timer,
and the retransmission
timer for all types of TPDUs except data TPDUs.
.(b
\fC
.TS
tab(+);
l s s s.
struct Ccallout {
.T&
l l l l.
+int+c_time;+/* incremental time */
+int+c_active;+/* this timer is active? */
};
.TE
\fR
.)b
.lp
All of these timers are stored
directly
in the reference block.
These timers are decremented in one linear scan of
the reference blocks.
Cancelling, setting, and both
cancelling and resetting one of these timers is accomplished by a
single assignment to an array element.
.sh 1 "Driver"
.pp
This is the finite state machine for TP.
A connection is managed by the finite state machine (fsm).
All events that pertain to a connection cause the
finite state machine driver to be called.
The driver takes two arguments - the pcb for the connection
and an event structure.
The event structure contains a field that discriminates
the different types of events, and a union of 
structures that are specific to the event types.
The driver evaluates a set of predicates based on the current
state of the finite state machine (which is kept in the pcb) and the event type.
The result of the predicate evaluation determines
a set of actions to take and a state transition.
The driver takes the actions and if they complete
without errors, the driver makes the state transition.
.pp
The states, event types, predicates, actions, and state transitions are all
specified as a \fIxebec transition file\fR.
\fIXebec\fR is a utility that takes a human-readable description
of a finite state machine
and produces a set of tables and C source code for the driver.
The driver procedure is called \fItp_driver()\fR.
It is located in a file generated by xebec, 
\fCtp_driver.c\fR.
For more details about xebec, see the manual page \fIxebec(1)\fR.
.pp
The transition file for TP is \fCtp.trans\fR,
and it is a good place to begin a perusal of the TP
source code.
.sh 1 "Input"
.pp
This is the module that decodes an incoming packet,
locates or creates the pcb for which 
the packet is destined, and creates an event to
pass to the driver.
The network layer passes a packet up to the appropriate
transport layer by indirectly calling a transport input
routine through the protocol switch table for the network
domain.
There is one protocol switch entry for TP for each domain in which
TP will run (Internet, ISO).
In the Internet domain, the protocol switch field \fIpr_input()\fR
takes the value \fItpip_input()\fR.
This procedure accepts a packet from IP, with the IP header
still intact.
It extracts the network addresses from the IP header,
strips the IP header, and calls the domain-independent
input procedure for TP,
\fItp_input()\fR.
\fITp_input()\fR
decodes a TPDU.
The multitude of options, the variable-length
nature of the options, the semantics of the
options, and the possible combinations of concatenated
TPDUs make this a 
complex procedure.
It is sensitive to changes, and from 
the point of view of a software maintenance, it is a
potential hazard.
Because it is in the 
critical path of TP however, some compromise
was made between maintainability and efficiency.
Multiple copies of sections of code were avoided as much as
possible,
not for the sake of saving space, but rather for the sake
of maintainability.
Ironically,
this detracts somewhat from the readability of the code.
.pp
Once a TPDU has been decoded and a pcb has been
identified for the TPDU,
the appropriate fields of the TPDU
are extracted and their values are placed in
an event structure.
Finally, \fItp_driver()\fR is called with
the event structure and the pcb as parameters.
.sh 1 "Output"
.pp
This module creates a TPDU header of a given type
with field values that are appropriate to the connection
on which the TPDU is being sent, appends data if necessary,
and hands a TPDU
to the lower layer according to the transport-to-lower-layer
interface.
Whenever a TPDU is to be sent to the peer or prospective peer,
the function \fItp_emit()\fR
is called, passing as arguments the pcb a TPDU type and several miscellaneous
other type-specific arguments, possibly including some data.
The data are in the form of an mbuf chain.
\fITp_emit()\fR prepends to the data an mbuf containing a TP header,
fills in the fields of the header according to the parameters
given, performs the checksum if appropriate, and
calls a domain-specific output routine.
For the Internet domain, this output routine is
\fItpip_output()\fR, which takes
as arguments the mbuf chain representing the TPDU,
and a network level pcb.
Some protocol errors cannot be associated with 
a connection 
but require that TP issue
an ER TPDU or a DR TPDU. 
When these errors occur the routine
\fItp_error_emit()\fR is called.
This procedure creates the appropriate type of TPDU
and passes it to a domain-dependent routine for transmitting datagrams.
In the Internet domain,
\fItpip_output_dg()\fR is called.
This takes as arguments an mbuf chain representing the TPDU,
a source network address, and a destination network address.
.sh 1 "Send"
.\" FIGURE
.so figs/mbufsnd.nr
.\".so figs/mbufsnd.grn
.pp
This module packetizes data from the outbound
socket buffer, \fIso_snd\fR,
handles retransmissions of packetized data, and
drops packetized data from the retransmission queue.
The major routine in this module is \fItp_send()\fR, which
takes a range of sequence numbers as arguments.
For each sequence number in the range,
it packetizes the an appropriate amount
of outbound data, and places the resulting TPDU on 
a retransmission control queue subject to the
constraints imposed by the rules of expedited data,
maximum packet sizes, and end-of-TSDU markers.
.pp
The most complicating factor is that of managing
expedited data.
A normal datum may not be sent (for its first time) before the
acknowledgment of any expedited datum
that was received from the user after the 
normal datum was received. 
In order to enforce this rule,
each TPDU must be marked in some way
so that it will be known which expedited datum
must be delivered and acknowledged by the peer before this TPDU may be transmitted
for the first time.
Markers are placed in \fIso_snd\fR 
when an
outgoing expedited datum arrives from the user. 
A marker is an mbuf structure with an \fIm_len\fR
of zero, but with the data area nevertheless containing
the sequence number of an expedited data TPDU.
The \fIm_type\fR of a marker is a new type, MT_XPD.
.pp
\fITp_send()\fR stops packetizing data when it encounters a marker
for an unacknowledged expedited datum.
If it encounters a marker for an expedited TPDU that has already
been acknowledged, the marker is jettisoned.
.CF
illustrates the structure of the sending socket buffer used
for normal data.
.pp
When \fItp_send()\fR moves data from mbufs on \fIso_snd\fR to the retransmission
control queue, it needs to know
how many octets of data can be placed in each TPDU.
The appropriate amount depends on, among other things,
the maximum transmission unit of the network layer
on the route the packet will take.
To determine the maximum transmission unit,
TP queries the network layer through
the domain-dependent switch table's field, \fInl_mtu\fR.
In the Internet domain, this resolves to \fItp_inmtu()\fR.
The header sizes for the network and transport layers
also affect the amount of data that can go into a packet,
and these sizes depend on the connection's characteristics.
.pp
Once the maximum amount of data per TPDU is determined,
\fItp_send()\fR can pull this amount off the \fIso_snd\fR queue to form
a TPDU,
assign a TPDU sequence number,
and place the new TPDU on the 
retransmission control queue.
The retransmission control queue is a list of mbuf chains.
Each mbuf chain represents one TPDU, preceded by an
\fIrtc structure\fR:
.(b
\fC
.TS
tab(+);
l s s s.
struct tp_rtc {
.T&
l l l l.
+struct tp_rtc+*tprt_next;+/* next rtc struct in list */
+SeqNum+tprt_seq;+/* seq # of this TPDU */
+int+tprt_eot;+/* end of TSDU? */
+int+tprt_octets;+/* # octets in this TPDU */
+struct mbuf+*tprt_data;+/* ptr to the octets of data */
.\"/* Performance measurment info: */
.\"int	tprt_window;	/* in which call to tp_send() was
.\"			  * this TPDU formed? 
.\"			  */
.\"struct timeval	tprt_sess_time;	/* time session received the 
.\"			* majority of the data for this packet on send;
.\"			* on recv, this is the time it's given to session 
.\"			*/
.\"struct timeval	tprt_net_time;	/* time first copy was given to net layer
.\"			* on send; on receive it's the time received from
.\"			* the network 
.\"			*/
};
.TE
\fR
.)b
.lp
Once TPDUs are on the retransmission control queue,
they are retransmitted or dropped by the actions
of timers.
The procedure \fItp_sbdrop()\fR
removes the TPDUs from the retransmission queue.
It takes a sequence number as an argument and drops
all TPDUs up to and including the TPDU with that sequence number.
.pp
When an AK TPDU arrives, the values from
its credit and sequence number fields
are passed to \fItp_goodack()\fR, which
determines whether or not the AK brought any news with it,
and therefore whether TP can send more data
or expedited data.
If this AK acknowledges something heretofore unacknowledged,
\fItp_goodack()\fR drops the appropriate TPDU(s) from the retransmission
control list, computes the smoothed average round trip time
and standard deviation of the round trip time, 
and updates
the retransmission timer based on these statistics.
It sets a flag in the pcb if the TP entity is obliged to
send the flow control confirmation parameter on its next
AK TPDU.
\fITp_goodack()\fR returns true if the AK brought some news with it,
either with respect to a change in credit or with respect to
new acknowledgments.
.pp
The function \fItp_goodXack()\fR is called when an XAK TPDU
arrives.
It takes the XAK sequence number as an argument and
determines if the XAK acknowledges the last XPD TPDU sent.
If so, it drops the expedited data from the outgoing
expedited data buffer.
By its definition in the TP specification,
the expedited data stream has a window
of size 1,
that is, 
only one expedited datum (packet) can be buffered
at a time.
\fITp_goodXack()\fR returns true if the XAK acknowledged
the last XPD TPDU sent and the data were dropped,
and it returns false if the acknowledgment caused no action to be taken.
.\" NEXT FIGURE
.so figs/mbufrcv.nr
.\".so figs/mbufrcv.grn
.sh 1 "Receive"
.pp
This module reorders incoming TPDUs if necessary,
depacketizes data, passes it to the socket code module,
and determines when acknowledgments should be sent.
The function 
\fItp_stash()\fR
takes an DT TPDU as an argument, and if the TPDU is not in
sequence, it saves the TPDU in a \fItp_rtc\fR structure in
a list, with the TPDUs
kept in order.
When the next expected TPDU arrives, the
list of out-of-order TPDUs is scanned for 
more TPDUs in sequence, updating
a field in the pcb, \fItp_rcvnxt\fR which
always contains the sequence
number of 
the next expected TPDU.
If an acknowledgment is to be generated
at any time, the value of tp_rcvnxt goes into the
\fIYR-TU-NR\fR\** field of the acknowledgment TPDU.
.(f
\** 
This is the name used in ISO 8073 for the field
which indicates the sequence number of the next expected DT TPDU.
.)f
.pp
\fITp_stash()\fR returns true if an acknowledgment needs to be generated
immediately, false not.
The acknowledgment strategy is therefore implemented in this routine.
Acknowledgments may be generated for one or more of several reasons,
listed below.
\fITp_stash()\fR increments a counter for each of these reasons
for which an acknowledgment is generated, and a counter for TPDUs
that are not acknowledged immediately.
.ip "ACK_STRAT_EACH" 5
The acknowledgment strategy in use calls for acknowledging each 
data packet with an AK TPDU.
.ip "ACK_STRAT_FULLWIN" 5
The acknowledgment strategy in use calls for acknowledging 
upon receiving the DT TPDU that represents the upper window
edge of the last advertised window.
.ip "ACK_DUP" 5
A duplicate data TPDU was received.
.ip "ACK_REORDER" 5
A DT TPDU arrived in the window but out of order.
.ip "ACK_EOT" 5
A DT TPDU arrived, and it had the end-of-TSDU flag set.
.pp
Upon receipt of a DT TPDU that is in order, and upon reordering
DT TPDUs, 
\fItp_stash()\fR
places the TSDUs into the socket's receive
socket buffer, \fIso->so_rcv\fR in mbuf chains, with
TSDUs delimited by mbufs of the \fIm_type\fR MT_EOT,
which is a new type with the ARGO kernel.
.CF
illustrates the structure of the receiving socket buffer used
for normal data.
.pp
A separate socket buffer, \fItpcb->tp_Xrcv\fR,
is used for
buffering expedited data.
Only one expedited data packet may reside in this buffer at a time
because the TP standard limits the size of the window on expedited flow
to be 1.
This means the data structures are straightforward;
there is no need to distinguish between separate TSDUs in this socket buffer.
.pp
Credit is determined 
by dividing the total amount of available
space in the receive buffer
by the negotiated maximum TPDU size.
TP can often offer a larger credit than this if it uses
an average of the measured actual TPDU sizes.
This strategy was once an option in the ARGO kernel,
but it was removed because unless the actual TPDU size
is constant, it leads to reneging of credit,
retransmissions, and decreased performance.
It does not work well when there is any fluctuation in the sizes
of TPDUs and it carries the penalty of lengthening the critical path
of the TP entity.
.sh 1 "Major Data Structures and Types"
.pp
In addition to the types commonly used in the kernel,
such as 
.(b
\fC
.TS
tab(+);
l l l l.
 +typedef+unsigned char+u_char;
 +typedef+unsigned int+u_int;
 +typedef+unsigned short+u_short;
.TE
\fR
.)b
TP uses the following types:
.(b
\fC
.TS
tab(+);
l l l l.
 +typedef+unsigned int+SeqNum
 +typedef+unsigned short+RefNum;
 +typedef+int+ProtoHook;
.TE
\fR
.)b
.pp
Sequence numbers can be either 7 or 31 bits.
An unsigned integer is used in all cases, and the proper type
of arithmetic is performed with bit masks.
Reference numbers are 16 bits.
ProtoHook is the type of the procedures that are in switch
tables, which,
although they are not functions,
are declared \fIint\fR rather than \fIvoid\fR
to be consistent with the rest of the kernel.
.pp
The following structures are fundamental
types used throughout TP,
in addition to those already described in the 
section,
"The Design of the Transport Entity".
.(b
\fC
.TS
tab(+);
l s s s.
struct tp_ref {
.T&
l l l l.
+u_char+tpr_state;+/* REF_FROZEN...*/
+struct Ccallout+tpr_callout[N_CTIMERS];+/* C timers */
+struct Ecallout+tpr_calltodo;+/* E timers list */
+struct tp_pcb+*tpr_pcb;+/* --> PCB */
};
.TE
\fR
.)b
.lp
The reference structure is logically a part of the protocol
control block and it is linked to a pcb, but it may outlive
a pcb.
When a connection is dissolved, the pcb may be recycled
but the reference structure must remain until the reference
timer goes off.
The field \fItpr_state\fR takes the values
REF_FROZEN (a reference timer is ticking),
REF_OPEN (in use, has timers and an associated pcb),
REF_OPENING (has a pcb but no timers), and
REF_FREE (free to reallocate).
.pp
The TP protocol control block is too large to fit into
one mbuf structure so it comprises two structures
linked together, the 
\fItp_pcb\fR structure and the.
\fItp_pcb_aux\fR structure.
The \fItp_pcb_aux\fR structure contains
items that are used less frequently than those in
the former structure, since each access to these
items requires a second pointer dereference.
.(b
\fC
.TS
tab(+);
l s s s.
struct tp_pcb_aux {
.T&
l l l s.
 +struct sockbuf+tpa_Xsnd;+/* for expedited data */
+struct sockbuf+tpa_Xrcv;+/* for expedited data */
+u_char +tpa_vers;+/* protocol version */
+u_char +tpa_peer_acktime;+/* to compute DT TPDU
+++retrans timer value */
+SeqNum+tpa_Xsndnxt;+/* seq # of
+++next XPD to send */
+SeqNum+tpa_Xuna;+/* seq # of 
+++unacked XPD */
+SeqNum+tpa_Xrcvnxt;+/* next XPD seq #
+++expect to recv */
+/* addressing */
+u_short+tpa_domain;+/* domain AF_ISO,...*/
+u_short+tpa_fsuffixlen;+/* foreign suffix */
+u_char+tpa_fsuffix[MAX_TSAP_SEL_LEN];+
+u_short+tpa_lsuffixlen;+/* local suffix */
+u_char+tpa_lsuffix[MAX_TSAP_SEL_LEN];+
.T&
l s s s.
 +/* AK subsequencing */
.T&
l l l s.
 +u_short+tpa_s_subseq;+/* next subseq to send */
+u_short+tpa_r_subseq;+/* highest recv subseq */
};
.TE
\fR
.)b
.pp
The major portion of the protocol control block is in the
\fItp_pcb\fR structure:
.(b
\fC
.TS
tab(%);
l s s s.
struct tp_pcb {
.\" *************************************** 
.T&
l l l l.
.\" The next line sets the spacing for the table: 1+3 17+3 17+3 13+3
 %                 %                 %
.\"456789 123456789- 123456789 123456-789 123456789 1234567890
.\"
 %struct tp_ref%*tp_refp;%    
.T&
l l l s.
%%/* reference structure */%
.\" *************************************** 
.T&
l l l l.
 %struct tp_pcb_aux%*tp_aux;% 
.T&
l l l s.
 %%/*rest of tpcb (auxiliary struct)*/%
.\" *************************************** 
.T&
l l l l.
 %caddr_t%tp_npcb;%/* to ll pcb */
%struct nl_protosw%*tp_nlproto;%
.T&
l l l s.
 % %/* domain-dependent routines */%
.\" *************************************** 
.T&
l l l l.
 %struct socket%*tp_sock;%/* back ptr */
.\" *************************************** 
.T&
l s s s.

/* local and foreign reference numbers: */
.T&
l l l l.
 %RefNum%tp_lref;% 
%RefNum%tp_fref;%
.\" *************************************** 
.T&
l s s s.
.\"456789 123456789 123456789 123456789 123456789 1234567890

/* Stuff for sequence space arithmetic: 
 * Maintaining 2 sequence spaces is a pain so we set these
 * values once at connection establishment time. Sequence
 * number arithmetic is a set of macros which uses these.
 * Sequence numbers are stored as 32 bits.
 * tp_seqmask tells which of the 32 bits is used.
 * tp_seqibt  is the lsb that is not used.  When set,
 *   it indicates wraparound has occurred.
 * tp_seqhalf is the value that is half the sequence space.
 *   (or half plus one).
 */
.T&
l l l l.
%u_int%tp_seqmask;%/* mask */
%u_int%tp_seqbit;%/* wraparound */
%u_int%tp_seqhalf;%/* half space */
.\" *************************************** 
.T&
l s s s.

/* flags:  values are defined in tp_user.h.
 * Here we keep such info as which options 
 * are in use: checksum, extended format,
 * flow control in class 2, etc.
 * See tp(4p) man page.
 */
.\" *************************************** 
.T&
l l l l.
 %u_short%tp_state;%/* fsm */
%short%tp_retrans;%
.T&
l l l s.
 % % /* # times to retransmit */% 
.\" *************************************** 
.T&
l s s s.

/* credit & sequencing info for SENDING: */
.T&
l l l s.
 %u_short%tp_fcredit;%
 % %/* remote real window */%
 %u_short%tp_cong_win;%
 % %/* remote congestion window */%
.\" *************************************** 
%SeqNum%tp_snduna;%
.T&
l l l s.
 % %/* seq # of lowest unacked DT */% 
.\" *************************************** 
.T&
l l l l.
 %struct tp_rtc    %*tp_snduna_rtc;% 
.T&
l l l s.
 % %/* ptr to mbufs containing lowest% 
%% * unacked TPDUs sent so far%
%% */%
.\" *************************************** 
.T&
l l l l.
 %SeqNum%tp_sndhiwat;% 
.T&
l l l s.
 % %/* highest DT sent yet */% 
.\" *************************************** 
.T&
l l l l.
 %struct tp_rtc%*tp_sndhiwat_rtc;% 
.T&
l l l s.
 % %/* ptr to mbufs containing the last% 
%% * DT sent - this is the last item %
%% * on the list that starts%
%% * at tp_snduna_rtc%
%% */%
.\" *************************************** 
.T&
l l l l.
 %int %tp_Nwindow;%/* for perf. measmt */
.\" *************************************** 
.T&
l s s s.

/* credit & sequencing info for RECEIVING: */
.\" *************************************** 
.T&
l l l s.
 %SeqNum%tp_sent_lcdt;%
 %%/* cdt according to last AK sent */%
 %SeqNum%tp_sent_uwe;% 
 % %/* upper window edge, according to% 
%% * the last AK sent %
%% */*
 %SeqNum%tp_sent_rcvnxt;% 
 % %/* rcvnxt, according to% 
%% * the last AK sent%
%% */*
.\" *************************************** 
.T&
l l l l.
 %short%tp_lcredit;%/* local */
.\" *************************************** 
.T&
l l l l.
 %SeqNum%tp_rcvnxt;% 
.T&
l l l s.
 % %/* next DT seq# we expect to recv */% 
.\" *************************************** 
.T&
l l l l.
 %struct tp_rtc%*tp_rcvnxt_rtc;% 
.T&
l l l s.
 % %/* ptr to mbufs containing unacked % 
%% * DTs received out of order, and %
%% * which we haven't acknowledged%
%% */%
.\" *************************************** 
.TE
.TS
tab(%);
l s s s.
/* Items kept in the aux structure: */

.\" *************************************** 
.T&
l s s l.
#define  tp_vers%tp_aux->tpa_vers
#define  tp_peer_acktime%tp_aux->tpa_peer_acktime
#define  tp_Xsnd%tp_aux->tpa_Xsnd
#define  tp_Xrcv%tp_aux->tpa_Xrcv
#define  tp_Xrcvnxt%tp_aux->tpa_Xrcvnxt
#define  tp_Xsndnxt%tp_aux->tpa_Xsndnxt
#define  tp_Xuna%tp_aux->tpa_Xuna
#define  tp_domain%tp_aux->tpa_domain
#define  tp_fsuffixlen%tp_aux->tpa_fsuffixlen
#define  tp_fsuffix%tp_aux->tpa_fsuffix
#define  tp_lsuffixlen%tp_aux->tpa_lsuffixlen
#define  tp_lsuffix%tp_aux->tpa_lsuffix
#define  tp_s_subseq%tp_aux->tpa_s_subseq
#define  tp_r_subseq%tp_aux->tpa_r_subseq
.\" *************************************** 
.T&
l s s s.
 % % % 
/* parameters per-connection controllable by user: */
.\" *************************************** 
.T&
l l l l.
 %struct%tp_conn_param%_tp_param; 
 % % %
.\" *************************************** 
.T&
l s s l.
#define  tp_Nretrans%_tp_param.p_Nretrans
#define  tp_dr_ticks%_tp_param.p_dr_ticks
#define  tp_cc_ticks%_tp_param.p_cc_ticks
#define  tp_dt_ticks%_tp_param.p_dt_ticks
#define  tp_xpd_ticks%_tp_param.p_x_ticks
#define  tp_cr_ticks%_tp_param.p_cr_ticks
#define  tp_keepalive_ticks%_tp_param.p_keepalive_ticks
#define  tp_sendack_ticks%_tp_param.p_sendack_ticks
#define  tp_refer_ticks%_tp_param.p_ref_ticks
#define  tp_inact_ticks%_tp_param.p_inact_ticks
#define  tp_xtd_format%_tp_param.p_xtd_format
#define  tp_xpd_service%_tp_param.p_xpd_service
#define  tp_ack_strat%_tp_param.p_ack_strat
#define  tp_rx_strat%_tp_param.p_rx_strat
#define  tp_use_checksum%_tp_param.p_use_checksum
#define  tp_tpdusize%_tp_param.p_tpdusize
#define  tp_class%_tp_param.p_class
#define  tp_winsize%_tp_param.p_winsize
#define  tp_netservice%_tp_param.p_netservice
#define  tp_no_disc_indications%_tp_param.p_no_disc_indications
#define  tp_dont_change_params%_tp_param.p_dont_change_params
.\" *************************************** 
.TE
.\" *************************************** 
.\" *************************************** 
.\" *************************************** 
.TS
tab(%);
l l l l.
.\" The next line sets the spacing for the table: 1+3 17+3 17+3 13+3
.\"456789 123456789- 123456789 123456-789 123456789 1234567890
.\"
.T&
l l l s.
 %%/* log2(the negotiated max size) */% 
.T&
l l l l.
 %int%tp_l_tpdusize;%/* # bytes */
.\" *************************************** 
 %struct timeval%tp_rtt;% 
.T&
l l l s.
 % %/* smoothed avg round-trip time */% 
 %struct timeval%tp_rtv;% 
 % %/* std deviation of round-trip time */% 
%struct timeval%tp_rttemit[ TP_RTT_NUM + 1 ];%
%%/* times that the last TP_RTT_NUM %
%% * DT_TPDUs were transmitted %
%% */%
.\" *************************************** 
 %unsigned % % 
%  tp_sendfcc:1,%/* shall next ack %
% %include flow control conf. param? */%
.\" *************************************** 
.T&
l l l s.
 %  tp_trace:1,%/* is this pcb being traced?% 
%% * (not used yet) %
%% */%
.\" *************************************** 
%  tp_perf_on:1,%/* statistics being kept? */% 
.\" *************************************** 
%  tp_reneged:1,%/* have we reneged on credit%
%% * since the last AK TPDU was sent? %
%% */%
%  tp_decbit:4,%/* congestion experienced? */%
%  tp_flags:8,%/* see #defines below */%
.\" *************************************** 
%  tp_unused:16;%%
.T&
l s s l.
#define  TPF_XPD_PRESENT%TPFLAG_XPD_PRESENT
#define  TPF_NLQOS_PDN%TPFLAG_NLQOS_PDN
#define  TPF_PEER_ON_SAMENET%TPFLAG_PEER_ON_SAMENET
%%%
.\" *************************************** 
.T&
l l l l.
 %struct tp_pmeas%*tp_p_meas;% 
.T&
l l l s.
 % %/* ptr to mbuf to hold the perf.% 
%% * statistics structure %
%% */%
.\" *************************************** 
};
.TE
\fR
.\"
.\" end of tpcb structure (thank you)
.\"
.)b
.fi
.sh 1 "Sequence Number Arithmetic"
.pp
Sequence numbers in TP can be either 7 bits 
(\*(lqnormal format\*(rq)
or 31 bits
(\*(lqextended format\*(rq).
Sequence numbers are unsigned integers,
regardless of their format.
Three fields are kept in the pcb to manage the sequence
number arithmetic:
.(b
\fC
.TS
tab(+);
l l l l.
 +u_int+tp_seqmask;+/* mask for seq space */
 +u_int+tp_seqbit;+/* bit for seq # wraparound */
 +u_int+tp_seqhalf;+/* half the seq space */
.TE
\fR
.)b
.lp
\fITp_seqmask\fR 
is a bit mask indicating which bits are legitimate 
for a sequence number of either format.
It takes the value 0x7f if 7-bit sequence numbers are in use,
and 0x7fffffff if 31-bit sequence numbers are in use.
\fITp_seqbit\fR 
is the bit that becomes set when a sequence number wraps around
while being incremented.
Its value is 0x80 for normal format, 0x80000000 for extended format.
\fITp_seqhalf\fR 
takes the value which is in the middle of the sequence space,
0x40 for normal format,
and
0x40000000 for extended format.
.(b
.nf
The macro 
.fi
\fC
.TS
tab(+);
l l l l.
     SEQ(tpcb, x)
.TE
\fR
.)b
.lp
extracts a sequence number from the location
in which it is stored.
.pp
The macros
.(b
\fC
.TS
tab(+);
l l s s l.
 +SEQ_GT(tpcb, seq, t)+is seq > t?
 +SEQ_GEQ(tpcb, seq, t)+is seq >= t?
 +SEQ_LT(tpcb, seq, t)+is seq < t?
 +SEQ_LEQ(tpcb, seq, t)+is seq <= t?
 +SEQ_INC(tpcb, seq)+seq\+\+
 +SEQ_DEC(tpcb, seq)+seq--
 +SEQ_SUB(tpcb, seq, amt)+seq -= amt
 +SEQ_ADD(tpcb, seq, amt)+seq \+= amt
.TE
\fR
.)b
.lp
perform the indicated comparisons and arithmetic
on their arguments.
.pp
An example of how these macros
are used is as follows.
To determine if a sequence
number \fIseq\fR is in a receive window
bounded by
\fIlwe\fR and \fIuwe\fR,
we define the
macro
.(b
\fC
.TS
tab(+);
l l.
#define+IN_RWINDOW(tpcb, seq, lwe, uwe)\\
+( SEQ_GEQ(tpcb, seq, lwe) && SEQ_LT(tpcb, seq, uwe) )
.TE
\fR
.)b
.sh 1 "TP Implementation Options"
.pp
The transport protocol specification leaves several
things to the discretion of the implementor,
some of which may affect the performance
of individual connections and
aggregate performance.
Wherever different strategies are likely to favor
the performance of
individual connections to the detriment of aggregate performance
or vice versa, the
various strategies are under the control of options via the
\fIgetsockopt()\fR and
\fIsetsockopt()\fR system calls (see the manual pages
\fIgetsockopt(2)\fR,
\fIsetsockopt(2)\fR  
and
\fItp(4p)\fR  
for details).
In some cases the preferred strategies differ for the different
subnetworks, so the strategies chosen will be determined
by the subnetwork in use.
.sh 2 "TPDU size"
.pp
The limitation of the maximum TPDU size to a power of two is
unfortunate in the LAN environment.
For example, if the maximum NSDU size is around 1500, as in the case of an
Ethernet,
using a maximum TPDU size of 1024 reduces
the possible throughput by approximately 30%.
TP negotiates a maximum TPDU size of 2048 and
generates TPDUs of size around 1500.
Obviously this works well only when the peer is known to be 
using the same scheme (so that the peer
doesn't send TPDUs of size 2048 and cause its
network layer to fragment the TPDUs).
This is likely to be the case in a LAN where
all protocol entities are under the same administrative
control.
The maximum TPDU size negotiated is under the control of the user,
so
it is possible to prevent this scheme from being used
by default
when the peer is not on the same LAN, by
setting the \fItp.tpdusize\fR parameter in the ARGO directory service
file to
something less than the network's maximum transmission
unit.
.\"***********************************************************
.sh 2 "Congestion Window Strategy"
.pp
The congestion window strategy from the
DoD Internet 
was adapted for use with TP.
The strategy is intended to minimize the 
adverse effect
of transport's retransmission on an
already congested network.
.pp
A TP entity keeps two notions of the peer's window:
the real window, which is that advertised by the peer
in AK TPDUs, and the congestion window, which is a locally
controlled window.
TP uses the smaller of the two windows when transmitting.
The congestion window starts small, which keeps a
new connection from overloading the network with a sudden
burst of packets
immediately after connection establishement.
This is called \fIslow start\fR. 
For each successful acknowledgment received, the congestion
window grows by one, until eventually the real window
is the one in use.
If a retransmission timer expires, the congestion window
is reset to size one.
.pp
The congestion window strategy is used for class 4 unless
the transport user requests that it not be used.
The slow start strategy is used for traffic over a PDN
unless
the transport user requests that it not be used.
Slow start is not used for traffic over a LAN unless
its use is requested by the transport user.
.\"***********************************************************
.sh 2 "Retransmission strategies"
.pp
A retransmission timer is invoked for each set of DT TPDUs
sent in one send operation (call to \fItp_send()\fR).
This set of packets is called the \fIsend window\fR for the purpose
of this discusssion.
.pp
The number of TPDUs 
in a send window
depends on the remote credit and the amount of data
in the local send buffers.
When a retransmission timer goes off, the lower
window edge 
is reevaluated but the upper window edge is not reevaluated.
.pp
There are several retransmission strategies implemented in
ARGO TP.
The choice of strategies is the user's, and is made with the 
\fIsetsockopt()\fR system call.
The strategies are summarized here:
.ip "Retransmit LWE TPDU only:" 5
Only the TPDU representing the new lower window edge 
is retransmitted.
This is the default retransmission strategy.
.ip "Retransmit whole send window:" 5
Retransmission begins with the new lower window edge
and continues up to the old upper window edge.
.pp
The value of the data retransmission timer
adapts to the average round trip time and the standard deviation of
the round trip time.
A round trip time is the time that passes between
the moment of a packet's first transmission and 
the moment it is first acknowledged.
The average round trip time
is kept by the sending side of TP, using
a formula for 
smoothing the average:
.(b
\fC
.TS
tab(+);
l l l l.
#define+TP_RTT_ALPHA+3
#define+TP_RTV_ALPHA+2
+++
#define+SMOOTH(alpha, old, new) \\
+(((new-old) >> alpha ) \+ (old) )
.TE
\fR
.)b
.lp
The times included in the average are chosen as follows.
The time of 
each packet's initial transmission is kept (for the last
\fIN\fR packets, where \fIN\fR is a defined constant).
When an AK TPDU arrives, ARGO TP subtracts the initial transmission
time for the lowest unacknowledged sequence number that was
acknowledged by this AK TPDU from the current time,
and apply the resulting time to the average.
Hence, not all packets are included in this average,
which is as it should be since
the purpose of this measurement is 
to find a good value for the retransmission timer.
.pp
Each time part of a window is retransmitted,
the retransmission timer for that window is increased.
This does not affect the retransmission timers for other windows.
.\"***********************************************************
.sh 2 "Acknowledgment strategies"
.pp
The transport protocol specification
requires acknowledgments to be sent immediately
upon receipt
of  CC TPDUs (in class 4), XPD TPDUs, and DT TPDUs containing an
EOT marker, and at other times as required for flow control,
otherwise acknowledgments may be delayed.
In addition to the times when an acknowledgment is required,
ARGO TP transmits an AK TPDU whenever the user receives some data,
thereby increasing the size of the window.
For those times when
immediate acknowledgment is optional,
ARGO TP offers two acknowledgment strategies:
.ip "     Acknowledge each TPDU" 10
Upon receipt of a DT TPDU and AK TPDU is sent.
.ip "     Acknowledge full window" 10
Acknowledgment is issued
upon receipt of enough data to
consume the last advertised credit.
.pp
The latter strategy
requires a timer to trigger an acknowledgment
in case the peer doesn't send the entire window 
quickly.
This timer is called the
\fIsendack timer\fR.
The upper bound on the value of this timer 
is called the \fIlocal acknowledgment time\fR.
The local acknowledgment time may be "advertised" to the 
peer during connection establishment, and the
peer may choose to use this value to
adjust its retransmission timers.
The ARGO TP entity advertises its local acknowledgment time
on a CR TPDU, but it is not 
constrained by 
the remote acknowledge time, should the peer 
advertise it.
Instead,
ARGO TP adapts its sendack timer
to the behavior of the connection.
.pp
Under the assumption that the round trip time is
often 
symmetric,
and lacking 
a method to measure
the round trip time in the other direction,
ARGO TP uses the measured average round trip time
to adjust the sendack timer.
.pp
The choice of strategies is made with the
\fIsetsockopt()\fR system call.
The default strategy is
to
delay acknowledgments until the most recently advertised window is filled.
OpenPOWER on IntegriCloud