usr.sbin/xntpd/doc/README.kern


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374

                A Kernel Model for Precision Timekeeping

                          Revised 3 April 1994

Note: This memorandum is a substantial revision of RFC-1589, "A Kernel
Model for Precision Timekeeping," March, 1994. It includes several
changes to the daemon and user interfaces, as well as a new feature
which disciplines the CPU clock oscillator in both time and frequency to
a source of precision time signals. This memorandum is included in the
distributions for the SunOS, Ultrix and OSF/1 kernels and in the NTP
Version 3 distribution (xntp3.v.tar.Z) as the file README.kern, where v
is the version identifier. Availability of the kernel distributions,
which involve licensed code, will be announced separately. The NTP
Version 3 distribution can be obtained via anonymous ftp from
louie.udel.edu in the directory pub/ntp. In order to utilize all
features of this distribution, the NTP version identifier should be 3q
or later.

Overview

This memorandum describes an engineering model which implements a
precision time-of-day function for a generic operating system. The model
is based on the principles of disciplined oscillators and phase-lock
loops (PLL) and frequency-lock loops (FLL) often found in the
engineering literature. It has been implemented in the Unix kernels for
several workstations, including those made by Sun Microsystems and
Digital Equipment. The model changes the way the system clock is
adjusted in time and frequency, as well as provides mechanisms to
discipline its frequency to an external precision timing source. The
model incorporates a generic system-call interface for use with the
Network Time Protocol (NTP) or similar time synchronization protocol.
The NTP Version 3 daemon xntpd operates with this model to provide
synchronization limited in principle only by the accuracy and stability
of the external timing source.

This memorandum does not obsolete or update any RFC. It does not propose
a standard protocol, specification or algorithm. It is intended to
provoke comment, refinement and implementations for kernels not
considered herein. While a working knowledge of NTP is not required for
an understanding of the design principles or implementation of the
model, it may be helpful in understanding how the model behaves in a
fully functional timekeeping system. The architecture and design of NTP
is described in [MIL91], while the current NTP Version 3 protocol
specification is given in RFC-1305 [MIL92a] and a subset of the
protocol, the Simple Network Time Protocol (SNTP), is given in RFC-1361
[MIL92c].

The model has been implemented in the Unix kernels for three Sun
Microsystems and Digital Equipment workstations. In addition, for the
Digital machines the model provides improved precision to one
microsecond (us). Since these specific implementations involve
modifications to licensed code, they cannot be provided directly.
Inquiries should be directed to the manufacturer's representatives.
However, the engineering model for these implementations, including a
simulator with code segments almost identical to the implementations,
but not involving licensed code, is available via anonymous FTP from
host louie.udel.edu in the directory pub/ntp and compressed tar archive
kernel.tar.Z. The NTP Version 3 distribution can be obtained via
anonymous ftp from the same host and directory in the compressed tar
archive xntp3.3q.tar.Z, where the version number shown as 3.3q may be
adjusted for new versions as they occur.

1. Introduction

This memorandum describes a model and programming interface for generic
operating system software that manages the system clock and timer
functions. The model provides improved accuracy and stability for most
computers using the Network Time Protocol (NTP) or similar time
synchronization protocol. This memorandum describes the design
principles and implementations of the model, while related technical
reports discuss the design approach, engineering analysis and
performance evaluation of the model as implemented in Unix kernels for
modern workstations. The NTP Version 3 daemon xntpd operates with these
implementations to provide improved accuracy and stability, together
with diminished overhead in the operating system and network. In
addition, the model supports the use of external timing sources, such as
precision pulse-per-second (PPS) signals and the industry standard IRIG
timing signals. The NTP daemon automatically detects the presence of the
new features and utilizes them when available.

There are three prototype implementations of the model presented in this
memorandum, one each for the Sun Microsystems SPARCstation with the
SunOS 4.1.x kernel, Digital Equipment DECstation 5000 with the Ultrix
4.x kernel and Digital Equipment 3000 AXP Alpha with the OSF/1 V1.x
kernel. In addition, for the DECstation 5000/240 and 3000 AXP Alpha
machines, a special feature provides improved precision to 1 us (stock
Sun kernels already do provide this precision). Other than improving the
system clock accuracy, stability and precision, these implementations do
not change the operation of existing Unix system calls which manage the
system clock, such as gettimeofday(), settimeofday() and adjtime();
however, if the new features are in use, the operations of
gettimeofday() and adjtime() can be controlled instead by new system
calls ntp_gettime() and ntp_adjtime() as described below.

A detailed description of the variables and algorithms that operate upon
them is given in the hope that similar functionality can be incorporated
in Unix kernels for other machines. The algorithms involve only minor
changes to the system clock and interval timer routines and include
interfaces for application programs to learn the system clock status and
certain statistics of the time synchronization process. Detailed
installation instructions are given in a specific README files included
in the kernel distributions.

In this memorandum, NTP Version 3 and the Unix implementation xntp3 are
used as an example application of the new system calls for use by a
synchronization daemon. In principle, these system calls can be used by
other protocols and implementations as well. Even in cases where the
local time is maintained by periodic exchanges of messages at relatively
long intervals, such as using the NIST Automated Computer Time Service
[LEV89], the ability to precisely adjust the system clock frequency
simplifies the synchronization procedures and allows the telephone call
frequency to be considerably reduced.

2. Design Approach

While not strictly necessary for an understanding or implementation of
the model, it may be helpful to briefly describe how NTP operates to
control the system clock in a client computer. As described in [MIL91],
the NTP protocol exchanges timestamps with one or more peers sharing a
synchronization subnet to calculate the time offsets between peer clocks
and the local clock. These offsets are processed by several algorithms
which refine and combine the offsets to produce an ensemble average,
which is then used to adjust the local clock time and frequency. The
manner in which the local clock is adjusted represents the main topic of
this memorandum. The goal in the enterprise is the most accurate and
stable system clock possible with the available computer hardware and
kernel software.

In order to understand how the new model works, it is useful to review
how most Unix kernels maintain the system clock. In the Unix design a
hardware counter interrupts the kernel at a fixed rate: 100 Hz in the
SunOS kernel, 256 Hz in the Ultrix kernel and 1024 Hz in the OSF/1
kernel. Since the Ultrix timer interval (reciprocal of the rate) does
not evenly divide one second in microseconds, the kernel adds 64 us once
each second, so the timescale consists of 255 advances of 3906 us plus
one of 3970 us. Similarly, the OSF/1 kernel adds 576 us once each
second, so its timescale consists of 1023 advances of 976 us plus one of
1552 us.

2.1. Mechanisms to Adjust Time and Frequency

In most Unix kernels it is possible to slew the system clock to a new
offset relative to the current time by using the adjtime() system call.
To do this the clock frequency is changed by adding or subtracting a
fixed amount (tickadj) at each timer interrupt (tick) for a calculated
number of timer interrupts. Since this calculation involves dividing the
requested offset by tickadj, it is possible to slew to a new offset with
a precision only of tickadj, which is usually in the neighborhood of 5
us, but sometimes much larger. This results in a roundoff error which
can accumulate to an unacceptable degree, so that special provisions
must be made in the clock adjustment procedures of the synchronization
daemon.

In order to implement a frequency discipline function, it is necessary
to provide time offset adjustments to the kernel at regular adjustment
intervals using the adjtime() system call. In order to reduce the system
clock jitter to the regime consistent with the model, it is necessary
that the adjustment interval be relatively small, in the neighborhood of
1 s. However, the Unix adjtime() implementation requires each offset
adjustment to complete before another one can be begun, which means that
large adjustments must be amortized over possibly many adjustment
intervals. The requirement to implement the adjustment interval and
compensate for roundoff error considerably complicates the synchronizing
daemon implementation.

In the new model this scheme is replaced by another that represents the
system clock as a multiple-word, precision-time variable in order to
provide very precise clock adjustments. At each timer interrupt a
precisely calibrated quantity is added to the kernel time variable and
overflows propagated as required. The quantity is computed as in the NTP
local clock model described in [MIL92b], which operates as an adaptive-
parameter, first-order, type-II phase-lock loop (PLL). In principle,
this PLL design can provide precision control of the system clock
oscillator within 1 us and frequency to within parts in 10^11. While
precisions of this order are surely well beyond the capabilities of the
CPU clock oscillator used in typical workstations, they are appropriate
using precision external oscillators, as described below.

The PLL design is identical to the one originally implemented in NTP and
described in [MIL92b]. In the original design the software daemon
simulates the PLL using the adjtime() system call; however, the daemon
implementation is considerably complicated by the considerations
described above. The modified kernel routines implement the PLL in the
kernel using precision time and frequency representations, so that these
complications are avoided. A new system call ntp_adjtime() is called
only as each new time update is determined, which in NTP occurs at
intervals of from 16 s to 1024 s. In addition, doing frequency
compensation in the kernel means that the system clock runs true even if
the daemon were to cease operation or the network paths to the primary
synchronization source fail.

In the new model the new ntp_adjtime() operates in a way similar to the
original adjtime() system call, but does so independently of adjtime(),
which continues to operate in its traditional fashion. When used with
NTP, it is the design intent that settimeofday() or adjtime() be used
only for system clock adjustments greater than +-128 ms, although the
dynamic range of the new model is much larger at +-512 ms. It has been
the Internet experience that the need to change the system clock in
increments greater than +-128 ms is extremely rare and is usually
associated with a hardware or software malfunction or system reboot.

The easiest way to set the time is with the settimeofday() system call;
however, this can under some conditions cause the clock to jump
backwards. If this cannot be tolerated, adjtime() can be used to slew
the clock to the new value without running backward or affecting the
frequency discipline process. Once the system clock has been set within
+-128 ms, the ntp_adjtime() system call is used to provide periodic
updates including the time offset, maximum error, estimated error and
PLL time constant. With NTP the update interval and time constant depend
on the measured delay and dispersion; however, the scheme is quite
forgiving and neither moderate loss of updates nor variations in the
update interval are serious.

2.2 Daemon and Application Interface

Unix application programs can read the system clock using the
gettimeofday() system call, which returns only the system time and
timezone data. For some applications it is useful to know the maximum
error of the reported time due to all causes, including clock reading
errors, oscillator frequency errors and accumulated latencies on the
path to the primary synchronization source. However, in the new model
the PLL adjusts the system clock to compensate for its intrinsic
frequency error, so that the time error expected in normal operation
will usually be much less than the maximum error. The programming
interface includes a new system call ntp_gettime(), which returns the
system time, as well as the maximum error and estimated error. This
interface is intended to support applications that need such things,
including distributed file systems, multimedia teleconferencing and
other real-time applications. The programming interface also includes a
new system call ntp_adjtime(), which can be used to read and write
kernel variables for time and frequency adjustment, PLL time constant,
leap-second warning and related data.

In addition, the kernel adjusts the indicated maximum error to grow by
an amount equal to the maximum oscillator frequency tolerance times the
elapsed time since the last update. The default engineering parameters
have been optimized for update intervals in the order of 64 s. As shown
in [MIL93], this is near the optimum interval for NTP used with ordinary
room-temperature quartz oscillators. For other intervals the PLL time
constant can be adjusted to optimize the dynamic response over intervals
of 16-1024 s. Normally, this is automatically done by NTP. In any case,
if updates are suspended, the PLL coasts at the frequency last
determined, which usually results in errors increasing only to a few
tens of milliseconds over a day using typical modern workstations.

While any synchronization daemon can in principle be modified to use the
new system calls, the most likely will be users of the NTP Version 3
daemon xntpd. The xntpd code determines whether the new system calls are
implemented and automatically reconfigures as required. When
implemented, the daemon reads the frequency offset from a system file
and provides it and the initial time constant via ntp_adjtime(). In
subsequent calls to ntp_adjtime(), only the time offset and time
constant are affected. The daemon reads the frequency from the kernel
using ntp_adjtime() at intervals of about one hour and writes it to a
system file. This information is recovered when the daemon is restarted
after reboot, for example, so the sometimes extensive training period to
learn the frequency separately for each oscillator can be avoided.

2.3. Precision Clocks for DECstation 5000/240 and 3000 AXP Alpha

The stock microtime() routine in the Ultrix kernel for Digital Equipment
MIPS-based workstations returns system time to the precision of the
timer interrupt interval, which is in the 1-4 ms range. However, in the
DECstation 5000/240 and possibly other machines of that family, there is
an undocumented IOASIC hardware register that counts system bus cycles
at a rate of 25 MHz. The new microtime() routine for the Ultrix kernel
uses this register to interpolate system time between timer interrupts.
This results in a precision of 1 us for all time values obtained via the
gettimeofday() and ntp_gettime() system calls. For the Digital Equipment
3000 AXP Alpha, the architecture provides a hardware Process Cycle
Counter and a machine instruction (rpcc) to read it. This counter
operates at the fundamental frequency of the CPU clock or some
submultiple of it, 133.333 MHz for the 3000/400 for example. The new
microtime() routine for the OSF/1 kernel uses this counter in the same
fashion as the Ultrix routine. Support for this feature is conditionally
compiled in the kernel only if the MICRO option is used in the kernel
configuration file.

In both the Ultrix and OSF/1 kernels the gettimeofday() and
ntp_gettime() system call use the new microtime() routine, which returns
the interpolated value to 1-us resolution, but does not change the
kernel time variable. Therefore, other routines that access the kernel
time variable directly and do not call either gettimeofday(),
ntp_gettime() or microtime() will continue their present behavior. The
microtime() feature is independent of other features described here and
is operative even if the kernel PLL or new system calls have not been
implemented.

The SunOS kernel already includes a system clock with 1-us resolution;
so, in principle, no microtime() routine is necessary. An existing
kernel routine uniqtime() implements this function, but it is coded in
the C language and is rather slow at 42-85 us per call on a SPARCstation
IPC. A replacement microtime() routine coded in assembler language is
available in the NTP Version 3 distribution and is much faster at about
3 us per call. Note that, as explained later, this routine should be
called at an interrupt priority level not greater than that of the timer
interrupt routine. Otherwise, it is possible to miss a tick increment,
with result the time returned can be late by one tick. This is always
true in the case of gettimeofday() and ntp_gettime(), but might not be
true in other cases, such as when using the PPS signal described later
in this memorandum.

2.4. External Time and Frequency Discipline

The overall accuracy of a time synchronization subnet with respect to
Coordinated Universal Time (UTC) depends on the accuracy and stability
of the primary synchronization source, usually a radio or satellite
receiver, and the CPU clock oscillator of the primary server. As
discussed in [MIL93], the traditional interface using a ASCII serial
timecode and RS232 port precludes the full accuracy of most radio
clocks. In addition, the poor frequency stability of typical CPU clock
oscillators limits the accuracy, whether or not precision time sources
are available. There are, however, several ways in which the system
clock accuracy and stability can be improved to the degree limited only
by the accuracy and stability of the synchronization source and the
jitter of the interface and operating system.

Many radio clocks produce special signals that can be used by external
equipment to precisely synchronize time and frequency. Most produce a
pulse-per-second (PPS) signal that can be read via a modem-control lead
of a serial port and some produce a special IRIG signal that can be read
directly by a bus peripheral, such as the KSI/Odetics TPRO IRIG SBus
interface, or indirectly via the audio codec of some workstations, as
described in [MIL93]. In the NTP Version 3 daemon xntpd, the PPS signal
can be used to augment the less precise ASCII serial timecode to improve
accuracy to the order of a few tens of microseconds. Support is also
included in the NTP distribution for the TPRO interface, as well as the
audio codec; however, the latter requires a modified kernel audio driver
contained in the compressed tar archive bsd_audio.tar.Z in the same host
and directory as the NTP Version 3 distribution mentioned previously.
2.4.1. PPS Signal

The most convenient way to interface a PPS signal to a computer is
usually with a serial port and RS232-compatible signal; however, the PPS
signal produced by most radio clocks and laboratory instruments is
usually a TTL pulse signal. Therefore, some kind of level
converter/pulse generator is necessary to adapt the PPS signal to a
serial port. An example design, including schematic and printed-circuit
board artwork, is in the compressed tar archive gadget.tar.Z in the same
host and directory as the NTP Version 3 distribution mentioned
previously. There are several ways the PPS signal can be used in
conjunction with the NTP Version 3 daemon xntpd, as described in [MIL93]
and in the documentation included in the distribution.

The NTP Version 3 distribution includes a special ppsclock module for
the SunOS 4.1.x kernel that captures the PPS signal presented via a
modem-control lead of a serial port. Normally, the ppsclock module
produces a timestamp at each transition of the PPS signal and provides
it to the synchronization daemon for integration with the serial ASCII
timecode, also produced by the radio clock. With the conventional PLL
implementation in either the daemon or the kernel as described in
[MIL93], the accuracy of this scheme is limited by the intrinsic
stability of the CPU clock oscillator to a millisecond or two, depending
on environmental temperature variations.

The ppsclock module has been modified to in addition call a new kernel
routine hardpps() once each second. In addition, the Ultrix 4.3 kernel
has been modified to provide a similar functionality. The hardpps()
routine compares the timestamp with a sample of the CPU clock oscillator
in order to discipline the oscillator to the time and frequency of the
PPS signal. Using this method, the time accuracy is improved to
typically 20 us or less and frequency stability a few parts in 10^8,
which is about two orders of magnitude better than the undisciplined
oscillator. The new feature is conditionally compiled in the code
described below only if the PPS_SYNC option is used in the kernel
configuration file.

When using the PPS signal to adjust the time, there is a problem with
some kernels which is very difficult to fix. The serial port interrupt
routine often operates at an interrupt priority level above the timer
interrupt routine. Thus, as explained below, it is possible that a tick
increment can be missed and the time returned late by one tick. It may
happen that, if the CPU clock oscillator frequency is close to the PPS
oscillator frequency (less than a few ppm), this condition can persist
for two or more successive PPS interrupts. A useful workaround in the
code is to use a glitch detector and median filter to process the PPS
sample offsets. The glitch detector suppresses offset bursts greater
than half the tick interval and which last less than 30 successive PPS
interrupts. The median filter ranks the offsets in a moving window of
three samples and uses the median as the output and the difference
between the other two as a dispersion measure.

2.4.2. External Clocks

It is possible to replace the system clock function with an external bus
peripheral. The TPRO device mentioned previously can be used to provide
IRIG-synchronized time with a precision of 1 us. A driver for this
device tprotime.c and header file tpro.h are included in the
kernel.tar.Z distribution mentioned previously. Using this device, the
system clock is read directly from the interface; however, the device
does not record the year, so special provisions have been made to obtain
the year from the kernel time variable and initialize the driver
accordingly. Support for this feature is conditionally compiled in the
kernel only if the EXT_CLOCK and TPRO options are used in the kernel
configuration file.

While the system clock function is provided directly by the microtime()
routine in the driver, the kernel time variable must be disciplined as
well, since not all system timing functions use the microtime() routine.
This is done by measuring the time difference between the microtime()
clock and kernel time variable and using it to adjust the kernel PLL as
if the adjustment were provided by an external peer and NTP.

A good deal of error checking is done in the TPRO driver, since the
system clock is vulnerable to a misbehaving radio clock, IRIG signal
source, interface cables and TPRO device itself. Unfortunately, there is
no practical way to utilize the extensive diversity and redundancy
capabilities available in the NTP synchronization daemon. In order to
avoid disruptions that might occur if the TPRO time is far different
from the kernel time variable, the latter is used instead of the former
if the difference between the two exceeds 1000 s; presumably in that
case operator intervention is required.

2.4.2. External Oscillators

Even if a source of PPS or IRIG signals is not available, it is still
possible to improve the stability of the system clock through the use of
a specialized bus peripheral. In order to explore the benefits of such
an approach, a special SBus peripheral called HIGHBALL has been
constructed. The device includes a pair of 32-bit hardware counters in
Unix timeval format, together with a precision, oven-controlled quartz
oscillator with a stability of a few parts in 10^9. A driver for this
device hightime.c and header file high.h are included in the
kernel.tar.Z distribution mentioned previously. Support for this feature
is conditionally compiled in the kernel only if the EXT_CLOCK and
HIGHBALL options are used in the kernel configuration file.

Unlike the external clock case, where the system clock function is
provided directly by the microtime() routine in the driver, the HIGHBALL
counter offsets with respect to UTC must be provided first. This is done
using the ordinary kernel PLL, but controlling the counter offsets
directly, rather than the kernel time variable. At first, this might
seem to defeat the purpose of the design, since the jitter and wander of
the synchronization source will affect the counter offsets and thus the
accuracy of the time. However, the jitter is much reduced by the PLL and
the wander is small, especially if using a radio clock or another
primary server disciplined in the same way. In practice, the scheme
works to reduce the incidental wander to a few parts in 10^8, or about
the same as using the PPS signal.

As in the previous case, the kernel time variable must be disciplined as
well, since not all system timing functions use the microtime() routine.
However, the kernel PLL cannot be used for this, since it is already in
use providing offsets for the HIGHBALL counters. Therefore, a special
correction is calculated from the difference between the microtime()
clock and the kernel time variable and used to adjust the kernel time
variable at the next timer interrupt. This somewhat roundabout approach
is necessary in order that the adjustment does not cause the kernel time
variable to jump backwards and possibly lose or duplicate a timer event.

2.5 Other Features

It is a design feature of the NTP architecture that the system clocks in
a synchronization subnet are to read the same or nearly the same values
before during and after a leap-second event, as declared by national
standards bodies. The new model is designed to implement the leap event
upon command by an ntp_adjtime() argument. The intricate and sometimes
arcane details of the model and implementation are discussed in [MIL92b]
and [MIL93]. Further details are given in the technical summary later in
this memorandum.
3. Technical Summary

In order to more fully understand the workings of the model, a stand-
alone simulator kern.c and header file timex.h are included in the
kernel.tar.Z distribution mentioned previously. In addition, an example
kernel module kern_ntptime.c which implements the ntp_gettime() and
ntp_adjtime() system calls is included. Neither of these programs
incorporate licensed code. Since the distribution is somewhat large, due
to copious comments and ornamentation, it is impractical to include a
listing of these programs in this memorandum. In any case, implementors
may choose to snip portions of the simulator for use in new kernel
designs; but, due to formatting conventions, this would be difficult if
included in this memorandum.

The kern.c program is an implementation of an adaptive-parameter, first-
order, type-II phase-lock loop. The system clock is implemented using a
set of variables and algorithms defined in the simulator and driven by
explicit offsets generated by the main() routine in the program. The
algorithms include code fragments almost identical to those in the
machine-specific kernel implementations and operate in the same way, but
the operations can be understood separately from any licensed source
code into which these fragments may be integrated. The code fragments
themselves are not derived from any licensed code. The following
discussion assumes that the simulator code is available for inspection.

3.1. PLL Simulation

The simulator operates in conformance with the analytical model
described in [MIL92b]. The main() program operates as a driver for the
fragments hardupdate(), hardclock(), second_overflow(), hardpps() and
microtime(), although not all functions implemented in these fragments
are simulated. The program simulates the PLL at each timer interrupt and
prints a summary of critical program variables at each time update.

There are three defined options in the kernel configuration file
specific to each implementation. The PPS_SYNC option provides support
for a pulse-per-second (PPS) signal, which is used to discipline the
frequency of the CPU clock oscillator. The EXT_CLOCK option provides
support for an external kernel-readable clock, such as the KSI/Odetics
TPRO IRIG interface or HIGHBALL precision oscillator, both for the SBus.
The TPRO option provides support for the former, while the HIGHBALL
option provides support for the latter. External clocks are implemented
as the microtime() clock driver, with the specific source code selected
by the kernel configuration file.

The PPS signal is carefully monitored for error conditions which can
affect accuracy, stability and reliability. The time_status kernel
variable contains bits that both control the use of the PPS signal and
reveal its operational status. The function of each bit is described in
a later section of this memo.

3.1.1. The hardupdate() Fragment

The hardupdate() fragment is called by ntp_adjtime() as each update is
computed to adjust the system clock phase and frequency. Note that the
time constant is in units of powers of two, so that multiplies can be
done by simple shifts. The phase variable is computed as the offset
divided by the time constant, but clamped to a maximum (for robustness).
Then, the time since the last update is computed and clamped to a
maximum and to zero if initializing. The offset is multiplied (sorry
about the ugly multiply) by the result and divided by the square of the
time constant and then added to the frequency variable. Note that all
shifts are assumed to be positive and that a shift of a signed quantity
to the right requires a little dance.

The STA_PLL and STA_PPSTIME status bits, which are set by the
ntp_adjtime() system call, serve to enable or inhibit the kernel PLL and
PPS time-discipline functions. The STA_PPSSIGNAL status bit is set by
the hardpps() code fragment when the PPS signal is present and operating
within nominal bounds. Time discipline from the PPS signal operates only
if both the STA_PPSTIME and STA_PPSSIGNAL bits are set; otherwise, the
discipline operates from the offset given in the ntp_adjtime() system
call. In the intended mode of operation, the synchronization daemon sets
STA_PLL to enable the PLL when first initialized, then sets STA_PPSTIME
when reliable synchronization to within +-128 ms has been achieved with
either a radio clock or external peer. The daemon can detect and
indicate this condition for monitoring purposes by noting that both
STA_PPSTIME and STA_PPSSIGNAL are set.

With the defines given in the program and header files, the maximum time
offset is determined by the size in bits of the long type (32 or 64)
less the SHIFT_UPDATE scale factor (12) or at least 20 bits (signed).
The scale factor is chosen so that there is no loss of significance in
later steps, which may involve a right shift up to SHIFT_UPDATE bits.
This results in a time adjustment range over +-512 ms. Since
time_constant must be greater than or equal to zero, the maximum
frequency offset is determined by the SHIFT_USEC scale factor (16) or at
least 16 bits (signed). This results in a frequency adjustment range
over +-31,500 ppm.

In the addition step, the value of offset * mtemp is not greater than
MAXPHASE * MAXSEC = 31 bits (signed), which will not overflow a long add
on a 32-bit machine. There could be a loss of precision due to the right
shift of up to 12 bits, since time_constant is bounded at 6. This
results in a net worst-case frequency resolution of about .063 ppm,
which is not significant for most quartz oscillators. The worst case
could be realized only if the NTP peer misbehaves according to the
protocol specification.

The time_offset value is clamped upon entry. The time_phase variable is
an accumulator, so is clamped to the tolerance on every call. This helps
to damp transients before the oscillator frequency has been stabilized,
as well as to satisfy the correctness assertions if the time
synchronization protocol or implementation misbehaves.

3.1.2. The hardclock() Fragment

The hardclock() fragment is inserted in the hardware timer interrupt
routine at the point the system clock is to be incremented by the value
of tick. Previous to this fragment the time_update variable has been
initialized to the tick increment plus the value computed by the
adjtime() system call in the stock Unix kernel, normally plus/minus the
tickadj value, which is usually in the order of 5 us. The time_phase
variable, which represents the instantaneous phase of the system clock,
is advanced by time_adj, which is calculated in the second_overflow()
fragment described below. If the value of time_phase exceeds 1 us in
scaled units, time_update is increased by the (signed) excess and
time_phase retains the residue.

In those cases where a PPS signal is connected by a serial port
operating at an interrupt priority level greater than the timer
interrupt, special consideration should be given the location of the
hardclock() fragment in the timer interrupt routine. The system clock
should be advanced as early in the routine as possible, preferably
before the hardware timer interrupt flag is cleared. This reduces or
eliminates the possibility that the microtime() routine may latch the
time after the flag is cleared, but before the system clock is advanced,
which results in a returned time late by one tick.

Except in the case of an external oscillator such as the HIGHBALL
interface, the hardclock() fragment advances the system clock by the
value of tick plus time_update. However, in the case of an external
oscillator, the system clock is obtained directly from the interface and
time_update used to discipline that interface instead. However, the
system clock must still be disciplined as explained previously, so the
value of clock_cpu computed by the second_overflow() fragment is used
instead.

3.1.3. The second_overflow() Fragment

The second_overflow() fragment is inserted at the point where the
microseconds field of the system time variable is being checked for
overflow. Upon overflow the maximum error time_maxerror is increased by
time_tolerance to reflect the maximum time offset due to oscillator
frequency error. Then, the increment time_adj to advance the kernel time
variable is calculated from the (scaled) time_offset and time_freq
variables updated at the last call to the hardclock() fragment.

The phase adjustment is calculated as a (signed) fraction of the
time_offset remaining, where the fraction is added to time_adj, then
subtracted from time_offset. This technique provides a rapid convergence
when offsets are high, together with good resolution when offsets are
low. The frequency adjustment is the sum of the (scaled) time_freq
variable, an adjustment necessary when the tick interval does not evenly
divide one second fixtick and PPS frequency adjustment pps_freq (if
configured).

The scheme of approximating exact multiply/divide operations with shifts
produces good results, except when an exact calculation is required,
such as when the PPS signal is being used to discipline the CPU clock
oscillator frequency as described below. As long as the actual
oscillator frequency is a power of two in Hz, no correction is required.
However, in the SunOS kernel the clock frequency is 100 Hz, which
results in an error factor of 0.78. In this case the code increases
time_adj by a factor of 1.25, which results in an overall error less
than three percent.

On rollover of the day, the leap-second state machine described below
determines whether a second is to be inserted or deleted in the
timescale. The microtime() routine insures that the reported time is
always monotonically increasing.

3.1.4. The hardpps() Fragment

The hardpps() fragment is operative only if the PPS_SYNC option is
specified in the kernel configuration file. It is called from the serial
port driver or equivalent interface at the on-time transition of the PPS
signal. The code operates as a first-order, type-I, frequency-lock loop
(FLL) controlled by the difference between the frequency represented by
the pps_freq variable and the frequency of the hardware clock
oscillator. It also provides offsets to the hardupdate() fragment in
order to discipline the system clock time.

In order to avoid calling the microtime() routine more than once for
each PPS transition, the interface requires the calling program to
capture the system time and hardware counter contents at the on-time
transition of the PPS signal and provide a pointer to the timestamp
(Unix timeval) and counter contents as arguments to the hardpps() call.
The hardware counter contents are determined by saving the microseconds
field of the system time, calling the microtime() routine, and
subtracting the saved value. If a microseconds overflow has occurred
during the process, the resulting microseconds value will be negative,
in which case the caller adds 1000000 to normalize the microseconds
field.

In order to avoid large jitter when the PPS interrupt occurs during the
timer interrupt routine before the system clock is advanced, a glitch
detector is used. The detector latches when an offset exceeds a
threshold tick/2 and stays latched until either a subsequent offset is
less than the threshold or a specified interval MAXGLITCH (30 s) has
elapsed. As long as the detector remains latched, it outputs the offset
immediately preceding the latch, rather than the one received.

A three-stage median filter is used to suppress jitter less than the
glitch threshold. The median sample drives the PLL, while the difference
between the other two samples represents the time dispersion. Time
dispersion samples are averaged and used as a jitter estimate. If this
estimate exceeds a threshold MAXTIME/2 (100 us), an error bit
STA_PPSJITTER is raised in the status word.

The frequency of the hardware oscillator is determined from the
difference in hardware counter readings at the beginning and end of the
calibration interval divided by the duration of the interval. However,
the oscillator frequency tolerance, as much as 100 ppm, may cause the
difference to exceed the tick value, creating an ambiguity. In order to
avoid this ambiguity, the hardware counter value at the beginning of the
interval is increased by the current pps_freq value once each second,
but computed modulo the tick value. At the end of the interval, the
difference between this value and the value computed from the hardware
counter is the control signal for the FLL.

Control signal samples which exceed the frequency tolerance MAXFREQ (100
ppm) are discarded, as well as samples resulting from excessive interval
duration jitter. In these cases an error bit STA_PPSERROR is raised in
the status word. Surviving samples are then processed by a three-stage
median filter. The median sample drives the FLL, while the difference
between the other two samples represents the frequency dispersion.
Frequency dispersion samples are averaged and used as a stabiity
estimate. If this estimate is below a threshold MAXFREQ/4 (25 ppm), the
median sample is used to correct the oscillator frequency pps_freq with
a weight expressed as a shift PPS_AVG (2).

Initially, an approximate value for the oscillator frequency is not
known, so the duration of the calibration interval must be kept small to
avoid overflowing the tick. The time difference at the end of the
calibration interval is measured. If greater than tick/4, the interval
is reduced by half. If less than this fraction for four successive
calibration intervals, the interval is doubled. This design
automatically adapts to nominal jitter in the PPS signal, as well as the
value of tick. The duration of the calibration interval is set by the
pps_shift variable as a shift in powers of two. The minimum value
PPS_SHIFT (2) is chosen so that with the highest CPU oscillator
frequency 1024 Hz and frequency tolerance 100 ppm the tick will not
overflow. The maximum value PPS_SHIFTMAX (8) is chosen such that the
maximum averaging time is about 1000 s as determined by measurements of
Allan variance [MIL93].

Should the PPS signal fail, the current frequency estimate pps_freq
continues to be used, so the nominal frequency remains correct subject
only to the instability of the undisciplined oscillator. The procedure
to save and restore the frequency estimate works as follows. When
setting the frequency from a file, the time_freq value is set as the
file value minus the pps_freq value; when retrieving the frequency, the
two values are added before saving in the file. This scheme provides a
seamless interface should the PPS signal fail or the kernel
configuration change. Note that the frequency discipline is active
whether or not the synchronization daemon is active. Since all Unix
systems take some time after reboot to build a running system, usually
by that time the discipline process has already settled down and the
initial transients due to frequency discipline have damped out.
3.1.4. External Clock Interface

The external clock driver interface is implemented with two routines,
microtime(), which returns the current clock time, and clock_set(),
which furnishes the apparent system time derived from the kernel time
variable. The latter routine is called only when the clock is set using
the settimeofday() system call, but can be called from within the
driver, such as when the year rolls over, for example.

In the stock SunOS kernel and modified Ultrix and OSF/1 kernels, the
microtime() routine returns the kernel time variable plus an
interpolation between timer interrupts based on the contents of a
hardware counter. In the case of an external clock, such as described
above, the system clock is read directly from the hardware clock
registers. Examples of external clock drivers are in the tprotime.c and
hightime.c routines included in the kernel.tar.Z distribution.

The external clock routines return a status code which indicates whether
the clock is operating correctly and the nature of the problem, if not.
The return code is interpreted by the ntp_gettime() system call, which
transitions the status state machine to the TIME_ERR state if an error
code is returned. This is the only error checking implemented for the
external clock in the present version of the code.

The simulator has been used to check the PLL operation over the design
envelope of +-512 ms in time error and +-100 ppm in frequency error.
This confirms that no overflows occur and that the loop initially
converges in about 15 minutes for timer interrupt rates from 50 Hz to
1024 Hz. The loop has a normal overshoot of a few percent and a final
convergence time of several hours, depending on the initial time and
frequency error.

3.2. Leap Seconds

It does not seem generally useful in the user application interface to
provide additional details private to the kernel and synchronization
protocol, such as stratum, reference identifier, reference timestamp and
so forth. It would in principle be possible for the application to
independently evaluate the quality of time and project into the future
how long this time might be "valid." However, to do that properly would
duplicate the functionality of the synchronization protocol and require
knowledge of many mundane details of the platform architecture, such as
the subnet configuration, reachability status and related variables. For
the curious, the ntp_adjtime() system call can be used to reveal some of
these mysteries.

However, the user application may need to know whether a leap second is
scheduled, since this might affect interval calculations spanning the
event. A leap-warning condition is determined by the synchronization
protocol (if remotely synchronized), by the timecode receiver (if
available), or by the operator (if awake). This condition is set by the
synchronization daemon on the day the leap second is to occur (30 June
or 31 December, as announced) by specifying in a ntp_adjtime() system
call a status bit of either STA_DEL, if a second is to be deleted, or
STA_INS, if a second is to be inserted. Note that, on all occasions
since the inception of the leap-second scheme, there has never been a
deletion, nor is there likely to be one in future. If the bit is
STA_DEL, the kernel adds one second to the system time immediately
following second 23:59:58 and resets the clock state to TIME_WAIT. If
the bit is STA_INS, the kernel subtracts one second from the system time
immediately following second 23:59:59 and resets the clock stateto
TIME_OOP, in effect causing system time to repeat second 59. Immediately
following the repeated second, the kernel resets the clock status to
TIME_WAIT.

Following the leap operations, the clock remains in the TIME_WAIT state
until both the STA_DEL and STA_INS status bits are reset. This provides
both an unambiguous indication that a leap recently occured, as well as
time for the daemon or operator to clear the warning condition.

Depending upon the system call implementation, the reported time during
a leap second may repeat (with the TIME_OOP return code set to advertise
that fact) or be monotonically adjusted until system time "catches up"
to reported time. With the latter scheme the reported time will be
correct before and shortly after the leap second (depending on the
number of microtime() calls during the leap second), but freeze or
slowly advance during the leap second itself. However, Most programs
will probably use the ctime() library routine to convert from timeval
(seconds, microseconds) format to tm format (seconds, minutes,...). If
this routine is modified to use the ntp_gettime() system call and
inspect the return code, it could simply report the leap second as
second 60.

3.3. Clock Status State Machine

The various options possible with the system clock model described in
this memorandum require a careful examination of the state transitions,
status indications and recovery procedures should a crucial signal or
interface fail. In this section is presented a prototype state machine
designed to support leap second insertion and deletion, as well as
reveal various kinds of errors in the synchronization process. The
states of this machine are decoded as follows:

     TIME_OK        If a PPS signal or external clock is present, it is
                    working properly and the system clock is derived
                    from it. If not, the synchronization daemon is
                    working properly and the system clock is
                    synchronized to a radio clock or one or more peers.

     TIME_INS       An insertion of one second in the system clock has
                    been declared following the last second of the
                    current day, but has not yet been executed.

     TIME_DEL       A deletion of the last second of the current day has
                    been declared, but not yet executed.

     TIME_OOP       An insertion of one second in the system clock has
                    been declared following the last second of the
                    current day. The second is in progress, but not yet
                    completed. Library conversion routines should
                    interpret this second as 23:59:60.

     TIME_WAIT      The scheduled leap event has occurred, but the
                    STA_DEL and STA_INS status bits have not yet been
                    cleared.

     TIME_ERROR     Either (a) the synchronization daemon has declared
                    the protocol is not working properly, (b) all
                    sources of outside synchronization have been lost or
                    (c) a PPS signal or external clock is present, but
                    not working properly.

In all states the system clock is derived from either a PPS signal or
external clock, if present, or the kernel time variable, if not. If a
PPS error condition is recognized, the PPS signal is disabled and
ntp_adjtime() updates are used instead. If an external clock error
condition is recognized, the external clock is disabled and the kernel
time variable is used instead.

The state machine makes a transition once each second at an instant
where the microseconds field of the kernel time variable overflows and
one second is added to the seconds field. However, this condition is
checked when the timer overflows, which may not coincide with the actual
seconds increment. This may lead to some interesting anomalies, such as
a status indication of a leap second in progress (TIME_OOP) when the
leap second has already expired. This ambiguity is unavoidable, unless
the timer interrupt is made synchronous with the system clock.

The following state transitions are executed automatically by the kernel
at rollover of the microseconds field:

     any state -> TIME_ERROR  This transition occurs when an error
                         condition is recognized and continues as long
                         as the condition persists. The error indication
                         overrides the normal state indication, but does
                         not affect the actual clock state. Therefore,
                         when the condition is cleared, the normal state
                         indication resumes.

     TIME_OK->TIME_DEL   This transition occurs if the STA_DEL bit is
                         set in the status word.

     TIME_OK->TIME_INS   This transition occurs if the STA_INS bit is
                         set in the status word.

     TIME_INS->TIME_OOP  This transition occurs immediately following
                         second 86,400 of the current day when an
                         insert-second event has been declared.

     TIME_OOP->TIME_WAIT This transition occurs immediately following
                         second 86,401 of the current day; that is, one
                         second after entry to the TIME_OOP state.

     TIME_DEL->TIME_WAIT This transition occurs immediately following
                         second 86,399 of the current day when a delete-
                         second event has been declared.

     TIME_WAIT->TIME_OK  This transition occurs when the STA_DEL and
                         STA_INS bits are cleared by an ntp_adjtime()
                         call.

The following table summarizes the actions just before, during and just
after a leap-second event. Each line in the table shows the UTC and NTP
times at the beginning of the second. The left column shows the behavior
when no leap event is to occur. In the middle column the state machine
is in TIME_INS at the end of UTC second 23:59:59 and the NTP time has
just reached 400. The NTP time is set back one second to 399 and the
machine enters TIME_OOP. At the end of the repeated second the machine
enters TIME_OK and the UTC and NTP times are again in correspondence. In
the right column the state machine is in TIME_DEL at the end of UTC
second 23:59:58 and the NTP time has just reached 399. The NTP time is
incremented, the machine enters TIME_OK and both UTC and NTP times are
again in correspondence.

              No Leap       Leap Insert    Leap Delete
              UTC NTP         UTC NTP        UTC NTP
         ---------------------------------------------
         23:59:58|398    23:59:58|398   23:59:58|398
                 |               |              |
         23:59:59|399    23:59:59|399   00:00:00|400
                 |               |              |
         00:00:00|400    23:59:60|399   00:00:01|401
                 |               |              |
         00:00:01|401    00:00:00|400   00:00:02|402
                 |               |              |
         00:00:02|402    00:00:01|401   00:00:03|403
                 |               |              |
To determine local midnight without fuss, the kernel code simply finds
the residue of the time.tv_sec (or time.tv_sec + 1) value mod 86,400,
but this requires a messy divide. Probably a better way to do this is to
initialize an auxiliary counter in the settimeofday() routine using an
ugly divide and increment the counter at the same time the time.tv_sec
is incremented in the timer interrupt routine. For future embellishment.

4. Programming Model and Interfaces

This section describes the programming model for the synchronization
daemon and user application programs. The ideas are based on suggestions
from Jeff Mogul and Philip Gladstone and a similar interface designed by
the latter. It is important to point out that the functionality of the
original Unix adjtime() system call is preserved, so that the modified
kernel will work as the unmodified one, should the new features not be
in use. In this case the ntp_adjtime() system call can still be used to
read and write kernel variables that might be used by a synchronization
daemon other than NTP, for example.

The kernel routines use the clock state variable time_state, which
records whether the clock is synchronized, waiting for a leap second,
etc. The value of this variable is returned as the result code by both
the ntp_gettime() and ntp_adjtime() system calls. It is set implicitly
by the STA_DEL and STA_INS status bits, as described previously. Values
presently defined in the timex.h header file are as follows:

     TIME_OK        0         no leap second warning
     TIME_INS       1         insert leap second warning
     TIME_DEL       2         delete leap second warning
     TIME_OOP       3         leap second in progress
     TIME_WAIT      4         leap second has occured
     TIME_ERROR     5         clock not synchronized

In case of a negative result code, the kernel has intercepted an invalid
address or (in case of the ntp_adjtime() system call), a superuser
violation.

4.1. The ntp_gettime() System Call

The syntax and semantics of the ntp_gettime() call are given in the
following fragment of the timex.h header file. This file is identical,
except for the SHIFT_HZ define, in the SunOS, Ultrix and OSF/1 kernel
distributions. (The SHIFT_HZ define represents the logarithm to the base
2 of the clock oscillator frequency specific to each system type.) Note
that the timex.h file calls the syscall.h system header file, which must
be modified to define the SYS_ntp_gettime system call specific to each
system type. The kernel distributions include directions on how to do
this.

     /*
      * This header file defines the Network Time Protocol (NTP)
      * interfaces for user and daemon application programs. These are
      * implemented using private system calls and data structures and
      * require specific kernel support.
      *
      * NAME
      *   ntp_gettime - NTP user application interface
      *
      * SYNOPSIS
      *   #include <sys/timex.h>
      *
      *   int system call(SYS_ntp_gettime, tptr)
      *
      *   int SYS_ntp_gettime defined in syscall.h header file
      *   struct ntptimeval *tptr  pointer to ntptimeval structure
      *
      * NTP user interface - used to read kernel clock values
      * Note: maximum error = NTP synch distance = dispersion + delay /
      * 2
      * estimated error = NTP dispersion.
      */
     struct ntptimeval {
          struct timeval time;     /* current time (ro) */
          long maxerror;           /* maximum error (us) (ro) */
          long esterror;           /* estimated error (us) (ro) */
     };

The ntp_gettime() system call returns three read-only (ro) values in the
ntptimeval structure: the current time in unix timeval format plus the
maximum and estimated errors in microseconds. While the 32-bit long data
type limits the error quantities to something more than an hour, in
practice this is not significant, since the protocol itself will declare
an unsynchronized condition well below that limit. In the NTP Version 3
specification, if the protocol computes either of these values in excess
of 16 seconds, they are clamped to that value and the system clock
declared unsynchronized.

Following is a detailed description of the ntptimeval structure members.

struct timeval time (ro)

     This member is the current system time expressed as a Unix timeval
     structure. The timeval structure consists of two 32-bit words; the
     first is the number of seconds past 1 January 1970 assuming no
     intervening leap-second insertions or deletions, while the second
     is the number of microseconds within the second.

long maxerror (ro)

     This member is the value of the time_maxerror kernel variable,
     which represents the maximum error of the indicated time relative
     to the primary synchronization source, in microseconds. For NTP,
     the value is initialized by a ntp_adjtime() call to the
     synchronization distance, which is equal to the root dispersion
     plus one-half the root delay. It is increased by a small amount
     (time_tolerance) each second to reflect the maximum clock frequency
     error. This variable is provided bu a ntp-adjtime() system call and
     modified by the kernel, but is otherwise not used by the kernel.

long esterror (ro)

     This member is the value of the time_esterror kernel variable,
     which represents the expected error of the indicated time relative
     to the primary synchronization source, in microseconds. For NTP,
     the value is determined as the root dispersion, which represents
     the best estimate of the actual error of the system clock based on
     its past behavior, together with observations of multiple clocks
     within the peer group. This variable is provided bu a ntp-adjtime()
     system call, but is otherwise not used by the kernel.

4.2. The ntp_adjtime() System Call

The syntax and semantics of the ntp_adjtime() call are given in the
following fragment of the timex.h header file. Note that, as in the
ntp_gettime() system call, the syscall.h system header file must be
modified to define the SYS_ntp_adjtime system call specific to each
system type. In the fragment, rw = read/write, ro = read-only, wo =
write-only.

     /*
      * NAME
      *   ntp_adjtime - NTP daemon application interface
      *
      * SYNOPSIS
      *   #include <sys/timex.h>
      *
      *   int system call(SYS_ntp_adjtime, mode, tptr)
      *
      *   int SYS_ntp_adjtime defined in syscall.h header file
      *   struct timex *tptr       pointer to timex structure
      *
      * NTP daemon interface - used to discipline kernel clock
      * oscillator
      */
     struct timex {
          unsigned int mode;       /* mode selector (wo) */
          long offset;             /* time offset (us) (rw) */
          long frequency;          /* frequency offset (scaled ppm) (rw)
                                    */
          long maxerror;           /* maximum error (us) (rw) */
          long esterror;           /* estimated error (us) (rw) */
          int status;              /* clock status bits (rw) */
          long constant;           /* pll time constant (rw) */
          long precision;          /* clock precision (us) (ro) */
          long tolerance;          /* clock frequency tolerance (scaled
                                    * ppm) (ro) */
          /*
           * The following read-only structure members are implemented
           * only if the PPS signal discipline is configured in the
           * kernel.
           */
          long ppsfreq;            /* pps frequency (scaled ppm) (ro) */
          long jitter;             /* pps jitter (us) (ro) */
          int shift;               /* interval duration (s) (shift) (ro)
                                    */
          long stabil;             /* pps stability (scaled ppm) (ro) */
          long jitcnt;             /* jitter limit exceeded (ro) */
          long calcnt;             /* calibration intervals (ro) */
          long errcnt;             /* calibration errors (ro) */
          long stbcnt;             /* stability limit exceeded (ro) */
     };

The ntp_adjtime() system call is used to read and write certain time-
related kernel variables summarized below. Writing these variables can
only be done in superuser mode. To write a variable, the mode structure
member is set with one or more bits, one of which is assigned each of
the following variables in turn. The current values for all variables
are returned in any case; therefore, a mode argument of zero means to
return these values without changing anything.

Following is a description of the timex structure members.

mode (wo)

     This is a bit-coded variable selecting one or more structure
     members, with one bit assigned each member. If a bit is set, the
     value of the associated member variable is copied to the
     corresponding kernel variable; if not, the member is ignored. The
     bits are assigned as given in the following, with the variable name
     indicated in parens. Note that the precision, tolerance and PPS
     variables are determined by the kernel and cannot be changed by
     ntp_adjtime().

     MOD_OFFSET     0x0001    time offset (offset)
     MOD_FREQUENCY  0x0002    frequency offset (frequency)
     MOD_MAXERROR   0x0004    maximum time error (maxerror)
     MOD_ESTERROR   0x0008    estimated time error (esterror)
     MOD_STATUS     0x0010    clock status (status)
     MOD_TIMECONST  0x0020    pll time constant (constant)
     MOD_CLKB       0x4000    set clock B
     MOD_CLKA       0x8000    set clock A

     Note that the MOD_CLK0 and MOD_CLK1 bits are intended for those
     systems where more than one hardware clock is available for backup,
     such as in Tandem Non-Stop computers. Presumably, in such cases
     each clock would have its own oscillator and require a separate PLL
     for each. Refinements to this model are for further study. The
     interpretation of these bits is as follows:

offset (rw)

     If selected, this member specifies the time adjustment, in
     microseconds. The absolute value must be less than MAXPHASE
     (128000) microseconds defined in the timex.h header file. On
     return, this member contains the residual offset remaining between
     a previously specified offset and the current system time, in
     microseconds.

frequency (rw)

     If selected, this member replaces the value of the time_frequency
     kernel variable. The value is in ppm, with the integer part in the
     high order 16 bits and fraction in the low order 16 bits. The
     absolute value must be in the range less than MAXFREQ (100) ppm
     defined in the timex.h header file.

     The time_freq variable represents the frequency offset of the CPU
     clock oscillator. It is recalculated as each update to the system
     clock is determined by the offset member of the timex structure. It
     is usually set from a value stored in a file when the
     synchronization daemon is first started. The current value is
     usually retrieved via this member and written to the file about
     once per hour.

maxerror (rw)

     If selected, this member replaces the value of the time_maxerror
     kernel variable, in microseconds. This is the same variable as in
     the ntp_getime() system call.

esterror (rw)

     If selected, this member replaces the value of the time_esterror
     kernel variable, in microseconds. This is the same variable as in
     the ntp_getime() system call.

int status (rw)

     If selected, this member replaces the value of the time_status
     kernel variable. This variable controls the state machine used to
     insert or delete leap seconds and shows the status of the
     timekeeping system, PPS signal and external oscillator, if
     configured.

     STA_PLL        0x0001    enable PLL updates (r/w)
     STA_PPSFREQ    0x0002    enable PPS freq discipline (r/w)
     STA_PPSTIME    0x0004    enable PPS time discipline (r/w)
     STA_INS        0x0010    insert leap (r/w)
     STA_DEL        0x0020    delete leap (r/w)
     STA_UNSYNC     0x0040    clock unsynchronized (r/w)
     STA_PPSSIGNAL  0x0100    PPS signal present (r)
     STA_PPSJITTER  0x0200    PPS signal jitter exceeded (r)
     STA_PPSWANDER  0x0400    PPS signal wander exceeded (r)
     STA_PPSERROR   0x0800    PPS signal calibration error (r)
     STA_CLOCKERR   0x1000    clock hardware fault (r)

     The interpretation of these bits is as follows:

     STA_PLL        set/cleared by the caller to enable PLL updates

     STA_PPSFREQ    set/cleared by the caller to enable PPS frequency
                    discipline

     STA_PPSTIME    set/cleared by the caller to enable PPS time
                    discipline

     STA_INS        set by the caller to insert a leap second at the end
                    of the current day; cleared by the caller after the
                    event

     STA_DEL        set by the caller to delete a leap second at the end
                    of the current day; cleared by the caller after the
                    event

     STA_UNSYNC     set/cleared by the caller to indicate clock
                    unsynchronized (e.g., when no peers are reachable)

     STA_PPSSIGNAL  set/cleared by the hardpps() fragment to indicate
                    PPS signal present

     STA_PPSJITTER  set/cleared by the hardpps() fragment to indicates
                    PPS signal jitter exceeded

     STA_PPSWANDER  set/cleared by the hardpps() fragment to indicates
                    PPS signal wander exceeded

     STA_PPSERROR   set/cleared by the hardpps() fragment to indicates
                    PPS signal calibration error

     STA_CLOCKERR   set/cleared by the external hardware clock driver to
                    indicate hardware fault

     An error condition is raised when (a) either STA_UNSYNC or
     STA_CLOCKERR is set (loss of synchronization), (b) STA_PPSFREQ or
     STA_PPSTIME is set and STA_PPSSIGNAL is clear (loss of PPS signal),
     (c) STA_PPSTIME and STA_PPSJITTER are both set (jitter exceeded),
     (d) STA_PPSFREQ is set and either STA_PPSWANDER or STA_PPSERROR is
     set (wander exceeded). An error condition results in a system call
     return code of TIME_ERROR.

constant (rw)

     If selected, this member replaces the value of the time_constant
     kernel variable. The value must be between zero and MAXTC (6)
     defined in the timex.h header file.

     The time_constant variable determines the bandwidth or "stiffness"
     of the PLL. The value is used as a shift between zero and MAXTC
     (6), with the effective PLL time constant equal to a multiple of (1
     << time_constant), in seconds. For room-temperature quartz
     oscillators, the recommended default value is 2, which corresponds
     to a PLL time constant of about 900 s and a maximum update interval
     of about 64 s. The maximum update interval scales directly with the
     time constant, so that at the maximum time constant of 6, the
     update interval can be as large as 1024 s.

     Values of time_constant between zero and 2 can be used if quick
     convergence is necessary; values between 2 and 6 can be used to
     reduce network load, but at a modest cost in accuracy. Values above
     6 are appropriate only if an precision external oscillator is
     present.

precision (ro)

     This is the current value of the time_precision kernel variable in
     microseconds.

     The time_precision variable represents the maximum error in reading
     the system clock, in microseconds. It is usually based on the
     number of microseconds between timer interrupts (tick), 10000 us
     for the SunOS kernel, 3906 us for the Ultrix kernel, 976 us for the
     OSF/1 kernel. However, in cases where the time can be interpolated
     between timer interrupts with microsecond resolution, such as in
     the stock SunOS kernel and modified Ultrix and OSF/1 kernels, the
     precision is specified as 1 us. In cases where a PPS signal or
     external oscillator is available, the precision can depend on the
     operating condition of the signal or oscillator. This variable is
     determined by the kernel for use by the synchronization daemon, but
     is otherwise not used by the kernel.

tolerance (ro)

     This is the current value of the time_tolerance kernel variable.
     The value is in ppm, with the integer part in the high order 16
     bits and fraction in the low order 16 bits.

     The time_tolerance variable represents the maximum frequency error
     in ppm of the particular CPU clock oscillator and is a property of
     the hardware; however, in principle it could change as result of
     the presence of external discipline signals, for instance.

     The recommended value for time_tolerance MAXFREQ (200) ppm is
     appropriate for room-temperature quartz oscillators used in typical
     workstations. However, it can change due to the operating condition
     of the PPS signal and/or external oscillator. With either the PPS
     signal or external oscillator, the recommended value for MAXFREQ is
     100 ppm.

The following members are defined only if the PPS_SYNC option is
specified in the kernel configuration file. These members are useful
primarily as a monitoring and evalutation tool. These variables can be
written only by the kernel.

ppsfreq (ro)

     This is the current value of the pps_freq kernel variable, which is
     the CPU clock oscillator frequency offset relative to the PPS
     discipline signal. The value is in ppm, with the integer part in
     the high order 16 bits and fraction in the low order 16 bits.

jitter (ro)

     This is the current value of the pps_jitter kernel variable, which
     is the average PPS time dispersion measured by the time-offset
     median filter, in microseconds.

shift (ro)

     This is the current value of the pps_shift kernel variable, which
     determines the duration of the calibration interval as the value of
     1 << pps_shift, in seconds.
stabil (ro)

     This is the current value of the pps_stabil kernel variable, which
     is the average PPS frequency dispersion measured by the frequency-
     offset median filter. The value is in ppm, with the integer part in
     the high order 16 bits and fraction in the low order 16 bits.

jitcnt (ro)

     This is the current value of the pps_jitcnt kernel variable, counts
     the number of PPS signals where the average jitter exceeds the
     threshold MAXTIME (200 us).

calcnt (ro)

     This is the current value of the pps_calcnt kernel variable, which
     counts the number of frequency calibration intervals. The duration
     of these intervals can range from 4 to 256 seconds, as determined
     by the pps_shift kernel variable.

errcnt (ro)

     This is the current value of the pps_errcnt kernel variable, which
     counts the number of frequency calibration cycles where (a) the
     apparent frequency offset is greater than MAXFREQ (100 ppm) or (b)
     the interval jitter exceeds tick * 2.

stbcnt (ro)

     This is the current value of the pps_discnt kernel variable, which
     counts the number of calibration intervals where the average
     stability exceeds the threshold MAXFREQ / 4 (25 ppm).

7. References

[MIL91] Mills, D.L. Internet time synchronization: the Network Time
Protocol, IEEE Trans. Communications COM-39, 10 (October 1991),
1482-1493. Also in: Yang, Z., and T.A. Marsland (Eds.). Global
States and Time in Distributed Systems, IEEE Press, Los Alamitos,
CA, 91-102.

[MIL92a] Mills, D.L. Network Time Protocol (Version 3) specification,
implementation and analysis, RFC 1305, University of Delaware, March
1992, 113 pp.

[MIL92b] Mills, D.L. Modelling and analysis of computer network clocks,
Electrical Engineering Department Report 92-5-2, University of Delaware,
May 1992, 29 pp.

[MIL92c] Mills, D.L. Simple Network Time Protocol (SNTP), RFC 1361,
University of Delaware, August 1992, 10 pp.

[MIL93] Mills, D.L. Precision synchronizatin of computer network clocks,
Electrical Engineering Department Report 93-11-1, University of
Delaware, November 1993, 66 pp.

[LEV89] Levine, J., M. Weiss, D. Davis, D. Allan, and D. Sullivan. The
NIST automated computer time service. J. Research National Institute of
Standards and Technology 94, 5 (September-October 1989), 311-321.

David L. Mills <mills@udel.edu>
Electrical Engineering Department
University of Delaware
Newark, DE 19716
302 831 8247 fax 302 831 4316
3 April 1994