1 files changed, 346 insertions, 0 deletions
diff --git a/usr.sbin/xntpd/doc/README.magic b/usr.sbin/xntpd/doc/README.magic
new file mode 100644
index 0000000..f473a92
--- /dev/null
+++ b/usr.sbin/xntpd/doc/README.magic
@@ -0,0 +1,346 @@
+                 Magic Tricks for Precision Timekeeping
+
+                       Revised 19 September 1993
+
+Note: This information file is included in the NTP Version 3
+distribution (xntp3.tar.Z) as the file README.magic. This distribution
+can be obtained via anonymous ftp from louie.udel.edu in the directory
+pub/ntp.
+
+1. Introduction
+
+It most cases it is possible using NTP to synchronize a number of hosts
+on an Ethernet or moderately loaded T1 network to a radio clock within a
+few tens of milliseconds with no particular care in selecting the radio
+clock or configuring the servers on the network. This may be adequate
+for the majority of applications; however, modern workstations and high
+speed networks can do much better than that, generally to within some
+fraction of a millisecond, by using special care in the design of the
+hardware and software interfaces.
+
+The timekeeping accuracy of a NTP-synchronized host depends on two
+quantities: the delay due to hardware and software processing and the
+accumulated jitter due to such things as clock reading precision and
+varying latencies in hardware and software queuing. Processing delays
+directly affect the timekeeping accuracy, unless minimized by systematic
+analysis and adjustment. Jitter, on the other hand, can be essentially
+removed, as long as the statistical properties are unbiased, by the low-
+pass filtering of the phase-lock loop incorporated in the NTP local
+clock model.
+
+This note discusses issues in the connection of external time sources
+such as radio clocks and related timing signals to a primary (stratum-1)
+NTP time server. Of principal concern are various techniques that can be
+utilized to improve the accuracy and precision of the time accuracy and
+frequency stability. Radio clocks are most often connected to a time
+server using a serial asynchronous port. Much of the discussion in this
+memorandum has to do with ways in which the delay incurred in this type
+of connection can be controlled and ways in which the jitter due to
+various causes can be minimized.
+
+However, there are ways other than serial ports to connect a radio
+clock, including special purpose hardware devices for some
+architectures, and even unusual applications of existing interface
+devices, such as the audio codec provided in some systems. Many of these
+methods can yield accuracies as good as any attainable with a serial
+port. For those radio clocks equipped with an IRIG-B signal output, for
+example, a hardware device is available for the Sun SPARCstation; see
+the xntpd.8 manual page in the doc directory of the NTP Version 3
+distribution for further information. In addition, it is possible to
+decode the IRIG-B signal using the audio codec included in the Sun
+SPARCstation and a special kernel driver described in the irig.txt file
+in the doc directory of the NTP Version 3 distribution. These devices
+will not be discussed further in this memorandum.
+
+2. Connection via Serial Port
+
+Most radio clocks produce an ASCII timecode with a precision only to the
+millisecond. This results in a maximum peak-to-peak (p-p) jitter in the
+clock readings of one millisecond. However, assuming the read requests
+are statistically independent of the clock update times, the reading
+error is uniformly distributed over the millisecond, so that the average
+over a large number of readings will make the clock appear 0.5 ms late.
+To compensate for this, it is only necessary to add 0.5 ms to its
+reading before further processing by the NTP algorithms.
+
+Radio clocks are usually connected to the host computer using a serial
+port operating at a typical speed of 9600 baud. The on-time reference
+epoch for the timecode is usually the start bit of a designated
+character, usually <CR>, which is part of the timecode. The UART chip
+implementing the serial port most often has a sample clock of eight to
+16 times the basic baud rate. Assuming the sample clock starts midway in
+the start bit and continues to midway in the first stop bit, this
+creates a processing delay of 10.5 baud times, or about 1.1 ms, relative
+to the start bit of the character. The jitter contribution is usually no
+more than a couple of sample-clock periods, or about 26 usec p-p. This
+is small compared to the clock reading jitter and can be ignored. Thus,
+the UART delay can be considered constant, so the hardware contribution
+to the total mean delay budget is 0.5 + 1.1 = 1.6 ms.
+
+In some kernel serial port drivers, in particular, the Sun zs driver,
+an intentional delay is introduce in input character processing when the
+first character is received after an idle period. A batch of characters
+is passed to the calling program when either (a) a timeout in the
+neighborhood of 10 ms expires or (b) an input buffer fills up. The
+intent in this design is to reduce the interrupt load on the processor
+by batching the characters where possible. Obviously, this can cause
+severe problems for precision timekeeping. It is possible to patch the
+zs driver to eliminate the jitter due to this cause; contact the author
+for further details. However, there is a better solution which will be
+described later in this note. The problem does not appear to be present
+in the Serial/Parallel Controller (SPC) for the SBus, which contains
+eight serial asynchronous ports along with a parallel port. The
+measurements referred to below were made using this controller.
+
+Good timekeeping depends strongly on the means available to capture an
+accurate sample of the local clock or timestamp at the instant the stop
+bit of the on-time character is found; therefore, the code path delay
+between the character interrupt routine and the first place a timestamp
+can be captured is very important, since on some systems such as Sun
+SPARCstations, this path can be astonishingly long. The Sun scheduling
+mechanisms involve both a hardware interrupt queue and a software
+interrupt queue. Entries are made on the hardware queue as the interrupt
+is signalled and generally with the lowest latency, estimated at 20-30
+microseconds (usec) for a SPARC 4/65 IPC. Then, after minimal
+processing, an entry is made on the software queue for later processing
+in order of software interrupt priority. Finally, the software interrupt
+unblocks the NTP daemon which calculates the current local clock offset
+and introduces corrections as required.
+
+Opportunities exist to capture timestamps at the hardware interrupt
+time, software interrupt time and at the time the NTP daemon is
+activated, but these involve various degrees of kernel trespass and
+hardware gimmicks. To gain some idea of the severity of the errors
+introduced at each of these stages, measurements were made using a Sun
+4/65 IPC and a test setup that results in an error between the host
+clock and a precision time source (calibrated cesium clock) no greater
+than 0.1 ms. The total delay from the on-time epoch to when the NTP
+daemon is activated was measured at 8.3 ms in an otherwise idle system,
+but increased on rare occasion to over 25 ms under load, even when the
+NTP daemon was operated at the highest available software priority
+level. Since 1.6 ms of the total delay is due to the hardware, the
+remaining 6.7 ms represents the total code path delay accounting for all
+software processing from the hardware interrupt to the NTP daemon.
+
+It is commonly observed that the latency variations (jitter) in typical
+real-time applications scale as the processing delay. In the case above,
+the ratio of the maximum observed delay (25 ms) to the baseline code
+path delay (8.3 ms) is about three. It is natural to expect that this
+ratio remain the same or less as the code path between the hardware
+interrupt and where the timestamp is captured is reduced. However, in
+general this requires trespass on kernel facilities and/or making use of
+features not common to all or even most Unix implementations. In order
+to assess the cost and benefits of increasingly more aggressive insult
+to the hardware and software of the system, it is useful to construct a
+budget of the code path delay at each of the timestamp opportunity
+times. For instance, on Unix systems which include support for the SIGIO
+facility, it is possible to intervene at the time the software interrupt
+is serviced. The NTP daemon code uses this facility, when available, to
+capture a timestamp and save it along with the data in a buffer for
+later processing. This reduces the total code path delay from 6.7 ms to
+3.5 ms on an otherwise idle system. This reduction applies to all input
+processing, including network interfaces and serial ports.
+
+3. The CLK Mode
+
+By far the best place to capture the timestamp is right in the kernel
+interrupt routine, but this gerally requires intruding in the code
+itself, which can be intricate and architecture dependent. The next best
+place is in some routine close to the interrupt routine on the code
+path. There are two ways to do this, depending on the ancestry of the
+Unix operating system variant. Older systems based primarily on the
+original Unix 4.3bsd support what is called a line discipline module,
+which is a hunk of code with more-or-less well defined interface
+specifications that can get in the way, so to speak, of the code path
+between the interrupt routine and the remainder of the serial port
+processing. Newer systems based on System V STREAMS can do the same
+thing using what is called a streams module. Both approaches are
+supported in the NTP Version 3 distribution, as described in the README
+files in the kernel directory of the distribution. In either case,
+header and source files have to be copied to the kernel build tree and
+certain tables in the kernel have to be modified. In neither case,
+however, are kernel sources required. In order to take advantage of
+this, the clock driver must include code to activate the feature and
+extract the timestamp. At present, this support is included in the clock
+drivers for the Spectracom WWVB clock (WWVB define), the PSTI/Traconex
+WWV/WWVH clock (PST define) and a special one-pulse-per-second (pps)
+signal (PPSCLK define) described later. If justified, support can be
+easily added to most other clock drivers as well. For future reference,
+these modules operating with supported drivers will be called the CLK
+support.
+
+The CLK line discipline and STREAMS modules operate in the same way.
+They look for a designated character, usually <CR>, and stuff a Unix
+timestamp in the data stream following that character whenever it is
+found. Eventually, the data arrive at the particular clock driver
+configured in the NTP Version 3 distribution. The driver then uses the
+timestamp as a precise reference epoch, subject to the earlier
+processing delays and jitter budget, for future reference. In order to
+gain some insight as to the effectiveness of this approach, measurements
+were made using the same test setup described above. The total delay
+from the on-time epoch to the instant when the timestamp is captured was
+measured at 3.5 ms. Thus, the code path delay is this value less the
+hardware delay 3.5 - 1.6 = 1.9 ms.
+
+While the improvement in accuracy in the baseline case is significant,
+there is another factor, at least in Sun systems, that makes it even
+more worthwhile. When processing the code path up to the CLK module, the
+priority is apparently higher than for processing beyond it. In case of
+heavy CPU activity, this can lead to relatively long tails in the
+processing delays for the driver, which of course are avoided by
+capturing the timestamp early in the code path.
+
+4. The PPSCLK Mode
+
+Many timing receivers can produce a 1-pps signal of considerably better
+precision than the ASCII timecode. Using this signal, it is possible to
+avoid the 1-ms p-p jitter and 1.6 ms hardware timecode adjustment
+entirely. However, a device is required to interface this signal to the
+hardware and operating system. In general, this requires some sort of
+level converter and pulse generator that can turn the 1-pps signal on-
+time transition into a valid character. An example of such a device is
+described in the gadget directory of the NTP Version 3 distribution.
+Although many different circuit designs could be used as well, this
+particular device generates a single 26-usec start bit for each 1-pps
+signal on-time transition. This appears to the UART operating at 38.4K
+baud as an ASCII DEL (hex FF).
+
+Now, assuming a serial port can be dedicated to this purpose, a source
+of 1-pps character interrupts is available and can be used to provide a
+precision reference. The NTP Version 3 daemon can be configured to
+utilize this feature by specifying the PPSCLK define, which requires the
+CLK module and gadget box described above. The character resulting from
+each 1-pps signal on-time transition is intercepted by the CLK module
+and a timestamp is inserted in the data stream. An interrupt is created
+for the device driver, which reads the timestamp and discards the DEL
+character. Since the timestamp is captured at the on-time transition,
+the seconds-fraction portion is the offset between the local clock and
+the on-time epoch less the UART delay of 273 usec at 38.4K baud. If the
+local clock is within +-0.5 second of this epoch, as determined by other
+means, the local clock correction is taken as the offset itself, if
+between zero and 0.5 s, and the offset minus one second, if between 0.5
+and 1.0 s. In the NTP daemon the resulting correction is first processed
+by a multi-stage median/trimmed mean filter to remove residual jitter
+and then processed by the usual NTP algorithms.
+
+The baseline delay between the on-time transition and the timestamp
+capture was measured at 400+-10 usec on an otherwise idle test system.
+As the UART delay at 38.4K baud is about 270 usec, the difference, 130
+usec, must be due to the hardware interrupt latency plus the time to
+call the microtime() routine which actually reads the system clock and
+microsecond counter. For these measurements the assembly-coded version
+of this routine described in the ppsclock directory of the NTP Version 3
+distribution was used. This routine reduces the time to read the system
+clock from 42-85 usec with the native Sun C-coded routine to about 3
+usec using the microtime() assembly-coded routine and can be ignored.
+Thus, the 130 usec must be accounted for in interrupt service, register
+window, context switching, streams operations and measurement
+uncertainty, which is probably not unreasonable. The reason for the
+difference between the this figure and the previously calculated value
+of 1.9 ms for the CLK module and serial ASCII timecode is probably due
+to the fact that all STREAMS modules other than the CLK module were
+removed, since the serial port is not used for ordinary ASCII data.
+
+An interesting feature of this approach is that the 1-pps signal is not
+necessarily associated with any particular radio clock and, indeed,
+there may be no such clock at all. Some precision timekeeping equipment,
+such as cesium clocks, VLF receivers and LORAN-C timing receivers
+produce only a precision 1-pps signal and rely on other mechanisms to
+resolve the second of the day and day of the year. It is possible for an
+NTP-synchronized host to derive the latter information using other NTP
+peers, presumably properly synchronized within +-0.5 second, and to
+remove residual jitter using the 1-pps signal. This makes it quite
+practical to deliver precision time to local clients when the subnet
+paths to remote primary servers are heavily congested. In extreme cases
+like this, it has been found useful to increase the tracking aperture
+from +-128 ms to as high as +-512 ms.
+
+In the current implementation the radio timecode and 1-pps signal are
+separately processed. The timecode capture and CLK support, if provided
+by the radio driver, operate the same way whether or not the PPSCLK
+support is enabled. If the local clock is reliably synchronized within
++-0.5 s and the 1-pps signal has been valid for some number of seconds,
+its offset rather than whatever synchronization source has been selected
+is used instead. However, while a this procedure delivers a new offset
+estimate every second, the local clock is updated only as each valid
+update is computed for the peer selected as the source of
+synchronization.
+
+However, there is a hazard to the use of the 1-pps signal in this way if
+the radio generating the 1-pps signal misbehaves or loses
+synchronization with its transmitter. In such a case the radio might
+indicate the error, but the system has no way to associate the error
+with the 1-pps signal. To deal with this problem the prefer parameter
+described in the xntpd.8 man page in the doc directory of the NTP
+Version 3 distribution can be used both to cause the clock selection
+algorithm to choose a preferred peer, all other things being equal, as
+well as associate the error indications in such a way that the 1-pps
+signal will be disregarded if the peer stops providing valid updates,
+such as would occur in an error condition. The prefer parameter can be
+used in other situations as well when preference is to be given a
+particular source of synchronization.
+
+5. The PPS Mode
+
+For the ultimate accuracy and lowest jitter, it would be best to
+eliminate the UART and capture the 1-pps on-time transition directly
+using an appropriate interface. This is in fact possible using a
+modified serial port driver and data lead in the serial port interface
+cable. In this scheme, described in detail in the ppsclock directory of
+the NTP Version 3 distribution, the 1-pps source is connected via the
+previously described gadget box to the carrier-detect lead of a serial
+port. Happily, this can be the same port used for a radio clock, for
+example, or another unrelated serial device. The scheme, referred to
+subsequently as the PPS mode, is specific to the SunOS 4.1.x kernel and
+requires a special STREAMS module. Instructions on how to build the
+kernel are also included in that directory.
+
+Except for special-purpose interface modules, such as the KSI/Odetics
+TPRO IRIG-B decoder and the modified audio driver for the IRIG-B signal
+mentioned previously, the PPS mode provides the most accurate and
+precise timestamp available. There is essentially no latency and the
+timestamp is captured within 20-30 usec of the on-time epoch.
+
+The PPS mode requires the PPSPPS define and one of the radio clock
+serial ports to be selected as the PPS interface. This is the port which
+handles the 1-pps signal; however, the signal path has nothing to do
+with the ordinary serial data path; the two signals are not related,
+other than by the need to activate the PPS mode and pass the file
+descriptor to a common processing routine. Thus, for the port to be
+selected for the PPS function, the define for the associated radio clock
+needs to have a PPS suffix. In case of multiple radio clocks on a single
+time server, the PPS suffix is necessary on only one of them; more than
+one PPS suffix would be an error.
+
+The PPS mode works just like the CLK mode in the treatment of the prefer
+parameter and indicated peer errors. As in the CLK mode, only the offset
+within the second is used and only when the offset is less than +-0.5 s.
+However, the precision of the clock adjustments is usually so fine that
+the error budget is dominated by the inherent short-term stability of
+typical computer local clock oscillators. Therefore, it is advisable to
+reduce the poll interval for the preferred peer from the default 64 s to
+something less, like 16 s. This is done using the minpoll and maxpoll
+parameters of the peer or server command associated with the clock.
+These parameters take as arguments a power of 2, in seconds, which
+becomes the poll interval and, indirectly, affects the bandwidth of the
+tracking loop.
+
+6. Results and Conclusions
+
+It is clear from the above that substantial improvements in timekeeping
+accuracy are possible with varying degrees of hardware and software
+intrusion. While the ultimate accuracy depends on the jitter and wander
+characteristics of the computer local oscillator, it is possible to
+reduce jitter to a negligible degree simply by processing with the NTP
+phase-lock loop and local clock algorithms. The residual jitter using
+the PPS mode on a Sun4 IPC is typically in the 40-100 usec range, while
+the wander is rarely more than twice that under typical environmental
+room conditions.
+
+David L. Mills <mills@udel.edu>
+Electrical Engineering Department
+University of Delaware
+Newark, DE 19716
+302 831 8247 fax 302 831 4316
+
+25 August 1993