summaryrefslogtreecommitdiffstats
path: root/share/doc/papers/timecounter/timecounter.ms
diff options
context:
space:
mode:
authorphk <phk@FreeBSD.org>2003-06-15 18:49:46 +0000
committerphk <phk@FreeBSD.org>2003-06-15 18:49:46 +0000
commit69776c10822e68c24101edbb82aec156257044a9 (patch)
tree9326df3513ab712f98602a769930cda4d4984c09 /share/doc/papers/timecounter/timecounter.ms
parent9be0707ac8c67dcef1185ecaba5868f138f4e2ad (diff)
downloadFreeBSD-src-69776c10822e68c24101edbb82aec156257044a9.zip
FreeBSD-src-69776c10822e68c24101edbb82aec156257044a9.tar.gz
Add my timecounter paper from EuroBSDcon2002
Diffstat (limited to 'share/doc/papers/timecounter/timecounter.ms')
-rw-r--r--share/doc/papers/timecounter/timecounter.ms1076
1 files changed, 1076 insertions, 0 deletions
diff --git a/share/doc/papers/timecounter/timecounter.ms b/share/doc/papers/timecounter/timecounter.ms
new file mode 100644
index 0000000..5084e50
--- /dev/null
+++ b/share/doc/papers/timecounter/timecounter.ms
@@ -0,0 +1,1076 @@
+.EQ
+delim øø
+.EN
+.\"
+.\" ----------------------------------------------------------------------------
+.\" "THE BEER-WARE LICENSE" (Revision 42):
+.\" <phk@login.dknet.dk> wrote this file. As long as you retain this notice you
+.\" can do whatever you want with this stuff. If we meet some day, and you think
+.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp
+.\" ----------------------------------------------------------------------------
+.\"
+.\" $FreeBSD$
+.\"
+.if n .ND
+.TI
+Timecounters: Efficient and precise timekeeping in SMP kernels.
+.AA
+.A "Poul-Henning Kamp" "The FreeBSD Project"
+.AB
+.PP
+The FreeBSD timecounters are an architecture-independent implementation
+of a binary timescale using whatever hardware support is at hand
+for tracking time. The binary timescale converts using simple
+multiplication to canonical timescales based on micro- or nano-seconds
+and can interface seamlessly to the NTP PLL/FLL facilities for clock
+synchronisation. Timecounters are implemented using lock-less
+stable-storage based primitives which scale efficiently in SMP
+systems. The math and implementation behind timecounters will
+be detailed as well as the mechanisms used for synchronisation. \**
+.AE
+.FS
+This paper was presented at the EuroBSDcon 2002 conference in Amsterdam.
+.FE
+.SH
+Introduction
+.PP
+Despite digging around for it, I have not been able to positively
+identify the first computer which knew the time of day.
+The feature probably arrived either from the commercial side
+so service centres could bill computer cycles to customers or from
+the technical side so computers could timestamp external events,
+but I have not been able to conclusively nail the first implementation down.
+.LP
+But there is no doubt that it happened very early in the development
+of computers
+and if systems like the ``SAGE'' [SAGE] did not know what time
+it was I would be amazed.
+.LP
+On the other hand, it took a long time for a real time clock to
+become a standard feature:
+.LP
+The ``Apple ]['' computer
+had neither in hardware or software any notion what time it was.
+.LP
+The original ``IBM PC'' did know what time it was, provided you typed
+it in when you booted it, but it forgot when you turned it off.
+.LP
+One of the ``advanced technologies'' in the ``IBM PC/AT'' was a battery
+backed CMOS chip which kept track of time even when the computer
+was powered off.
+.LP
+Today we expect our computers to know the time, and with network
+protocols like NTP we will usually find that they do, give and
+take some milliseconds.
+.LP
+This article is about the code in the FreeBSD kernel which keeps
+track of time.
+.SH
+Time and timescale basics
+.PP
+Despite the fact that time is the physical quantity (or maybe entity
+?) about which we know the least, it is at the same time [sic!] what we
+can measure with the highest precision of all physical quantities.
+.LP
+The current crop of atomic clocks will neither gain nor loose a
+second in the next couple hundred million years, provided we
+stick to the preventative maintenance schedules. This is a feat
+roughly in line with to knowing the circumference of the Earth
+with one micrometer precision, in real time.
+.LP
+While it is possible to measure time by means other than oscillations,
+for instance transport or consumption of a substance at a well-known
+rate, such designs play no practical role in time measurement because
+their performance is significantly inferior to oscillation based
+designs.
+.LP
+In other words, it is pretty fair to say that all relevant
+timekeeping is based on oscillating phenomena:
+.TS
+center;
+l l.
+sun-dial Earths rotation about its axis.
+calendar Ditto + Earths orbit around the sun.
+clockwork Mechanical oscillation of pendulum.
+crystals Mechanical resonance in quartz.
+atomic Quantum-state transitions in atoms.
+.TE
+.LP
+We can therefore with good fidelity define ``a clock'' to be the
+combination of an oscillator and a counting mechanism:
+.LP
+.PSPIC fig3.eps
+.LP
+The standard second is currently defined as
+.QP
+The duration of 9,192,631,770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium 133 atom.
+.LP
+and we have frequency standards which are able to mark a sequence of
+such seconds
+with an error less than ø2 cdot 10 sup{-15}ø [DMK2001] with commercially
+available products doing better than ø1 cdot 10 sup{-14}ø [AG2002].
+.LP
+Unlike other physical units with a conventionally defined origo,
+longitude for instance, the ephemeral nature of time prevents us
+from putting a stake in the ground, so to speak, and measure from
+there. For measuring time we have to rely on ``dead reckoning'',
+just like the navigators before Harrison built his clocks [RGO2002]:
+We have to tally how far we went from our reference point, keeping a
+running total at all times, and use that as our estimated position.
+.LP
+The upshot of this is, that we cannot define a timescale by any
+other means than some other timescale(s).
+.LP
+``Relative time'' is a time interval between two events, and for this
+we only need to agree on the rate of the oscillator.
+.LP
+``Absolute time'' consists of a well defined point in time and the
+time interval since then, this is a bit more tricky.
+.LP
+The Internationally agreed upon TAI and the UTC timescales
+starts at (from a physics point of view) arbitrary points in time
+and progresses in integral intervals of the standard second, with the
+difference being that UTC does tricks to the counting to stay roughly
+in sync with Earths rotation \**.
+.FS
+The first atomic based definition actually operated in a different way:
+each year would have its own value determined for the frequency of the
+caesium resonance, selected so as to match the revolution rate of the
+Earth. This resulted in time-intervals being very unwieldy business,
+and more and more scientists realized that that the caesium resonance
+was many times more stable than the angular momentum of the Earth.
+Eventually the new leap-second method were introduced in 1972.
+It is interesting to note that the autumn leaves falling on the
+northern hemisphere affects the angular momentum enough to change
+the Earths rotational rate measurably.
+.FE
+.LP
+TAI is defined as a sequence of standard seconds (the first timescale),
+counted from January 1st 1958 (the second timescale).
+.LP
+UTC is defined basically the same way, but every so often a leap-second
+is inserted (or theoretically deleted) to keep UTC synchronised
+with Earths rotation.
+.LP
+Both the implementation of these two, and a few others speciality
+timescales are the result of the
+combined efforts of several hundred atomic frequency standards in
+various laboratories and institutions throughout the world, all
+reporting to the BIPM in Paris who calculate the ``paper clock'' which
+TAI and UTC really are using a carefully designed weighting algorithm \**.
+.FS
+The majority of these clocks are model 5071A from Agilent (the test
+and measurement company formerly known as ``Hewlett-Packard'') which
+count for as much as 85% of the combined weight.
+A fact the company deservedly is proud of.
+The majority of the remaining weight is assigned to a handful of big
+custom-design units like the PTB2 and NIST7.
+.FE
+.LP
+Leap seconds are typically announced six to nine months in advance,
+based on precise observations of median transit times of stars and VLBI
+radio astronomy of very distant quasars.
+.LP
+The perceived wisdom of leap-seconds have been gradually decreasing
+in recent years, as devices and products with built-in calendar
+functionality becomes more and more common and people realize that
+user input or software upgrades are necessary to instruct the
+calendar functionality about upcoming leap seconds.
+.SH
+UNIX timescales
+.PP
+UNIX systems use a timescale which pretends to be UTC, but defined
+as the count of standard seconds since 00:00:00 01-01-1970 UTC,
+ignoring the leap-seconds. This definition has never been perceived
+as wise.
+.LP
+Ignoring leap seconds means that unless some trickery is performed
+when a leap second happens on the UTC scale, UNIX clocks would be
+one second off. Another implication is that the length of a
+time interval calculated on UNIX time_t variables, can be up to 22
+(and counting) seconds wrong relative to the same time interval
+measured on the UTC timescale.
+.LP
+Recent efforts have tried to make the NTP protocol make up for this
+deficiency by transmitting the UTC-TAI offset as part of the protocol.
+[MILLS2000A]
+.LP
+Fractional seconds are represented two ways in UNIX, ``timeval'' and
+``timespec''. Both of these formats are two-component structures
+which record the number of seconds, and the number of microseconds
+or nanoseconds respectively.
+.LP
+This unfortunate definition makes arithmetic on these two formats
+quite expensive to perform in terms of computer instructions:
+.DS
+.ps -1
+/* Subtract timeval from timespec */
+t3.tv_sec = t1.tv_sec - t2.tv_sec;
+t3.tv_nsec = t1.tv_nsec -
+ t2.tv_usec * 1000;
+if (t3.tv_nsec >= 1000000000) {
+ t3.tv_sec++;
+ t3.tv_nsec -= 1000000000;
+} else if (t3.tv_nsec < 0) {
+ t3.tv_sec--;
+ t3.tv_nsec += 1000000000;
+}
+.ps +1
+.DE
+.LP
+While nanoseconds will probably be enough for most timestamping
+tasks faced by UNIX computers for a number of years, it is an
+increasingly uncomfortable situation that CPU clock periods and
+instruction timings are already not representable in the standard
+time formats available on UNIX for consumer grade hardware,
+and the first POSIX mandated API, \fCclock_getres(3)\fP has
+already effectively reached end of life as a result of this.
+.LP
+Hopefully the various standards bodies will address this issue
+better in the future.
+.SH
+Precision, Stability and Resolution
+.PP
+Three very important terms in timekeeping are ``precision'',
+``stability'' and ``resolution''.
+While the three words may seem to describe somewhat the
+same property in most uses, their use in timekeeping covers three
+very distinct and well defined properties of a clock.
+.LP
+Resolution in clocks is simply a matter of the step-size of the
+counter or in other words: the rate at which it steps.
+A counter running on a 1 MHz frequency will have a resolution
+of 1 microsecond.
+.LP
+Precision talks about how close to the intended rate the clock runs,
+stability about how much the rate varies and resolution about the
+size of the smallest timeinterval we can measure.
+.LP
+From a quality point of view, Stability is a much more
+valuable property than precision, this is probably best explained
+using a graphic illustration of the difference between the two
+concepts:
+.LP
+.PSPIC fig1.eps
+.LP
+In the top row we have instability, the bullet holes are spread over
+a large fraction of the target area.
+In the bottom row, the bullets all hit in a very small area.
+.LP
+On the left side, we have lack of precision, the holes obviously are
+not centred on the target, a systematic offset exists.
+In the right side we have precision, the bullets are centred on
+the target \**.
+.FS
+We cannot easily get resolution into this analogy, the obvious
+representation as the diameter of the bullet-hole is not correct,
+it would have to be the grid or other pattern of locations where
+the bullet could possibly penetrate the target material, but this
+gets too quantum-mechanical-oid to serve the instructional purpose.
+.FE
+.LP
+Transposing these four targets to actual clocks, the situation
+could look like the following plots:
+.LP
+.PSPIC fig2.eps
+.LP
+On the x-axis we have time and on the y-axis how wrong the clock
+was at a given point in time.
+.LP
+The reason atomic standards are such a big deal in timekeeping is
+that they are incredibly stable: they are able to generate an oscillation
+where the period varies by roughly a millionth of a billonth of a
+second in long term measurements.
+.LP
+They are in fact not nearly as precise as they are stable, but as
+one can see from the graphic above, a stable clock which is not
+precise can be easily corrected for the offset and thus calibrated
+is as good as any clock.
+.LP
+This lack of precision is not necessarily a flaw in these kinds of
+devices, once you get into the ø10 cdot 10 sup{-15}ø territory
+things like the blackbody spectrum at the particular absolute
+temperature of the clocks hardware and general relativistic
+effects mostly dependent on the altitude above earths center
+has to be corrected for \**.
+.FS
+This particularly becomes an issue with space-based atomic standards
+as those found on the ``Navstar'' GPS satellites.
+.FE
+.SH
+Design goals of timecounters
+.PP
+After this brief description of the major features of the local
+landscape, we can look at the design goals of timecounters in detail:
+.LP
+.I "Provide timestamps in timeval and timespec formats,"
+.IP
+This is obviously the basic task we have to solve, but as was noted
+earlier, this is in no way the performance requirement.
+.LP
+.I "on both the ``uptime'' and the POSIX timescales,"
+.IP
+The ``uptime'' timescale is convenient for time intervals which are
+not anchored in UTC time: the run time of processes, the access
+time of disks and similar.
+.IP
+The uptime timescale counts seconds starting from when the system
+is booted. The POSIX/UTC timescale is implemented by adding an
+estimate of the POSIX time when the system booted to the uptime
+timescale.
+.LP
+.I "using whatever hardware we have available at the time,"
+.IP
+Which in a subtle way also implies ``be able to switch from one
+piece of hardware to another on the fly'' since we may not know
+right up front what hardware we have access to and which is
+preferable to use.
+.LP
+.I "while supporting time the NTP PLL/FLL discipline code,"
+.IP
+The NTP kernel PLL/FLL code allows the local clock and timescale
+to be synchronised or syntonised to an external timescale either
+via network packets or hardware connection. This also implies
+that the rate and phase of the timescale must be manoeuvrable
+with sufficient resolution.
+.LP
+.I "and providing support for the RFC 2783 PPS API,"
+.IP
+This is mainly for the benefit of the NTPD daemons communication
+with external clock or frequency hardware, but it has many other
+interesting uses as well [PHK2001].
+.LP
+.I "in a SMP efficient way."
+.IP
+Timestamps are used many places in the kernel and often at pretty
+high rate so it is important that the timekeeping facility
+does not become a point of CPU or lock contention.
+.SH
+Timecounter timestamp format.
+.PP
+Choosing the fundamental timestamp format for the timecounters is
+mostly a question of the resolution and steer-ability requirements.
+.LP
+There are two basic options on contemporary hardware: use a 32 bit
+integer for the fractional part of seconds, or use a 64 bit which
+is computationally more expensive.
+.LP
+The question therefore reduced to the somewhat simpler: can we get
+away with using only 32 bit ?
+.LP
+Since 32 bits fractional seconds have a resolution of slightly
+better than quarter of a nanosecond (.2328 nsec) it can obviously
+be converted to nanosecond resolution struct timespec timestamps
+with no loss of precision, but unfortunately not with pure 32 bit
+arithmetic as that would result in unacceptable rounding errors.
+.LP
+But timecounters also need to represent the clock period of the
+chosen hardware and this hardware might be the GHz range CPU-clock.
+The list of clock frequencies we could support with 32 bits are:
+.TS
+center;
+l l n l.
+ø2 sup{32} / 1ø ø=ø 4.294 GHz
+ø2 sup{32} / 2ø ø=ø 2.147 GHz
+ø2 sup{32} / 3ø ø=ø 1.432 GHz
+\&...
+ø2 sup{32} / (2 sup{32}-1)ø ø=ø 1.000 Hz
+.TE
+We can immediately see that 32 bit is insufficient to faithfully
+represent clock frequencies even in the low GHz area, much less in
+the range of frequencies which have already been vapourwared by
+both IBM, Intel and AMD.
+QED: 32 bit fractions are not enough.
+.LP
+With 64 bit fractions the same table looks like:
+.TS
+center;
+l l r l.
+ø2 sup{64} / 1ø ø=ø ø 18.45 cdot 10 sup{9}ø GHz
+ø2 sup{64} / 2ø ø=ø ø 9.223 cdot 10 sup{9}ø GHz
+\&...
+ø2 sup{64} / 2 sup{32}ø ø=ø 4.294 GHz
+\&...
+ø2 sup{64} / (2 sup{64}-1)ø ø=ø 1.000 Hz
+.TE
+And the resolution in the 4 GHz frequency range is approximately one Hz.
+.LP
+The following format have therefore been chosen as the basic format
+for timecounters operations:
+.DS
+.ps -1
+struct bintime {
+ time_t sec;
+ uint64_t frac;
+};
+.ps +1
+.DE
+Notice that the format will adapt to any size of time_t variable,
+keeping timecounters safely out of the ``We SHALL prepare for the
+Y2.038K problem'' war zone.
+.LP
+One beauty of the bintime format, compared to the timeval and
+timespec formats is that it is a binary number, not a pseudo-decimal
+number. If compilers and standards allowed, the representation
+would have been ``int128_t'' or at least ``int96_t'', but since this
+is currently not possible, we have to express the simple concept
+of multiword addition in the C language which has no concept of a
+``carry bit''.
+.LP
+To add two bintime values, the code therefore looks like this \**:
+.FS
+If the reader suspects the '>' is a typo, further study is suggested.
+.FE
+.LP
+.DS
+.ps -1
+uint64_t u;
+
+u = bt1->frac;
+bt3->frac = bt1->frac + bt2->frac;
+bt3->sec = bt1->sec + bt2->sec;
+if (u > bt3->frac)
+ bt3->sec += 1;
+.ps +1
+.DE
+.LP
+An important property of the bintime format is that it can be
+converted to and from timeval and timespec formats with simple
+multiplication and shift operations as shown in these two
+actual code fragments:
+.DS
+.ps -1
+void
+bintime2timespec(struct bintime *bt,
+ struct timespec *ts)
+{
+
+ ts->tv_sec = bt->sec;
+ ts->tv_nsec =
+ ((uint64_t)1000000000 *
+ (uint32_t)(bt->frac >> 32)) >> 32;
+}
+.ps +1
+.DE
+.DS
+.ps -1
+void
+timespec2bintime(struct timespec *ts,
+ struct bintime *bt)
+{
+
+ bt->sec = ts->tv_sec;
+ /* 18446744073 =
+ int(2^64 / 1000000000) */
+ bt->frac = ts->tv_nsec *
+ (uint64_t)18446744073LL;
+}
+.ps +1
+.DE
+.LP
+.SH
+How timecounters work
+.PP
+To produce a current timestamp the timecounter code
+reads the hardware counter, subtracts a reference
+count to find the number of steps the counter has
+progressed since the reference timestamp.
+This number of steps is multiplied with a factor
+derived from the counters frequency, taking into account
+any corrections from the NTP PLL/FLL and this product
+is added to the reference timestamp to get a timestamp.
+.LP
+This timestamp is on the ``uptime'' time scale, so if
+UNIX/UTC time is requested, the estimated time of boot is
+added to the timestamp and finally it is scaled to the
+timeval or timespec if that is the desired format.
+.LP
+A fairly large number of functions are provided to produce
+timestamps, depending on the desired timescale and output
+format:
+.TS
+center;
+l r r.
+Desired uptime UTC/POSIX
+Format timescale timescale
+_
+bintime binuptime() bintime()
+timespec nanouptime() nanotime()
+timeval microuptime() microtime()
+.TE
+.LP
+Some applications need to timestamp events, but are not
+particular picky about the precision.
+In many cases a precision of tenths or hundreds of
+seconds is sufficient.
+.LP
+A very typical case is UNIX file timestamps:
+There is little point in spending computational resources getting an
+exact nanosecond timestamp, when the data is written to
+a mechanical device which has several milliseconds of unpredictable
+delay before the operation is completed.
+.LP
+Therefore a complementary shadow family of timestamping functions
+with the prefix ``get'' have been added.
+.LP
+These functions return the reference
+timestamp from the current timehands structure without going to the
+hardware to determine how much time has elapsed since then.
+These timestamps are known to be correct to within rate at which
+the periodic update runs, which in practice means 1 to 10 milliseconds.
+.SH
+Timecounter math
+.LP
+The delta-count operation is straightforward subtraction, but we
+need to logically AND the result with a bit-mask with the same number
+(or less) bits as the counter implements,
+to prevent higher order bits from getting set when the counter rolls over:
+.DS
+.ce
+.EQ
+Delta Count = (Count sub{now} - Count sub{ref}) ~ BITAND ~ mask
+.EN
+.DE
+The scaling step is straightforward.
+.DS
+.ce
+.EQ
+T sub{now} = Delta Count cdot R sub{counter} + T sub{ref}
+.EN
+.DE
+The scaling factor øR sub{counter}ø will be described below.
+.LP
+At regular intervals, scheduled by \fChardclock()\fP, a housekeeping
+routine is run which does the following:
+.LP
+A timestamp with associated hardware counter reading is elevated
+to be the new reference timecount:
+.DS
+
+.ce
+.EQ
+Delta Count = (Count sub{now} - Count sub{ref}) ~ BITAND ~ mask
+.EN
+
+.ce
+.EQ
+T sub{now} = Delta Count cdot R sub{counter}
+.EN
+
+.ce
+.EQ
+Count sub{ref} = Count sub{now}
+.EN
+
+.ce
+.EQ
+T sub{ref} = T sub{now}
+.EN
+.DE
+.LP
+If a new second has started, the NTP processing routines are called
+and the correction they return and the counters frequency is used
+to calculate the new scaling factor øR sub{counter}ø:
+.DS
+.ce
+.EQ
+R sub{counter} = {2 sup{64} over Freq sub{counter}} cdot ( 1 + R sub{NTP} )
+.EN
+.DE
+Since we only have access to 64 bit arithmetic, dividing something
+into ø2 sup{64}ø is a problem, so in the name of code clarity
+and efficiency, we sacrifice the low order bit and instead calculate:
+.DS
+.ce
+.EQ
+R sub{counter} = 2 cdot {2 sup{63} over Freq sub{counter}} cdot ( 1 + R sub{NTP} )
+.EN
+.DE
+The øR sub{NTP}ø correct factor arrives as the signed number of
+nanoseconds (with 32 bit binary fractions) to adjust per second.
+This quasi-decimal number is a bit of a square peg in our round binary
+hole, and a conversion factor is needed.
+Ideally we want to multiply this factor by:
+.DS
+.ce
+.EQ
+2 sup {64} over {10 sup{9} cdot 2 sup{32}} = 4.294967296
+.EN
+.DE
+This is not a nice number to work with.
+Fortunately, the precision of this correction is not critical, we are
+within an factor of a million of the ø10 sup{-15}ø performance level
+of state of the art atomic clocks, so we can use an approximation
+on this term without anybody noticing.
+.LP
+Deciding which fraction to use as approximation needs to carefully
+consider any possible overflows that could happen.
+In this case the correction may be as large as \(+- 5000 PPM which
+leaves us room to multiply with about 850 in a multiply-before-divide
+setting.
+Unfortunately, there are no good fractions which multiply with less
+than 850 and at the same time divide by a power of two, which is
+desirable since it can be implemented as a binary shift instead of
+an expensive full division.
+.LP
+A divide-before-multiply approximation necessarily results in a loss
+of lower order bits, but in this case dividing by 512 and multiplying
+by 2199 gives a good approximation where the lower order bit loss is
+not a concern:
+.DE
+.EQ
+2199 over 512 = 4.294921875
+.EN
+.DE
+The resulting error is an systematic under compensation of 10.6PPM
+of the requested change, or ø1.06 cdot 10 sup -14ø per nanosecond
+of correction.
+This is perfectly acceptable.
+.LP
+Putting it all together, including the one bit we put on the alter for the
+Goddess of code clarity, the formula looks like this:
+.DS
+.ce
+.EQ
+R sub{counter} = 2 cdot {{2 sup{63} + 2199 cdot {R sub{NTP}} over 1024} over Freq sub{counter}}
+.EN
+.DE
+Presented here in slightly unorthodox format to show the component arithmetic
+operations as they are carried out in the code.
+.SH
+Frequency of the periodic update
+.PP
+The hardware counter should have a long enough
+period, ie, number of distinct counter values divided by
+frequency, to not roll over before our periodic update function
+has had a chance to update the reference timestamp data.
+.LP
+The periodic update function is called from \fChardclock()\fP which
+runs at a rate which is controlled by the kernel parameter
+.I HZ .
+.LP
+By default HZ is 100 which means that only hardware with a period
+longer than 10 msec is usable.
+If HZ is configured higher than 1000, an internal divider is
+activated to keep the timecounter periodic update running
+no more often than 2000 times per second.
+.LP
+Let us take an example:
+At HZ=100 a 16 bit counter can run no faster than:
+.DS
+.ce
+.EQ
+2 sup{16} cdot {100 Hz} = 6.5536 MHz
+.EN
+.DE
+Similarly, if the counter runs at 10MHz, the minimum HZ is
+.DS
+.ce
+.EQ
+{10 MHz} over {2 sup{16}} = 152.6 Hz
+.EN
+.DE
+.LP
+Some amount of margin is of course always advisable,
+and a factor two is considered prudent.
+.LP
+.SH
+Locking, lack of ...
+.PP
+Provided our hardware can be read atomically, that our arithmetic
+has enough bits to not roll over and that our clock frequency is
+perfectly, or at least sufficiently, stable, we could avoid the
+periodic update function, and consequently disregard the entire
+issue of locking.
+We are seldom that lucky in practice.
+.LP
+The straightforward way of dealing with meta data updates is to
+put a lock of some kind on the data and grab hold of that before
+doing anything.
+This would however be a very heavy-handed approach. First of
+all, the updates are infrequent compared to simple references,
+second it is not important which particular state of meta data
+a consumer gets hold of, as long as it is consistent: as long
+as the øCount sub{ref}ø and øT sub{ref}ø are a matching pair,
+and not old enough to cause an ambiguity with hardware counter
+rollover, a valid timestamp can be derived from them.
+.LP
+A pseudo-stable-storage with generation count method has been
+chosen instead.
+A ring of ten ``timehands'' data structures are used to hold the
+state of the timecounter system, the periodic update function
+updates the next structure with the new reference data and
+scaling factor and makes it the current timehands.
+.LP
+The beauty of this arrangement lies in the fact that even though
+a particular ``timehands'' data structure has been bumped from being
+the ``currents state'' by its successor, it still contains valid data
+for some amount of time into the future.
+.LP
+Therefore, a process which has started the timestamping process but
+suffered an interrupt which resulted in the above periodic processing
+can continue unaware of this afterwards and not suffer corruption
+or miscalculation even though it holds no locks on the shared
+meta-data.
+.PSPIC fig4.eps
+.LP
+This scheme has an inherent risk that a process may be de-scheduled for
+so long time that it will not manage to complete the timestamping
+process before the entire ring of timehands have been recycled.
+This case is covered by each timehand having a private generation number
+which is temporarily set to zero during the periodic processing, to
+mark inconsistent data, and incremented to one more than the
+previous value when the update has finished and the timehands
+is again consistent.
+.LP
+The timestamping code will grab a copy of this generation number and
+compare this copy to the generation in the timehands after completion
+and if they differ it will restart the timestamping calculation.
+.DS
+.ps -1
+do {
+ th = timehands;
+ gen = th->th_generation;
+ /* calculate timestamp */
+} while (gen == 0 ||
+ gen != th->th_generation);
+.ps +1
+.DE
+.LP
+Each hardware device supporting timecounting is represented by a
+small data structure called a timecounter, which documents the
+frequency, the number of bits implemented by the counter and a method
+function to read the counter.
+.LP
+Part of the state in the timehands structure is a pointer to the
+relevant timecounter structure, this makes it possible to change
+to a one piece of hardware to another ``on the fly'' by updating
+the current timehands pointer in a manner similar to the periodic
+update function.
+.LP
+In practice this can be done with sysctl(8):
+.DS
+.ps -1
+sysctl kern.timecounter.hardware=TSC
+.ps +1
+.DE
+.LP
+at any time while the system is running.
+.SH
+Suitable hardware
+.PP
+A closer look on ``suitable hardware'' is warranted
+at this point.
+It is obvious from the above description that the ideal hardware
+for timecounting is a wide binary counter running at a constant
+high frequency
+and atomically readable by all CPUs in the system with a fast
+instruction(-sequence).
+.LP
+When looking at the hardware support on the PC platform, one
+is somewhat tempted to sigh deeply and mutter ``so much for theory'',
+because none of the above parameters seems to have been on the
+drawing board together yet.
+.LP
+All IBM PC derivatives contain a device more or less compatible
+with the venerable Intel i8254 chip.
+This device contains 3 counters of 16 bits each,
+one of which is wired so it can interrupt the CPU when the
+programmable terminal count is reached.
+.LP
+The problem with this device is that it only has 8bit bus-width,
+so reading a 16 bit timestamp takes 3 I/O operations: one to latch
+the count in an internal register, and two to read the high and
+low parts of that register respectively.
+.LP
+Obviously, on multi-CPU systems this cannot be done without some
+kind of locking mechanism preventing the other CPUs from trying
+to do the same thing at the same time.
+.LP
+Less obviously we find it is even worse than that:
+Since a low priority kernel thread
+might be reading a timestamp when an interrupt comes in, and since
+the interrupt thread might also attempt to generate a timestamp,
+we need to totally block interrupts out while doing those three
+I/O instructions.
+.LP
+And just to make life even more complicated, FreeBSD uses the same
+counter to provide the periodic interrupts which schedule the
+\fChardclock()\fP routine, so in addition the code has to deal with the
+fact that the counter does not count down from a power of two and
+that an interrupt is generated right after the reloading of the
+counter when it reaches zero.
+.LP
+Ohh, and did I mention that the interrupt rate for hardclock() will
+be set to a higher frequency if profiling is active ? \**
+.FS
+I will not even mention the fact that it can be set also to ridiculous
+high frequencies in order to be able to use the binary driven ``beep''
+speaker in the PC in a PCM fashion to output ``real sounds''.
+.FE
+.LP
+It hopefully doesn't ever get more complicated than that, but it
+shows, in its own bizarre and twisted way, just how little help the
+timecounter code needs from the actual hardware.
+.LP
+The next kind of hardware support to materialise was the ``CPU clock
+counter'' called ``TSC'' in official data-sheets.
+This is basically a on-CPU counter, which counts at the rate
+of the CPU clock.
+.LP
+Unfortunately, the electrical power needed to run a CPU is pretty
+precisely proportional with the clock frequency for the
+prevailing CMOS chip technology, so
+the advent of computers powered by batteries prompted technologies
+like APM, ACPI, SpeedStep and others which varies or throttles the
+CPU clock to match computing demand in order to minimise the power
+consumption \**.
+.FS
+This technology also found ways into stationary computers from
+two different vectors.
+The first vector was technical: Cheaper cooling solutions can be used
+for the CPU if they are employed resulting in cheaper commodity
+hardware.
+The second vector was political: For reasons beyond reason, energy
+conservation became an issue with personal computers, despite the fact
+that practically all north American households contains 4 to 5 household
+items which through inefficient designs waste more power than a
+personal computer use.
+.FE
+.LP
+Another wiggle for the TSC is that it is not usable on multi-CPU
+systems because the counter is implemented inside the CPU and
+not readable from other CPUs in the system.
+.LP
+The counters on different CPUs are not guaranteed
+to run syntonously (ie: show the same count at the same time).
+For some architectures like the DEC/alpha architecture they do not even
+run synchronously (ie: at the same rate) because the CPU clock frequency
+is generated by a small SAW device on the chip which is very sensitive
+to temperature changes.
+.LP
+The ACPI specification finally brings some light:
+it postulates the existence of a 24 or 32 bit
+counter running at a standardised constant frequency and
+specifically notes that this is intended to be used for timekeeping.
+.LP
+The frequency chosen, 3.5795454... MHz\**
+.FS
+The reason for this odd-ball frequency has to be sought in the ghastly
+colours offered by the original IBM PC Color Graphics Adapter: It
+delivered NTSC format output and therefore introduced the NTSC colour
+sync frequency into personal computers.
+.FE
+ is not quite as high as one
+could have wished for, but it is certainly a big improvement over
+the i8254 hardware in terms of access path.
+.LP
+But trust it to Murphys Law: The majority of implementations so far
+have failed to provide latching suitable to avoid meta-stability
+problems, and several readings from the counter is necessary to
+get a reliable timestamp.
+In difference from the i8254 mentioned above, we do not need to
+any locking while doing so, since each individual read is atomic.
+.LP
+An initialization routine tries to test if the ACPI counter is properly
+latched by examining the width of a histogram over read delta-values.
+.LP
+Other architectures are similarly equipped with means for timekeeping,
+but generally more carefully thought out compared to the haphazard
+developments of the IBM PC architecture.
+.LP
+One final important wiggle of all this, is that it may not be possible
+to determine which piece of hardware is best suited for clock
+use until well into or even after the bootstrap process.
+.LP
+One example of this is the Loran-C receiver designed by Prof. Dave Mills
+[MILLS1992]
+which is unsuitable as timecounter until the daemon program which
+implements the software-half of the receiver has properly initialised
+and locked onto a Loran-C signal.
+.SH
+Ideal timecounter hardware
+.LP
+As proof of concept, a sort of an existentialist protest against
+the sorry state describe above, the author undertook a project to
+prove that it is possible to do better than that, since none of
+the standard hardware offered a way to fully validate the timecounter
+design.
+.LP
+Using a COTS product, ``HOT1'', from Virtual Computers Corporation
+[VCC2002] containing a FPGA chip on a PCI form factor card, a 26
+bit timecounter running at 100MHz was successfully implemented.
+.LP
+.PSPIC fig5.eps
+.LP
+.LP
+In order to show that timestamping does not necessarily have to
+be done using unpredictable and uncalibratable interrupts, an
+array of latches were implemented as well, which allow up to 10
+external signals to latch the reading of the counter when
+an external PPS signal transitions from logic high to logic
+low or vice versa.
+.LP
+Using this setup, a standard 133 MHz Pentium based PC is able to
+timestamp the PPS output of the Motorola UT+ GPS receiver with
+a precision of \(+- 10 nanoseconds \(+- one count which in practice
+averages out to roughly \(+- 15 nanoseconds\**:
+.FS
+The reason the plot does not show a very distinct 10 nanosecond
+quantization is that the GPS receiver produces the PPS signal from
+a clock with a roughly 55 nanosecond period and then predicts in
+the serial data stream how many nanoseconds this will be offset
+from the ideal time.
+This plot shows the timestamps corrected for this ``negative
+sawtooth correction''.
+.FE
+.LP
+.PSPIC gps.ps
+.LP
+It shold be noted that the author is no hardware wizard and
+a number of issues in the implementation results in less than
+ideal noise performance.
+.LP
+Now compare this to ``ideal'' timecounter to the normal setup
+where the PPS signal is used
+to trigger an interrupt via the DCD pin on a serial port, and
+the interrupt handler calls \fCnanotime()\fP to timestamp
+the external event \**:
+.FS
+In both cases, the computers clock frequency controlled
+with a Rubidium Frequency standard.
+The average quality of crystals used for computers would
+totally obscure the curves due to their temperature coefficient.
+.FE
+.LP
+.PSPIC intr.ps
+.LP
+It is painfully obvious that the interrupt latency is the
+dominant noise factor in PPS timestamping in the second case.
+The asymetric distribution of the noise in the second plot
+also more or less entirely invalidates the design assumption
+in the NTP PLL/FLL kernel code that timestamps are dominated
+by gaussian noise with few spikes.
+.SH
+Status and availability
+.PP
+The timecounter code has been developed and used in FreeBSD
+for a number of years and has now reached maturity.
+The source-code is located almost entirely in the kernel source file
+kern_tc.c, with a few necessary adaptations in code which
+interfaces to it, primarily the NTP PLL/FLL code.
+.LP
+The code runs on all FreeBSD platforms including i386, alpha,
+PC98, sparc64, ia64 and s/390 and contains no wordsize or
+endianess issues not specifically handled in the sourcecode.
+.LP
+The timecounter implementation is distributed under the ``BSD''
+open source license or the even more free ``Beer-ware'' license.
+.LP
+While the ability to accurately model and compensate for
+inaccuracies typical of atomic frequency standards are not
+catering to the larger userbase, but this ability and precision
+of the code guarntees solid support for the widespread deployment
+of NTP as a time synchronization protocol, without rounding
+or accumulative errors.
+.LP
+Adding support for new hardware and platforms have been
+done several times by other developers without any input from the
+author, so this particular aspect of timecounters design
+seems to work very well.
+.SH
+Future work
+.PP
+At this point in time, no specific plans exist for further
+development of the timecounters code.
+.LP
+Various micro-optimizations, mostly to compensate for inadequate
+compiler optimization could be contemplated, but the author
+resists these on the basis that they significantly decrease
+the readability of the source code.
+.SH
+Acknowledgements
+.PP
+.EQ
+delim ññ
+.EN
+The author would like to thank:
+.LP
+Bruce Evans
+for his invaluable assistance
+in taming the evil i8254 timecounter, as well as the enthusiastic
+resistance he has provided throughout.
+.PP
+Professor Dave Mills of University of Delaware for his work on
+NTP, for lending out the neglected twin Loran-C receiver and for
+picking up the glove when timecounters made it clear
+that the old ``microkernel'' NTP timekeeping code were not up to snuff
+[MILLS2000B].
+.PP
+Tom Van Baak for helping out, despite the best efforts of the National
+Danish Posts center for Customs and Dues to prevent it.
+.PP
+Corby Dawson for helping with the care and feeding for caesium standards.
+.PP
+The staff at the NELS Loran-C control station in Bø, Norway for providing
+information about step-changes.
+.PP
+The staff at NELS Loran-C station Eiðe, Faeroe
+Islands for permission to tour their installation.
+.PP
+The FreeBSD users for putting up with ``micro uptime went backwards''.
+.SH
+References
+.LP
+[AG2002]
+Published specifications for Agilent model 5071A Primary Frequency
+Standard on
+.br
+http://www.agilent.com
+.LP
+[DMK2001]
+"Accuracy Evaluation of a Cesium Fountain Primary Frequency Standard at NIST."
+D. M. Meekhof, S. R. Jefferts, M. Stephanovic, and T. E. Parker
+IEEE Transactions on instrumentation and measurement, VOL. 50, NO. 2,
+APRIL 2001.
+.LP
+[PHK2001]
+"Monitoring Natural Gas Usage"
+Poul-Henning Kamp
+http://phk.freebsd.dk/Gasdims/
+.LP
+[MILLS1992]
+"A computer-controlled LORAN-C receiver for precision timekeeping."
+Mills, D.L.
+Electrical Engineering Department Report 92-3-1, University of Delaware, March 1992, 63 pp.
+.LP
+[MILLS2000A]
+Levine, J., and D. Mills. "Using the Network Time Protocol to transmit International Atomic Time (TAI)". Proc. Precision Time and Time Interval (PTTI) Applications and Planning Meeting (Reston VA, November 2000), 431-439.
+.LP
+[MILLS2000B]
+"The nanokernel."
+Mills, D.L., and P.-H. Kamp.
+Proc. Precision Time and Time Interval (PTTI) Applications and Planning Meeting (Reston VA, November 2000), 423-430.
+.LP
+[RGO2002]
+For an introduction to Harrison and his clocks, see for
+instance
+.br
+http://www.rog.nmm.ac.uk/museum/harrison/
+.br
+or for
+a more detailed and possibly better researched account: Dava
+Sobels excellent book, "Longitude: The True Story of a Lone
+Genius Who Solved the Greatest Scientific Problem of His
+Time" Penguin USA (Paper); ISBN: 0140258795.
+.LP
+[SAGE]
+This ``gee-wiz'' kind of article in Dr. Jobbs Journal is a goot place to
+start:
+.br
+http://www.ddj.com/documents/s=1493/ddj0001hc/0085a.htm
+.LP
+[VCC2002]
+Please consult Virtual Computer Corporations homepage:
+.br
+http://www.vcc.com
OpenPOWER on IntegriCloud