1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
|
.\" Copyright (c) 1985 Regents of the University of California.
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\" 4. Neither the name of the University nor the names of its contributors
.\" may be used to endorse or promote products derived from this software
.\" without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" from: @(#)ieee.3 6.4 (Berkeley) 5/6/91
.\" $FreeBSD$
.\"
.Dd January 26, 2005
.Dt IEEE 3
.Os
.Sh NAME
.Nm ieee
.Nd IEEE standard 754 for floating-point arithmetic
.Sh DESCRIPTION
The IEEE Standard 754 for Binary Floating-Point Arithmetic
defines representations of floating-point numbers and abstract
properties of arithmetic operations relating to precision,
rounding, and exceptional cases, as described below.
.Ss IEEE STANDARD 754 Floating-Point Arithmetic
Radix: Binary.
.Pp
Overflow and underflow:
.Bd -ragged -offset indent -compact
Overflow goes by default to a signed \*(If.
Underflow is
.Em gradual .
.Ed
.Pp
Zero is represented ambiguously as +0 or \-0.
.Bd -ragged -offset indent -compact
Its sign transforms correctly through multiplication or
division, and is preserved by addition of zeros
with like signs; but x\-x yields +0 for every
finite x.
The only operations that reveal zero's
sign are division by zero and
.Fn copysign x \(+-0 .
In particular, comparison (x > y, x \(>= y, etc.)\&
cannot be affected by the sign of zero; but if
finite x = y then \*(If = 1/(x\-y) \(!= \-1/(y\-x) = \-\*(If.
.Ed
.Pp
Infinity is signed.
.Bd -ragged -offset indent -compact
It persists when added to itself
or to any finite number.
Its sign transforms
correctly through multiplication and division, and
(finite)/\(+-\*(If\0=\0\(+-0
(nonzero)/0 = \(+-\*(If.
But
\*(If\-\*(If, \*(If\(**0 and \*(If/\*(If
are, like 0/0 and sqrt(\-3),
invalid operations that produce \*(Na. ...
.Ed
.Pp
Reserved operands (\*(Nas):
.Bd -ragged -offset indent -compact
An \*(Na is
.Em ( N Ns ot Em a N Ns umber ) .
Some \*(Nas, called Signaling \*(Nas, trap any floating-point operation
performed upon them; they are used to mark missing
or uninitialized values, or nonexistent elements
of arrays.
The rest are Quiet \*(Nas; they are
the default results of Invalid Operations, and
propagate through subsequent arithmetic operations.
If x \(!= x then x is \*(Na; every other predicate
(x > y, x = y, x < y, ...) is FALSE if \*(Na is involved.
.Ed
.Pp
Rounding:
.Bd -ragged -offset indent -compact
Every algebraic operation (+, \-, \(**, /,
\(sr)
is rounded by default to within half an
.Em ulp ,
and when the rounding error is exactly half an
.Em ulp
then
the rounded value's least significant bit is zero.
(An
.Em ulp
is one
.Em U Ns nit
in the
.Em L Ns ast
.Em P Ns lace . )
This kind of rounding is usually the best kind,
sometimes provably so; for instance, for every
x = 1.0, 2.0, 3.0, 4.0, ..., 2.0**52, we find
(x/3.0)\(**3.0 == x and (x/10.0)\(**10.0 == x and ...
despite that both the quotients and the products
have been rounded.
Only rounding like IEEE 754 can do that.
But no single kind of rounding can be
proved best for every circumstance, so IEEE 754
provides rounding towards zero or towards
+\*(If or towards \-\*(If
at the programmer's option.
.Ed
.Pp
Exceptions:
.Bd -ragged -offset indent -compact
IEEE 754 recognizes five kinds of floating-point exceptions,
listed below in declining order of probable importance.
.Bl -column -offset indent "Invalid Operation" "Gradual Underflow"
.Em "Exception Default Result"
Invalid Operation \*(Na, or FALSE
Overflow \(+-\*(If
Divide by Zero \(+-\*(If
Underflow Gradual Underflow
Inexact Rounded value
.El
.Pp
NOTE: An Exception is not an Error unless handled
badly.
What makes a class of exceptions exceptional
is that no single default response can be satisfactory
in every instance.
On the other hand, if a default
response will serve most instances satisfactorily,
the unsatisfactory instances cannot justify aborting
computation every time the exception occurs.
.Ed
.Ss Data Formats
Single-precision:
.Bd -ragged -offset indent -compact
Type name:
.Vt float
.Pp
Wordsize: 32 bits.
.Pp
Precision: 24 significant bits,
roughly like 7 significant decimals.
.Bd -ragged -offset indent -compact
If x and x' are consecutive positive single-precision
numbers (they differ by 1
.Em ulp ) ,
then
.Bd -ragged -compact
5.9e\-08 < 0.5**24 < (x'\-x)/x \(<= 0.5**23 < 1.2e\-07.
.Ed
.Ed
.Pp
.Bl -column "XXX" -compact
Range: Overflow threshold = 2.0**128 = 3.4e38
Underflow threshold = 0.5**126 = 1.2e\-38
.El
.Bd -ragged -offset indent -compact
Underflowed results round to the nearest
integer multiple of 0.5**149 = 1.4e\-45.
.Ed
.Ed
.Pp
Double-precision:
.Bd -ragged -offset indent -compact
Type name:
.Vt double
.Bd -ragged -offset indent -compact
On some architectures,
.Vt long double
is the same as
.Vt double .
.Ed
.Pp
Wordsize: 64 bits.
.Pp
Precision: 53 significant bits,
roughly like 16 significant decimals.
.Bd -ragged -offset indent -compact
If x and x' are consecutive positive double-precision
numbers (they differ by 1
.Em ulp ) ,
then
.Bd -ragged -compact
1.1e\-16 < 0.5**53 < (x'\-x)/x \(<= 0.5**52 < 2.3e\-16.
.Ed
.Ed
.Pp
.Bl -column "XXX" -compact
Range: Overflow threshold = 2.0**1024 = 1.8e308
Underflow threshold = 0.5**1022 = 2.2e\-308
.El
.Bd -ragged -offset indent -compact
Underflowed results round to the nearest
integer multiple of 0.5**1074 = 4.9e\-324.
.Ed
.Ed
.Pp
Extended-precision:
.Bd -ragged -offset indent -compact
Type name:
.Vt long double
(when supported by the hardware)
.Pp
Wordsize: 96 bits.
.Pp
Precision: 64 significant bits,
roughly like 19 significant decimals.
.Bd -ragged -offset indent -compact
If x and x' are consecutive positive extended-precision
numbers (they differ by 1
.Em ulp ) ,
then
.Bd -ragged -compact
1.0e\-19 < 0.5**63 < (x'\-x)/x \(<= 0.5**62 < 2.2e\-19.
.Ed
.Ed
.Pp
.Bl -column "XXX" -compact
Range: Overflow threshold = 2.0**16384 = 1.2e4932
Underflow threshold = 0.5**16382 = 3.4e\-4932
.El
.Bd -ragged -offset indent -compact
Underflowed results round to the nearest
integer multiple of 0.5**16445 = 5.7e\-4953.
.Ed
.Ed
.Pp
Quad-extended-precision:
.Bd -ragged -offset indent -compact
Type name:
.Vt long double
(when supported by the hardware)
.Pp
Wordsize: 128 bits.
.Pp
Precision: 113 significant bits,
roughly like 34 significant decimals.
.Bd -ragged -offset indent -compact
If x and x' are consecutive positive quad-extended-precision
numbers (they differ by 1
.Em ulp ) ,
then
.Bd -ragged -compact
9.6e\-35 < 0.5**113 < (x'\-x)/x \(<= 0.5**112 < 2.0e\-34.
.Ed
.Ed
.Pp
.Bl -column "XXX" -compact
Range: Overflow threshold = 2.0**16384 = 1.2e4932
Underflow threshold = 0.5**16382 = 3.4e\-4932
.El
.Bd -ragged -offset indent -compact
Underflowed results round to the nearest
integer multiple of 0.5**16494 = 6.5e\-4966.
.Ed
.Ed
.Ss Additional Information Regarding Exceptions
For each kind of floating-point exception, IEEE 754
provides a Flag that is raised each time its exception
is signaled, and stays raised until the program resets
it.
Programs may also test, save and restore a flag.
Thus, IEEE 754 provides three ways by which programs
may cope with exceptions for which the default result
might be unsatisfactory:
.Bl -enum
.It
Test for a condition that might cause an exception
later, and branch to avoid the exception.
.It
Test a flag to see whether an exception has occurred
since the program last reset its flag.
.It
Test a result to see whether it is a value that only
an exception could have produced.
.Pp
CAUTION: The only reliable ways to discover
whether Underflow has occurred are to test whether
products or quotients lie closer to zero than the
underflow threshold, or to test the Underflow
flag.
(Sums and differences cannot underflow in
IEEE 754; if x \(!= y then x\-y is correct to
full precision and certainly nonzero regardless of
how tiny it may be.)
Products and quotients that
underflow gradually can lose accuracy gradually
without vanishing, so comparing them with zero
(as one might on a VAX) will not reveal the loss.
Fortunately, if a gradually underflowed value is
destined to be added to something bigger than the
underflow threshold, as is almost always the case,
digits lost to gradual underflow will not be missed
because they would have been rounded off anyway.
So gradual underflows are usually
.Em provably
ignorable.
The same cannot be said of underflows flushed to 0.
.El
.Pp
At the option of an implementor conforming to IEEE 754,
other ways to cope with exceptions may be provided:
.Bl -enum
.It
ABORT.
This mechanism classifies an exception in
advance as an incident to be handled by means
traditionally associated with error-handling
statements like "ON ERROR GO TO ...".
Different
languages offer different forms of this statement,
but most share the following characteristics:
.Bl -dash
.It
No means is provided to substitute a value for
the offending operation's result and resume
computation from what may be the middle of an
expression.
An exceptional result is abandoned.
.It
In a subprogram that lacks an error-handling
statement, an exception causes the subprogram to
abort within whatever program called it, and so
on back up the chain of calling subprograms until
an error-handling statement is encountered or the
whole task is aborted and memory is dumped.
.El
.It
STOP.
This mechanism, requiring an interactive
debugging environment, is more for the programmer
than the program.
It classifies an exception in
advance as a symptom of a programmer's error; the
exception suspends execution as near as it can to
the offending operation so that the programmer can
look around to see how it happened.
Quite often
the first several exceptions turn out to be quite
unexceptionable, so the programmer ought ideally
to be able to resume execution after each one as if
execution had not been stopped.
.It
\&... Other ways lie beyond the scope of this document.
.El
.Pp
Ideally, each
elementary function should act as if it were indivisible, or
atomic, in the sense that ...
.Bl -enum
.It
No exception should be signaled that is not deserved by
the data supplied to that function.
.It
Any exception signaled should be identified with that
function rather than with one of its subroutines.
.It
The internal behavior of an atomic function should not
be disrupted when a calling program changes from
one to another of the five or so ways of handling
exceptions listed above, although the definition
of the function may be correlated intentionally
with exception handling.
.El
.Pp
The functions in
.Nm libm
are only approximately atomic.
They signal no inappropriate exception except possibly ...
.Bl -tag -width indent -offset indent -compact
.It Xo
Over/Underflow
.Xc
when a result, if properly computed, might have lain barely within range, and
.It Xo
Inexact in
.Fn cabs ,
.Fn cbrt ,
.Fn hypot ,
.Fn log10
and
.Fn pow
.Xc
when it happens to be exact, thanks to fortuitous cancellation of errors.
.El
Otherwise, ...
.Bl -tag -width indent -offset indent -compact
.It Xo
Invalid Operation is signaled only when
.Xc
any result but \*(Na would probably be misleading.
.It Xo
Overflow is signaled only when
.Xc
the exact result would be finite but beyond the overflow threshold.
.It Xo
Divide-by-Zero is signaled only when
.Xc
a function takes exactly infinite values at finite operands.
.It Xo
Underflow is signaled only when
.Xc
the exact result would be nonzero but tinier than the underflow threshold.
.It Xo
Inexact is signaled only when
.Xc
greater range or precision would be needed to represent the exact result.
.El
.Sh SEE ALSO
.Xr fenv 3 ,
.Xr ieee_test 3 ,
.Xr math 3
.Pp
An explanation of IEEE 754 and its proposed extension p854
was published in the IEEE magazine MICRO in August 1984 under
the title "A Proposed Radix- and Word-length-independent
Standard for Floating-point Arithmetic" by
.An "W. J. Cody"
et al.
The manuals for Pascal, C and BASIC on the Apple Macintosh
document the features of IEEE 754 pretty well.
Articles in the IEEE magazine COMPUTER vol.\& 14 no.\& 3 (Mar.\&
1981), and in the ACM SIGNUM Newsletter Special Issue of
Oct.\& 1979, may be helpful although they pertain to
superseded drafts of the standard.
.Sh STANDARDS
.St -ieee754
|