|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new functions have been added to the mathfuncs template class,
and are named vml_sin_chebyshev_single, vml_sin_chebyshev_double,
vml_cos_chebyshev_single, and vml_cos_chebyshev_double. The
corresponding sin and cos member functions in the vector template
structs have been updated to call into the new implementations.
These functions use float optimized minimaxed Chebyshev polynomial
approximations. They have good relative error distributions for
IEEE-754 floating point numbers, as the highest contributing
coefficient is selected to precisely map to either a 32-bit or
64-bit IEEE number for the _single and _double function variants
respectively.
The _single variants produce approximately ~30-bits of precision
in the mantissa, and the _double variants produce around ~60-bits,
which is more than enough to produce accurate values.
The vml_tan function hasn't been updated, so it calls both sin
and cos as it used to, and thus relies on the compiler to factor
out common code. It's possible to implement a sincos function
using these polynomials that interleaves the fmas, and since
the fma instructions in both the sin and cos paths don't have
any dependencies on one another, one of the paths is computed
for essentially free on x86-64 platforms due to instruction
parallelism. Alternatiely, tan can be implemented in terms of
a specifically optimized Chebyshev rational function with good
performance and properties.
|