summaryrefslogtreecommitdiffstats
path: root/lib/msun/src/k_sinf.c
Commit message (Collapse)AuthorAgeFilesLines
* Use ISO C99 style inline semantics in msun.ed2009-06-031-3/+3
| | | | | Because we use ISO C99 nowadays, we can just get rid of enforcing GNU89-style inlining.
* Use __gnu89_inline so that these files will compile with newer versionsdas2009-01-131-1/+1
| | | | | | of gcc, where the meaning of 'inline' was changed to match C99. Noticed by: rdivacky
* s/rcsid/__FBSDID/das2008-02-221-3/+2
|
* Rearranged the polynomial evaluation to reduce dependencies, as inbde2005-11-301-5/+7
| | | | | | | | | | | | | | | | | | | | k_tanf.c but with different details. The polynomial is odd with degree 13 for tanf() and odd with degree 9 for sinf(), so the details are not very different for sinf() -- the term with the x**11 and x**13 coefficients goes awaym and (mysteriously) it helps to do the evaluation of w = z*z early although moving it later was a key optimization for tanf(). The details are different but simpler for cosf() because the polynomial is even and of lower degree. On Athlons, for uniformly distributed args in [-2pi, 2pi], this gives an optimization of about 4 cycles (10%) in most cases (13% for sinf() on AXP, but 0% for cosf() with gcc-3.3 -O1 on AXP). The best case (sinf() with gcc-3.4 -O1 -fcaller-saves on A64) now takes 33-39 cycles (was 37-45 cycles). Hardware sinf takes 74-129 cycles. Despite being fine tuned for Athlons, the optimization is even larger on some other arches (about 15% on ia64 (pluto2) and 20% on alpha (beast) with gcc -O2 -fomit-frame-pointer).
* Use only double precision for "kernel" cosf and sinf (except forbde2005-11-281-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | returning float). The functions are renamed from __kernel_{cos,sin}f() to __kernel_{cos,sin}df() so that misuses of them will cause link errors and not crashes. This version is an almost-routine translation with no special optimizations for accuracy or efficiency. The not-quite-routine part is that in __kernel_cosf(), regenerating the minimax polynomial with double precision coefficients gives a coefficient for the x**2 term that is not quite -0.5, so the literal 0.5 in the code and the related `hz' variable need to be modified; also, the special code for reducing the error in 1.0-x**2*0.5 is no longer needed, so it is convenient to adjust all the logic for the x**2 term a little. Note that without extra precision, it would be very bad to use a coefficient of other than -0.5 for the x**2 term -- the old version depends on multiplication by -0.5 being infinitely precise so as not to need even more special code for reducing the error in 1-x**2*0.5. This gives an unimportant increase in accuracy, from ~0.8 to ~0.501 ulps. Almost all of the error is from the final rounding step, since the choice of the minimax polynomials so that their contribution to the error is a bit less than 0.5 ulps just happens to give contributions that are significantly less (~.001 ulps). An Athlons, for uniformly distributed args in [-2pi, 2pi], this gives overall speed increases in the 10-20% range, despite giving a speed decrease of typically 19% (from 31 cycles up to 37) for sinf() on args in [-pi/4, pi/4].
* Mess up the "kernel" float trig function .c files with ifdefs so thatbde2005-11-211-0/+5
| | | | | | | | | | | | they can be #included in other .c files to give inline functions, and use them to inline the functions in most callers (not in e_lgammaf_r.c). __kernel_tanf() is too large and complicated for gcc to inline very well. An athlons, this gives a speed increase under favourable pipeline conditions of about 10% overall (larger for AXP, smaller for A64). E.g., on AXP, sinf() on uniformly distributed args in [-2Pi, 2Pi] now takes 30-56 cycles; it used to take 45-61 cycles; hardware fsin takes 65-129.
* Tweaked the minimax polynomial and improved its comments.bde2005-11-121-5/+5
|
* Use fairly optimal minimax polynomials for __kernel_cosf() andbde2005-10-281-8/+8
| | | | | | | | | | | | | | | | | | __kernel_sinf(). The old ones were the double-precision polynomials with coefficients truncated to float. Truncation is not a good way to convert minimax polynomials to lower precision. Optimize for efficiency and use the lowest-degree polynomials that give a relative error of less than 1 ulp -- degree 8 instead of 14 for cosf and degree 9 instead of 13 for sinf. For sinf, the degree 8 polynomial happens to be 6 times more accurate than the old degree 14 one, but this only gives a tiny amount of extra accuracy in results -- we just need to use a a degree high enough to give a polynomial whose relative accuracy in infinite precision (but with float coefficients) is a small fraction of a float ulp (fdlibm generally uses 1/32 for the small fraction, and the fraction for our degree 8 polynomial is about 1/600). The maximum relative errors for cosf() and sinf() are now 0.7719 ulps and 0.7969 ulps, respectively.
* Moved the optimization for tiny x from __kernel_{cos,sin}[f](x) tobde2005-10-241-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | {cos_sin}[f](x) so that x doesn't need to be reclassified in the "kernel" functions to determine if it is tiny (it still needs to be reclassified in the cosine case for other reasons that will go away). This optimization is quite large for exponentially distributed x, since x is tiny for almost half of the domain, but it is a pessimization for uniformally distributed x since it takes a little time for all cases but rarely applies. Arg reduction on exponentially distributed x rarely gives a tiny x unless the reduction is null, so it is best to only do the optimization if the initial x is tiny, which is what this commit arranges. The imediate result is an average optimization of 1.4% relative to the previous version in a case that doesn't favour the optimization (double cos(x) on all float x) and a large pessimization for the relatively unimportant cases of lgamma[f][_r](x) on tiny, negative, exponentially distributed x. The optimization should be recovered for lgamma*() as part of fixing lgamma*()'s low-quality arg reduction. Fixed various wrong constants for the cutoff for "tiny". For cosine, the cutoff is when x**2/2! == {FLT or DBL}_EPSILON/2. We round down to an integral power of 2 (and for cos() reduce the power by another 1) because the exact cutoff doesn't matter and would take more work to determine. For sine, the exact cutoff is larger due to the ration of terms being x**2/3! instead of x**2/2!, but we use the same cutoff as for cosine. We now use a cutoff of 2**-27 for double precision and 2**-12 for single precision. 2**-27 was used in all cases but was misspelled 2**27 in comments. Wrong and sloppy cutoffs just cause missed optimizations (provided the rounding mode is to nearest -- other modes just aren't supported).
* Fix formatting, this is hard to explain, so I'll show one example.alfred2002-05-281-1/+2
| | | | | | | | | | - float ynf(int n, float x) /* wrapper ynf */ +float +ynf(int n, float x) /* wrapper ynf */ This is because the __STDC__ stuff was indented. Reviewed by: md5
* Assume __STDC__, remove non-__STDC__ code.alfred2002-05-281-9/+0
| | | | Reviewed by: md5
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* Revert $FreeBSD$ to $Id$peter1997-02-221-1/+1
|
* Make the long-awaited change from $Id$ to $FreeBSD$jkh1997-01-141-1/+1
| | | | | | | | This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
* Remove trailing whitespace.rgrimes1995-05-301-4/+4
|
* J.T. Conklin's latest version of the Sun math library.jkh1994-08-191-0/+54
-- Begin comments from J.T. Conklin: The most significant improvement is the addition of "float" versions of the math functions that take float arguments, return floats, and do all operations in floating point. This doesn't help (performance) much on the i386, but they are still nice to have. The float versions were orginally done by Cygnus' Ian Taylor when fdlibm was integrated into the libm we support for embedded systems. I gave Ian a copy of my libm as a starting point since I had already fixed a lot of bugs & problems in Sun's original code. After he was done, I cleaned it up a bit and integrated the changes back into my libm. -- End comments Reviewed by: jkh Submitted by: jtc
OpenPOWER on IntegriCloud