diff options
author | bde <bde@FreeBSD.org> | 2005-11-30 11:51:17 +0000 |
---|---|---|
committer | bde <bde@FreeBSD.org> | 2005-11-30 11:51:17 +0000 |
commit | 8cc821405ab48cc12a0b4d3b452a47e816d978cf (patch) | |
tree | 48dcd46ed879e591abfa46bfaf080ccbf0697479 /lib | |
parent | a016ed505ca0d48d2b90c0dfd2e561d317c4919b (diff) | |
download | FreeBSD-src-8cc821405ab48cc12a0b4d3b452a47e816d978cf.zip FreeBSD-src-8cc821405ab48cc12a0b4d3b452a47e816d978cf.tar.gz |
Rearranged the polynomial evaluation to reduce dependencies, as in
k_tanf.c but with different details.
The polynomial is odd with degree 13 for tanf() and odd with degree
9 for sinf(), so the details are not very different for sinf() -- the
term with the x**11 and x**13 coefficients goes awaym and (mysteriously)
it helps to do the evaluation of w = z*z early although moving it later
was a key optimization for tanf(). The details are different but simpler
for cosf() because the polynomial is even and of lower degree.
On Athlons, for uniformly distributed args in [-2pi, 2pi], this gives
an optimization of about 4 cycles (10%) in most cases (13% for sinf()
on AXP, but 0% for cosf() with gcc-3.3 -O1 on AXP). The best case
(sinf() with gcc-3.4 -O1 -fcaller-saves on A64) now takes 33-39 cycles
(was 37-45 cycles). Hardware sinf takes 74-129 cycles. Despite
being fine tuned for Athlons, the optimization is even larger on
some other arches (about 15% on ia64 (pluto2) and 20% on alpha (beast)
with gcc -O2 -fomit-frame-pointer).
Diffstat (limited to 'lib')
-rw-r--r-- | lib/msun/src/k_cosf.c | 10 | ||||
-rw-r--r-- | lib/msun/src/k_sinf.c | 12 |
2 files changed, 13 insertions, 9 deletions
diff --git a/lib/msun/src/k_cosf.c b/lib/msun/src/k_cosf.c index a3bf520..6a0afe9 100644 --- a/lib/msun/src/k_cosf.c +++ b/lib/msun/src/k_cosf.c @@ -37,9 +37,11 @@ extern inline float __kernel_cosdf(double x) { - double z,r; + double r, w, z; - z = x*x; - r = z*(C1+z*(C2+z*C3)); - return (one+z*C0) + z*r; + /* Try to optimize for parallel evaluation as in k_tanf.c. */ + z = x*x; + w = z*z; + r = C2+z*C3; + return ((one+z*C0) + w*C1) + (w*z)*r; } diff --git a/lib/msun/src/k_sinf.c b/lib/msun/src/k_sinf.c index 7009f82..79f32a1 100644 --- a/lib/msun/src/k_sinf.c +++ b/lib/msun/src/k_sinf.c @@ -36,10 +36,12 @@ extern inline float __kernel_sindf(double x) { - double z,r,v; + double r, s, w, z; - z = x*x; - v = z*x; - r = S2+z*(S3+z*S4); - return x+v*(S1+z*r); + /* Try to optimize for parallel evaluation as in k_tanf.c. */ + z = x*x; + w = z*z; + r = S3+z*S4; + s = z*x; + return (x + s*(S1+z*S2)) + s*w*r; } |