summaryrefslogtreecommitdiffstats
path: root/lib/libpthread
diff options
context:
space:
mode:
authorbde <bde@FreeBSD.org>2005-11-19 02:38:27 +0000
committerbde <bde@FreeBSD.org>2005-11-19 02:38:27 +0000
commit558fb238b170a8cea89a078ae31ee7cf11dc49a6 (patch)
tree06f77ae5b835dd0091c1d286f4c63858859a2f15 /lib/libpthread
parent666e602c465a2f1c8965ed92a47086e3d5e98ecf (diff)
downloadFreeBSD-src-558fb238b170a8cea89a078ae31ee7cf11dc49a6.zip
FreeBSD-src-558fb238b170a8cea89a078ae31ee7cf11dc49a6.tar.gz
Moved all the optimizations for |x| <= 9pi/2 from
__ieee754_rem_pio2f() to its 3 callers and manually inline them. On Athlons, with favourable compiler flags and optimizations and favourable pipeline conditions, this gives a speedup of 30-40 cycles for cosf(), sinf() and tanf() on the range pi/4 < |x| <= 9pi/4, so thes functions are now signifcantly faster than the hardware trig functions in many cases. E.g., in a benchmark with uniformly distributed x in [-2pi, 2pi], A64 hardware fcos took 72-129 cycles and cosf() took 37-55 cycles. Out-of-order execution is needed to get both of these times. The optimizations in this commit apparently work more by removing 1 serialization point than by reducing latency.
Diffstat (limited to 'lib/libpthread')
0 files changed, 0 insertions, 0 deletions
OpenPOWER on IntegriCloud