| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
arch-specific updates for powerpcspe, mips and riscv, for which support
has not been merged yet.
Bump __FreeBSD_version for the addition of cacoshl, cacosl, casinhl,
casinl, catanl, catanhl, sincos, sincosf, and sincosl.
MFC r305382 (by bde):
Add asm versions of fmod(), fmodf() and fmodl() on amd64. Add asm
versions of fmodf() amd fmodl() on i387.
fmod is similar to remainder, and the C versions are 3 to 9 times
slower than the asm versions on x86 for both, but we had the strange
mixture of all 6 variants of remainder in asm and only 1 of 6
variants of fmod in asm.
MFC r305384 (by bde):
Disconnect the "optimized" asm variants of cos(), sin() and tan() from
the build on i386. Leave them in the source tree for regression tests.
The asm functions were always much less accurate (by a factor of more
than 10**18 in the worst case). They were faster on old CPUs. But
with each new generation of CPUs they get relatively slower. The
double precision C version's average advantage is about a factor of 2
on Haswell.
The asm functions were already intentionally avoided in float and long
double precision on i386 and in all precisions on amd64. Float
precision and amd64 give larger advantages to the C version. The long
double precision C code and compilers' understanding of long double
precision are not so good, so the i387 is still slightly faster for
long double precision, except for the unimportant subcase of huge args
where the sub-optimal C code now somehow beats the i387 by about a
factor of 2.
MFC r305385 (by bde):
Oops, the previous i386 version of e_fmodf.S and e_fmodl.S was
actually the amd64 version.
MFC r306409 (by emaste):
libm: fix some unused variable (rcsid) and dangling else warnings
s_{fabs,fmax,logb,scalb}{,f,l}.c may be built elsewhere with a higher
WARNS setting.
Reviewed by: ed
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D8061
MFC r306410 (by emaste):
libm: simplify i387 subdir logic with make's :S substitution
MFC r306527 (by emaste):
libm: remove unused variables for LDBL_MANT_DIG != 113
Sponsored by: The FreeBSD Foundation
MFC r306709 (by emaste):
libm: remove unused variables
Sponsored by: The FreeBSD Foundation
MFC r307066 (by br):
Don't use fmaxl/fminl on platforms with no long double support,
use fmax/fmin instead.
This fixes fmaxmin test failure on MIPS64.
Reviewed by: emaste
Sponsored by: DARPA, AFRL
Sponsored by: HEIF5
Differential Revision: https://reviews.freebsd.org/D8216
MFC r308172 (by emaste):
libm: add braces around initialization of subobjects
This cleans up a warning when building libm at higher WARNS levels and
makes the intent more clear. By the C standard the values are assigned
to subobject members in order so this change introduces no functional
change. (6.7.9 20)
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D8333
MFC r313761 (by mmokhi):
Add casinl() cacosl() catanl() casinhl() cacoshl() catanhl() APIs to msun
to improve C11 conformance.
PR: 216850 216851 216852 216856 216857 216858
Submitted by: mmokhi
Reported by: sgk@troutmask.apl.washington.edu
Reviewed by: bde, mat, theraven
Approved by: bde (src committer), mat (mentor)
Differential Revision: https://reviews.freebsd.org/D9491
MFC r313863 (by mmokhi):
Fix building of r313761 on platforms that
`long double` is alias of `double` (MIPS, etc)
PR: 216850 216851 216852 216856 216857 216858
Reported by: emsate
Reviewed by: bde emaste hselasky
Approved by: bde emaste hselasky
Differential Revision: https://reviews.freebsd.org/D9491
MFC r313864 (by mmokhi):
Add documentations related to new APIs of r313761
PR: 216850 216851 216852 216856 216857 216858
Submitted by: sgk@troutmask.apl.washington.edu
Reported by: sgk@troutmask.apl.washington.edu
Reviewed by: bde emaste hselasky
Approved by: bde emaste hselasky
Differential Revision: https://reviews.freebsd.org/D9491
MFC r314950 (by ngie):
Don't expect :test_large_inputs to fail with i386 anymore
Recent changes (maybe a side-effect of the ATF-ification in r314649)
invalidate the failure expectation.
PR: 205446
Sponsored by: Dell EMC Isilon
MFC r317349 (by pfg):
msun: Remove trailing space in Sunsoft copyright statement.
Submittedby: kargl
MFC r319047 (by mmel):
Implement sincos, sincosf, and sincosl.
The primary benefit of these functions is that argument
reduction is done once instead of twice in independent
calls to sin() and cos().
* lib/msun/Makefile:
. Add s_sincos[fl].c to the build.
. Add sincos.3 documentation.
. Add appropriate MLINKS.
* lib/msun/Symbol.map:
. Expose sincos[fl] symbols in dynamic libm.so.
* lib/msun/man/sincos.3:
. Documentation for sincos[fl].
* lib/msun/src/k_sincos.h:
. Kernel for sincos() function. This merges the individual kernels
for sin() and cos(). The merger offered an opportunity to re-arrange
the individual kernels for better performance.
* lib/msun/src/k_sincosf.h:
. Kernel for sincosf() function. This merges the individual kernels
for sinf() and cosf(). The merger offered an opportunity to re-arrange
the individual kernels for better performance.
* lib/msun/src/k_sincosl.h:
. Kernel for sincosl() function. This merges the individual kernels
for sinl() and cosl(). The merger offered an opportunity to re-arrange
the individual kernels for better performance.
* lib/msun/src/math.h:
. Add prototytpes for sincos[fl]().
* lib/msun/src/math_private.h:
. Add RETURNV macros. This is needed to reset fpsetprec on I386
hardware for a function with type void.
* lib/msun/src/s_sincos.c:
. Implementation of sincos() where sin() and cos() were merged into
one routine and possibly re-arranged for better performance.
* lib/msun/src/s_sincosf.c:
. Implementation of sincosf() where sinf() and cosf() were merged into
one routine and possibly re-arranged for better performance.
* lib/msun/src/s_sincosl.c:
. Implementation of sincosl() where sinl() and cosl() were merged into
one routine and possibly re-arranged for better performance.
PR: 215977, 218300
Submitted by: Steven G. Kargl <sgk@troutmask.apl.washington.edu>
Differential Revision: https://reviews.freebsd.org/D10765
MFC r321457 (by ngie):
Mark :reduction as an expected failure
It fails with clang 5.0+.
PR: 220989
Reported by: Jenkins
MFC r322418 (by rlibby):
lib/msun: avoid referring to broken LDBL_MAX
LDBL_MAX is broken on i386:
https://lists.freebsd.org/pipermail/freebsd-numerics/2012-September/000288.html
Gcc has produced +Infinity for LDBL_MAX on i386 and amd64 with -m32
for some time, and newer versions of gcc are now warning that the
"floating constant exceeds range of 'long double'". Avoid this by
referring to half the value of LDBL_MAX instead.
Reviewed by: bde
Approved by: markj (mentor)
Sponsored by: Dell EMC Isilon
MFC r322435 (by rlibby):
Revert r322418, LDBL_MAX_EXP unsuitable for macro pasting on some arches
Either need a different way to spell HALF_LDBL_MAX, or a different way
to spell LDBL_MAX_EXP, or a different approach.
Reported by: ian
MFC r322921 (by ngie):
Revert r321457
It doesn't fail after ^/head@r322855 (the releng_50 clang merge).
PR: 220989
(cherry picked from commit 68896d08c13fabea18690d4c5369de8a5e24ba82)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The C11 standard introduced a set of macros (CMPLX, CMPLXF, CMPLXL) that
can be used to construct complex numbers from a pair of real and
imaginary numbers. Unfortunately, they require some compiler support,
which is why we only define them for Clang and GCC>=4.7.
The cpack() function in libm performs the same task as CMPLX(), but
cannot be used to generate compile-time constants. This means that all
invocations of cpack() can safely be replaced by C11's CMPLX(). To keep
the code building with GCC 4.2, provide copies of CMPLX() that can at
least be used to generate run-time complex numbers.
This makes it easier to build some of the functions outside of libm.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lgamma(x) = -log(x) - log(1+x) + x*(1-g) + x**2*P(x) with g = 0.57...
being the Euler constant and P(x) a polynomial. Substitution of small
into the RHS shows that the last 3 terms are negligible in comparison to
the leading term. The choice of 3 may be conservative.
The value large=2**(p+3) is detemined from Stirling's approximation
lgamma(x) = x*(log(x)-1) - log(x)/2 + log(2*pi)/2 + P(1/x)/x
Again, substitution of large into the RHS reveals the last 3 terms
are negligible in comparison to the leading term.
Move the x=+-0 special case into the |x|<small block.
In the ld80 and ld128 implementaion, use fdlibm compatible comparisons
involving ix, lx, and llx. This replaces several floating point
comparisons (some involving fabsl()) and also fixes the special cases
x=1 and x=2.
While here
. Remove unnecessary parentheses.
. Fix/improve comments due to the above changes.
. Fix nearby whitespace.
* src/e_lgamma_r.c:
. Sort declaration.
. Remove unneeded explicit cast for type conversion.
. Replace a double literal constant by an integer literal constant.
* src/e_lgammaf_r.c:
. Sort declaration.
* ld128/e_lgammal_r.c:
. Replace a long double literal constant by a double literal constant.
* ld80/e_lgammal_r.c:
. Remove unused '#include float.h'
. Replace a long double literal constant by a double literal constant.
Requested by: bde
|
|
|
|
|
|
| |
set signgamp = -1.
Submitted by: enh at google dot com (e_lgamma[f]_r.c)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
. Hook e_lgammal[_r].c to the build.
. Create man page links for lgammal[-r].3.
* Symbol.map:
. Sort lgammal to its rightful place.
. Add FBSD_1.4 section for the new lgamal_r symbol.
* ld128/e_lgammal_r.c:
. 128-bit implementataion of lgammal_r().
* ld80/e_lgammal_r.c:
. Intel 80-bit format implementation of lgammal_r().
* src/e_lgamma.c:
. Expose lgammal as a weak reference to lgamma for platforms
where long double is mapped to double.
* src/e_lgamma_r.c:
. Use integer literal constants instead of real literal constants.
Let compiler(s) do the job of conversion to the appropriate type.
. Expose lgammal_r as a weak reference to lgamma_r for platforms
where long double is mapped to double.
* src/e_lgammaf_r.c:
. Fixed the Cygnus Support conversion of e_lgamma_r.c to float.
This includes the generation of new polynomial and rational
approximations with fewer terms. For each approximation, include
a comment on an estimate of the accuracy over the relevant domain.
. Use integer literal constants instead of real literal constants.
Let compiler(s) do the job of conversion to the appropriate type.
This allows the removal of several explicit casts of double values
to float.
* src/e_lgammal.c:
. Wrapper for lgammal() about lgammal_r().
* src/imprecise.c:
. Remove the lgamma.
* src/math.h:
. Add a prototype for lgammal_r().
* man/lgamma.3:
. Document the new functions.
Reviewed by: bde
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
. Add s_erfl.c to building libm.
. Add MLINKS for erfl.3 and erfcl.3.
* Symbol.map:
. Move erfl and erfcl to their proper location.
* ld128/s_erfl.c:
. Implementations of erfl and erfcl in the IEEE 754 128-bit format.
* ld80/s_erfl.c:
. Implementations of erfl and erfcl in the Intel 80-bit format.
* man/erf.3:
. Document the new functions.
. While here, remove an incomplete sentence.
* src/imprecise.c:
. Remove the stupidity of mapping erfl and erfcl to erf and erfc.
* src/math.h:
. Move the declarations of erfl and erfcl to their proper place.
* src/s_erf.c:
. For architectures where double and long double are the same
floating point format, use weak references to map erfl to
erf and ercl to erfc.
Reviewed by: bde (many earlier versions)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* ld128/k_expl.h:
. Split out a computational kernel,__k_expl(x, &hi, &lo, &k) from expl(x).
x must be finite and not tiny or huge. The kernel returns hi and lo
values for extra precision and an exponent k for a 2**k scale factor.
. Define additional kernels k_hexpl() and hexpl() that include a 1/2
scaling and are used by the hyperbolic functions.
* ld80/s_expl.c:
* ld128/s_expl.c:
. Use the __k_expl() kernel.
Obtained from: bde
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
as a fairly faithful implementation of the algorithm found in
PTP Tang, "Table-driven implementation of the Expm1 function
in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18,
211-222 (1992).
Over the last 18-24 months, the code has under gone significant
optimization and testing.
Reviewed by: bde
Obtained from: bde (most of the optimizations)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Use integral numerical constants, and let the compiler do the
conversion to long double.
ld128/s_expl.c:
* Use integral numerical constants, and let the compiler do the
conversion to long double.
* Use the ENTERI/RETURNI macros, which are no-ops on ld128. This
however makes the ld80 and ld128 identical.
Reviewed by: bde (as part of larger diff)
|
|
|
|
|
|
| |
on a literal constant.
Obtained from: bde
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* In the special case x = -Inf or -NaN, use a micro-optimization
to eliminate the need to access u.xbits.man.
* Fix an off-by-one for small arguments |x| < 0x1p-65.
ld128/s_expl.c:
* In the special case x = -Inf or -NaN, use a micro-optimization
to eliminate the need to access u.xbits.manh and u.xbits.manl.
* Fix an off-by-one for small arguments |x| < 0x1p-114.
Obtained from: bde
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Update the evaluation of the polynomial. This allows the removal
of the now unused variables t23 and t45.
ld128/s_expl.c:
* Update the evaluation of the polynomial and the intermediate
result t. This update allows several numerical constants to be
written as double rather than long double constants. Update
the constants as appropriate.
Obtained from: bde
|
|
|
|
| |
my previous commit.
|
|
|
|
| |
an interval instead of a midpoint.
|
|
|
|
|
|
|
| |
and use macros to access the e component of the unions. This allows
the portions of the code in ld80 to be identical to the ld128 code.
Obtained from: bde
|
|
|
|
|
|
|
| |
Use the macroi as a micro-optimization to convert a subtraction and
division to a shift.
Obtained from: bde
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The names now coincide with the name used in PTP Tang's paper.
* Rename the variable from s to tbl to better reflect that
this is a table, and to be consistent with the naming scheme
in s_exp2l.c
Reviewed by: bde (as part of larger diff)
|
|
|
|
| |
Reviewed by: bde (as part of larger diff)
|
|
|
|
|
|
|
|
|
|
|
| |
* Update Copyright years to include 2013.
ld128/s_expl.c:
* Correct and update Copyright years. This code originated from
the ld80 version, so it should reflect the same time period.
Reviewed by: bde (as part of larger diff)
|
|
|
|
| |
Submitted by: bde
|
|
|
|
|
| |
Approved by: das (implicit)
Reported by: jh
|
|
|
|
|
|
|
|
| |
* Use ENTERI/RETURNI to allow the use of FP_PE on i386 target.
Reviewed by: das (and bde a long time ago)
Approved by: das (mentor)
Obtained from: bde (polynomial coefficients)
|
|
|
|
|
|
|
|
|
|
| |
are workarounds for various symptoms of the problem described in clang
bugs 3929, 8100, 8241, 10409, and 12958.
The regression tests did their job: they failed, someone brought it
up on the mailing lists, and then the issue got ignored for 6 months.
Oops. There may still be some regressions for functions we don't have
test coverage for yet.
|
|
|
|
|
|
|
|
|
|
| |
table and the requirement on trailing zero bits.
* Remove the __aligned() compiler directives as these were found
to have a negative effect on the produced code.
Submitted by: bde
Approved by: das (mentor)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
. Change the API for the LD80C by removing the explicit passing
of the sign bit. The sign can be determined from the last
parameter of the macro.
. On i386, load long double by bit manipulations to work around
at least a gcc compiler issue. On non-i386 ld80 architectures,
use a simple assignment.
* ld80/s_expl.c:
. Update the only consumer of LD80C.
Submitted by: bde
Approved by: das (mentor)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
. Fix the threshold for expl(x) where |x| is small.
. Also update the previously incorrect comment to match the
new threshold.
* ld128/s_expl.c:
. Re-order logic in exceptional cases to match the logic used in
other long double functions.
. Fix the threshold for expl(x) where is |x| is small.
. Also update the previously incorrect comment to match the
new threshold.
Submitted by: bde
Approved by: das (mentor)
|
|
|
|
| |
Approved by: das (mentor, implicit)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
. Guard a comment from reformatting by indent(1).
. Re-order variables in declarations to alphabetical order.
. Remove a banal comment.
* ld128/s_expl.c:
. Add a comment to point to ld80/s_expl.c for implementation details.
. Move the #define of INTERVAL to reduce the diff with ld80/s_expl.c.
. twom10000 does not need to be volatile, so move its declaration.
. Re-order variables in declarations to alphabetical order.
. Add a comment that describes the argument reduction.
. Remove the same banal comment found in ld80/s_expl.c.
Reviewed by: bde
Approved by: das (mentor)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also, update the comment to describe the choice of using
a high and low decomposition of 2^(i/INTERNVAL) for
0 <= i <= INTERVAL in preparation for an implementation of
expm1l.
* Move the #define of INTERVAL above the comment, because the
comment refers to INTERVAL.
Reviewed by: bde
Approved by: das (mentor)
|
|
|
|
|
| |
Submitted by: bde
Approved by: das (pre-approved)
|
|
|
|
|
|
|
|
|
|
|
| |
compatibility with the INTERVALS macro used in the soon-to-be-commmitted
expm1l() and someday-to-be-committed log*l() functions.
Add a comment into ld128/s_expl.c noting at gcc issue that was
deleted when rewriting ld80/e_expl.c as ld128/s_expl.c.
Requested by: bde
Approved by: das (mentor)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
. Remove a few #ifdefs that should have been removed in the initial
commit.
. Sort fpmath.h to its rightful place.
* ld128/s_expl.c:
. Replace EXPMASK with its actual value.
. Sort fpmath.h to its rightful place.
Requested by: bde
Approved by: das (mentor)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
format. These implementations are based on
PTP Tang, "Table-driven implementation of the exponential function
in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 15,
144-157 (1989).
PR: standards/152415
Submitted by: kargl
Reviewed by: bde, das
Approved by: das (mentor)
|
|
|
|
| |
Approved by: philip (mentor)
|
|
|
|
|
| |
Reviewed by: das
Approved by: das (mentor)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
on i386-class hardware for sinl and cosl. The hand-rolled argument
reduction have been replaced by e_rem_pio2l() implementations. To
preserve history the following commands have been executed:
svn cp src/e_rem_pio2.c ld80/e_rem_pio2l.h
mv ${HOME}/bde/ld80/e_rem_pio2l.c ld80/e_rem_pio2l.h
svn cp src/e_rem_pio2.c ld128/e_rem_pio2l.h
mv ${HOME}/bde/ld128/e_rem_pio2l.c ld128/e_rem_pio2l.h
The ld80 version has been tested by bde, das, and kargl over the
last few years (bde, das) and few months (kargl). An older ld128
version was tested by das. The committed version has only been
compiled tested via 'make universe'.
Approved by: das (mentor)
Obtained from: bde
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
at compile time regardless of the dynamic precision, and there's
no way to disable this misfeature at compile time. Hence, it's
impossible to generate the appropriate tables of constants for the
long double inverse trig functions in a straightforward way on i386;
this change hacks around the problem by encoding the underlying bits
in the table.
Note that these functions won't pass the regression test on i386,
even with the FPU set to extended precision, because the regression
test is similarly damaged by gcc. However, the tests all pass when
compiled with a modified version of gcc.
Reported by: bde
|
|
|
|
|
|
|
| |
and cargl().
Reviewed by: bde
sparc64 testing resources from: remko
|
|
|
|
|
| |
on !(amd64 || i386). It gave slightly worse than double precision in some
cases. tanl() now passes tests of 2^24 values on ia64.
|
| |
|
|
|
|
|
|
|
| |
Bruce for putting lots of effort into these; getting them right isn't
easy, and they went through many iterations.
Submitted by: Steve Kargl <sgk@apl.washington.edu> with revisions from bde
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has the side effect of confusing gcc-4.2.1's optimizer into more
often doing the right thing. When it does the wrong thing here, it
seems to be mainly making too many copies of x with dependency chains.
This effect is tiny on amd64, but in some cases on i386 it is enormous.
E.g., on i386 (A64) with -O1, the current version of exp2() should
take about 50 cycles, but took 83 cycles before this change and 66
cycles after this change. exp2f() with -O1 only speeded up from 51
to 47 cycles. (exp2f() should take about 40 cycles, on an Athlon in
either i386 or amd64 mode, and now takes 42 on amd64). exp2l() with
-O1 slowed down from 155 cycles to 123 for some args; this is unimportant
since the i386 exp2l() is a fake; the wrong thing for it seems to
involve branch misprediction.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
exponent bits of the reduced result, construct 2**k (hopefully in
parallel with the construction of the reduced result) and multiply by
it. This tends to be much faster if the construction of 2**k is
actually in parallel, and might be faster even with no parallelism
since adjustment of the exponent requires a read-modify-wrtite at an
unfortunate time for pipelines.
In some cases involving exp2* on amd64 (A64), this change saves about
40 cycles or 30%. I think it is inherently only about 12 cycles faster
in these cases and the rest of the speedup is from partly-accidentally
avoiding compiler pessimizations (the construction of 2**k is now
manually scheduled for good results, and -O2 doesn't always mess this
up). In most cases on amd64 (A64) and i386 (A64) the speedup is about
20 cycles. The worst case that I found is expf on ia64 where this
change is a pessimization of about 10 cycles or 5%. The manual
scheduling for plain exp[f] is harder and not as tuned.
This change ld128/s_exp2l.c has not been tested.
|
|
|
|
|
|
|
|
| |
long doubles (i386, amd64, ia64) and one for machines with 128-bit
long doubles (sparc64). Other platforms use the double version.
I've only done runtime testing on i386.
Thanks to bde@ for helpful discussions and bugfixes.
|
|
|
|
|
|
|
|
|
|
|
|
| |
my original implementation made both use the same code. Unfortunately,
this meant libm depended on a vendor header at compile time and previously-
unexposed vendor bits in libc at runtime.
Hence, I just wrote my own version of the relevant vendor routine. As it
turns out, mine has a factor of 8 fewer of lines of code, and is a bit more
readable anyway. The strtod() and *scanf() routines still use vendor code.
Reviewed by: bde
|
|
adds two new directories in msun: ld80 and ld128. These are for
long double functions specific to the 80-bit long double format
used on x86-derived architectures, and the 128-bit format used on
sparc64, respectively.
|