summaryrefslogtreecommitdiffstats
path: root/lib/msun/ld80
Commit message (Collapse)AuthorAgeFilesLines
* Rename cpack*() to CMPLX*().ed2014-12-161-1/+1
| | | | | | | | | | | | | | | The C11 standard introduced a set of macros (CMPLX, CMPLXF, CMPLXL) that can be used to construct complex numbers from a pair of real and imaginary numbers. Unfortunately, they require some compiler support, which is why we only define them for Clang and GCC>=4.7. The cpack() function in libm performs the same task as CMPLX(), but cannot be used to generate compile-time constants. This means that all invocations of cpack() can safely be replaced by C11's CMPLX(). To keep the code building with GCC 4.2, provide copies of CMPLX() that can at least be used to generate run-time complex numbers. This makes it easier to build some of the functions outside of libm.
* The value small=2**-(p+3), where p is the precision, can be determine fromkargl2014-10-091-38/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lgamma(x) = -log(x) - log(1+x) + x*(1-g) + x**2*P(x) with g = 0.57... being the Euler constant and P(x) a polynomial. Substitution of small into the RHS shows that the last 3 terms are negligible in comparison to the leading term. The choice of 3 may be conservative. The value large=2**(p+3) is detemined from Stirling's approximation lgamma(x) = x*(log(x)-1) - log(x)/2 + log(2*pi)/2 + P(1/x)/x Again, substitution of large into the RHS reveals the last 3 terms are negligible in comparison to the leading term. Move the x=+-0 special case into the |x|<small block. In the ld80 and ld128 implementaion, use fdlibm compatible comparisons involving ix, lx, and llx. This replaces several floating point comparisons (some involving fabsl()) and also fixes the special cases x=1 and x=2. While here . Remove unnecessary parentheses. . Fix/improve comments due to the above changes. . Fix nearby whitespace. * src/e_lgamma_r.c: . Sort declaration. . Remove unneeded explicit cast for type conversion. . Replace a double literal constant by an integer literal constant. * src/e_lgammaf_r.c: . Sort declaration. * ld128/e_lgammal_r.c: . Replace a long double literal constant by a double literal constant. * ld80/e_lgammal_r.c: . Remove unused '#include float.h' . Replace a long double literal constant by a double literal constant. Requested by: bde
* For targets that have a signed zero, lgamma_r(-0, &signgamp) shouldkargl2014-09-171-1/+5
| | | | | | set signgamp = -1. Submitted by: enh at google dot com (e_lgamma[f]_r.c)
* * Makefile:kargl2014-09-151-0/+345
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | . Hook e_lgammal[_r].c to the build. . Create man page links for lgammal[-r].3. * Symbol.map: . Sort lgammal to its rightful place. . Add FBSD_1.4 section for the new lgamal_r symbol. * ld128/e_lgammal_r.c: . 128-bit implementataion of lgammal_r(). * ld80/e_lgammal_r.c: . Intel 80-bit format implementation of lgammal_r(). * src/e_lgamma.c: . Expose lgammal as a weak reference to lgamma for platforms where long double is mapped to double. * src/e_lgamma_r.c: . Use integer literal constants instead of real literal constants. Let compiler(s) do the job of conversion to the appropriate type. . Expose lgammal_r as a weak reference to lgamma_r for platforms where long double is mapped to double. * src/e_lgammaf_r.c: . Fixed the Cygnus Support conversion of e_lgamma_r.c to float. This includes the generation of new polynomial and rational approximations with fewer terms. For each approximation, include a comment on an estimate of the accuracy over the relevant domain. . Use integer literal constants instead of real literal constants. Let compiler(s) do the job of conversion to the appropriate type. This allows the removal of several explicit casts of double values to float. * src/e_lgammal.c: . Wrapper for lgammal() about lgammal_r(). * src/imprecise.c: . Remove the lgamma. * src/math.h: . Add a prototype for lgammal_r(). * man/lgamma.3: . Document the new functions. Reviewed by: bde
* * Makefile:kargl2014-07-131-0/+337
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | . Add s_erfl.c to building libm. . Add MLINKS for erfl.3 and erfcl.3. * Symbol.map: . Move erfl and erfcl to their proper location. * ld128/s_erfl.c: . Implementations of erfl and erfcl in the IEEE 754 128-bit format. * ld80/s_erfl.c: . Implementations of erfl and erfcl in the Intel 80-bit format. * man/erf.3: . Document the new functions. . While here, remove an incomplete sentence. * src/imprecise.c: . Remove the stupidity of mapping erfl and erfcl to erf and erfc. * src/math.h: . Move the declarations of erfl and erfcl to their proper place. * src/s_erf.c: . For architectures where double and long double are the same floating point format, use weak references to map erfl to erf and ercl to erfc. Reviewed by: bde (many earlier versions)
* * ld80/k_expl.h:kargl2013-12-302-230/+350
| | | | | | | | | | | | | | | * ld128/k_expl.h: . Split out a computational kernel,__k_expl(x, &hi, &lo, &k) from expl(x). x must be finite and not tiny or huge. The kernel returns hi and lo values for extra precision and an exponent k for a 2**k scale factor. . Define additional kernels k_hexpl() and hexpl() that include a 1/2 scaling and are used by the hyperbolic functions. * ld80/s_expl.c: * ld128/s_expl.c: . Use the __k_expl() kernel. Obtained from: bde
* ld80 and ld128 implementations of expm1l(). This code started lifekargl2013-06-031-0/+165
| | | | | | | | | | | | | | as a fairly faithful implementation of the algorithm found in PTP Tang, "Table-driven implementation of the Expm1 function in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18, 211-222 (1992). Over the last 18-24 months, the code has under gone significant optimization and testing. Reviewed by: bde Obtained from: bde (most of the optimizations)
* ld80/s_expl.c:kargl2013-06-031-2/+2
| | | | | | | | | | | | | | * Use integral numerical constants, and let the compiler do the conversion to long double. ld128/s_expl.c: * Use integral numerical constants, and let the compiler do the conversion to long double. * Use the ENTERI/RETURNI macros, which are no-ops on ld128. This however makes the ld80 and ld128 identical. Reviewed by: bde (as part of larger diff)
* Micro-optimization: move the unary mius operator to operatekargl2013-06-031-2/+2
| | | | | | on a literal constant. Obtained from: bde
* ld80/s_expl.c:kargl2013-06-031-7/+5
| | | | | | | | | | | | | | | | * In the special case x = -Inf or -NaN, use a micro-optimization to eliminate the need to access u.xbits.man. * Fix an off-by-one for small arguments |x| < 0x1p-65. ld128/s_expl.c: * In the special case x = -Inf or -NaN, use a micro-optimization to eliminate the need to access u.xbits.manh and u.xbits.manl. * Fix an off-by-one for small arguments |x| < 0x1p-114. Obtained from: bde
* ld80/s_expl.c:kargl2013-06-031-6/+4
| | | | | | | | | | | | | | * Update the evaluation of the polynomial. This allows the removal of the now unused variables t23 and t45. ld128/s_expl.c: * Update the evaluation of the polynomial and the intermediate result t. This update allows several numerical constants to be written as double rather than long double constants. Update the constants as appropriate. Obtained from: bde
* Rename a few P2, P3, ... coefficients to A2, A3, ... missed inkargl2013-06-031-3/+3
| | | | my previous commit.
* Update a comment to reflect that we are using an endpoint ofkargl2013-06-031-1/+1
| | | | an interval instead of a midpoint.
* Add a u suffix to the IEEEl2bits unions o_threshold and u_threshold,kargl2013-06-031-4/+7
| | | | | | | and use macros to access the e component of the unions. This allows the portions of the code in ld80 to be identical to the ld128 code. Obtained from: bde
* Introduce the macro LOG2_INTERVAL, which is log2(number of intervals).kargl2013-06-031-1/+3
| | | | | | | Use the macroi as a micro-optimization to convert a subtraction and division to a shift. Obtained from: bde
* Whitespace.kargl2013-06-031-3/+3
|
* * Rename the polynomial coefficients from P2, P3, ... to A2, A3, ....kargl2013-06-031-10/+9
| | | | | | | | | | The names now coincide with the name used in PTP Tang's paper. * Rename the variable from s to tbl to better reflect that this is a table, and to be consistent with the naming scheme in s_exp2l.c Reviewed by: bde (as part of larger diff)
* * Style(9). Start non-Copyright fancy formatted comments with /**.kargl2013-06-031-1/+1
| | | | Reviewed by: bde (as part of larger diff)
* ld80/s_expl.c:kargl2013-06-031-1/+1
| | | | | | | | | | | * Update Copyright years to include 2013. ld128/s_expl.c: * Correct and update Copyright years. This code originated from the ld80 version, so it should reflect the same time period. Reviewed by: bde (as part of larger diff)
* Add logl, log2l, log10l, and log1pl.das2013-06-031-0/+717
| | | | Submitted by: bde
* Style(9)kargl2013-05-271-1/+1
| | | | | Approved by: das (implicit) Reported by: jh
* * Update polynomial coefficients.kargl2013-05-271-33/+30
| | | | | | | | * Use ENTERI/RETURNI to allow the use of FP_PE on i386 target. Reviewed by: das (and bde a long time ago) Approved by: das (mentor) Obtained from: bde (polynomial coefficients)
* Fix some regressions caused by the switch from gcc to clang. The fixesdas2013-05-271-6/+3
| | | | | | | | | | are workarounds for various symptoms of the problem described in clang bugs 3929, 8100, 8241, 10409, and 12958. The regression tests did their job: they failed, someone brought it up on the mailing lists, and then the issue got ignored for 6 months. Oops. There may still be some regressions for functions we don't have test coverage for yet.
* * Update the comment that explains the choice of values in thekargl2012-10-131-6/+7
| | | | | | | | | | table and the requirement on trailing zero bits. * Remove the __aligned() compiler directives as these were found to have a negative effect on the produced code. Submitted by: bde Approved by: das (mentor)
* * src/math_private.h:kargl2012-09-291-2/+2
| | | | | | | | | | | | | | | . Change the API for the LD80C by removing the explicit passing of the sign bit. The sign can be determined from the last parameter of the macro. . On i386, load long double by bit manipulations to work around at least a gcc compiler issue. On non-i386 ld80 architectures, use a simple assignment. * ld80/s_expl.c: . Update the only consumer of LD80C. Submitted by: bde Approved by: das (mentor)
* * ld80/s_expl.c:kargl2012-09-231-1/+1
| | | | | | | | | | | | | | | | . Fix the threshold for expl(x) where |x| is small. . Also update the previously incorrect comment to match the new threshold. * ld128/s_expl.c: . Re-order logic in exceptional cases to match the logic used in other long double functions. . Fix the threshold for expl(x) where is |x| is small. . Also update the previously incorrect comment to match the new threshold. Submitted by: bde Approved by: das (mentor)
* Fix whitespace issue.kargl2012-09-231-1/+1
| | | | Approved by: das (mentor, implicit)
* * ld80/s_expl.c:kargl2012-09-231-4/+4
| | | | | | | | | | | | | | | | | . Guard a comment from reformatting by indent(1). . Re-order variables in declarations to alphabetical order. . Remove a banal comment. * ld128/s_expl.c: . Add a comment to point to ld80/s_expl.c for implementation details. . Move the #define of INTERVAL to reduce the diff with ld80/s_expl.c. . twom10000 does not need to be volatile, so move its declaration. . Re-order variables in declarations to alphabetical order. . Add a comment that describes the argument reduction. . Remove the same banal comment found in ld80/s_expl.c. Reviewed by: bde Approved by: das (mentor)
* * Update the lookup table to use 53-bit high and low values.kargl2012-09-231-131/+134
| | | | | | | | | | | | | Also, update the comment to describe the choice of using a high and low decomposition of 2^(i/INTERNVAL) for 0 <= i <= INTERVAL in preparation for an implementation of expm1l. * Move the #define of INTERVAL above the comment, because the comment refers to INTERVAL. Reviewed by: bde Approved by: das (mentor)
* Whitespace.kargl2012-07-301-1/+1
| | | | | Submitted by: bde Approved by: das (pre-approved)
* Replace the macro name NUM with INTERVALS. This change provideskargl2012-07-261-11/+11
| | | | | | | | | | | compatibility with the INTERVALS macro used in the soon-to-be-commmitted expm1l() and someday-to-be-committed log*l() functions. Add a comment into ld128/s_expl.c noting at gcc issue that was deleted when rewriting ld80/e_expl.c as ld128/s_expl.c. Requested by: bde Approved by: das (mentor)
* * ld80/expl.c:kargl2012-07-261-5/+1
| | | | | | | | | | | | | . Remove a few #ifdefs that should have been removed in the initial commit. . Sort fpmath.h to its rightful place. * ld128/s_expl.c: . Replace EXPMASK with its actual value. . Sort fpmath.h to its rightful place. Requested by: bde Approved by: das (mentor)
* Compute the exponential of x for Intel 80-bit format and IEEE 128-bitkargl2012-07-231-0/+304
| | | | | | | | | | | | | format. These implementations are based on PTP Tang, "Table-driven implementation of the exponential function in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 15, 144-157 (1989). PR: standards/152415 Submitted by: kargl Reviewed by: bde, das Approved by: das (mentor)
* Fix clang warnings.benl2011-06-181-1/+1
| | | | Approved by: philip (mentor)
* Clean up the unneeded cpp macro INLINE_REM_PIO2L.kargl2011-05-301-4/+1
| | | | | Reviewed by: das Approved by: das (mentor)
* Improve the accuracy from a max ULP of ~2000 to max ULP < 0.79kargl2011-04-291-0/+152
| | | | | | | | | | | | | | | | | | | | on i386-class hardware for sinl and cosl. The hand-rolled argument reduction have been replaced by e_rem_pio2l() implementations. To preserve history the following commands have been executed: svn cp src/e_rem_pio2.c ld80/e_rem_pio2l.h mv ${HOME}/bde/ld80/e_rem_pio2l.c ld80/e_rem_pio2l.h svn cp src/e_rem_pio2.c ld128/e_rem_pio2l.h mv ${HOME}/bde/ld128/e_rem_pio2l.c ld128/e_rem_pio2l.h The ld80 version has been tested by bde, das, and kargl over the last few years (bde, das) and few months (kargl). An older ld128 version was tested by das. The committed version has only been compiled tested via 'make universe'. Approved by: das (mentor) Obtained from: bde
* On i386, gcc truncates long double constants to double precisiondas2008-08-021-4/+17
| | | | | | | | | | | | | | | | at compile time regardless of the dynamic precision, and there's no way to disable this misfeature at compile time. Hence, it's impossible to generate the appropriate tables of constants for the long double inverse trig functions in a straightforward way on i386; this change hacks around the problem by encoding the underlying bits in the table. Note that these functions won't pass the regression test on i386, even with the FPU set to extended precision, because the regression test is similarly damaged by gcc. However, the tests all pass when compiled with a modified version of gcc. Reported by: bde
* Add implementations of acosl(), asinl(), atanl(), atan2l(),das2008-07-312-0/+183
| | | | | | | and cargl(). Reviewed by: bde sparc64 testing resources from: remko
* 2 long double constants were missing L suffixes. This helped break tanl()bde2008-02-181-2/+2
| | | | | on !(amd64 || i386). It gave slightly worse than double precision in some cases. tanl() now passes tests of 2^24 values on ia64.
* Fix a typo which broke k_tanl.c on !(amd64 || i386).bde2008-02-181-1/+1
|
* Add kernel functions for 80-bit long doubles. Many thanks to Steve anddas2008-02-173-0/+264
| | | | | | | Bruce for putting lots of effort into these; getting them right isn't easy, and they went through many iterations. Submitted by: Steve Kargl <sgk@apl.washington.edu> with revisions from bde
* Fix exp2*(x) on signaling NaNs by returning x+x as usual.bde2008-02-131-1/+1
| | | | | | | | | | | | | | | This has the side effect of confusing gcc-4.2.1's optimizer into more often doing the right thing. When it does the wrong thing here, it seems to be mainly making too many copies of x with dependency chains. This effect is tiny on amd64, but in some cases on i386 it is enormous. E.g., on i386 (A64) with -O1, the current version of exp2() should take about 50 cycles, but took 83 cycles before this change and 66 cycles after this change. exp2f() with -O1 only speeded up from 51 to 47 cycles. (exp2f() should take about 40 cycles, on an Athlon in either i386 or amd64 mode, and now takes 42 on amd64). exp2l() with -O1 slowed down from 155 cycles to 123 for some args; this is unimportant since the i386 exp2l() is a fake; the wrong thing for it seems to involve branch misprediction.
* Use a better method of scaling by 2**k. Instead of adding to thebde2008-02-071-11/+14
| | | | | | | | | | | | | | | | | | | | | exponent bits of the reduced result, construct 2**k (hopefully in parallel with the construction of the reduced result) and multiply by it. This tends to be much faster if the construction of 2**k is actually in parallel, and might be faster even with no parallelism since adjustment of the exponent requires a read-modify-wrtite at an unfortunate time for pipelines. In some cases involving exp2* on amd64 (A64), this change saves about 40 cycles or 30%. I think it is inherently only about 12 cycles faster in these cases and the rest of the speedup is from partly-accidentally avoiding compiler pessimizations (the construction of 2**k is now manually scheduled for good results, and -O2 doesn't always mess this up). In most cases on amd64 (A64) and i386 (A64) the speedup is about 20 cycles. The worst case that I found is expf on ia64 where this change is a pessimization of about 10 cycles or 5%. The manual scheduling for plain exp[f] is harder and not as tuned. This change ld128/s_exp2l.c has not been tested.
* Implement exp2l(). There is one version for machines with 80-bitdas2008-01-181-0/+291
| | | | | | | | long doubles (i386, amd64, ia64) and one for machines with 128-bit long doubles (sparc64). Other platforms use the double version. I've only done runtime testing on i386. Thanks to bde@ for helpful discussions and bugfixes.
* Since nan() is supposed to work the same as strtod("nan(...)", NULL),das2007-12-181-10/+9
| | | | | | | | | | | | my original implementation made both use the same code. Unfortunately, this meant libm depended on a vendor header at compile time and previously- unexposed vendor bits in libc at runtime. Hence, I just wrote my own version of the relevant vendor routine. As it turns out, mine has a factor of 8 fewer of lines of code, and is a bit more readable anyway. The strtod() and *scanf() routines still use vendor code. Reviewed by: bde
* Implement and document nan(), nanf(), and nanl(). This commitdas2007-12-161-0/+47
adds two new directories in msun: ld80 and ld128. These are for long double functions specific to the 80-bit long double format used on x86-derived architectures, and the 128-bit format used on sparc64, respectively.
OpenPOWER on IntegriCloud