summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* Shorten floatprops<T> to FPErik Schnetter2013-02-141-4/+4
|
* Rename "pseudovectors" to "libm"-vectorsErik Schnetter2013-02-141-1/+1
|
* Move definition of all and any into classErik Schnetter2013-02-141-20/+12
|
* Fold Chebyshev versions of sin and cos back into the main versionsErik Schnetter2013-02-146-207/+46
|
* Use SSE 3 and SSE 4.1 only when availableErik Schnetter2013-02-142-8/+72
|
* Introduce VML_NODEBUGErik Schnetter2013-02-142-1/+9
|
* Remove unused functionsErik Schnetter2013-02-141-13/+0
|
* Improve pow()Erik Schnetter2013-02-141-1/+2
|
* Correct ceil() and floor()Erik Schnetter2013-02-141-4/+6
|
* Provide scalbn with scalar int argumentErik Schnetter2013-02-146-0/+26
|
* Test conversions and rounding also for 0Erik Schnetter2013-02-141-6/+13
|
* Correct vector types used in math functionsErik Schnetter2013-02-141-2/+2
|
* Merged in upcaste/vecmathlib (pull request #2) Erik Schnetter2013-02-118-17/+165
|\ | | | | Re-generate pull request
| * Merged eschnett/vecmathlib into masterErik Schnetter2013-02-111-0/+64
| |\ | |/ |/|
* | Add beginning of documentationErik Schnetter2013-02-111-0/+64
| |
| * Added optimized versions of sin and cos.Jesse W. Towner2013-02-118-17/+165
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new functions have been added to the mathfuncs template class, and are named vml_sin_chebyshev_single, vml_sin_chebyshev_double, vml_cos_chebyshev_single, and vml_cos_chebyshev_double. The corresponding sin and cos member functions in the vector template structs have been updated to call into the new implementations. These functions use float optimized minimaxed Chebyshev polynomial approximations. They have good relative error distributions for IEEE-754 floating point numbers, as the highest contributing coefficient is selected to precisely map to either a 32-bit or 64-bit IEEE number for the _single and _double function variants respectively. The _single variants produce approximately ~30-bits of precision in the mantissa, and the _double variants produce around ~60-bits, which is more than enough to produce accurate values. The vml_tan function hasn't been updated, so it calls both sin and cos as it used to, and thus relies on the compiler to factor out common code. It's possible to implement a sincos function using these polynomials that interleaves the fmas, and since the fma instructions in both the sin and cos paths don't have any dependencies on one another, one of the paths is computed for essentially free on x86-64 platforms due to instruction parallelism. Alternatiely, tan can be implemented in terms of a specifically optimized Chebyshev rational function with good performance and properties.
* Remove unused fileErik Schnetter2013-02-061-0/+0
|
* Find out good clang options for using a modern STLErik Schnetter2013-02-063-107/+44
| | | | | Remove clang-specific work-arounds. Switch to building with clang by default.
* Avoid more gcc extensionsErik Schnetter2013-02-052-16/+24
|
* Avoid gcc extensionErik Schnetter2013-02-051-4/+4
|
* Output timings for looping exampleErik Schnetter2013-02-051-9/+116
|
* Remove superfluous instructionsErik Schnetter2013-02-051-4/+0
|
* Correct termination condition in mask classErik Schnetter2013-02-051-5/+8
|
* Add lots of work-arounds for missing clang++ functionalityErik Schnetter2013-02-053-45/+111
|
* Add small commentErik Schnetter2013-02-051-0/+1
|
* Activate #defines handling IEEE complianceErik Schnetter2013-02-051-4/+4
|
* Provide missing shift operators in pseudovec classesErik Schnetter2013-02-051-1/+27
|
* Improve scalbn()Erik Schnetter2013-02-051-1/+1
|
* Update clang build instructionsErik Schnetter2013-02-051-1/+6
|
* Handle rint() overflow correctlyErik Schnetter2013-02-051-2/+18
|
* Use correct version of rint()Erik Schnetter2013-02-051-6/+6
|
* Omit unused header fileErik Schnetter2013-02-051-1/+0
|
* Correct clang instructions (doesn't work yet). Add Intel instructions (works).Erik Schnetter2013-02-051-2/+6
|
* Don't use constexpr; Intel compiler doesn't handle it wellErik Schnetter2013-02-051-44/+47
|
* Comment out unused variableErik Schnetter2013-02-051-1/+1
|
* Provide intvec_t::iota()Erik Schnetter2013-02-046-0/+11
|
* Add example that uses a manually vectorised loopErik Schnetter2013-02-041-0/+107
|
* Provide memory access functionsErik Schnetter2013-02-0411-0/+858
|
* Add flag VML_DEBUGErik Schnetter2013-02-041-4/+16
| | | | Also define VML_ASSERT for error-checking
* White space changeErik Schnetter2013-02-041-0/+2
|
* Update build instructionsErik Schnetter2013-02-043-8/+14
|
* Improve indentationErik Schnetter2013-02-031-27/+38
|
* Use pseudovec templateErik Schnetter2013-02-032-63/+26
|
* Add new template {bool|int|real}pseudovec that scalarizes all operationsErik Schnetter2013-02-032-0/+1104
|
* Whitespace changeErik Schnetter2013-02-031-0/+2
|
* Use memcpy instead of a union to re-interpret valuesErik Schnetter2013-02-031-8/+21
|
* Correct indentationErik Schnetter2013-02-031-8/+8
|
* Describe how to build with make and with ninja, respectivelyErik Schnetter2013-02-031-2/+3
|
* Build with make instead of ninja by defaultErik Schnetter2013-02-031-1/+6
|
* Check for non-finite number instead of nan after running benchmarksErik Schnetter2013-02-011-2/+2
|
OpenPOWER on IntegriCloud