diff options
author | Ganesh Ajjanagadde <gajjanagadde@gmail.com> | 2015-10-10 21:58:47 -0400 |
---|---|---|
committer | Michael Niedermayer <michael@niedermayer.cc> | 2015-10-11 04:08:41 +0200 |
commit | 971d12b7f9d7be3ca8eb98e6c04ed521f83cbd3c (patch) | |
tree | 68b3c2a368a21f02fc06dde5ab5f75d3d7b44296 /COPYING.LGPLv2.1 | |
parent | 1e7e4f13f95227d79bc8ab9a2167f02f7a3e063f (diff) | |
download | ffmpeg-streaming-971d12b7f9d7be3ca8eb98e6c04ed521f83cbd3c.zip ffmpeg-streaming-971d12b7f9d7be3ca8eb98e6c04ed521f83cbd3c.tar.gz |
avutil/mathematics: speed up av_gcd by using Stein's binary GCD algorithm
This uses Stein's binary GCD algorithm:
https://en.wikipedia.org/wiki/Binary_GCD_algorithm
to get a roughly 4x speedup over Euclidean GCD on standard architectures
with a compiler intrinsic for ctzll, and a roughly 2x speedup otherwise.
At the moment, the compiler intrinsic is used on GCC and Clang due to
its easy availability.
Quick note regarding overflow: yes, subtractions on int64_t can, but the
llabs takes care of that. The llabs is also guaranteed to be safe, with
no annoying INT64_MIN business since INT64_MIN being a power of 2, is
shifted down before being sent to llabs.
The binary GCD needs ff_ctzll, an extension of ff_ctz for long long (int64_t). On
GCC, this is provided by a built-in. On Microsoft, there is a
BitScanForward64 analog of BitScanForward that should work; but I can't confirm.
Apparently it is not available on 32 bit builds; so this may or may not
work correctly. On Intel, per the documentation there is only an
intrinsic for _bit_scan_forward and people have posted on forums
regarding _bit_scan_forward64, but often their documentation is
woeful. Again, I don't have it, so I can't test.
As such, to be safe, for now only the GCC/Clang intrinsic is added, the rest
use a compiled version based on the De-Bruijn method of Leiserson et al:
http://supertech.csail.mit.edu/papers/debruijn.pdf.
Tested with FATE, sample benchmark (x86-64, GCC 5.2.0, Haswell)
with a START_TIMER and STOP_TIMER in libavutil/rationsl.c, followed by a
make fate.
aac-am00_88.err:
builtin:
714 decicycles in av_gcd, 4095 runs, 1 skips
de-bruijn:
1440 decicycles in av_gcd, 4096 runs, 0 skips
previous:
2889 decicycles in av_gcd, 4096 runs, 0 skips
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Diffstat (limited to 'COPYING.LGPLv2.1')
0 files changed, 0 insertions, 0 deletions