| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
|
|
|
|
|
| |
Remove all files and functions which are not going to be reused,
and disable all functions and FATE tests temporarily which will be.
|
|\
| |
| |
| |
| |
| |
| | |
* commit '40d949677335a564f769823f4afdb7e7a3da8d6b':
dca: use defines for subband related constants
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| | |
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
|\ \
| |/
| |
| |
| |
| |
| | |
* commit '2008f76054906e9ff6bf744800af0e5a5bfe61be':
dca: remove unused decode_hf function and quant_d tables
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| |
| | |
They were superseded with their integer equivalents. Rename integer
decode_hf to decode_hf.
|
|\ \
| |/
| |
| |
| |
| |
| | |
* commit 'aebf07075f4244caf591a3af71e5872fe314e87b':
dca: change the core to work with integer coefficients.
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The DCA core decoder converts integer coefficients read from the
bitstream to floats just after reading them (along with dequantization).
All the other steps of the audio reconstruction are done with floats
which makes the output for the DTS lossless extension (XLL)
actually lossy.
This patch changes the DCA core to work with integer coefficients
until QMF. At this point the integer coefficients are converted to floats.
The coefficients for the LFE channel (lfe_data) are not touched.
This is the first step for the really lossless XLL decoding.
|
|\ \
| |/
| |
| |
| |
| |
| | |
* commit 'c33c1fa8af2b2e82418a06901b6ad17b3d61b73e':
arm64: convert dcadsp neon asm from arm
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
~2% faster dts decoding overall.
cortex-a57 cortex-a53
dca_decode_hf_c: 474.8 1659.9
dca_decode_hf_neon: 225.2 301.1
dca_lfe_fir0_c: 913.2 1537.7
dca_lfe_fir0_neon: 286.8 451.9
dca_lfe_fir1_c: 848.7 1711.5
dca_lfe_fir1_neon: 387.1 506.4
|
|\ \
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '45ff7c93dd84a254cc96acc589e5ac3d7bd16bce':
dca: K&R formatting cosmetics
Conflicts:
libavcodec/dca_parser.c
libavcodec/dcadec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
|\ \
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '4cb6964244fd6c099383d8b7e99731e72cc844b9':
dcadec: simplify decoding of VQ high frequencies
Conflicts:
configure
libavcodec/dcadec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The vector dequantization has a test in a loop preventing effective SIMD
implementation. By moving it out of the loop, this loop can be DSPized.
Therefore, modify the current DSP implementation. In particular, the
DSP implementation no longer has to handle null loop sizes.
The decode_hf implementations have following timings:
For x86 Arrandale:
C SSE SSE2 SSE4
win32: 260 162 119 104
win64: 242 N/A 89 72
The arm NEON optimizations follow in a later patch as external asm. The
now unused check for the y modifier in arm inline asm is removed from
configure.
|
|\ \
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '57b1eb9f75b04571063ddec316e290c216c114ac':
dcadsp: scan coefficients linearly in dca_lfe_fir
Conflicts:
libavcodec/dcadsp.c
See: 9ae8e23188fc2e533eea74757c9060557941d3d9
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
This change is inspired by x86 asm where it frees a register.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
|\ \
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '87ec849fe9acba075c843e67bcd01f256f481a18':
dcadec: remove scaling in lfe_interpolation_fir
Conflicts:
libavcodec/dcadec.c
libavcodec/dcadsp.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
The scaling factor is constant so it is faster to scale the
FIR coefficients in the tables during compilation.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The x86 runs short on registers because numerous elements are not static.
In addition, splitting them allows more optimized code, at least for x86.
Arm asm changes by Janne Grunau.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
| |
| |
| |
| |
| |
| | |
This change is inspired by x86 asm, where this frees a register.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
The x86 runs short on registers because numerous elements are not static.
In addition, splitting them allows more optimized code, at least for x86.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\ \
| |/
| |
| |
| |
| |
| | |
* commit '5b59a9fc6152169599561f04b4f66370edda5c9c':
x86: dcadsp: implement int8x8_fmul_int32
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For the callable function (as opposed to the inline one):
C SSE SSE2 SSE4
Win32: 47 42 29 26
Win64: 30 33 25 23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It is currently declared as a macro who is set to inlinable functions,
among which a Neon and a default C implementations.
Add a DSP parameter to each inline function, unused except by the
default C implementation which calls a function from the DSP context.
On an Arrandale CPU, gain for an inlined SSE2 function vs. a call:
- Win32: 29 to 26 cycles
- Win64: 25 to 23 cycles
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It is currently declared as a macro who is set to inlinable functions,
among which a Neon and a default C implementations.
Add a DSP parameter to each inline function, unused except by the
default C implementation which calls a function from the DSP context.
On an Arrandale CPU, gain for an inlined SSE2 function vs. a call:
- Win32: 29 to 26 cycles
- Win64: 25 to 23 cycles
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\ \
| |/
| |
| |
| |
| |
| | |
* commit '800ffab48a7844dd5dc0a33b8f6b8e5ed718cf2e':
dcadsp: Add a new method, qmf_32_subbands
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
This does most of the work formerly carried out by
the static function qmf_32_subbands() in dcadec.c.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|\ \
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '6fee1b90ce3bf4fbdfde7016e0890057c9000487':
avcodec: Add av_cold attributes to init functions missing them
Conflicts:
libavcodec/aacpsy.c
libavcodec/atrac3.c
libavcodec/dvdsubdec.c
libavcodec/ffv1.c
libavcodec/ffv1enc.c
libavcodec/h261enc.c
libavcodec/h264_parser.c
libavcodec/h264dsp.c
libavcodec/h264pred.c
libavcodec/libschroedingerenc.c
libavcodec/libxvid_rc.c
libavcodec/mpeg12.c
libavcodec/mpeg12enc.c
libavcodec/proresdsp.c
libavcodec/rangecoder.c
libavcodec/videodsp.c
libavcodec/x86/proresdsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
| |
| |
| |
| | |
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
DSPContext.vector_fmul_window()
DCADSPContext.lfe_fir()
SynthFilterContext.synth_filter_float()
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|/
|
|
|
|
|
|
|
| |
DSPContext.vector_fmul_window()
DCADSPContext.lfe_fir()
SynthFilterContext.synth_filter_float()
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 80ba1ddb58b5923b9f36a6acd542affc4ca722eb)
|
|
|
|
| |
Originally committed as revision 22863 to svn://svn.ffmpeg.org/ffmpeg/trunk
|
|
This enables SIMD optimisations of this function.
Originally committed as revision 22861 to svn://svn.ffmpeg.org/ffmpeg/trunk
|