| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| | |
* commit '3ccec334b8502701e72ef13bed25913c3578022e':
sbrdsp: Move a misplaced #endif directive to the right spot
Merged-by: James Almer <jamrial@gmail.com>
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
Add fixed poind code.
Signed-off-by: Nedeljko Babic <nedeljko.babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| |
| |
| |
| |
| |
| |
| | |
Move the existing code to a new template file.
Signed-off-by: Nedeljko Babic <nedeljko.babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|\ \
| |/
| |
| |
| |
| |
| | |
* commit 'd961a79eb07a8911540a0bd356d68ae0cf93c6a1':
sbrdsp: move #if to disable all educational code
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Avoids a warning of the unused function 'autocorrelate'.
|
| | |
|
|\ \
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '4a7af92cc80ced8498626401ed21f25ffe6740c8':
sbrdsp: Unroll and use integer operations
sbrdsp: Unroll sbr_autocorrelate_c
x86: sbrdsp: Implement SSE2 qmf_deint_bfly
Conflicts:
libavcodec/sbrdsp.c
libavcodec/x86/sbrdsp.asm
libavcodec/x86/sbrdsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch can be controversial, by assuming floats are IEEE-754 and
particular behaviour of the FPU will get in the way.
Timing on Arrandale and Win32 (thus, x87 FPU is used in the reference).
sbr_qmf_pre_shuffle_c: 115 to 76
sbr_neg_odd_64_c: 84 to 55
sbr_qmf_post_shuffle_c: 112 to 83
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
| |
| |
| |
| |
| |
| | |
1410 cycles to 1148 on Arrandale/Win64
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch can be controversial, by assuming floats are IEEE-754 and
particular behaviour of the FPU will get in the way.
Timing on Arrandale and Win32 (thus, x87 FPU is used in the reference).
sbr_qmf_pre_shuffle_c: 115 to 76
sbr_neg_odd_64_c: 84 to 55
sbr_qmf_post_shuffle_c: 112 to 83
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
1410 cycles to 1148 on Arrandale/Win64.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|/
|
|
|
| |
Signed-off-by: Mirjana Vulin <mvulin@mips.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
| |
Rename the called dsp init functions to *_init_x86.
|
|
|
|
|
|
|
|
| |
The length is even, so some unrolling can be performed. Timings are for x86:
- 32bits: 102c -> 82c
- 64bits: 82c -> 69c
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C /32bits: 82c (unrolled)/102c
C /64bits: 69c (unrolled)/82c
SSE/32bits: 42c
SSE/64bits: 31c
Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
|
|
| |
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
|
|
|
|
| |
Overall speedup of HE-AAC decoding 2.3x on Cortex-A8, 1.2x on A9.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
This prepares for assembly optimisations by moving the most
time-consuming loops to functions called through pointers
in a new context.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|