summaryrefslogtreecommitdiffstats
path: root/libavcodec/x86/hevc_mc.asm
Commit message (Collapse)AuthorAgeFilesLines
* x86: hevc_mc: fewer xmm regs used in epel h/vChristophe Gisquet2015-02-171-6/+12
| | | | | | | 11 xmm regs seem only required for avx2. Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: save 1 gpr in epel filter loadingChristophe Gisquet2015-02-161-36/+35
| | | | | | | The 3*stride value stored in r3src can be loaded much later, so use r3src instead of a dedicated gpr when possible. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc: remove a parameter to WP internalsChristophe Gisquet2015-02-141-15/+20
| | | | | | | The second stride is always the internal buffer one, MAX_PB_SIZE (times 2 to get the value in bytes). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc_mc: optimize AVX2 mc functionsJames Almer2015-02-121-20/+12
| | | | | | | | | | | Before 40766 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips After 37975 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86: hevc_mc: remove lea in EPEL_LOADChristophe Gisquet2015-02-081-12/+7
| | | | | | | The second parameter to the macro is always an immediate address, so no lea is needed. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: fewer gpr autoloads for _v filtersChristophe Gisquet2015-02-081-6/+12
| | | | | | In that case, it's just to load my, but mx/r3src is not used. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: lavc/hevc_mc: fix commentsChristophe Gisquet2015-02-071-7/+5
| | | | | | | The width parameter is now completely at the back, and actually never used. This helps understanding the actual parameter list. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: lavc: share more constant through definesChristophe Gisquet2015-02-071-7/+14
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: lavc: share more constantsChristophe Gisquet2015-02-061-7/+7
| | | | | Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc_mc: use aligned loadsMickaël Raulet2015-02-061-3/+3
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc: use CLIPW macro when possibleMickaël Raulet2015-02-061-8/+4
| | | | | | | | Conflicts: libavcodec/x86/hevc_mc.asm Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: add AVX2 optimizationsPierre Edouard Lepere2015-02-061-147/+434
| | | | | | | | | | | | | | | | | | before 33304 decicycles in luma_bi_1, 523066 runs, 1222 skips 38138 decicycles in luma_bi_2, 523427 runs, 861 skips 13490 decicycles in luma_uni, 516138 runs, 8150 skips after 20185 decicycles in luma_bi_1, 519970 runs, 4318 skips 24620 decicycles in luma_bi_2, 521024 runs, 3264 skips 10397 decicycles in luma_uni, 515715 runs, 8573 skips Conflicts: libavcodec/x86/hevc_mc.asm libavcodec/x86/hevcdsp_init.c Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/x86/hevc_mc: fix sse register countsMichael Niedermayer2014-12-111-14/+14
| | | | | | | | These fix failures of --enable-xmm-clobber-test It would be better to change the code to use fewer registers, but until someone does the used register count must not be too small Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/x86/hevc_mc: remove dead branch from EPEL_FILTERMichael Niedermayer2014-12-101-5/+0
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc: get rid off packusdw for ssse3 compatibilityMickaël Raulet2014-10-041-2/+4
| | | | | | | | | | cherry picked from commit df8ebe304df453f26c28ff8f11d607f49b90a4c2 Fixes out of array access Fixes: asan_stack-oob_1046454_9_asan_stack-oob_15a9e7c_170_WP_MAIN10_B_Toshiba_3.bit Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: correct unneeded use of SSE4 codeChristophe Gisquet2014-08-241-1/+1
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevcdsp: use compilation-time-fixed constantChristophe Gisquet2014-08-221-2/+2
| | | | | | | The stride for some buffers is known. Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* hevcdsp: remove more instances of compile-time-fixed parametersChristophe Gisquet2014-08-221-23/+19
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* hevcdsp: remove compilation-time-fixed parameterChristophe Gisquet2014-08-221-8/+8
| | | | | | | The dststride parameter is always MAX_PB_SIZE. Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: assume 2nd source stride is 64Christophe Gisquet2014-08-221-15/+21
| | | | | Reviewed-by: Mickaël Raulet <mraulet@gmail.com Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc_mc: use fewer instructions in hevc_put_hevc_{uni, bi}_w[24]_{8, 10, 12}James Almer2014-08-041-1/+34
| | | | | | Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc_mc: remove an unnecessary pxorJames Almer2014-08-041-2/+1
| | | | | | Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: fix register count usageChristophe Gisquet2014-07-291-12/+12
| | | | | | | A macro was using a fixed register, causing too many GPRs to be declared as used. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: load less data in epel filtersChristophe Gisquet2014-07-271-7/+15
| | | | | | | | | | | | Before: 5679 decicycles in epel_bi, 2059976 runs, 37176 skips 3468 decicycles in epel_uni, 1040886 runs, 7690 skips After: 5323 decicycles in epel_bi, 2059493 runs, 37659 skips 3262 decicycles in epel_uni, 1040871 runs, 7705 skips Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: replace one lea by addChristophe Gisquet2014-07-271-1/+1
| | | | | | Should have been in 036f11bdb565. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: replace simple leas by addsChristophe Gisquet2014-07-261-60/+60
| | | | | | | | | | | | | | | | | | | | | | | lea is detrimental for those simple cases. No impact overall to the change though. Before: 15017 decicycles in q, 1016152 runs, 32424 skips 15382 decicycles in q_bi, 1013673 runs, 34903 skips 3713 decicycles in e, 2074534 runs, 22618 skips 3901 decicycles in e_bi, 2065509 runs, 31643 skips 7852 decicycles in q_uni, 520165 runs, 4123 skips 2398 decicycles in e_uni, 1043339 runs, 5237 skips After: 14898 decicycles in q, 1016295 runs, 32281 skips 15119 decicycles in q_bi, 1015392 runs, 33184 skips 3682 decicycles in e, 2073224 runs, 23928 skips 3720 decicycles in e_bi, 2065043 runs, 32109 skips 7643 decicycles in q_uni, 520280 runs, 4008 skips 2363 decicycles in e_uni, 1043780 runs, 4796 skips Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/hevc: add 12bits support for MCMickaël Raulet2014-07-261-5/+58
| | | | | | cherry picked from commit 3fcb7a4595a6f40100a22110a5805e3b7510c0fd Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: remove unneeded shiftChristophe Gisquet2014-06-011-0/+10
| | | | | | The immediate value may be 0. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: better register allocationChristophe Gisquet2014-05-281-31/+48
| | | | | | | | | | | | | | | | | | | | | The xmm reg count was incorrect, and manual loading of the gprs furthermore allows to noticeable reduce the number needed. The modified functions are used in weighted prediction, so only a few samples like WP_* exhibit a change. For this one and Win64 (some widths removed because of too few occurrences): WP_A_Toshiba_3.bit, ff_hevc_put_hevc_uni_w 16 32 before: 2194 3872 after: 2119 3767 WP_B_Toshiba_3.bit, ff_hevc_put_hevc_bi_w 16 32 64 before: 2819 4960 9396 after: 2617 4788 9150 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: hevc_mc: specify coefficients registersChristophe Gisquet2014-05-181-27/+32
| | | | | | | | | By default, macro EPEL_FILTER loads the coefficients inconditionally into m14/m15. This forces an unneeded higher register count. Reduce that count by making them parameters of EPEL_FILTER. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* hevcdsp: correctly indicate that hevc_put_hevc_bi_epel_h uses 9 GPRsHendrik Leppkes2014-05-121-1/+1
| | | | | | | Fixes FATE on Windows. Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* HEVC : added assembly MC functionsplepere2014-05-061-0/+1256
pretty print x86 Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
OpenPOWER on IntegriCloud