summaryrefslogtreecommitdiffstats
path: root/libavcodec/mips/hevc_mc_biw_msa.c
Commit message (Collapse)AuthorAgeFilesLines
* avutil/mips: refine msa macros CLIP_*.gxw2019-08-131-28/+28
| | | | | | | | | | | | | | | Changing details as following: 1. Remove the local variable 'out_m' in 'CLIP_SH' and store the result in source vector. 2. Refine the implementation of macro 'CLIP_SH_0_255' and 'CLIP_SW_0_255'. Performance of VP8 decoding has speed up about 1.1%(from 7.03x to 7.11x). Performance of H264 decoding has speed up about 0.5%(from 4.35x to 4.37x). Performance of Theora decoding has speed up about 0.7%(from 5.79x to 5.83x). 3. Remove redundant macro 'CLIP_SH/Wn_0_255_MAX_SATU' and use 'CLIP_SH/Wn_0_255' instead, because there are no difference in the effect of this two macros. Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips: refactor msa load and store macros.Shiyou Yin2019-07-191-61/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace STnxm_UB and LDnxm_SH with new macros ST_{H/W/D}{1/2/4/8}. The old macros are difficult to use because they don't follow the same parameter passing rules. Changing details as following: 1. remove LD4x4_SH. 2. replace ST2x4_UB with ST_H4. 3. replace ST4x2_UB with ST_W2. 4. replace ST4x4_UB with ST_W4. 5. replace ST4x8_UB with ST_W8. 6. replace ST6x4_UB with ST_W2 and ST_H2. 7. replace ST8x1_UB with ST_D1. 8. replace ST8x2_UB with ST_D2. 9. replace ST8x4_UB with ST_D4. 10. replace ST8x8_UB with ST_D8. 11. replace ST12x4_UB with ST_D4 and ST_W4. Examples of new macro: ST_H4(in, idx0, idx1, idx2, idx3, pdst, stride) ST_H4 store four half-word elements in vector 'in' to pdst with stride. About the macro name: 1) 'ST' means store operation. 2) 'H/W/D' means type of vector element is 'half-word/word/double-word'. 3) Number '1/2/4/8' means how many elements will be stored. About the macro parameter: 1) 'in0, in1...' 128-bits vector. 2) 'idx0, idx1...' elements index. 3) 'pdst' destination pointer to store to 4) 'stride' stride of each store operation. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Improve hevc bi wgt 4 tap hv mc msa functionsKaustubh Raste2017-11-081-524/+872
| | | | | | | | | Use global mask buffer for appropriate mask load. Use immediate unsigned saturation for clip to max saving one vector register. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Improve hevc bi wgt 4 tap hz and vt mc msa functionsKaustubh Raste2017-11-041-340/+247
| | | | | | | | Use global mask buffer for appropriate mask load. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Improve hevc bi weighted copy, hz and vt mc msa functionsKaustubh Raste2017-11-011-777/+793
| | | | | | | | | Pack the data to half word before clipping. Use immediate unsigned saturation for clip to max saving one vector register. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Improve hevc bi weighted hv mc msa functionsKaustubh Raste2017-10-251-252/+454
| | | | | | | | Use immediate unsigned saturation for clip to max saving one vector register. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* mips/hevcdsp: fix string concatenation on macrosJames Almer2015-07-301-2/+2
| | | | | | | | | Needed for old compilers like GCC 4.2 Tested by trac user brad. Fixes ticket #4745 Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>
* avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for HEVC biw mc functionsShivraj Patil2015-06-031-0/+5572
This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC biw mc functions (qpel as well as epel) in new file hevc_mc_biw_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
OpenPOWER on IntegriCloud