Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | x86: dsputil: Rename dsputil_mmx.h --> dsputil_x86.h | Diego Biurrun | 2013-05-12 | 21 | -20/+20 |
| | | | | The header is not (anymore) MMX-specific. | ||||
* | x86: dsputil: Split inline assembly from init code | Diego Biurrun | 2013-05-12 | 4 | -727/+759 |
| | | | | Also remove some pointless comments. | ||||
* | x86: dsputil: Refactor pixels16 wrapper functions with a macro | Diego Biurrun | 2013-05-12 | 8 | -143/+41 |
| | |||||
* | configure: Rename cmov processor capability to i686 | Diego Biurrun | 2013-05-12 | 1 | -4/+2 |
| | | | | | The goal is to make the capapility slightly more general and have it cover the availability of the nopl instruction in addition to cmov. | ||||
* | x86: sbrdsp: implement SSE2 qmf_pre_shuffle | Christophe Gisquet | 2013-05-10 | 2 | -0/+32 |
| | | | | | | | From 253 to 51 cycles on Arrandale and Win64. 44 cycles on SandyBridge. Signed-off-by: Anton Khirnov <anton@khirnov.net> | ||||
* | x86: dsputil: Remove unused argument from QPEL_OP macro | Diego Biurrun | 2013-05-08 | 1 | -4/+4 |
| | |||||
* | x86: dsputil: Move TRANSPOSE4 macro to the only place it is used | Diego Biurrun | 2013-05-08 | 2 | -11/+11 |
| | |||||
* | x86: dsputil: Move constant declarations into separate header | Diego Biurrun | 2013-05-08 | 6 | -24/+57 |
| | |||||
* | x86: dsputil: Group all assembly constants together in constants.c | Diego Biurrun | 2013-05-08 | 2 | -15/+11 |
| | |||||
* | x86: dsputil: Move ff_pd assembly constants to the only place they are used | Diego Biurrun | 2013-05-08 | 3 | -13/+11 |
| | |||||
* | x86: dsputil: Remove unused ff_pb_3F constant | Diego Biurrun | 2013-05-07 | 2 | -2/+0 |
| | |||||
* | x86: dsputil: Remove unused MOVQ_BONE macro | Diego Biurrun | 2013-05-07 | 2 | -9/+0 |
| | |||||
* | x86: dsputil: Move rv40-specific functions where they belong | Diego Biurrun | 2013-05-07 | 3 | -26/+27 |
| | |||||
* | x86: dsputil hpeldsp: Move shared template functions into separate object | Diego Biurrun | 2013-05-07 | 7 | -26/+69 |
| | |||||
* | x86: rnd_template: Eliminate pointless OP_AVG macro indirection | Diego Biurrun | 2013-05-07 | 4 | -12/+8 |
| | |||||
* | x86: hpeldsp: Move avg_pixels8_x2_mmx() out of hpeldsp_rnd_template.c | Diego Biurrun | 2013-05-06 | 5 | -25/+58 |
| | | | | | The function is only instantiated once, so there is no point in keeping it in a template file. | ||||
* | x86: hpeldsp: Only compile MMX hpeldsp code if MMX is enabled | Diego Biurrun | 2013-05-06 | 1 | -2/+2 |
| | |||||
* | x86: More specific ifdefs for dsputil/hpeldsp init functions | Diego Biurrun | 2013-05-06 | 2 | -16/+16 |
| | |||||
* | avcodec: Add av_cold attributes to init functions missing them | Diego Biurrun | 2013-05-04 | 2 | -2/+4 |
| | |||||
* | silly typo fixes | Diego Biurrun | 2013-05-03 | 1 | -1/+1 |
| | |||||
* | x86: sbrdsp: Implement SSE2 qmf_deint_bfly | Christophe Gisquet | 2013-05-03 | 2 | -0/+33 |
| | | | | | | | | | | Sandybridge: 47 cycles Having a loop counter is a 7 cycle gain. Unrolling is another 7 cycle gain. Working in reverse scan is another 6 cycles. Signed-off-by: Diego Biurrun <diego@biurrun.de> | ||||
* | x86: dsputil: Move cavs and vc1-specific functions where they belong | Diego Biurrun | 2013-05-02 | 4 | -40/+35 |
| | |||||
* | x86: dsputil: Move avg_pixels16_mmx() out of rnd_template.c | Diego Biurrun | 2013-05-02 | 5 | -24/+29 |
| | | | | | The function does not do any rounding, so there is no point in keeping it in a round template file. | ||||
* | x86: dsputil: Move avg_pixels8_mmx() out of rnd_template.c | Diego Biurrun | 2013-05-02 | 5 | -23/+25 |
| | | | | | The function is only instantiated once, so there is no point in keeping it in a template file. | ||||
* | x86: Move duplicated put_pixels{8|16}_mmx functions into their own file | Diego Biurrun | 2013-05-02 | 5 | -134/+109 |
| | |||||
* | x86: Drop unnecessary ff_ name prefixes from static functions | Diego Biurrun | 2013-04-30 | 5 | -53/+60 |
| | |||||
* | mpegaudiosp: More consistent names for ppc/x86 optimization files | Diego Biurrun | 2013-04-30 | 2 | -1/+1 |
| | |||||
* | x86: dsputil: Remove a set of pointless #ifs around function declarations | Diego Biurrun | 2013-04-30 | 1 | -2/+0 |
| | |||||
* | x86: dsputil: cosmetics: Group ff_{avg|put}_pixels16_mmxext() declarations | Diego Biurrun | 2013-04-30 | 1 | -28/+14 |
| | |||||
* | x86: hpeldsp: Remove unused macro definitions | Diego Biurrun | 2013-04-29 | 1 | -7/+0 |
| | |||||
* | x86: ac3dsp: Remove 3dnow version of ff_ac3_extract_exponents | Diego Biurrun | 2013-04-26 | 2 | -37/+0 |
| | | | | | | | The function requires increasing the fuzz factor for the ac3/eac3 encode tests and even so makes fate fail. It only provides a slight encoding speedup for legacy CPUs that do not support SS2. Thus its benefit is not worth the trouble it creates and fixing it would be a waste of time. | ||||
* | x86: Rename dsputil_rnd_template.c to rnd_template.c | Martin Storsjö | 2013-04-25 | 3 | -2/+2 |
| | | | | | | | This makes it less confusing when this template is shared both by dsputil and by hpeldsp. Signed-off-by: Martin Storsjö <martin@martin.st> | ||||
* | x86: Get rid of duplication between *_rnd_template.c | Martin Storsjö | 2013-04-23 | 2 | -197/+5 |
| | | | | Signed-off-by: Martin Storsjö <martin@martin.st> | ||||
* | x86: Factorize duplicated inline assembly snippets | Martin Storsjö | 2013-04-23 | 3 | -130/+76 |
| | | | | Signed-off-by: Diego Biurrun <diego@biurrun.de> | ||||
* | x86: Move some conditional code around to avoid unused variable warnings | Diego Biurrun | 2013-04-22 | 3 | -17/+15 |
| | |||||
* | x86: cavs: Refactor duplicate dspfunc macro | Diego Biurrun | 2013-04-22 | 1 | -22/+14 |
| | |||||
* | x86: cavs: Put mmx-specific code into its own init function | Diego Biurrun | 2013-04-22 | 3 | -15/+31 |
| | | | | | Before, this code was labeled as mmxext and enabled both for the 3dnow and the mmxext case. | ||||
* | x86: Remove some duplicate function declarations | Diego Biurrun | 2013-04-22 | 2 | -7/+0 |
| | |||||
* | x86: Remove unused inline asm instruction defines | Martin Storsjö | 2013-04-20 | 1 | -3/+0 |
| | | | | Signed-off-by: Martin Storsjö <martin@martin.st> | ||||
* | x86: hpeldsp: Move half-pel assembly from dsputil to hpeldsp | Ronald S. Bultje | 2013-04-19 | 8 | -657/+957 |
| | | | | Signed-off-by: Martin Storsjö <martin@martin.st> | ||||
* | vp3: Use full transpose for all IDCTs | Ronald S. Bultje | 2013-04-15 | 2 | -43/+82 |
| | | | | | | | | | | | This way, the special IDCT permutations are no longer needed. This is similar to how H264 does it, and removes the dsputil dependency imposed by the scantable code. Also remove the unused type == 0 cases from the plain C version of the idct. Signed-off-by: Martin Storsjö <martin@martin.st> | ||||
* | x86: Move constants to the only place where they are used | Ronald S. Bultje | 2013-04-15 | 3 | -9/+4 |
| | | | | Signed-off-by: Martin Storsjö <martin@martin.st> | ||||
* | x86: dsputil: Move some ifdefs to avoid unused variable warnings | Diego Biurrun | 2013-04-12 | 1 | -2/+2 |
| | |||||
* | x86: dsputil: cosmetics: Remove two pointless variable indirections | Diego Biurrun | 2013-04-12 | 1 | -4/+2 |
| | |||||
* | x86: dsputil: Refactor some ff_{avg|put}_pixels function declarations | Diego Biurrun | 2013-04-12 | 3 | -15/+6 |
| | |||||
* | x86: dsputil: ff_h263_*_loop_filter declarations to a more suitable place | Diego Biurrun | 2013-04-12 | 1 | -5/+3 |
| | |||||
* | x86: h264qpel: int --> ptrdiff_t for some line_size parameters | Diego Biurrun | 2013-04-12 | 2 | -6/+9 |
| | |||||
* | Move misplaced file author information where it belongs | Diego Biurrun | 2013-04-11 | 2 | -4/+4 |
| | |||||
* | dsputil: Make dsputil selectable | Ronald S. Bultje | 2013-04-10 | 7 | -49/+60 |
| | | | | Signed-off-by: Martin Storsjö <martin@martin.st> | ||||
* | h264: Integrate clear_blocks calls with IDCT | Ronald S. Bultje | 2013-04-10 | 2 | -30/+131 |
| | | | | | | | | | The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700 to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb (in the decode_slice loop) goes from 1759 to 1733 cycles on the clip tested (cathedral), i.e. almost 30 cycles per mb faster. Signed-off-by: Martin Storsjö <martin@martin.st> |