summaryrefslogtreecommitdiffstats
path: root/libavcodec/x86
Commit message (Collapse)AuthorAgeFilesLines
* Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-04-293-1/+81
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: 4xm: fix invalid array indexing rv34dsp: factorize a multiplication in the noround inverse transform rv40: perform bitwise checks in loop filter rv34: remove inline keyword from rv34_decode_block(). rv40: change a logical test into a bitwise one. rv34: remove constant parameter rv40: don't always do the full prev_type search dsputil x86: revert a test back to its previous value rv34dsp x86: implement MMX2 inverse transform Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * dsputil x86: revert a test back to its previous valueChristophe GISQUET2012-04-281-1/+1
| | | | | | | | | | | | Commit 356ee8d caused the initial inversion. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
| * rv34dsp x86: implement MMX2 inverse transformChristophe Gisquet2012-04-282-0/+80
| | | | | | | | | | | | 141 cycles down to 51. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
| * h264: new assembly version of get_cabac for x86_64 with PICRoland Scheidegger2012-04-282-23/+120
| | | | | | | | | | | | | | | | | | | | This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. get_cabac() gets about 40% faster, for an overall speedup of about 5%. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
| * h264: use one table instead of several for cabac functionsRoland Scheidegger2012-04-282-14/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | The reason is this is easier for PIC code (in particular on darwin...). Keep the old names as pointers (static in cabac_functions.h so gcc knows these are just immediate offsets) so the c code can nicely stay the same (alternatively could use offsets directly in the functions needing the tables). This should produce the same code as before with non-pic and better code (confirmed) with pic. The assembly uses the new table but still won't work for PIC case. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
| * h264: (trivial) remove unneeded macro argument in x86/cabac.hRoland Scheidegger2012-04-281-3/+3
| | | | | | | | Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* | h264: new assembly version of get_cabac for x86_64 with PICRoland Scheidegger2012-04-282-24/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. There is a surprisingly large performance improvement over the c version (more so than the generated assembly seems to suggest) just in get_cabac, I measured roughly 40% faster for get_cabac on a K8. However, overall the difference is not that big, I measured roughly 5% on a test clip on a K8 and a Core2. Hopefully it still compiles on x86 32bit... Now that only one table is used, there's some chance even darwin as compiles this (apparently the label arithmetic used previously doesn't work if it involves symbols defined in a different file, thanks to Ronald S. Bultje for helping me with this). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | h264: use one table instead of several for cabac functionsRoland Scheidegger2012-04-282-14/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | The reason is this is easier for PIC code (in particular on darwin...). Keep the old names as pointers (static in cabac_functions.h so gcc knows these are just immediate offsets) so the c code can nicely stay the same (alternatively could use offsets directly in the functions needing the tables). This should produce the same code as before with non-pic and better code (confirmed) with pic. The assembly uses the new table but still won't work for PIC case. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | h264: (trivial) remove unneeded macro argument in x86/cabac.hRoland Scheidegger2012-04-281-3/+3
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | lowres2 support.Michael Niedermayer2012-04-221-1/+1
| | | | | | | | | | | | | | | | | | The new lowres support is limited to decoders where lowres decoding is possible in high quality. I was not able to measure any speed difference, but if one is found the 2-3 lines that might affect speed can be made compile time conditional Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-04-221-1/+1
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: ARM: allow runtime masking of CPU features dsputil: remove unused functions mov: Treat keyframe indexes as 1-origin if starting at non-zero. mov: Take stps entries into consideration also about key_off. Remove lowres video decoding Conflicts: ffmpeg.c ffplay.c libavcodec/arm/vp8dsp_init_arm.c libavcodec/libopenjpegdec.c libavcodec/mjpegdec.c libavcodec/mpegvideo.c libavcodec/utils.c libavformat/mov.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * Remove lowres video decodingMans Rullgard2012-04-211-1/+1
| | | | | | | | | | | | | | This feature is complex, of questionable utility, and slows down normal decoding. Signed-off-by: Mans Rullgard <mans@mansr.com>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-04-211-7/+0
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: avcodec: remove AVCodecContext.dsp_mask avconv: fix a segfault when default encoder for a format doesn't exist. utvideo: general cosmetics aac: Handle HE-AACv2 when sniffing a channel order. movenc: Support high sample rates in isomedia formats by setting the sample rate field in stsd to 0. xxan: Remove write-only variable in xan_decode_frame_type0(). ivi_common: Initialize a variable at declaration in ff_ivi_decode_blocks(). Conflicts: ffmpeg.c libavcodec/utvideo.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * avcodec: remove AVCodecContext.dsp_maskMans Rullgard2012-04-211-7/+0
| | | | | | | | | | | | | | | | This removes all references to AVCodecContext.dsp_mask and marks it for eviction at the next version bump. It has been superseded by av_set_cpu_flag_mask() which, unlike this field, works everywhere. Signed-off-by: Mans Rullgard <mans@mansr.com>
* | Revert "h264: assembly version of get_cabac for x86_64 with PIC (v4)"Michael Niedermayer2012-04-212-124/+24
| | | | | | | | | | | | This broke compilation on darwin, revert until a better solution is found. This reverts commit a812b599b504b39a8021827da89d5e23fb361cc9.
* | h264: assembly version of get_cabac for x86_64 with PIC (v4)Roland Scheidegger2012-04-212-24/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. There is a surprisingly large performance improvement over the c version (more so than the generated assembly seems to suggest) just in get_cabac, I measured roughly 40% faster for get_cabac on a K8. However, overall the difference is not that big, I measured roughly 5% on a test clip on a K8 and a Core2. Hopefully it still compiles on x86 32bit... v2: incorporated feedback from Loren Merritt to avoid rip-relative movs for every table, and got rid of unnecessary @GOTPCREL. v3: apply similar fixes to the the decode_significance functions, and use same macro arguments for non-pic case. v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect the c code to be faster otherwise since both cmov and sbb suck hard on a Prescott, even can't construct the mask with a 64bit shift as that's just as terrible - it's quite difficult to find usable instructions on that chip...). This is tested to work but not on a P4, in theory it _should_ be fast there. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-04-171-4/+4
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: indeo3: add parens around some macro arguments h264: use proper PROLOGUE statement for a function using 8 registers. doc: Update sample Vim config with suitable (function) indentation settings. dv: Merge dvquant.h into dvdata.c where all other DV tables reside. dv: Move static tables only used in one place to where they are used. graphparser: set next to NULL on an entry extracted from inputs list doc/filters: update documentation. avconv: flush decoders immediately after an EOF. avconv: send EOF to vsrc_buffer. avconv: reindent. Conflicts: doc/filters.texi ffmpeg.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * h264: use proper PROLOGUE statement for a function using 8 registers.Ronald S. Bultje2012-04-161-4/+4
| | | | | | | | Fixes crashes when using biweight on win64.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-04-141-1/+1
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: vsrc_buffer: fix check from 7ae7c41. libxvid: Reorder functions to avoid forward declarations; make functions static. libxvid: drop some pointless dead code wmal: vertical alignment cosmetics wmal: Warn about missing bitstream splicing feature and ask for sample. wmal: Skip seekable_frame_in_packet. wmal: Drop unused variable num_possible_block_size. avfiltergraph: make the AVFilterInOut alloc/free API public graphparser: allow specifying sws flags in the graph description. graphparser: fix the order of connecting unlabeled links. graphparser: add avfilter_graph_parse2(). vsrc_buffer: allow using a NULL buffer to signal EOF. swscale: handle last pixel if lines have an odd width. qdm2: fix a dubious pointer cast WMAL: Do not try to read rawpcm coefficients if bits is invalid mov: Fix detecting there is no sync sample. tiffdec: K&R cosmetics avf: has_duration does not check the global one dsputil: fix optimized emu_edge function on Win64. Conflicts: doc/APIchanges libavcodec/libxvid_rc.c libavcodec/libxvidff.c libavcodec/tiff.c libavcodec/wmalosslessdec.c libavfilter/avfiltergraph.h libavfilter/graphparser.c libavfilter/version.h libavfilter/vsrc_buffer.c libswscale/output.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * dsputil: fix optimized emu_edge function on Win64.Ronald S. Bultje2012-04-131-1/+1
| | | | | | | | | | | | | | | | Recent register allocation changes (x86inc.asm update) changed the register order and thus opcodes for the inner loops. One of them became >128bytes, which confuses other parts of this function where it jumps to fixed-offset positions to extend the edge by fixed amounts. A simple register change fixes this.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-04-132-6/+8
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: swscale: K&R formatting cosmetics (part II) tiffdec: Add a malloc check and refactor another. faxcompr: Check malloc results and unify return path configure: escape colons in values written to config.fate ac3dsp: call femms/emms at the end of float_to_fixed24() for 3DNow and SSE matroska: Fix leaking memory allocated for laces. pthread: Fix crash due to fctx->delaying not being cleared. vp3: Assert on invalid filter_limit values. h264: fix 10bit biweight functions after recent x86inc.asm fixes. ffv1: Fix size mismatch in encode_line. movenc: Remove a dead initialization git-howto: Explain how to avoid Windows line endings in git checkouts. build: Move all arch OBJS declarations into arch subdirectory Makefiles. Conflicts: configure libavcodec/vp3.c libavformat/matroskadec.c libavutil/Makefile libswscale/Makefile libswscale/swscale.c libswscale/swscale_internal.h libswscale/utils.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * ac3dsp: call femms/emms at the end of float_to_fixed24() for 3DNow and SSEJustin Ruggles2012-04-121-2/+4
| | | | | | | | | | | | Fixes ac3-encode and eac3-encode FATE test failures with SSE2 disabled. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
| * h264: fix 10bit biweight functions after recent x86inc.asm fixes.Ronald S. Bultje2012-04-121-4/+4
| | | | | | | | | | This should have been updated in the x86inc.asm update, but was accidently forgotten.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-04-1312-228/+204
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: libxvid: remove disabled code qdm2: make a table static const qdm2: simplify bitstream reader setup for some subpacket types qdm2: use get_bits_left() build: Consistently handle conditional compilation for all optimization OBJS. avpacket, bfi, bgmc, rawenc: K&R prettyprinting cosmetics msrle: convert MS RLE decoding function to bytestream2. x86inc improvements for 64-bit Conflicts: common.mak libavcodec/avpacket.c libavcodec/bfi.c libavcodec/msrledec.c libavcodec/qdm2.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * build: Consistently handle conditional compilation for all optimization OBJS.Diego Biurrun2012-04-121-3/+2
| |
| * x86inc improvements for 64-bitHenrik Gramner2012-04-1110-221/+198
| | | | | | | | | | | | | | | | | | | | | | | | Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-04-102-69/+68
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: (22 commits) rv40dsp x86: use only one register, for both increment and loop counter rv40dsp: implement prescaled versions for biweight. avconv: use default channel layouts when they are unknown avconv: parse channel layout string nutdec: K&R formatting cosmetics vda: Signal 4 byte NAL headers to the decoder regardless of what's in the extradata mem: Consistently return NULL for av_malloc(0) vf_overlay: implement poll_frame() vf_scale: support named constants for sws flags. lavc doxy: add all installed headers to doxy groups. lavc doxy: add avfft to the main lavc group. lavc doxy: add remaining avcodec.h functions to a misc doxygen group. lavc doxy: add AVPicture functions to a doxy group. lavc doxy: add resampling functions to a doxy group. lavc doxy: replace \ with / lavc doxy: add encoding functions to a doxy group. lavc doxy: add decoding functions to a doxy group. lavc doxy: fix formatting of AV_PKT_DATA_{PARAM_CHANGE,H263_MB_INFO} lavc doxy: add AVPacket-related stuff to a separate doxy group. lavc doxy: add core functions/definitions to a doxy group. ... Conflicts: ffmpeg.c libavcodec/avcodec.h libavcodec/vda.c libavcodec/x86/rv40dsp.asm libavfilter/vf_scale.c libavformat/nutdec.c libavutil/mem.c tests/ref/acodec/pcm_s24daud Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * rv40dsp x86: use only one register, for both increment and loop counterChristophe GISQUET2012-04-101-23/+20
| | | | | | | | | | | | Around 10 cycles faster for luma. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
| * rv40dsp: implement prescaled versions for biweight.Christophe GISQUET2012-04-102-49/+51
| | | | | | | | | | | | | | | | | | | | Quite often, the original weights are multiple of 512. By prescaling them by 1/512 when they are computed (once per frame), no intermediate shifting is needed, and no prescaling on each call either. The x86 code already used that trick. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-04-052-12/+10
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: h264: Factorize declaration of mb_sizes array. vsrc_buffer: when no frame is available, return an error instead of segfaulting. configure: add dl to frei0r extralibs. dsputil x86: use SSE float instruction instead of SSE2 integer equivalent dsputil x86: remove deprecated parameter from scalarproduct_int16 prototype vp8dsp x86: perform rounding shift with a single instruction fate: add BMP tests. swscale: handle complete dimensions for monoblack/white. aacenc: Mark deinterleave_input_samples argument as const. vf_unsharp: Mark readonly variable as const. h264: fix 4:2:2 PCM-macroblocks decoding Conflicts: configure libavcodec/h264.h libavcodec/x86/dsputil_mmx.c libavfilter/vf_unsharp.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * dsputil x86: use SSE float instruction instead of SSE2 integer equivalentChristophe GISQUET2012-04-042-2/+2
| | | | | | | | | | | | All the more required since the users are pure SSE functions. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
| * dsputil x86: remove deprecated parameter from scalarproduct_int16 prototypeChristophe GISQUET2012-04-041-2/+2
| | | | | | | | Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
| * vp8dsp x86: perform rounding shift with a single instructionChristophe GISQUET2012-04-041-10/+8
| | | | | | | | Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* | dsputil_mmx: fix scalarproduct prototypesMichael Niedermayer2012-04-011-2/+2
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-03-292-63/+77
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: asf: only set index_read if the index contained entries. cabac: add overread protection to BRANCHLESS_GET_CABAC(). cabac: increment jump locations by one in callers of BRANCHLESS_GET_CABAC(). cabac: remove unused argument from BRANCHLESS_GET_CABAC_UPDATE(). cabac: use struct+offset instead of memory operand in BRANCHLESS_GET_CABAC(). h264: add overread protection to get_cabac_bypass_sign_x86(). h264: reindent get_cabac_bypass_sign_x86(). h264: use struct offsets in get_cabac_bypass_sign_x86(). h264: fix overreads in cabac reader. wmall: fix seeking. lagarith: fix buffer overreads. dvdec: drop unnecessary dv_tablegen.h #include build: fix doc generation errors in parallel builds Replace memset(0) by zero initializations. faandct: Remove FAAN_POSTSCALE define and related code. dvenc: print allowed profiles if the video doesn't conform to any of them. avcodec_encode_{audio,video}: only reallocate output packet when it has non-zero size. FATE: add a test for vp8 with changing frame size. fate: add kgv1 fate test. oggdec: calculate correct timestamps in Ogg/FLAC Conflicts: libavcodec/4xm.c libavcodec/cook.c libavcodec/dvdata.c libavcodec/dvdsubdec.c libavcodec/lagarith.c libavcodec/lagarithrac.c libavcodec/utils.c tests/fate/video.mak Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * cabac: add overread protection to BRANCHLESS_GET_CABAC().Ronald S. Bultje2012-03-282-11/+22
| | | | | | | | Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
| * cabac: increment jump locations by one in callers of BRANCHLESS_GET_CABAC().Ronald S. Bultje2012-03-281-12/+12
| |
| * cabac: remove unused argument from BRANCHLESS_GET_CABAC_UPDATE().Ronald S. Bultje2012-03-281-3/+3
| |
| * cabac: use struct+offset instead of memory operand in BRANCHLESS_GET_CABAC().Ronald S. Bultje2012-03-282-20/+22
| |
| * h264: add overread protection to get_cabac_bypass_sign_x86().Ronald S. Bultje2012-03-281-3/+5
| |
| * h264: reindent get_cabac_bypass_sign_x86().Ronald S. Bultje2012-03-281-22/+22
| |
| * h264: use struct offsets in get_cabac_bypass_sign_x86().Ronald S. Bultje2012-03-281-8/+11
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-03-261-54/+46
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: build: ppc: drop stray leftover backslash build: Only clean the architecture subdirectory we build for. build: drop some unnecessary dependencies from the H.264 parser build: prettyprinting cosmetics libavutil: Remove pointless rational test program. libavutil: Remove broken and pointless lzo test program. lavf doxy: expand AVStream.codec doxy. lavf doxy: improve AVStream.time_base doxy. lavf doxy: add some basic documentation about reading from the demuxer. lavf doxy: document passing options to demuxers. lavf doxy: clarify that an AVPacket contains encoded data. mpegtsenc: allow user triggered PES packet flushing APIchanges: mark the place where 0.7 was cut. APIchanges: mark the place where 0.8 was cut. APIchanges: fill in missing dates and hashes. smacker: convert palette and header reading to bytestream2. alac: convert extradata reading to bytestream2. Conflicts: doc/APIchanges libavcodec/smacker.c libavcodec/x86/Makefile libavfilter/Makefile libavutil/Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * build: prettyprinting cosmeticsDiego Biurrun2012-03-261-47/+40
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-03-252-1988/+2295
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: x86: dsputil: prettyprint gcc inline asm x86: K&R prettyprinting cosmetics for dsputil_mmx.c x86: conditionally compile H.264 QPEL optimizations dsputil_mmx: Surround QPEL macros by "do { } while (0);" blocks. Ignore generated files below doc/. dpcm: convert to bytestream2. interplayvideo: convert to bytestream2. movenc: Merge if statements h264: fix memleak in error path. pthread: Immediately release all frames in ff_thread_flush() h264: Add check for invalid chroma_format_idc utvideo: port header reading to bytestream2. Conflicts: .gitignore configure libavcodec/h264_ps.c libavcodec/interplayvideo.c libavcodec/pthread.c libavcodec/x86/dsputil_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: dsputil: prettyprint gcc inline asmDiego Biurrun2012-03-251-1295/+1310
| |
| * x86: K&R prettyprinting cosmetics for dsputil_mmx.cDiego Biurrun2012-03-251-773/+1049
| |
| * x86: conditionally compile H.264 QPEL optimizationsDiego Biurrun2012-03-252-6/+14
| |
| * dsputil_mmx: Surround QPEL macros by "do { } while (0);" blocks.Diego Biurrun2012-03-251-4/+12
| | | | | | | | This makes them safe to use in non-fully braced if-blocks and similar.
* | Fix linking without yasm.Carl Eugen Hoyos2012-03-241-4/+6
| |
OpenPOWER on IntegriCloud