ffmpeg-streaming - Raptor Engineering's fork of FFmpeg with streaming enhancements https://git.ffmpeg.org/ffmpeg.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	v210enc: Add SIMD optimised 8-bit and 10-bit encoders	Kieran Kunhya	2014-11-26	1	-0/+5
\| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	avutil/lls: Make unchanged function arguments const	Michael Niedermayer	2014-09-28	1	-3/+3
\| \| \| \| \|	Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	avutil/x86/cpu: fix cpuid sub-leaf selection	lvqcl	2014-09-27	1	-1/+1
\| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflags	Henrik Gramner	2014-09-05	1	-19/+22
\| \| \| \| \| \|	Previously there was a limit of two cpuflags. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	x86inc: Make ym# behave the same way as xm#	Henrik Gramner	2014-09-05	1	-4/+4
\| \| \| \| \| \|	This makes more sense for future implementations of templates with zmm registers. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	x86inc: free up variable name "n" in global namespace	Loren Merritt	2014-09-05	1	-9/+9
\| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	avutil/pixelutils: faster pixelutils_sad_16x16	Clément Bœsch	2014-08-23	1	-5/+11
\| \| \| \| \| \|	501 to 439 decicycles. See 45c7f3997ea11c3d1007b2126b1c0049a8c27105.
*	avutil/pixelutils: faster pixelutils_sad_[au]_16x16	Clément Bœsch	2014-08-23	1	-5/+9
\| \| \| \| \| \| \| \| \| \|	~560 → ~500 decicycles This is following the comments from Michael in https://ffmpeg.org/pipermail/ffmpeg-devel/2014-August/160599.html Using 2 registers for accumulator didn't help. On the other hand, some re-ordering between the movs and psadbw allowed going ~538 to ~500.
*	drop LLS1, rename LLS2 to LLS	Michael Niedermayer	2014-08-09	2	-9/+9
\| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	avutil: add pixelutils API	Clément Bœsch	2014-08-05	4	-0/+243
\|
*	x86/hevc_deblock: improve 8bit transpose store macros	James Almer	2014-08-03	1	-0/+9
\| \| \| \| \| \| \|	Up to four instructions less depending on function and instruction set. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	x86/hevc_idct: replace old and unused idct functions	James Almer	2014-07-26	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial). Benchmarks on an Intel Core i5-4200U: idct8x8_dc SSE2 MMXEXT C cycles 22 26 57 idct16x16_dc AVX2 SSE2 C cycles 27 32 249 idct32x32_dc AVX2 SSE2 C cycles 62 126 1375 Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
*	Merge commit '79793f833784121d574454af4871866576c0749d'	Michael Niedermayer	2014-07-01	2	-2/+2
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	* commit '79793f833784121d574454af4871866576c0749d': Update Fiona's name in copyright statements. Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	Update Fiona's name in copyright statements.	Diego Biurrun	2014-07-01	2	-2/+2
\| \|
* \|	x86util: add and use RSHIFT/LSHIFT macros	Christophe Gisquet	2014-06-15	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Those macros take a byte number as shift argument, as this argument differs between MMX and SSE2 instructions. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86/float_dsp: add missing femms	James Almer	2014-06-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It was lost during the port. Should fix fate on 3dnowext machines. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86/float_dsp: port vector_fmul_window to yasm	James Almer	2014-06-08	2	-73/+63
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86/vp9: inital AVX2 intra_pred	James Almer	2014-06-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tos3k-vp9-b10000.webm on a Core i5-4200U @1.6GHz 1219 decicycles in ff_vp9_ipred_dc_32x32_ssse3, 131070 runs, 2 skips 439 decicycles in ff_vp9_ipred_dc_32x32_avx2, 131070 runs, 2 skips 3570 decicycles in ff_vp9_ipred_dc_top_32x32_ssse3, 4096 runs, 0 skips 2494 decicycles in ff_vp9_ipred_dc_top_32x32_avx2, 4096 runs, 0 skips 1419 decicycles in ff_vp9_ipred_dc_left_32x32_ssse3, 16384 runs, 0 skips 717 decicycles in ff_vp9_ipred_dc_left_32x32_avx2, 16384 runs, 0 skips 2737 decicycles in ff_vp9_ipred_tm_32x32_avx, 1024 runs, 0 skips 2088 decicycles in ff_vp9_ipred_tm_32x32_avx2, 1024 runs, 0 skips 3090 decicycles in ff_vp9_ipred_v_32x32_avx, 512 runs, 0 skips 2226 decicycles in ff_vp9_ipred_v_32x32_avx2, 512 runs, 0 skips 1565 decicycles in ff_vp9_ipred_h_32x32_avx, 1024 runs, 0 skips 922 decicycles in ff_vp9_ipred_h_32x32_avx2, 1024 runs, 0 skips Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86: hpeldsp: better factorization	Christophe Gisquet	2014-05-29	1	-1/+9
\| \| \| \| \| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1}	James Almer	2014-05-28	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	inline asm: fix arrays as named constraints.	Matt Oliver	2014-05-07	1	-0/+6
\| \| \| \| \| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86/float_dsp: remove duplicated code from vector_dmul_scalar	James Almer	2014-04-19	1	-8/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the xm# and ym# aliases as they remain in sync with m# after a SWAP. No actual changes to the assembly. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86: move horizontal add macros to x86util	James Almer	2014-04-17	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also port relevant AVX2/XOP optimizations from x264 with permission to relicense to LGPL from the corresponding authors Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86/float_dsp: unroll loop in vector_fmac_scalar	James Almer	2014-04-16	1	-18/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	~6% faster SSE2 performance. AVX/FMA3 are unaffected. Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86/float_dsp: use SWAP in vector_fmac_scalar Win64	James Almer	2014-04-16	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The mova is unnecessary Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86/cpu: check for OS support before enabling AVX2	James Almer	2014-03-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	AV_CPU_FLAG_AVX is enabled at this point only if there's OS support. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	Automatically change MANGLE() into named inline asm operands when direct ↵	Matt Oliver	2014-03-18	1	-1/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	symbol reference in inline asm are not supported. This is part of the patch-set for intel C inline asm on windows support Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86/float_dsp: add ff_vector_{fmul_add, fmac_scalar}_fma3	James Almer	2014-03-13	2	-1/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	~7% faster than AVX Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	avutil/timer: Fix units for x86 after c708b5403346255ea5adc776645616cc7c61f078	Michael Niedermayer	2014-03-09	1	-0/+1
\| \| \| \| \| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86: Move XOP emulation to x86util	James Almer	2014-02-24	2	-19/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need the emulation to support the cases where the first argument is the same as the fourth. To achieve this a fifth argument working as a temporary may be needed. Emulation that doesn't obey the original instruction semantics can't be in x86inc. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	Merge remote-tracking branch 'qatar/master'	Michael Niedermayer	2014-02-23	1	-5/+4
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* qatar/master: x86: add detection for Bit Manipulation Instruction sets Conflicts: libavutil/x86/cpu.c See: 0bc3de19ffe296254f214dc7615e624d8e401bcb Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86: add detection for Bit Manipulation Instruction sets	James Almer	2014-02-23	1	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Based on x264 code Signed-off-by: James Almer <jamrial@gmail.com>
* \|	Merge commit '1b932eb1508f550fac9e911923a0383efda53aa3'	Michael Niedermayer	2014-02-23	1	-1/+1
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '1b932eb1508f550fac9e911923a0383efda53aa3': x86: add detection for FMA3 instruction set Conflicts: configure libavutil/cpu.h libavutil/x86/cpu.c See: a2af8eddab75f1eac712411e4dde89823c0845e8 Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86: add detection for FMA3 instruction set	James Almer	2014-02-23	2	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Based on x264 code Signed-off-by: James Almer <jamrial@gmail.com>
\| *	x86: add missing XOP checks and macros	James Almer	2014-02-23	1	-0/+3
\| \| \| \| \| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
\| *	x86: float dsp: unroll SSE versions	Christophe Gisquet	2014-02-20	1	-16/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vector_fmul and vector_fmac_scalar are guaranteed that they can process in batch of 16 elements, but their SSE versions only does 8 at a time. Therefore, unroll them a bit. 299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64. Signed-off-by: Janne Grunau <janne-libav@jannau.net>
\| *	x86inc: Speed up assembling with Yasm	Loren Merritt	2014-01-26	1	-23/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Work around Yasm's inefficiency with handling large numbers of variables in the global scope. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* \|	x86: add detection for Bit Manipulation Instruction sets	James Almer	2014-02-22	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on x264 code Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86: add detection for FMA3 instruction set	James Almer	2014-02-22	2	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on x264 code Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86: float dsp: unroll SSE versions	Christophe Gisquet	2014-02-15	1	-16/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vector_fmul and vector_fmac_scalar are guaranteed that they can process in batch of 16 elements, but their SSE versions only does 8 at a time. Therefore, unroll them a bit. 299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86inc: Extend FMA_INSTR functionality	James Almer	2014-02-13	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support the cases where the first and last operand of the XOP instruction are the same. Also add vpmacsdql emulation. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86: add missing XOP checks and macros	James Almer	2014-02-11	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	x86inc: speed up compilation with yasm	Loren Merritt	2014-01-18	1	-23/+23
\| \| \| \| \| \| \| \| \| \|	Work around yasm's inefficiency with handling large numbers of variables in the global scope.
* \|	rename new lls code to lls2 to avoid conflict with the old which has a ↵	Michael Niedermayer	2013-11-17	2	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	different ABI also remove failed attempt at a compatibility layer, the code simply cannot work Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	avutil: rename lls to lls2	Michael Niedermayer	2013-11-17	1	-1/+1
\| \| \| \| \| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	Merge commit '4d6ee0725553a43ba88d6f8327ebcf8f1c5ae8d4'	Michael Niedermayer	2013-10-26	1	-2/+3
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '4d6ee0725553a43ba88d6f8327ebcf8f1c5ae8d4': libavutil: x86: Add AVX2 capable CPU detection. Conflicts: libavutil/cpu.c libavutil/cpu.h libavutil/x86/cpu.c See: 865b70bc5d1cf37ec6d6cb729a69dda2cca28bd5 Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	libavutil: x86: Add AVX2 capable CPU detection.	Kieran Kunhya	2013-10-25	2	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Patch based on x264's AVX2 detection Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* \|	Add AVX2 capable CPU detection. Patch based on x264's AVX2 detection	Kieran Kunhya	2013-10-26	2	-0/+10
\| \| \| \| \| \| \| \|	Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* \|	Merge remote-tracking branch 'qatar/master'	Michael Niedermayer	2013-10-14	1	-0/+11
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \|	* qatar/master: x86: more AVX2 framework Merged-by: Michael Niedermayer <michaelni@gmx.at>
\| *	x86: more AVX2 framework	Jason Garrett-Glaser	2013-10-14	1	-0/+11
\| \| \| \| \| \| \| \|	Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>