diff options
author | James Darnley <james.darnley@gmail.com> | 2013-03-16 21:42:24 +0100 |
---|---|---|
committer | Michael Niedermayer <michaelni@gmx.at> | 2013-03-16 22:32:54 +0100 |
commit | 0a5814c9ba23f510fd8218c6677cc9b878d542c6 (patch) | |
tree | 1fef9a2be506803474b1b3d50e4a1f36bb71a251 /ffserver.c | |
parent | 17e7b495013de644dc49e61673846d6c0c1bde47 (diff) | |
download | ffmpeg-streaming-0a5814c9ba23f510fd8218c6677cc9b878d542c6.zip ffmpeg-streaming-0a5814c9ba23f510fd8218c6677cc9b878d542c6.tar.gz |
yadif: x86 assembly for 9 to 14-bit samples
These smaller samples do not need to be unpacked to double words
allowing the code to process more pixels every iteration (still 2 in MMX
but 6 in SSE2). It also avoids emulating the missing double word
instructions on older instruction sets.
Like with the previous code for 16-bit samples this has been tested on
an Athlon64 and a Core2Quad.
Athlon64:
1809275 decicycles in C, 32718 runs, 50 skips
911675 decicycles in mmx, 32727 runs, 41 skips, 2.0x faster
495284 decicycles in sse2, 32747 runs, 21 skips, 3.7x faster
Core2Quad:
921363 decicycles in C, 32756 runs, 12 skips
486537 decicycles in mmx, 32764 runs, 4 skips, 1.9x faster
293296 decicycles in sse2, 32759 runs, 9 skips, 3.1x faster
284910 decicycles in ssse3, 32759 runs, 9 skips, 3.2x faster
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Diffstat (limited to 'ffserver.c')
0 files changed, 0 insertions, 0 deletions