| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
Tested using a simple command (apply edge enhance):
./ffmpeg_g -i ~/Downloads/bbb_sunflower_1080p_30fps_normal.mp4 \
-vf convolution="0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:5:1:1:1:0:128:128:128" \
-an -vframes 1000 -f null /dev/null
The fps increase from 151 to 270 on my local machine.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
| |
Fixes compilation on x86_32
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The horizontal pass get ~2x performance with the patch
under single thread.
Tested overall performance using the command(avx2 enabled):
./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null
./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null
For single thread, the fps improves from 43 to 60, about 40%.
For multi-thread, the fps improves from 110 to 130, about 20%.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
|
| |
|
|
|
|
|
|
| |
Fixes compilation with old yasm versions.
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
fcmul_add_c: 1228.8
fcmul_add_sse3: 334.3
fcmul_add_avx: 186.3
Tested on a Core i5 4460 @ 3.2GHz
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
| |
ff_fcmul_add_sse3() is now identical to the C version.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Haihao Xiang <haihao.xiang@intel.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
|
|
|
|
|
| |
After adding field type management to the common yadif logic, we can
remove the duplicate copy of that logic from bwdif.
|
|
|
|
|
|
|
|
|
| |
frame
Also add SIMD which works on lines because it is faster then calculating it on
8x8 blocks using pixelutils.
Signed-off-by: Marton Balint <cus@passwd.hu>
|
|
|
|
|
|
| |
They are yet to be supported,
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
|
|
|
| |
Specifically for yuv444, yuv422, yuv420 format when main stream has no alpha, and alpha
is straight.
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
| |
|
|
|
|
|
|
|
|
| |
grainextract
grainmerge
average
extremity
negation
|
|
|
|
| |
duplication between 8bit and 16 bit version
|
|
|
|
| |
difference for SSE and AVX2 (x86_64)
|
| |
|
| |
|
|
|
|
|
|
| |
func except divide
and optimize average, grainextract, multiply, screen, grain merge
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Blend function speedups on x86_64 Core i5 4460:
ffmpeg -f lavfi -i allyuv -vf framerate=60:threads=1 -f null none
C: 447548411 decicycles in Blend, 2048 runs, 0 skips
SSSE3: 130020087 decicycles in Blend, 2048 runs, 0 skips
AVX2: 128508221 decicycles in Blend, 2048 runs, 0 skips
ffmpeg -f lavfi -i allyuv -vf format=yuv420p12,framerate=60:threads=1 -f null none
C: 228932745 decicycles in Blend, 2048 runs, 0 skips
SSE4: 123357781 decicycles in Blend, 2048 runs, 0 skips
AVX2: 121215353 decicycles in Blend, 2048 runs, 0 skips
Signed-off-by: Marton Balint <cus@passwd.hu>
|
| |
|
|
|
|
|
|
|
|
|
| |
This reverts commits 1a5865b6dcc97754a1d7eedc130fb58237d2a715 and
8fb1d63d919286971b8e6afad372730d6d6f25c8.
They made fate interlace tests fail when AVX2 was used.
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
based on patch by Paul B Mahol
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
arg for ff_interlace_init_x86
|
|
|
|
|
|
| |
unaligned data in low_pass complex
related to ticket 6491
|
|
|
|
| |
ticket 6491
|
| |
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
| |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
|
|
|
| |
Fixes building with yasm
Tested-by: stevenliu
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
|
| |
|
|
|
|
| |
version
|
|
|
|
| |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2':
x86util: Port all macros to cpuflags
See d5f8a642f6eb1c6e305c41dabddd0fd36ffb3f77
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| | |
None of them are specific to the YASM assembler.
|
| |
| |
| |
| |
| | |
This fixes many warnings of the sort
warning: label alone on a line without a colon might be in error
|
| |
| |
| |
| |
| |
| | |
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Thomas Mundt <tmundt75@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The complex vertical low-pass filter slightly over-sharpens the picture. This becomes visible when several transcodings are cascaded and the error potentises, e.g. some generations of HD->SD SD->HD.
To prevent this behaviour the destination pixel must not exceed the source pixel when the average of the pixels above and below is less than the source pixel. And the other way around.
Tested and approved in a visual transcoding cascade test by video professionals.
SSIM/PSNR test with the first generation of an HD->SD file as a reference against the 6th generation(3 x SD->HD HD->SD):
Results without the patch:
SSIM Y:0.956508 (13.615881) U:0.991601 (20.757750) V:0.993004 (21.551382) All:0.974405 (15.918463)
PSNR y:31.838009 u:48.424280 v:48.962711 average:34.759466 min:31.699297 max:40.857847
Results with the patch:
SSIM Y:0.970051 (15.236232) U:0.991883 (20.905857) V:0.993174 (21.658049) All:0.981290 (17.279202)
PSNR y:34.412108 u:48.504454 v:48.969496 average:37.264644 min:34.310637 max:42.373392
Signed-off-by: Thomas Mundt <tmundt75@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| |
| |
| |
| | |
grainextract
|
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
|