| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Some formats use longer names than 12.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
|
|
|
|
|
| |
In libswcale/tests/swcale.c, the function fileTest() calls sscanf in
an argument of "%12s" on character srcStr[] and dstStr[], which are
only 12 bytes. So, if the input string is 12 characters, a
terminating null byte can be written past the end of these arrays.
This bug was found by cppcheck.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
| |
Lauri had asked me what the semi planar formats were and that reminded
me that we could add it to pixdesc_query so we know exactly what the
list is.
|
|
|
|
|
|
|
|
|
|
|
| |
The implementation is pretty straight-forward. Most of the existing
NV12 codepaths work regardless of subsampling and are re-used as is.
Where necessary I wrote the slightly different NV24 versions.
Finally, the one thing that confused me for a long time was the
asm specific x86 path that did an explicit exclusion check for NV12.
I replaced that with a semi-planar check and also updated the
equivalent PPC code, which Lauri kindly checked.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \
-s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw
./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \
-s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p -nostats test.raw
32-bit mul, power8 only
2x speedup for hScale8To19_vsx (x86 SSE2 is 2.37):
30896 UNITS in hscale, 8192 runs, 0 skips
63956 UNITS in hscale, 8192 runs, 0 skips
2.06 for hScale16To15_vsx:
30531 UNITS in hscale, 8192 runs, 0 skips
63161 UNITS in hscale, 8192 runs, 0 skips
|
| |
|
|
|
|
|
|
|
|
|
| |
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
-s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw
2.26 speedup (x86 SSE2 is 2.32):
23772 UNITS in hscale, 4096 runs, 0 skips
53862 UNITS in hscale, 4096 runs, 0 skips
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
-s 2400x720 -f rawvideo -vframes 5 -pix_fmt abgr -nostats test.raw
4.27 speedup for hyscale_fast:
24796 UNITS in hyscale_fast, 4096 runs, 0 skips
5797 UNITS in hyscale_fast, 4096 runs, 0 skips
4.48 speedup for hcscale_fast:
19911 UNITS in hcscale_fast, 4095 runs, 1 skips
4437 UNITS in hcscale_fast, 4096 runs, 0 skips
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
-s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
-cpuflags 0 -v error -
32-bit mul, power8 only.
~2x speedup:
rgb24
24431 UNITS in yuv2packed2, 16384 runs, 0 skips
13783 UNITS in yuv2packed2, 16383 runs, 1 skips
bgr24
24396 UNITS in yuv2packed2, 16384 runs, 0 skips
14059 UNITS in yuv2packed2, 16384 runs, 0 skips
rgba
26815 UNITS in yuv2packed2, 16383 runs, 1 skips
12797 UNITS in yuv2packed2, 16383 runs, 1 skips
bgra
27060 UNITS in yuv2packed2, 16384 runs, 0 skips
13138 UNITS in yuv2packed2, 16384 runs, 0 skips
argb
26998 UNITS in yuv2packed2, 16384 runs, 0 skips
12728 UNITS in yuv2packed2, 16381 runs, 3 skips
bgra
26651 UNITS in yuv2packed2, 16384 runs, 0 skips
13124 UNITS in yuv2packed2, 16384 runs, 0 skips
This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
is also heavily inaccurate, while the vsx version has high accuracy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
-s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
-cpuflags 0 -v error -
32-bit mul, power8 only.
~6.4x speedup:
rgb24
214278 UNITS in yuv2packedX, 16384 runs, 0 skips
33249 UNITS in yuv2packedX, 16384 runs, 0 skips
bgr24
214616 UNITS in yuv2packedX, 16384 runs, 0 skips
33233 UNITS in yuv2packedX, 16384 runs, 0 skips
rgba
214517 UNITS in yuv2packedX, 16384 runs, 0 skips
33271 UNITS in yuv2packedX, 16384 runs, 0 skips
bgra
214973 UNITS in yuv2packedX, 16384 runs, 0 skips
33397 UNITS in yuv2packedX, 16384 runs, 0 skips
argb
214613 UNITS in yuv2packedX, 16384 runs, 0 skips
33310 UNITS in yuv2packedX, 16384 runs, 0 skips
bgra
214637 UNITS in yuv2packedX, 16384 runs, 0 skips
33330 UNITS in yuv2packedX, 16384 runs, 0 skips
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \
-s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
-cpuflags 0 -v error -
32-bit mul, power8 only.
~4x speedup:
rgb24
52763 UNITS in yuv2packed2, 16384 runs, 0 skips
13453 UNITS in yuv2packed2, 16384 runs, 0 skips
bgr24
53144 UNITS in yuv2packed2, 16384 runs, 0 skips
13616 UNITS in yuv2packed2, 16384 runs, 0 skips
rgba
52796 UNITS in yuv2packed2, 16384 runs, 0 skips
12904 UNITS in yuv2packed2, 16384 runs, 0 skips
bgra
52732 UNITS in yuv2packed2, 16384 runs, 0 skips
13262 UNITS in yuv2packed2, 16384 runs, 0 skips
argb
52661 UNITS in yuv2packed2, 16384 runs, 0 skips
12879 UNITS in yuv2packed2, 16384 runs, 0 skips
bgra
52662 UNITS in yuv2packed2, 16384 runs, 0 skips
12932 UNITS in yuv2packed2, 16384 runs, 0 skips
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
-s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
-cpuflags 0 -v error -
32-bit mul, power8 only.
1.8-2.3x speedup:
rgb24
18192 UNITS in yuv2packed1, 32767 runs, 1 skips
9983 UNITS in yuv2packed1, 32760 runs, 8 skips
bgr24
18665 UNITS in yuv2packed1, 32766 runs, 2 skips
9925 UNITS in yuv2packed1, 32763 runs, 5 skips
rgba
20239 UNITS in yuv2packed1, 32767 runs, 1 skips
8794 UNITS in yuv2packed1, 32759 runs, 9 skips
bgra
20354 UNITS in yuv2packed1, 32768 runs, 0 skips
8770 UNITS in yuv2packed1, 32761 runs, 7 skips
argb
20185 UNITS in yuv2packed1, 32768 runs, 0 skips
8761 UNITS in yuv2packed1, 32761 runs, 7 skips
bgra
20360 UNITS in yuv2packed1, 32766 runs, 2 skips
8759 UNITS in yuv2packed1, 32764 runs, 4 skips
This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
is also heavily inaccurate, while the vsx version has high accuracy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
-s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
-cpuflags 0 -v error -
7.2x speedup:
yuyv422
126354 UNITS in yuv2packedX, 16384 runs, 0 skips
16383 UNITS in yuv2packedX, 16382 runs, 2 skips
yvyu422
117669 UNITS in yuv2packedX, 16384 runs, 0 skips
16271 UNITS in yuv2packedX, 16379 runs, 5 skips
uyvy422
117310 UNITS in yuv2packedX, 16384 runs, 0 skips
16226 UNITS in yuv2packedX, 16382 runs, 2 skips
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \
-s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
-cpuflags 0 -v error -
5.1x speedup:
yuyv422
19339 UNITS in yuv2packed2, 16384 runs, 0 skips
3718 UNITS in yuv2packed2, 16383 runs, 1 skips
yvyu422
19438 UNITS in yuv2packed2, 16384 runs, 0 skips
3800 UNITS in yuv2packed2, 16380 runs, 4 skips
uyvy422
19128 UNITS in yuv2packed2, 16384 runs, 0 skips
3721 UNITS in yuv2packed2, 16380 runs, 4 skips
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
-s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
-cpuflags 0 -v error -
15.3x speedup:
yuyv422
14513 UNITS in yuv2packed1, 32768 runs, 0 skips
949 UNITS in yuv2packed1, 32767 runs, 1 skips
yvyu422
14516 UNITS in yuv2packed1, 32767 runs, 1 skips
943 UNITS in yuv2packed1, 32767 runs, 1 skips
uyvy422
14530 UNITS in yuv2packed1, 32767 runs, 1 skips
941 UNITS in yuv2packed1, 32766 runs, 2 skips
|
|
|
|
| |
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
|
|
| |
2-multiple, transition of nv12 to u/v planes is not completed.
Signed-off-by: Dong, Jerry <jerry.dong@intel.com>
Signed-off-by: Decai Lin <decai.lin@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
-s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
-cpuflags 0 -v error -
This uses 32-bit mul, so POWER8 only.
The following output formats get about 4.5x speedup:
rgb24
39980 UNITS in yuv2packed1, 32768 runs, 0 skips
8774 UNITS in yuv2packed1, 32768 runs, 0 skips
bgr24
40069 UNITS in yuv2packed1, 32768 runs, 0 skips
8772 UNITS in yuv2packed1, 32766 runs, 2 skips
rgba
39759 UNITS in yuv2packed1, 32768 runs, 0 skips
8681 UNITS in yuv2packed1, 32767 runs, 1 skips
bgra
39729 UNITS in yuv2packed1, 32768 runs, 0 skips
8696 UNITS in yuv2packed1, 32766 runs, 2 skips
argb
39766 UNITS in yuv2packed1, 32768 runs, 0 skips
8672 UNITS in yuv2packed1, 32766 runs, 2 skips
bgra
39784 UNITS in yuv2packed1, 32768 runs, 0 skips
8659 UNITS in yuv2packed1, 32767 runs, 1 skips
|
|
|
|
| |
In this function, the exact same clamping happens both in the if and unconditionally.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16be \
-s 1920x1728 -f null -vframes 100 -v error -nostats -
9-14 bit funcs get about 6x speedup, 16-bit gets about 15x.
Fate passes, each format tested with an image to video conversion.
Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out
of the 16-bit function. This includes the vec_mulo/mule functions too,
not just vmuluwm.
With TIMER_REPORT skips disabled:
yuv420p9le
12412 UNITS in planarX, 131072 runs, 0 skips
73136 UNITS in planarX, 131072 runs, 0 skips
yuv420p9be
12481 UNITS in planarX, 131072 runs, 0 skips
73410 UNITS in planarX, 131072 runs, 0 skips
yuv420p10le
12322 UNITS in planarX, 131072 runs, 0 skips
72546 UNITS in planarX, 131072 runs, 0 skips
yuv420p10be
12291 UNITS in planarX, 131072 runs, 0 skips
72935 UNITS in planarX, 131072 runs, 0 skips
yuv420p12le
12316 UNITS in planarX, 131072 runs, 0 skips
72708 UNITS in planarX, 131072 runs, 0 skips
yuv420p12be
12319 UNITS in planarX, 131072 runs, 0 skips
72577 UNITS in planarX, 131072 runs, 0 skips
yuv420p14le
12259 UNITS in planarX, 131072 runs, 0 skips
72516 UNITS in planarX, 131072 runs, 0 skips
yuv420p14be
12440 UNITS in planarX, 131072 runs, 0 skips
72962 UNITS in planarX, 131072 runs, 0 skips
yuv420p16le
10548 UNITS in planarX, 131072 runs, 0 skips
73429 UNITS in planarX, 131072 runs, 0 skips
yuv420p16be
10634 UNITS in planarX, 131072 runs, 0 skips
150959 UNITS in planarX, 131072 runs, 0 skips
Signed-off-by: Lauri Kasanen <cand@gmx.com>
|
|
|
|
|
|
|
| |
ff_yuv2rgb_c_init_tables()
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function wouldn't benefit from VSX instructions, so I put it
under altivec.
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt grayf32le \
-f null -vframes 100 -v error -nostats -
3743 UNITS in planar1, 65495 runs, 41 skips
-cpuflags 0
23511 UNITS in planar1, 65530 runs, 6 skips
grayf32be
4647 UNITS in planar1, 65449 runs, 87 skips
-cpuflags 0
28608 UNITS in planar1, 65530 runs, 6 skips
The native speedup is 6.28133, and the bswapping one 6.15623.
Fate passes, each format tested with an image to video conversion.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16le \
-f null -vframes 100 -v error -nostats -
2120 UNITS in planar1, 65393 runs, 143 skips
-cpuflags 0
19157 UNITS in planar1, 65512 runs, 24 skips
9.03632 speedup, 16be similarly.
Fate passes, each format tested with an image to video conversion.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p9le \
-f null -vframes 100 -v error -nostats -
Speedups:
yuv2plane1_9BE_vsx 11.2042
yuv2plane1_9LE_vsx 11.156
yuv2plane1_10BE_vsx 9.89428
yuv2plane1_10LE_vsx 10.3637
yuv2plane1_12BE_vsx 9.71923
yuv2plane1_12LE_vsx 11.0404
yuv2plane1_14BE_vsx 10.1763
yuv2plane1_14LE_vsx 11.2728
Fate passes, each format tested with an image to video conversion.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
|
|
| |
Passes fate on LE (with "lavc/jrevdct: Avoid an aliasing violation" applied).
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Tested-by: Michael Kostylev on BE
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p \
-f null -vframes 100 -v error -nostats -
1158 UNITS in planar1, 65528 runs, 8 skips
-cpuflags 0
19082 UNITS in planar1, 65533 runs, 3 skips
16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version
takes as many cycles as the x86 SSE2 version, yikes it's fast.
Note that this function uses VSX instructions, but is not marked so.
This is because several existing functions also make that mistake.
I'll submit a patch moving them once this is reviewed.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| |
|
|
|
|
| |
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
| |
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
| |
is used for packed and planar format
|
| |
|
| |
|
| |
|
|
|
|
| |
suggested by Carl Eugen Hoyos
|
|
|
|
| |
inline asm version
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Currently float are converted to 16b uint in input part
using src depth (32 bits) in hScale16To19 and hScale16to15,
make an invalid shift for the data
So shift the value when using float input
like 16 bpc uint.
|
|
|
|
| |
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
|
|
| |
Fixes the following warnings:
In file included from libswscale/rgb2rgb.c:128:0:
libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_3210_c' defined but not used
libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_3012_c' defined but not used
libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_1230_c' defined but not used
|
|
|
|
| |
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
|
|
| |
and checkasm test
|
|
|
|
| |
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
| |
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PSEUDOPAL pixel formats are not paletted, but carried a palette with the
intention of allowing code to treat unpaletted formats as paletted. The
palette simply mapped the byte values to the resulting RGB values,
making it some sort of LUT for RGB conversion.
It was used for 1 byte formats only: RGB4_BYTE, BGR4_BYTE, RGB8, BGR8,
GRAY8. The first 4 are awfully obscure, used only by some ancient bitmap
formats. The last one, GRAY8, is more common, but its treatment is
grossly incorrect. It considers full range GRAY8 only, so GRAY8 coming
from typical Y video planes was not mapped to the correct RGB values.
This cannot be fixed, because AVFrame.color_range can be freely changed
at runtime, and there is nothing to ensure the pseudo palette is
updated.
Also, nothing actually used the PSEUDOPAL palette data, except xwdenc
(trivially changed in the previous commit). All other code had to treat
it as a special case, just to ignore or to propagate palette data.
In conclusion, this was just a very strange old mechnaism that has no
real justification to exist anymore (although it may have been nice and
useful in the past). Now it's an artifact that makes the API harder to
use: API users who allocate their own pixel data have to be aware that
they need to allocate the palette, or FFmpeg will crash on them in
_some_ situations. On top of this, there was no API to allocate the
pseuo palette outside of av_frame_get_buffer().
This patch not only deprecates AV_PIX_FMT_FLAG_PSEUDOPAL, but also makes
the pseudo palette optional. Nothing accesses it anymore, though if it's
set, it's propagated. It's still allocated and initialized for
compatibility with API users that rely on this feature. But new API
users do not need to allocate it. This was an explicit goal of this
patch.
Most changes replace AV_PIX_FMT_FLAG_PSEUDOPAL with FF_PSEUDOPAL. I
first tried #ifdefing all code, but it was a mess. The FF_PSEUDOPAL
macro reduces the mess, and still allows defining FF_API_PSEUDOPAL to 0.
Passes FATE with FF_API_PSEUDOPAL enabled and disabled. In addition,
FATE passes with FF_API_PSEUDOPAL set to 1, but with allocation
functions manually changed to not allocating a palette.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Vanilla clang supports altmacro since clang 5.0, and thus doesn't
require gas-preprocessor for building the arm assembly any longer.
However, the built-in assembler doesn't support .dn directives.
This readds checks that were removed in d7320ca3ed10f0d, when
the last usage of .dn directives within libav were removed.
Alternatively, the assembly could be rewritten to not use the
.dn directive, making it available to clang users.
Signed-off-by: Martin Storsjö <martin@martin.st>
|