Using AVX Intrinsics in gstreamer plugins

rayrapetyan · October 26, 2023, 4:24pm

I was looking into gstreamer pipeline flamecharts and found video_converter_matrix8_table function from videoconvert plugin consuming a significant amount of CPU resources (~5% of total).
The function itself is very simple matrix computation and I believe can be further improved using AVX instructions. I was unable to locate any instances of AVX utilization within the project, and I’m curious as to whether this absence is due to a specific policy or if there are reasons against employing SIMD instructions, possibly related to concerns about portability.

slomo · October 27, 2023, 6:16am

There’s no policy for that. As long as there’s runtime CPU feature detection, SIMD assembly optimizations wouldn’t be a problem at all. You can find some e.g. in the audio converter for resampling.

In most other places we use ORC, which is a simple data processing language that is then JIT-compiled to SIMD assembly. This is used in the video converter, for example, but the language is not powerful enough to express matrix multiplication.

For the function you mention here, specifically, I think it would be great to have SIMD optimizations. Not even necessarily AVX, SSE would be a big improvement too already. If you want to give it a try, please just go ahead For runtime CPU feature detection you could still use ORC there, just like it is done for the resampler.

FWIW, ORC currently only has SSE support (on x86) but AVX support is on the way.