Performance problem i.mx6 and xorg

Hi,

I am using a board based on imx6q, and I want to play h264 video streams with low CPU usage thanks to hardware acceleration. The GPU is etnaviv, and VPU is CODA. When using h264 hw decoding through v4l2 and with kmssink I have medium performances (15% for a reference test source):
gst-launch-1.0 rtspsrc location=“rtsp://192.9.202.100/udpstream_ch1_stream1” latency=100 ! rtph264depay ! h264parse ! v4l2h264dec ! kmssink connector-id=66

However My need is to display multiple videos on top of an xorg application, and I cannot find a way to do it with appropriate performances. Xorg is confugured with Modesetting driver and Glamor for acceleration.
1- when using glimagesink, I got 54% cpuusage:
gst-launch-1.0 rtspsrc location=“rtsp://192.9.202.100/udpstream_ch1_stream1” latency=100 ! rtph264depay ! h264parse ! v4l2h264dec ! glimagesink

To use xvimagesink and ximagesink, I am using hw assisted video conversion (because of colorpsace I think), otherwise it is not working:
2- xvimagesink: cpu usage = 37%
/usr/bin/gst-launch-1.0 -v filesrc location=201411_blender_big_buck_bunny_24fps_360p_h264-baseline.mp4 ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! v4l2h264dec ! v4l2convert v4l2convert output-io-mode=dmabuf-import ! xvimagesink

3- ximagesink: cpu usage = 40%
gst-launch-1.0 rtspsrc location=“rtsp://192.9.202.100/udpstream_ch1_stream1” lattency=100 ! rtph264depay ! h264parse ! v4l2h264dec ! v4l2convert output-io-mode==dmabuf-import ! ximagesink handle-events=false

I am using a kernel version 5.18, because hw assisted video conversion does not work after (another problem I am working on…). However, when using a kernel 6.1-rc, the resuls are similar:

  • xorg glimagesink performance very far from kmssink rendering
  • better kmssink performances (10%)
  • ximaginesink and xvimagesink cannot be used because of lack of hw assisted video conversion

It seams that with xorg the rendering path of video is not directly done from VPU to GPU and software is too much involved. Do you know how to achieve better performances with my configuration? Or do you know how to investigate?

And a final though: the etnaviv DRM device has hardware overlay plane. I am able to use it with kmssink (another plane on the same connector-id), but not when xorg is started. Would it be possible to find a way to do a direct rendering in this plane to increase performances with xorg?

Best regards,
Cedric