High video decoding delay with hardware decoders (va, vaapi & msdk)


I want to decode a live video stream as fast as possible. It is encoded in h265/h264 and does not contain any B Frames. For decoding, I tried the decoding plugins: vaapi, va and msdk.

An example pipeline looks like:

appsrc → h265parse → vah265dec → vapostproc → appsink

I push new frames in the pipeline with gst_app_src_push_buffer and get the decoded ones via new_sample callback. The problem is now the high decoding latency. For 1024x768@30fps its around 75ms and for 1024x768@60fps its around 37 ms.

The interesting thing is that it seems to keep at least two frames in the pipeline, so it doesn’t seem to be a performance issue but rather something buffering. To verify it i tried a program which does a similar thing but uses ffmpeg and also ueses vaapi as backend, I took arround 1,2 ms to decode a frame. So the hardware seems to be fine. I then tried to trace it with latency(flags=element+pipeline+reported) (https://pastebin.com/sQj2jvxL). Which also indicates, if I interpret it correctly, that decoding should be significantly faster. The sum of h265parse, vah265dec and vapostproc always seems to be smaller than 2 ms.

And now I ask myself what I am missing?

unless you set alignment field in the caps, parser will assume that input stream is not au aligned in case of byte-stream format. Then parser might buffer one frame for au boundary detection.

Setting video/x-h265,stream-format=byte-stream,alignment=au caps to your appsrc will get rid of the one frame latency introduced by parser if it’s the case.

Wow, thank you very much, if only the solutions were always so simple. Now my decoding time is under 2.5 ms.