High video decoding delay with hardware decoders (va, vaapi & msdk)


I want to decode a live video stream as fast as possible. It is encoded in h265/h264 and does not contain any B Frames. For decoding, I tried the decoding plugins: vaapi, va and msdk.

An example pipeline looks like:

appsrc → h265parse → vah265dec → vapostproc → appsink

I push new frames in the pipeline with gst_app_src_push_buffer and get the decoded ones via new_sample callback. The problem is now the high decoding latency. For 1024x768@30fps its around 75ms and for 1024x768@60fps its around 37 ms.

The interesting thing is that it seems to keep at least two frames in the pipeline, so it doesn’t seem to be a performance issue but rather something buffering. To verify it i tried a program which does a similar thing but uses ffmpeg and also ueses vaapi as backend, I took arround 1,2 ms to decode a frame. So the hardware seems to be fine. I then tried to trace it with latency(flags=element+pipeline+reported) (https://pastebin.com/sQj2jvxL). Which also indicates, if I interpret it correctly, that decoding should be significantly faster. The sum of h265parse, vah265dec and vapostproc always seems to be smaller than 2 ms.

And now I ask myself what I am missing?

unless you set alignment field in the caps, parser will assume that input stream is not au aligned in case of byte-stream format. Then parser might buffer one frame for au boundary detection.

Setting video/x-h265,stream-format=byte-stream,alignment=au caps to your appsrc will get rid of the one frame latency introduced by parser if it’s the case.

1 Like

Wow, thank you very much, if only the solutions were always so simple. Now my decoding time is under 2.5 ms.