I am working on an application that involves muxing audio and video streams for subsequent RTMP or SRT streaming. Both input sources are live and timestamped in NTP time, both sources are received via the same rtpbin. However, I am facing a specific issue that I’d like to resolve.
In my GStreamer pipeline, I have the following scenario:
(1) The video source is received arbitrary seconds after the audio source
(2) The video source has latency approximately 2s higher than the audio.
If the audio then has been received for 10s before the video, then the output also consist of 10s of audio, before video and audio are interleaved. This I understand might be intended design to not lose any data - however I would like the application to discard frames until I have synced A/V that can be muxed and then streamed - so that we send interleaved A/V frames from the start.
Why do I want this? I’ve observed playback and transcoding issues downstream that I suspect are related to this initial buffer accumulation. For instance, some decoders require A/V packets to be received interleaved immediately or else they might skip decoding the audio track. Another thing I suspect is related is during transcoding (on some other receiver platforms) the A/V offset of the first received frames can cause a sync offset between A/V in the transcoded file.
How would one go about solving this problem? I.e. delaying muxing process until there is input on both pads, preferably with synchronised buffers right from the first packet.
I have not yet found any properties on the muxers (flvmux/mpegtsmux in this case) that seem to do the trick or any other elements that can help. Currently, I have a somewhat hacky solution that throws away buffers around the muxer until data is being received on both pads but that does not feel like a smart way to go.
Any help and ideas on how to achieve this is greatly appreciated.
Thanks
Thanks @tpm for your suggestion.
I see that the element only accepts raw format on the caps, which would not be the best fit to avoid a layer of coding. But sounds like it could the trick. If you have other ideas please let know. I also did try with the streamsynchronizer element in the pipeline before muxing, but got issues with not getting any data out from that element, however not sure if that element will help for this problem.
If you have an application you could simply use a pad probe for buffers that drops all buffers until all streams have seen data (optionally with min timestamp coordination).
That is what I am doing atm. Using pad probes and dropping buffers until we detect flow on both sink pads. It does feel a bit wrong though to do this manually…? After doing so, I also ended up with throwing away buffers after the muxer to ensure that A/V are “synced” from resp. first frame. Seemed like some buffer-accumulation was still there in the pipeline as the gates were opened (canal lock analogy). Also leading to another side-effect of not knowing that first outputted frame will be a keyframe…
I was hoping that there would be either some other elements that could be used or strategies to avoid my issue. But if writing a own element or manually dropping buffers is the way forward here then that will bring peace to mind. Maybe some timestamp coordination can help with the side-effects I guess, as you mentioned.