Multiple SRT stream synchronization with timecode

ivan94fi · March 27, 2024, 3:44pm

Introduction and context

Hi everyone, I will try to explain my problem as clearly as possible, thank you for everyone who will find time to help me.

Context

For a project at work I have to develop a program which receives multiple live SRT streams of the same scene in input, processes them in real time with a custom in-house built element, and then outputs multiple NDI streams.

The cameras are placed in the same physical location and all send the SRT stream through the Internet at another location, where our machine resides. Our machine is a server with enough processing power.

Primary requirement for our implementation

The must-have requirement is that our processing pipeline must preserve the synchronization between frames of the input SRT streams, in particular we must keep the frame-by-frame synchronization information.

Stream Specification

The SRT streams are composed of a H.264 video track and an audio track.

Our pipeline

So our task is to keep the streams synchronized after our processing. Our complete pipeline for each of the SRT strem is the following:

receive the SRT stream
decode the video stream
process the video with our plugin (already implemented)
passthrough the audio
reassemble the audio and the processed video stream
output an NDI stream

This project faces multiple challenges, the most relevant being:

How do we even insert synchronization information in the source SRT streams?
Supposing we can embed synchronization timestamps in the sources, how can we maintain the synchronization while processing the decoded frames in the GPU?

I will try to outline what we devised for now, and what we are still missing, walking through the decisions that brought us where we are now.

Challange 1: Video synchronization strategies

While this is not strictly a problem for us but for who sends us the video streams, it becomes so when considering that the strategy of synchronization will determine what we can do to maintain it.

After many days of research on the web, it appears that the common way of synchronizing videos (or audio/subtitles tracks) is by using the timecode. For the most part we find articles describing what is timecode in video production, that suggest using timecode generators to sync multiple cameras setup to ease post production processing. The only references to remote production and synchronization of SRT/RTMP/RTP streams are in commercial software descriptions, for example see the links in the following section:

Commercial software mentioning timecode

Judging by the resources in these links, it seems that for synchronization of several cameras when transmitting the streams over a network two steps are required:

the clocks of the cameras must be synchronized, ideally with specific devices as timecode generators, connected to each of the cameras, or with NTP if hardware generators are not available.
the timecodes of each frame must be transmitted with the video stream

So, assuming the clocks are synchronized, we need timecodes for every frame.

As we are dealing with H.264 streams, I found that timecode can be embedded in SEI messages or we can use the more general VITC timecode. Let us consider only timecode in SEI messages for the sake of this conversation, even if VITC might be still a viable option if we fail to implement the primary option.

Suppose we decide to use Unregistered User Data SEI messages, where we will embed timecodes as 4 byte strings (one byte for each of hours, minutes, seconds and frame number).

Question 1

Is this approach for synchronizing videos the correct one? Is there something else used commonly which I did not mention?

Challenge 2: Keep timecode synchronized between input and output

Now, supposing that we use SEI messages to store timecode information, our pipeline should read the SRT stream, decode the frames while annotating their timecode, process the decoded frames with our algorithm in gpu, then reapply the original timecode in each frame in the NDI output.

I found some references to SEI and timecode in Gstreamer docs, but I have not been able to grasp how this process of extracting timecode from a frame, decoding the frame, processing it with our algorithm in gpu, reencoding the frame and finally putting back the corresponding timecode extracted at the start of the pipeline works.

Some of the resources I consulted are:

Also, I searched here in the Discourse, and the relevant topics are:

Relevant discourse topics

https://discourse.gstreamer.org/t/send-frame-number-over-udp/1035/2: here @slomo suggests the idea of using SEI timecodes
https://discourse.gstreamer.org/t/associating-additional-data-with-a-frame-webrtc/198: here there are some other suggestion about SEI timecodes and how to handle them. As far as I can tell I should implement a custom element to process the timecodes
https://discourse.gstreamer.org/t/inserting-sei-metadata-in-h264/840: here the asker is using the function to insert metadata and gave me the idea of using some video filter to accomplish my task (I should research better here)
https://discourse.gstreamer.org/t/plugin-to-modify-sei-data/1118: this user is trying to implement a similar feature, but it is just to modify SEI data in place
https://discourse.gstreamer.org/t/no-python-gstreamer-api-extract-h264-sei/1175/2: here the asker is trying to extract SEI with python and @ndufresne answers to “create a pipeline with a parser and an appsink”, which I do not fully understand. Actually I have been able to extract some SEI metadata with pyav, which is the second suggestion.

Question 2

How can I implement this logic of reading the timecode for a frame in the H.264 input stream, keeping it on hold somewhere for after, decode the frame and process it in gpu, then reattach the timecode to the frame?

What I obtained so far

I was able to insert timecodes in SEI messages with AWS Mediaconvert, following this guide Putting timecodes in your outputs - MediaConvert, and I was able to insert timecodes both in timecode SEI messages and as User Data Unregistered SEI Messages.

Also with FFmpeg I could insert metadata, but only on the first frame, with the h264_metadata bitstream filter for h264) and read them (actually, printing them to screen, not reading programmatically) with FFmpeg and the showinfo bitstream filter or with ffprobe -show_frames.

With Gstreamer I tried to generate the timecode in SEI messages with the following pipeline, but I received many errors:

gst-launch-1.0 -e videotestsrc ! timecodestamper ! x264enc ! h264parse update-timecode=1 ! matroskamux ! filesink location=timecodestamper_out.mkv

The errors are like this:

WARNING: from element /GstPipeline:pipeline0/GstH264Parse:h264parse0: Element doesn't implement handling of this stream. Please file a bug.
Additional debug info:
../gst/videoparsers/gsth264parse.c(2938): gst_h264_parse_create_pic_timing_sei (): /GstPipeline:pipeline0/GstH264Parse:h264parse0:
timecode update was requested but VUI doesn't support timecode

and the produced file does not seem to contain any timecode information.

MichaelG · October 31, 2024, 8:00am

Hello,
our encoder SRT Streamer PRO (based on GStreamer) uses LTC-audio for sync many sources.
Let me just copy my explanation about it from one discussion.

==
we can’t help with SEI insertion because we have not experience with it (we have only reading experience).
Let me talk our experiences: we have an encoder called “SRT Streamer PRO”, it allows you to stream several cameras at the same time. Of course we wanted to add timecode insertion to it to synchronize streams
But we did not use SEI timecode.
Because: SEI is a specificity of h264 and hevc only. But our streamer can encode mpeg2 and in the future we want to add av1 and new codecs like j2k. So SEI not universal solution for us. And it’s really not trivial to implement. VITC - is universal but it “eats” up the top of the picture…
So we went a simple way: use LTC timecode. This is a special audio that contains SMPTE timecode (12.8.2. LTC (audio) method ) . It is quite simple to implement (there are code examples for generating LTC sound stream on the net). We implemented it as gstreamer plugin.

So our encoder just adds an additional audio track to the main stream. This method suited us better than “fiddling” with SEI timecode insertion

ivan94fi · November 5, 2024, 10:58am

Hi @MichaelG, thank you for sharing your experience.

We ended up using SEI metadata as we use H.264 as an encoder. We faced many problem along the way and the software is not fully ready but it is working quite good.

Thanks again for spending some time to answer.