Background
I’m building an iPhone app that captures synchronized RGB and depth frames. After encoding and decoding the RGB frames, I need to re-synchronize them with the corresponding depth frames.
Currently, I’m using AVCaptureDataOutputSynchronizer to capture RGB and depth frames with wall-clock timestamps. The depth frames are zipped raw with the wall-clock timestamp embedded in each frame. RGB frames, on the other hand, are encoded and written with AVAssetwriter. This is where the problem starts: when the RGB frames are encoded I lose all the connection it had with the wall-clock timestamp. Therefore when I decode it, I cannot get a perfect synchronization with the depth frames.
To work around this, I write a timed metadata track alongside the video. Each metadata entry is written at the same time as the corresponding RGB frame and includes both RGB and depth wall-clock timestamps. Later, I locate the metadata with the closest PTS to each RGB frame and use that to realign the frames.
While this approach mostly works, I still see a drift of 1–3 frames at random times, which is unacceptable for my application.
Is SEI with GStreamer my best option?
From what I understand, the only reliable way to retain precise synchronization is to embed the wall-clock timestamp into the video frame itself—something like an SEI message. Unfortunately:
- AVFoundation doesn’t support SEI messages,
- FFmpeg isn’t supported on iOS,
- GStreamer is(!), but its iOS build relies on the GStreamer Bad Plug-ins.
My questions are then:
- Is GStreamer (and the “bad” plugins) a good option for my case? The app needs to be distributed.
- Should I consider something else?
Other ideas I’ve considered:
- Bake timestamps into the alpha channel
- but this will be compressed and the timestamps can be corrupted, right?
- Somehow utilize HDR metadata, as the encoder seems to preserve this.
- I dont know how this works or if it works.
I’m quite new to both media encoding and GStreamer, and I’ve been working on this issue for a while. Any advice or alternative approaches would be deeply appreciated!