Muxing seperate timestamped h264/5 video and AAC-ELD audio streams

fduncanh · September 26, 2025, 7:10pm

I am a maintainer of the GPLv3 AirPlay server UxPlay.

Apple devices in screen mirror mode send separate encrypted streams of AAC-ELD audio and H264/H265 video, which we decrypt.

Each has timestamps (different formats for audio and video) and there is a mechanism for translating the audio timestamp to its video timestamp equivalent. Audio arrives ahead of video and goes into a buffer to be used when the corresponding video frame arrives. We render using separate gstreamer pipelines, injecting each stream as an appsrc.

Now there is a request to offer an option to output a rtp stream of muxed audio and video (instead of rendering) , I suppose as mpeg4 packets.

This use case seems to differ from the examples we found. Any advice or hints on how to do this muxing would be very welcome.

ndufresne · September 27, 2025, 5:25pm

This is an interesting project. Do you know which RTP media format you need? At first sight, you already have code the generates timestamp that aligned audio and video. I would suggest to consider creating and RTSP server (there is library for that). The server pipeline would be one bin with distinct appsrc doing appsrc ! parser ! payloader name=payN.

If on the otherhand you want more a transmitter, musing (using mpegtsmux, since it is RTP supported), and then use the rtpsink element. It’s also compatible with SRT and RIST, and even RTSP. If the timestamp do sink when playing GStreamer, they will sync in the muxer.