I am a maintainer of the GPLv3 AirPlay server UxPlay.
Apple devices in screen mirror mode send separate encrypted streams of AAC-ELD audio and H264/H265 video, which we decrypt.
Each has timestamps (different formats for audio and video) and there is a mechanism for translating the audio timestamp to its video timestamp equivalent. Audio arrives ahead of video and goes into a buffer to be used when the corresponding video frame arrives. We render using separate gstreamer pipelines, injecting each stream as an appsrc.
Now there is a request to offer an option to output a rtp stream of muxed audio and video (instead of rendering) , I suppose as mpeg4 packets.
This use case seems to differ from the examples we found. Any advice or hints on how to do this muxing would be very welcome.