I’m working on a system that takes an RTSP stream and re-transmits it over WebRTC. Pretty much have everything working except it’s not quite as smooth as I want with respect to handling media that isn’t present. And, for reference, this is using GStreamer-Sharp and GStreamer version 1.24.2 in a Windows environment.
So, to give an idea, some of the RTSP streams will have video and audio, but others will only have video. I want something than handles both situations. Currently, I’m doing this:
- Query the stream using Discoverer in order to determine if an audio stream exists.
- If an audio stream exists, use this:
rtspsrc location=rtsp://192.168.1.38:8554/test name=src
src. !
queue max-size-bytes=20971520 name=video_source_queue max-size-time=0 !
parsebin !
identity sync=true !
tee name=video_tee !
queue name=video_fake_queue !
fakesink
src. !
application/x-rtp,media=audio !
queue max-size-bytes=20971520 max-size-time=0 name=audio_source_queue !
parsebin !
identity sync=true !
decodebin !
audioconvert !
audioresample !
opusenc !
tee name=audio_tee !
queue name=audio_fake_queue !
fakesink
else, if it’s only video, use this:
rtspsrc location=rtsp://192.168.1.38:8554/test name=src
src. !
queue max-size-bytes=20971520 name=video_source_queue max-size-time=0 !
parsebin !
identity sync=true !
tee name=video_tee !
queue name=video_fake_queue !
fakesink
- And for each WebRTC client, if an audio stream exists, use this:
queue name=video_queue max-size-bytes=20971520 max-size-time=0 leaky=downstream !
rtph264pay name=video_pay aggregate-mode=zero-latency config-interval=-1 timestamp-offset=0 !
webrtcbin name=webrtc bundle-policy=max-bundle
queue name=audio_queue max-size-bytes=20971520 max-size-time=0 leaky=downstream !
rtpopuspay name=audio_pay !
webrtc.
else, if it’s only video:
queue name=video_queue max-size-bytes=20971520 max-size-time=0 leaky=downstream !
rtph264pay name=video_pay aggregate-mode=zero-latency config-interval=-1 timestamp-offset=0 !
webrtcbin name=webrtc bundle-policy=max-bundle
And in my C# code, I handle all the pad creation and linking. What I don’t like about this solution is having to use the Discoverer. It adds a bit of delay that I’d rather not have. Is there any clean way to do that? Only thing that’s coming to mind is to have the video-only pipelines to begin with, then parse RTSP messages from the rtspsrc and if I detect audio, add in the audio part of the pipeline.