Rtsp/webrtc graceful handling of no audio stream

I’m working on a system that takes an RTSP stream and re-transmits it over WebRTC. Pretty much have everything working except it’s not quite as smooth as I want with respect to handling media that isn’t present. And, for reference, this is using GStreamer-Sharp and GStreamer version 1.24.2 in a Windows environment.

So, to give an idea, some of the RTSP streams will have video and audio, but others will only have video. I want something than handles both situations. Currently, I’m doing this:

  1. Query the stream using Discoverer in order to determine if an audio stream exists.
  2. If an audio stream exists, use this:
rtspsrc location=rtsp://192.168.1.38:8554/test name=src
  src. ! 
  queue max-size-bytes=20971520 name=video_source_queue max-size-time=0 ! 
  parsebin ! 
  identity sync=true ! 
  tee name=video_tee ! 
  queue name=video_fake_queue ! 
  fakesink
  src. ! 
  application/x-rtp,media=audio ! 
  queue max-size-bytes=20971520 max-size-time=0 name=audio_source_queue ! 
  parsebin ! 
  identity sync=true ! 
  decodebin ! 
  audioconvert ! 
  audioresample ! 
  opusenc ! 
  tee name=audio_tee ! 
  queue name=audio_fake_queue ! 
  fakesink

else, if it’s only video, use this:

rtspsrc location=rtsp://192.168.1.38:8554/test name=src
  src. ! 
  queue max-size-bytes=20971520 name=video_source_queue max-size-time=0 ! 
  parsebin ! 
  identity sync=true ! 
  tee name=video_tee ! 
  queue name=video_fake_queue ! 
  fakesink
  1. And for each WebRTC client, if an audio stream exists, use this:
queue name=video_queue max-size-bytes=20971520 max-size-time=0 leaky=downstream !
  rtph264pay name=video_pay aggregate-mode=zero-latency config-interval=-1 timestamp-offset=0 ! 
  webrtcbin name=webrtc bundle-policy=max-bundle
  queue name=audio_queue max-size-bytes=20971520 max-size-time=0 leaky=downstream !
  rtpopuspay name=audio_pay !
  webrtc.

else, if it’s only video:

queue name=video_queue max-size-bytes=20971520 max-size-time=0 leaky=downstream !
  rtph264pay name=video_pay aggregate-mode=zero-latency config-interval=-1 timestamp-offset=0 !
  webrtcbin name=webrtc bundle-policy=max-bundle

And in my C# code, I handle all the pad creation and linking. What I don’t like about this solution is having to use the Discoverer. It adds a bit of delay that I’d rather not have. Is there any clean way to do that? Only thing that’s coming to mind is to have the video-only pipelines to begin with, then parse RTSP messages from the rtspsrc and if I detect audio, add in the audio part of the pipeline.