WebRTC vs. UDP/RTP: Streaming Audio Samples to a Remote Compute

Hi everyone,

I have a scenario where I receive audio samples from OpenAI Realtime that I want to playback on a remote computer. Initially, I tried using webrtcsink, but I couldn’t get it to work.

In theory, it should be possible to use a pipeline such as:

appsrc caps="audio/x-raw,channels=2,rate=24000,format=F32LE" ! webrtcsink

and then:

webrtcsrc ! alsasink sync=false

However, this didn’t work as expected. Interestingly, replacing appsrc with audiotestsrc works fine.

Can the WebRTC pipeline work with an appsrc that doesn’t push data continuously?

I managed to get it working by combining appsrc ! audiomixer with audiotestsrc pattern=silence ! audiomixer, but the received audio wasn’t properly rendered. Could this be due to improper timestamping of the audio samples from OpenAI?

For now, I’ve switched to a simple udpsrc/sink setup:

gst-launch-1.0 appsrc ! audioconvert ! audioresample ! opusenc ! rtpopuspay ! udpsink host=127.0.0.1

gst-launch-1.0 udpsrc ! "application/x-rtp,media=audio,encoding-name=OPUS,payload=96" ! rtpjitterbuffer ! rtpopusdepay ! queue ! opusdec ! audioconvert ! audioresample ! alsasink sync=false

While this works, I’d prefer using WebRTC since it’s already being used to stream microphone input from the remote computer to OpenAI Realtime.

I initially tried using rtpsrc/sink, but it didn’t work, even with audiotestsrc. On the receiver side, I encountered these warnings:

0:00:02.326823961 110052 0x7f7f20000d30 WARN rtpsource rtpsource.c:1134:calculate_jitter: cannot get clock-rate for pt 96 0:00:02.326839070 110052 0x7f7f20000d30 WARN rtpjitterbuffer gstrtpjitterbuffer.c:3754:gst_rtp_jitter_buffer_chain:<rtpjitterbuffer0> No clock-rate in caps!, dropping buffer

I used the following commands:

gst-launch-1.0 audiotestsrc ! audioconvert ! audioresample ! opusenc ! rtpopuspay ! rtpsink address=127.0.0.1

gst-launch-1.0 rtpsrc ! "application/x-rtp,media=audio,encoding-name=OPUS,payload=96,clock-rate=48000" ! rtpjitterbuffer ! rtpopusdepay ! queue ! opusdec ! audioconvert ! audioresample ! autoaudiosink

If anyone has experience with this or suggestions for improving the setup, I’d love to hear your thoughts!

Thanks in advance!

Hi @FabPoll

That’s a lot of questions in one! :slight_smile: You should probably ask step by step on the forum.

OpenAI most likely doesn’t provide timestamps that match the clock of the webrtcsink pipeline. webrtcsink internally performs synchronisation. You’d need to adjust timestamps accordingly and possibly implement a skew compensation mechanism.

As per the rtpsrc pipeline, you could try setting the caps property of rtpsrc instead of using a capsfilter after rtpsrc:

gst-launch-1.0 rtpsrc caps="application/x-rtp,media=audio,encoding-name=OPUS,payload=96,clock-rate=48000" ! ...
1 Like