A/V Sync Issue in WebRTC Pipeline

extremestreamer · April 16, 2025, 6:06pm

I’m currently working on a real-time avatar rendering system where an DL-based frame generator takes an audio file as input and generates video frames at a fixed FPS (e.g., 25 FPS). These frames, along with the original audio, are streamed via WebRTC for real-time playback.

Once the audio-to-frame generation process is complete, I switch the audio source to a silent stream (audiotestsrc is-live=true wave=silence) to continue pushing audio downstream. Simultaneously, I loop a predefined set of idle avatar frames to maintain the video output.

The issue arises during this transition — there is a noticeable loss of audio-video sync between the idle frames and the silent audio stream. I have attempted to manage buffer PTS and duration manually, but this hasn’t resolved the sync discrepancy. It seems the pipeline experiences jitter or timing inconsistencies when switching audio sources, even with proper timestamps set.

I’m looking for guidance on how to handle seamless transitions between real audio and silence in a live WebRTC pipeline while maintaining A/V sync. Any suggestions on pipeline design, clock synchronization, or proper timestamping strategy would be greatly appreciated.