How to represent "empty" caps, or silence

Hi, I am writing a demuxer of a custom file format, and the format it outputs is RTP of various encodings that can change on the fly for each pad.

Under some specific conditions, i want to signal that the next X seconds are silence, no media. I could use a GAP event, but the gap event does not change the caps, so the last used caps are still in the pad although they do not apply anymore. Also the GAP event does not contain key-values so i cannot represent e.g the id of the client who turned off their media.

Is there a use case for this?

It would look something like User A video (with caps X, userId=A) → no video for 1 minute → User B video (with caps Y, userId=B). My downstream element would change the decoder each time the video caps changes, or insert filler frames during silence.

