mp4mux
wants proper dts/pts on the input buffers, but that’s not always available when coming from RTP. Theoretically it would be h264parse
’s job to fix that up, but in practice it doesn’t do that (yet), so there’s a separate h264timestamper
element which should hopefully reconstruct and set pts/dts correctly. So give that a try perhaps.
The other question is what happens when there are network constraints? is it just that you get framedrops and that causes it, or does the sender perhaps switch to a different video resolution/framerate?