Some IP camera produces HUGE latency over time via RTSP

We discovered strange issue with one particular IP camera module: if we leave RTSP streams running for quite some time (for example for couple of hours, or overnight), it MAY produce very high video latency, like in terms of seconds, even if we use latency=100 in our pipeline.

Our pipeline currently looks like this (used inside Qt application):

rtspsrc location=rtsp://...protocols=tcp latency=100 buffer-mode=slave max-ts-offset=0 ! queue max-size-buffers=0 ! rtph264depay wait-for-keyframe=1 ! h264parse ! vah264dec compliance=3 ! glupload ! glcolorconvert ! glvideoflip name=glvideoflip ! qmlglsink name=qmlglsink

Strangely, if we have multiple instances of such pipeline in our application, not ALL of these instances produce such huge latency! It seems that some random RTSP streams “goes crazy” in time. SOMETIMES it has big latency even from the beginning of the stream!

I remember that some time ago I introduced max-ts-offset=0 as a workaround that allegedly helped to avoid slowly, constantly growing latency, that could be measured in 10 minutes or so. But now it seems it helps in most cases for that camera, but still latency becomes huge in time, for unknown reasons…

Moreover, we discovered that this issues is not only with GSTreamer, but with other FOSS players…

Here for example we see gst-launch-1.0 (top left, using same pipeline but with glimagesink), mpv (top right), and ffplay (bottom right), and if left overnight, in this particular case, it’s ffplay that produced huge ~3s latency:

At the beginning, all these maintained <300ms latency.

Now since this issue is not only with GStreamer, this might be bug in camera firmware? Or maybe in some FOSS software used both by GStreamer and ffpaly? Live555?

Anyway, is there a way to debug where this huge lag “collects”? Is it within rtspsrc, or maybe queue? Could I monitor and detect this latency using C API, and maybe react to it, in result creating at least workaround?

Thanks!

I have update our software to print current-level-buffers, current-level-bytes and current-level-time of the single queue we have in our pipeline:

rtspsrc location=rtsp://…protocols=tcp latency=100 buffer-mode=slave max-ts-offset=0 ! queue max-size-buffers=0 ! rtph264depay wait-for-keyframe=1 ! h264parse ! vah264dec compliance=3 ! glupload ! glcolorconvert ! glvideoflip name=glvideoflip ! qmlglsink name=qmlglsink

Here’s what I see on some weaker Celeron J1900 computer:

It displays streams of the same camera, and queue37 shows current-level-time between 400-600ms.

Normally I see about 40-80ms on more powerful machine.

Why would this queue grow? Hardware decoder stalled and not dropped?

That single queue is responsible for a bit too many things (depayloading, parsing, decoding, uploading to gl, conversion in gl, flipping in gl, waiting for display time). All of that in a single thread (it’s the main point of using a queue, it starts a dedicated streaming thread).

I would recommend you split up your pipeline to have have a queue just after the decoder.

Regarding late/dropped frams, since you are using a live video sink, the QoS system will kick in and decoders/converters/… will automatically kick in to catch up as quickly as possible.

I would recommend you run your pipeline with GST_DEBUG=2 to see if there are any error/warning messages that would indicate that frames really arrive late.

As far as I understand, you don’t seem to have issues with GStreamer, but with other implementations ?

I do have issues with GSTreamer, but later discovered random huge latency with ffplay too.

I would recommend you split up your pipeline to have have a queue just after the decoder.

Idea is to use as less latency possible, so maybe because of that I’ve used only one queue, but thanks for suggestion, will try. Though for MANY other IP camera modules there’s no such issues.

Here’s latest experiment: I’ve set queue max-size-time=1 instead, and now that queue holds only one frame (~40ms at 25fps), so no longer grows but… I STILL GET 900ms LATENCY measuring glass-to-class (using screenshot ant timer):

What a heck :expressionless:

AND IT’S ONLY FOR THAT PARTICULAR MODULE. Other’s like Dahua, Hikvision, Axis, various less-known chinese brands works just fine…

To reiterate:

  1. There’s no such random huge latency issues with MANY other IP camera modules.
  2. This latency issue happens RANDOMLY. I might leave multiple instances on multiple machines overnight and not reproduce it.
    2.b Sometimes only one GStreamer pipeline lags while other two of the same camera works fine in the same application (we support multiple video slots).

Here’s some GST_DEBUG=4 output:

0:00:07.357204693 3453534 0x7efca4002060 INFO               GST_EVENT gstevent.c:1687:gst_event_new_reconfigure: creating reconfigure event
0:00:07.357220398 3453534 0x7efca4002060 INFO               GST_EVENT gstevent.c:918:gst_event_new_caps: creating caps event application/x-rtcp, ssrc=(uint)1758363959
TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 1456, current-level-time: 37.187862ms
TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 1456, current-level-time: 36.636688ms
TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 1456, current-level-time: 36.037617ms
TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 1456, current-level-time: 37.655727ms
TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 1456, current-level-time: 39.592674ms
TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 1456, current-level-time: 39.899487ms
TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 1456, current-level-time: 39.996662ms
0:00:08.871996844 3453534 0x7efc94040920 INFO               GST_EVENT gstevent.c:918:gst_event_new_caps: creating caps event application/x-rtcp
0:00:08.872095909 3453534 0x7efc94040920 INFO               GST_EVENT gstevent.c:998:gst_event_new_segment: creating segment event time segment start=0:00:00.000000000, offset=0:00:00.000000000, stop=99:99:99.999999999, rate=1.000000, applied_rate=1.000000, flags=0x00, time=0:00:00.000000000, base=0:00:00.000000000, position 0:00:00.000000000, duration 99:99:99.999999999
TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 1456, current-level-time: 39.999985ms
TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 1456, current-level-time: 40.000000ms
TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 1456, current-level-time: 40.000000ms
TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 28, current-level-time: 40.000000ms
0:00:09.634637891 3453534 0x7efc94040a40 INFO               GST_EVENT gstevent.c:918:gst_event_new_caps: creating caps event application/x-rtcp
0:00:09.634685525 3453534 0x7efc94040a40 INFO               GST_EVENT gstevent.c:998:gst_event_new_segment: creating segment event time segment start=0:00:00.000000000, offset=0:00:00.000000000, stop=99:99:99.999999999, rate=1.000000, applied_rate=1.000000, flags=0x00, time=0:00:00.000000000, base=0:00:00.000000000, position 0:00:00.000000000, duration 99:99:99.999999999
TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 1456, current-level-time: 40.000000ms
0:00:09.745097322 3453534 0x7efc940409e0 INFO               GST_EVENT gstevent.c:918:gst_event_new_caps: creating caps event application/x-rtcp
0:00:09.745136915 3453534 0x7efc940409e0 INFO               GST_EVENT gstevent.c:998:gst_event_new_segment: creating segment event time segment start=0:00:00.000000000, offset=0:00:00.000000000, stop=99:99:99.999999999, rate=1.000000, applied_rate=1.000000, flags=0x00, time=0:00:00.000000000, base=0:00:00.000000000, position 0:00:00.000000000, duration 99:99:99.999999999
TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 1456, current-level-time: 40.000000ms
TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 1456, current-level-time: 40.000000ms

<...>

TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 1456, current-level-time: 40.004978ms
0:00:12.379029553 3453534 0x7efcb4002180 WARN            videodecoder gstvideodecoder.c:3152:gst_video_decoder_prepare_finish_frame:<vah264dec1> decreasing timestamp (0:00:05.515782607 < 0:00:05.516059995)
TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 1456, current-level-time: 40.004745ms
TRACE: Queue element queue1 has current-level-buffers: 1, current-level-bytes: 1456, current-level-time: 40.004559ms

Maybe decreasing timestamp message says something?

@bilboed Could it be that camera somehow sometimes “lies” about it’s timestamps or something like that? Some RTSP streams due to some bug produces incorrect timing and our application “lags”?

I have made visual example. I have returned to queue max-size-buffers=0.

Top-left is our application using GStreamer via qmlglsink with pipeline shown previously…

Right is applications output, where we can see current-level-time: ... printed.

Bottom-left is some measurement utility I use.

  1. At the start of the video we see that current-level-time is around 80ms.
  2. Then I re-start RTSP stream (by switching from-and-back to the same video layout).
  3. Now see that current-level-time is 600ms (for whatever reason).
  4. Re-start stream again.
  5. Now current-level-time is just ~80ms again, as before.