Tips for Minimizing Latency in Video Streaming Over WiFi

ravi · August 24, 2024, 5:14pm

I’m streaming raw video frames in BGR format using GStreamer in C++ langugae. See below sender (in C++) and receiver (in terminal):

Sender (C++)

// Define pipeline
std::string pipeline = "appsrc name=mysource ! queue ! video/x-raw,format=BGR,width=1280,height=720 ! queue ! rtpvrawpay ! udpsink auto-multicast=0 host=Receiver_IP port=5004”;

// Configure appsrc
g_object_set(G_OBJECT(appsrc_), "caps", gst_caps_new_simple("video/x-raw", "format", G_TYPE_STRING, "BGR", "width", G_TYPE_INT, width_, "height", G_TYPE_INT, height_, "framerate", GST_TYPE_FRACTION, 5, 1, nullptr), nullptr);

Receiver (terminal)

$ gst-launch-1.0 udpsrc address=<Receiver_IP> port=5004 caps="application/x-rtp, media=(string)video, encoding-name=(string)RAW, sampling=(string)BGR, width=(string)1280, height=(string)720, framerate=(string)5/1, depth=(string)8, payload=(int)96" ! queue! rtpvrawdepay ! videoconvert ! autovideosink

I’ve noticed that when I run both the sender (written in C++) and the receiver (in the terminal) on my laptop (13th Gen Intel i7 processor with 32 GB RAM), I experience almost no latency (almost in real-time). However, when I move both sender and receiver to different devices and connect them via WiFi (a local network), I observe a significant latency of approximately 6-7 seconds. BTW, the sender is a small device equipped with an x86 processor (Dual Core Intel Processor) with 4GB RAM and standard WiFi capabilities. When streaming the video frames, the CPU usage of both cores on sender machine reaches up to 80-90%, and multiple instances of GStreamer are executing, as shown by the htop command in the terminal.

In order to reduce this huge latency to a reasonable value, the first thing that comes to my mind is to downscale the video frame. I wonder what other tips and tricks are suggested to minimize latency in this environment?

Thanks for your suggestions

Honey_Patouceul · August 26, 2024, 8:13pm

At first I’d suggest checking if both ends have the same time.
Ideally have these synchronized by a common NTP server.
Or try changing Wifi network channel. There are some free tools for analyzing wifi channels.
You may also try using some tool such as wireshark for further insights into network data.

ravi · August 27, 2024, 3:58pm

@Honey_Patouceul

Thank you very much for the suggestions.

I verified that both machines have the same time.

Furthermore, I started using a better machine equipped with 11th Gen Intel i7 2.80 Ghz x 8 CPU with 32 GB RAM. This machine has a WiFi hotspot. For now, the client machine (laptop) is kept just next to it. I think the new machine is enough powerful and thus can do the image/video compression for lower latency. Below is the code snippet used by GStreamer:

class H264FrameReceiver {

public:
  H264FrameReceiver() {
    std::string pipeline = fmt::format("appsrc name=mysource ! queue ! video/x-raw,format=BGR,width={},height={} ! queue ! rtpvrawpay ! udpsink auto-multicast=0 host={} port=5004 ", width, height, receiver_ip);
    pipeline_ = gst_parse_launch(pipeline.c_str(), nullptr);
    appsrc_ = gst_bin_get_by_name(GST_BIN(pipeline_), "mysource");
    g_object_set(G_OBJECT(appsrc_), "caps", gst_caps_new_simple("video/x-raw", "format", G_TYPE_STRING, "BGR", "width", G_TYPE_INT, width_, "height", G_TYPE_INT, height_, "framerate", GST_TYPE_FRACTION, fps_, 1, nullptr), nullptr);
    g_object_set(G_OBJECT(appsrc_), "block", TRUE, nullptr);
    gst_element_set_state(pipeline_, GST_STATE_PLAYING);
  }


  void H264FrameReceiveCallback(const uint8_t *h264_data, size_t size, int64_t timestamp) override {
    // Some code here to decode the H264 packet and convert it to OpenCV image

    GstBuffer *buffer = gst_buffer_new_allocate(nullptr, opencv_img.total() * opencv_img.elemSize(), nullptr);

    GstMapInfo map;
    gst_buffer_map(buffer, &map, GST_MAP_WRITE);
    std::memcpy(map.data, opencv_img.data, opencv_img.total() * opencv_img.elemSize());

    gst_buffer_unmap(buffer, &map);
    GstFlowReturn ret = gst_app_src_push_buffer(GST_APP_SRC(appsrc_), buffer);
    if (ret != GST_FLOW_OK) {
      std::cerr << "Failed to push buffer to appsrc: " << ret << std::endl;
    }
  }
};

On the other hand, please note that this camera provides a mobile app, that connects to the camera via its WiFi and shows 4K stream in real time. Here I am not even using HD! Furthermore, what surprises me is that mjpg-streamer is showing lesser latency (2-3 seconds) compared to 6-7 seconds (by GStreamer).
I am targeting a latency of within 1 second. I appreciate your expert advice to achieve this. Thanks again

Joe · August 28, 2024, 6:27pm

I guess your pipeline on the receiver does not use hardware(GPU) acceleration.

videoconvert

Honey_Patouceul · August 28, 2024, 7:25pm

First be sure that I’m not an expert. There are a few gurus here, but I’m just a stupid dog hanging around, providing weird advice for free on my personnal time.

You may add chrono probes around this for checking decoding time.
Also do you set buffer PTS before mapping ?

It might use a different encoding (MJPG? is that better than RTP/JPG?) and/or a transport protocol that may be better configured for your case, or else…
2-3s latency seems sub-optimal to me… You may adjust receiver VLC settings.

Also, to be sure, how do you measure latency ?

ravi · August 29, 2024, 2:42am

@Joe

This is receiver side:

$ gst-launch-1.0 udpsrc address=10.42.0.1 port=5004 caps="application/x-rtp, media=(string)video, encoding-name=(string)RAW, sampling=(string)BGR, width=(string)width_, height=(string)height_, framerate=(string)fps_/1, depth=(string)8, payload=(int)96" ! queue! rtpvrawdepay ! videoconvert ! autovideosink

ravi · August 29, 2024, 2:56am

@Honey_Patouceul

Thanks for your valuable time. I appriciate!

You may add chrono probes around this for checking decoding time.

This makes sense. Thank you very much. I will do it. What I did instead was displyaed the OpenCV image using imshow here and disabled all the GStreamer code that sends the buffer. With this setup, I could see realtime image on OpenCV window. But when I enabled GStreamer code, the display was lagging (looking like framedrop)

Also do you set buffer PTS before mapping ?

Sorry, I think I haven’t! BTW, what is PTS? Can you please explain a bit or share a code snippet?

It might use a different encoding (MJPG? is that better than RTP/JPG?)

It is difficult to make such a conclusion on which one is better. But I can share the code snippet of mjpg-streamer. Please see below:

std::vector<int> params = {cv::IMWRITE_JPEG_QUALITY, 90};
MJPEGStreamer streamer;

cv::Mat frame;

// http://localhost:8080/bgr
std::vector<uchar> buff_bgr;
cv::imencode(".jpg", frame, buff_bgr, params);
streamer.publish("/bgr", std::string(buff_bgr.begin(), buff_bgr.end()));

Now, simply use VLC or browser to access the steam over HTTP. The latency is minimal in this case (as compared to GStreamer as mentioned in the first post)

and/or a transport protocol that may be better configured for your case, or else…

Can you please elaborate more so that I can check them on my setup?

You may adjust receiver VLC settings.

VLC is taking some time. But browser is performing better though.

Also, to be sure, how do you measure latency ?

Currently I sender and receiver both are kept near to each other. So I can see waving my hand and approximating the latnecy by getting it reflected on receiver side. I am looking for an optimal way though. Let me know if there is any!

I am expecting GStreamer to outperform all other stremers. So I am not ready to migrate to mjpg-streamer.

Thanks

Honey_Patouceul · August 29, 2024, 8:07pm

You may learn more from reading: GstBuffer.

ravi · September 10, 2024, 4:14pm

@Honey_Patouceul

Sorry for my delayed response. I tried setting buffer PTS before mapping but got following runtime error:

(streamer_gst:41760): GStreamer-CRITICAL **: 20:52:19.968: gst_clock_get_time: assertion 'GST_IS_CLOCK (clock)' failed
(streamer_gst:41760): GStreamer-CRITICAL **: 20:52:19.996: gst_segment_to_running_time: assertion 'segment->format == format' failed

Below is the code snippet:

// Create a new buffer
GstBuffer *buffer = gst_buffer_new_allocate(nullptr, img.total() * img.elemSize(), nullptr);

// Add a timestamp to the buffer
GstClockTime timestamp = gst_clock_get_time(gst_element_get_clock(pipeline_));
GST_BUFFER_PTS(buffer) = timestamp;

GstMapInfo map;
gst_buffer_map(buffer, &map, GST_MAP_WRITE);

// Copy frame data to buffer
std::memcpy(map.data, img.data, img.total() * img.elemSize());

// Unmap buffer
gst_buffer_unmap(buffer, &map);

// Push buffer to appsrc
GstFlowReturn ret = gst_app_src_push_buffer(GST_APP_SRC(appsrc_), buffer);
if (ret != GST_FLOW_OK) {
  std::cerr << "Failed to push buffer to appsrc: " << ret << std::endl;
}

Next, I added chrono probes and realised that decoding the H264 packet and convert it to OpenCV image takes highest time. Furthermore, I also noticed that mjpg-streamer is taking more time that GStreamer BUT on the receiver side GStreamer seems slower though.

I suspect that GStreamer is queuing or something. For example, with mjpg-streamer, I could simply use my chrome browser to receive and view the stream. However, when I use VLC to view the mjpg-stream, VLC looks slower as it might be queuing or something.

BTW, can you tell me the proper way to set buffer PTS?

Thanks

Honey_Patouceul · September 12, 2024, 7:56pm

Once you get a reliable framerate for decoding, you may :

be sure to have a queue after appsrc in your pipeline
set your appsrc to GST_APP_STREAM_TYPE_STREAM before playing
have static/global GstClockTime timestamp and integer frame_num initialized to 0

have feed_function with monotonic rate such as (for 30 fps):

allocate GST buffer
set:

    GST_BUFFER_PTS(buffer) = timestamp;
    GST_BUFFER_DTS(buffer) = timestamp; // not sure...
    GST_BUFFER_OFFSET(buffer) = frame_num++;
    GST_BUFFER_DURATION(buffer) = ((double)1/30) * GST_SECOND;

Map buffer, copy/process, unmap
timestamp += 33333333; //ns
return G_SOURCE_CONTINUE;

This is just advice from a not so skilled hobbyist. (From an old experiment here if it can help, but most of this is experimental, out of date and probably mostly unrelated to your case)
Someone better skilled may correct this and/or better advise.