Efficient pipeline for streaming BGR frames to local network via UDP

I am receiving h264 frames of a stream (720x360@30fps) from a camera connected to PC via USB. I am converting these frames to BGR frames supported in OpenCV. After the conversion, these frames are sent to local network via UDP with the use of GStreamer in C++. A code snippet is shown below:

class VideoServer {
public:
  VideoServer() {
    std::string pipeline = "appsrc ! video/x-raw,format=BGR ! queue ! videoconvert ! x264enc bitrate=5000 ! "
                           "mpegtsmux alignment=7 ! rndbuffersize max=1316 min=1316 ! udpsink port=5005";

    writer = std::make_unique<cv::VideoWriter>(pipeline, cv::CAP_GSTREAMER, 0, fps_, cv::Size(width_, height_));
  }
  void VideoDataCallack(const uint8_t *data, size_t size, int64_t timestamp) override {
    std::cout << "At timestamp: " << timestamp << " video data of size " << size << " received." << std::endl;
    // convert h264_frame to opencv_frame
    writer->write(opencv_frame);
  }

private:
  std::unique_ptr<cv::VideoWriter> writer;
};

My goal is to steam these frames with minimal latency. Therefore, can someone point me to an efficient pipeline suitable for above code snippet?

Why are you encoding to H.264 and muxing into an MPEG-TS container?

Is that because of specific receiver requirements, or are you in control of both sender and receiver on the local machine?

The easiest way to send raw video BGR frames over the local network via UDP would probably to just send them in raw form as-is, payloaded as RTP, which you can do with the rtpvrawpay element (going straight into udpsink).

If your local interface supports a higher-than-usual MTU (that is more then 1200-1500 bytes, such as 16kB or 64kB), then you can configure the payloader with the mtu property to get higher throughput with fewer packets.

Because raw video data rates can be quite high (well, maybe not with 360p@30fps, but with higher resolutions at least), you may also want to tweak the max. allowed kernel-side send/receiver buffer sizes (see sysctl net.core.rmem_max net.core.wmem_max) which are quite small by default and may lead to packet loss if too small. Once that’s done you can set the buffer-size property on udpsink etc. to make the buffer size for this element higher.

In addition to what @tpm mentioned above, for more details you may have look to the RTSP example in this post:

RTSP server may not be required for your case, but that’s just easing things for trying. The example shows some ways for RTP video streaming (to localhost in the example) with gstreamer.

I’d suggest that you try these, adjust to your resolution/framerate and see what better suits your case.

Once done, you may just have to replace videotestsrc with your video source.

At this point you may say if you want to process frames with opencv or if you were just using it for format conversion. If your intent is just streaming the camera and if your USB camera is UVC then a pure gstreamer pipeline using v4l2src may be more efficient with Linux.

If you want to use OpenCV for processing, it may not be the best for latency, though you would use similar pipelines in writer from appsrc (opencv app pushing BGR frames) → videoconvert → encoding_if_any → RTP pay → udpsink

@tpm

Thanks a lot for so many suggestions.

let me acknowledge and answer your concerns one by one

Why are you encoding to H.264 and muxing into an MPEG-TS container?
Is that because of specific receiver requirements, or are you in control of both sender and receiver on the local machine?

Yes, you are right. I have the control of both sender and receiver. The important thing is that sender and receiver are connected via WiFi on a local network.

The easiest way to send raw video BGR frames over the local network via UDP would probably to just send them in raw form as-is, payloaded as RTP, which you can do with the rtpvrawpay element (going straight into udpsink).

Yeah, it seems straightforward. Is the following pipeline does the same?

std::string pipeline = "appsrc ! video/x-raw,format=BGR ! queue ! rtpvrawpay ! udpsink port=5005";

Can you please correct it, in case of any discrepancy?

If your local interface supports a higher-than-usual MTU (that is more then 1200-1500 bytes, such as 16kB or 64kB), then you can configure the payloader with the mtu property to get higher throughput with fewer packets.

Let me check the MTU of NIC on this machine. Thanks a lot for this tip!

you may also want to tweak the max. allowed kernel-side send/receiver buffer sizes (see sysctl net.core.rmem_max net.core.wmem_max) which are quite small by default and may lead to packet loss if too small. Once that’s done you can set the buffer-size property on udpsink etc. to make the buffer size for this element higher.

Let me check it out and get back to you with actual values.
BTW, my goal is to steam these frames with minimal latency.

Thank you again.

@Honey_Patouceul

Thank you so much for additional feedback.

Let me acknowledge and answer to your concerns one by one.

RTSP server may not be required for your case, but that’s just easing things for trying. The example shows some ways for RTP video streaming (to localhost in the example) with gstreamer.

Although I have full control over sender and receiver, the receiver is connected via WiFi on local network. So, the receiver is not in localhost.

I’d suggest that you try these, adjust to your resolution/framerate and see what better suits your case.

Let me try it on my setup.

At this point you may say if you want to process frames with opencv or if you were just using it for format conversion. If your intent is just streaming the camera and if your USB camera is UVC then a pure gstreamer pipeline using v4l2src may be more efficient with Linux.

I totally agree with your suggestion. However, the problem is that the camera manufacturer has not provided complete information (maybe because it is proprietary). The limitation is that I get the H264 frame inside a callback in C++. There is no /dev/video* for this device and it seems to be using /dev/usb/bus/* on Linux.

If you want to use OpenCV for processing, it may not be the best for latency, though you would use similar pipelines in writer from appsrc (opencv app pushing BGR frames) → videoconvert → encoding_if_any → RTP pay → udpsink

For now, OpenCV works and seems a workaround though. Basically, I am receiveing H264 frame, that I am converting into OpenCV BGR image. A code snippet is posted in the question above. Can you please provide the pipeline accordingly?

Thanks for your time and kind suggestions

That was mostly in case you’re using the localhost interface, if it’s wifi you’ll likely have a standard MTU.

For your case I’d suggest that you try these various RTP streaming cases. I’ve disabled multicast as you’re using WiFi network.

1. RTP/JPG
Sender

gst-launch-1.0 videotestsrc ! video/x-raw,format=BGR,width=640,height=480 ! queue ! videoconvert ! video/x-raw,format=I420 ! jpegenc ! rtpjpegpay ! udpsink auto-multicast=0 host=<Receiver_IP> port=5004 

Receiver

gst-launch-1.0 udpsrc address=<Receiver_IP> port=5004 ! application/x-rtp,media=video,clock-rate=90000,encoding-name=JPEG,payload=26 ! rtpjitterbuffer latency=300 ! rtpjpegdepay ! jpegdec ! videoconvert ! xvimagesink

2. RTP/H264
Sender

gst-launch-1.0 videotestsrc ! video/x-raw,format=BGR,width=640,height=480 ! queue ! videoconvert ! x264enc tune=zerolatency insert-vui=1 key-int-max=30 ! h264parse ! rtph264pay ! udpsink auto-multicast=0 host=<Receiver_IP> port=5004 

Receiver

gst-launch-1.0 udpsrc address=<Receiver_IP> port=5004 ! application/x-rtp,encoding-name=H264 ! rtpjitterbuffer latency=300 ! rtph264depay ! h264parse ! avdec_h264 ! videoconvert ! xvimagesink

3. RTP/RAW
Sender

# To do once
sudo sysctl -w net.core.wmem_max=10000000

gst-launch-1.0 videotestsrc ! video/x-raw,format=BGR,width=640,height=480 ! queue ! rtpvrawpay ! udpsink auto-multicast=0 host=<Receiver_IP> port=5004 buffer-size=1000000

Receiver

# To do once
sudo sysctl -w net.core.rmem_max=10000000

gst-launch-1.0 udpsrc address=<Receiver_IP> port=5004 buffer-size=1000000 ! 'application/x-rtp,media=video,clock-rate=90000,encoding-name=RAW,sampling=BGR,depth=(string)8, width=(string)640,height=(string)480,colorimetry=(string)SMPTE240M,payload=(int)96,a-framerate=30' ! rtpjitterbuffer latency=300 ! rtpvrawdepay ! videoconvert ! xvimagesink

4. With OpenCv
When you’ve got something fine for your case, from opencv your would use the same sender pipeline where you would replace videotestsrc by appsrc ! queue and adjust resolution for your VideoWriter.

5. Without OpenCv - appsrc
That being said, your best option would be to directly stream H264 frames from the camera without decoding. You may have a look to gstreamer tutorials for building an appsrc application where you would put your H264 frames into gst buffers, then it would just have to launch:

appsrc ! queue ! h264parse ! rtph264pay ! udpsink auto-multicast=0 host=<Receiver_IP> port=5004 

Indeed it is! Thank you

I tried the above cases and found “3. RTP/RAW” to be the best one. However, I noticed a latency of 6-7 seconds within localhost. My PC is Intel i7 16 GB RAM running Ubuntu 22.04 OS.

BTW, I have set net.core.wmem_max and net.core.rmem_max as per your instructions.

Let me show you the updated code snippet below:

class CameraStreamer {
public:
  CameraStreamer() : fps_(10), width_(2304 / 2), height_(1152 / 2) {
    std::string pipeline = fmt::format("appsrc ! queue ! video/x-raw,format=BGR,width={},height={} ! queue ! "
                                       "rtpvrawpay ! udpsink auto-multicast=0 host=127.0.0.1 port=5004 "
                                       "buffer-size=1000000", width_, height_);
    writer_ = std::make_unique<cv::VideoWriter>(pipeline, cv::CAP_GSTREAMER, 0, fps_, cv::Size(width_, height_));
    // Find the decoder for the h264 and initilize AV frame, packet etc
  }

  ~CameraStreamer() { /* Dispose AV frame, packet etc */ }

  void H264Callack(const uint8_t *data, size_t size) override {
    packet_->data = const_cast<uint8_t *>(data);
    packet_->size = size;

    if (avcodec_send_packet(codec_ctx_, packet_) == 0) {
      while (avcodec_receive_frame(codec_ctx_, frame_) == 0) {
        cv::Mat yuv = AVFrametoYUV(frame_);
        cv::Mat bgr;
        cv::cvtColor(yuv, bgr, cv::COLOR_YUV420p2BGR);
        cv::Mat img = UndistortImage(img);

        writer_->write(img); // write data to gstreamer
        cv::imshow("Live Video", img);
        cv::waitKey(1);
      }
    }
  }
};

This this code, I can see image stream in realtime (almost) on OpenCV display. To do that, I have to comment writer_->write(img) . I mean to say that if I am not writing any data to GStreamer, I can see the live stream. However, the moment I start writing to GStreamer by enabling writer_->write(img);, the OpenCV live steam becomes slow (2-3 seconds latency). Finally, I open the Receiver from terminal, I can see the stream with 6-7 seconds latency.

Any work arounds to minimize this latency? Please keep in mind that the H264 data is received inside a callback function, thus I planned using OpenCV VideoWriter.

I look for your expert advise. Thank you again

Thank you for your help @tpm and @Honey_Patouceul . I managed to resolve the issue by removing VideoWriter from OpenCV and using GStreamer instead. I’m really enjoying working with GStreamer! I still feel like I’m looking at the iceberg from outside, thus I appreciate your guidance more than ever.

There are few more issues to address, so I’ve created a new post to keep things organized. Could you please take a look at my latest post here?

I’m looking forward to your suggestions. Thanks again!

6-7s latency to localhost seems unexpected (furthermore as best case).
Is that the result from the exact sender command from my above post using simulated video 640x480 source with receiver command on same machine (<Receiver_IP> = 127.0.0.1) ?
If yes there may be an issue in your setup. Network misconfiguration might be a cause, though I couldn’t accurately advise due to the huge sum of possibilities.

Try to transmit data out of gstreamer and check latency for confirmation.

Using an embedded device (12 cores aarch64) with Ubuntu 22.04 the posted case works fine (without tweaking MTU), your platform hardware would probably be able to better perform.