Overlaying rtsp feeds and streaming to web

Arnold · April 22, 2024, 2:00pm

• Jetson Xavier NX
• DeepStream 6.3
• GStreamer 1.16.3
• Issue Type: Question

Hello Community,

With the hardware and specs I’ve listed, what would be an efficient way to overlay, say, half a dozen RTSP feeds with simple graphics (like circles, text, boxes) and then stream them to web? I’d prefer a method with fairly low latency (a constant delay of preferably sub-5 seconds). If GStreamer isn’t an ideal approach, I’m open to other suggestions. I’d be especially grateful if there’s an example or two you can point me to, thanks.

[Note: My Jetson device is tied to the abovementioned GStreamer version, and an upgrade isn’t officially recommended - users on the NVIDIA forums have reported that this brings up complications, such as necessitating a device reflash.]

Honey_Patouceul · April 22, 2024, 8:22pm

For your Xavier NX platform, I think that your best bet for performance would be using NV DeepStream gst nvdsosd plugin.
Sorry I can’t help much further with putting your metadata into NvDsBatchMeta structure, but if you can provide a test code it would be easier to test and provide some advice if any.
Someone better skilled with DS may better advise.

Arnold · April 23, 2024, 3:02pm

Thanks
Here’s a minimal code to demonstrate my overall aim:

import cv2
from flask import Flask, Response
from imutils.video import VideoStream
import queue
from threading import Thread

app = Flask(__name__)

# Queue for data communication
msg_queue = queue.Queue()

# Colours, dimensions and other settings
RED = (0, 0, 255)
src_width = 1920
src_height = 1080

# RTSP urls
src_url = "rtsp://{feedpath}"

def populate_queue(msg_queue=msg_queue):
# Here I maintain a queue of centroid coordinates received from another program
# This function is not strictly needed for this minimal example, I only include
# it to paint a picture of the overall structure of the program

    while True:
        try:
            # Store centroid coordinates as dictionary in the message queue
            msg_queue.put( {sample x, y} )

        except KeyboardInterrupt:
            print ("quitting")

        except Exception as e:
            print(f"Error saving coordinates: {e}")

def process_frames():
    src_stream = VideoStream(src=src_url).start()

    while True:
        frame = src_stream.read()

        if not msg_queue.empty():
            msg_dict = msg_queue.get()
            # Extract saved centroid coordinates. Here I'll use an example
            x = ( ( 0.08 - 0.09/2 ) * src_width )
            y = ( ( 0.57 - 0.29/2 ) * src_height )
            x_end = x + ( 0.09 * src_width )
            y_end = y + ( 0.29 * src_height )

            print (str(x) + ' and ' + str(y))
            
            # Draw RED box on the video
            cv2.rectangle(frame, (int(x), int(y)), (int(x_end), int(y_end)), RED, 2)

        # Encode the frame as JPEG
        _, jpeg = cv2.imencode('.jpg', frame)
        frame_bytes = jpeg.tobytes()

        yield (b'--frame\r\n'
               b'Content-Type: image/jpeg\r\n\r\n' + frame_bytes + b'\r\n\r\n')
    
    src_stream.stop()

@app.route('/video_feed')
def video_feed():
    return Response(process_frames(), mimetype='multipart/x-mixed-replace; boundary=frame')  

def main():
    queueing_thread.start()

    app.run(host='0.0.0.0', port=5560, debug=True)

    queueing_thread.join()
    cv2.destroyAllWindows()

if __name__ == '__main__':
    queueing_thread = Thread(target=populate_queue, args=())
    # Daemonize the thread to exit when the main program exits
    queueing_thread.daemon = True
    main()

If OpenCV, Flask and the other listed required programs are installed correctly, the output should be accessible at http://127.0.0.1:5560/video_feed

Honey_Patouceul · April 23, 2024, 7:36pm

Looking at your case, I have installed Flask and imutils, though it fails to run at:

msg_queue.put( {sample x, y} )

# or
msg_queue.put( {sample, x, y} )

as sample is not defined. Please try it and fix it for maybe further advice.

Arnold · April 24, 2024, 2:40pm

Thank you. Well, that specific function is not needed to demonstrate a minimal working example, but rather my purpose for including it was to paint a more complete picture of this program’s structure and show that coordinates are continuously being received from another program, to direct where the boxes are to be drawn. The comment I included at the start of that function explaining this might not be so visible; apologies.

To illustrate my objective without relying on external inputs, I’ve updated the code to generate coordinate values that simulate the movement of a box from the left of the screen to the right, within a loop:

import cv2
from flask import Flask, Response
from imutils.video import VideoStream
import queue
from threading import Thread
import time

app = Flask(__name__)

# Queue for data communication
msg_queue = queue.Queue()

# Colours, dimensions and other settings
RED = (0, 0, 255)
src_width = 1920
src_height = 1080

def populate_queue(msg_queue=msg_queue):
    while True:
        for x in range(0, src_width, 10):  # Increment x coordinate by 10 pixels
            msg_queue.put({"x": x, "y": src_height // 2})  # Place y at the center
            time.sleep(0.1)

def process_frames():
    src_stream = VideoStream(src=src_url).start()

    while True:
        frame = src_stream.read()

        if not msg_queue.empty():
            msg_dict = msg_queue.get()
            x = msg_dict["x"]
            y = msg_dict["y"]
            x_end = x + 100  # Adjust the width of the box
            y_end = y + 100  # Adjust the height of the box

            # Draw red box on the video
            cv2.rectangle(frame, (int(x), int(y)), (int(x_end), int(y_end)), RED, 2)

        # Encode the frame as JPEG
        _, jpeg = cv2.imencode('.jpg', frame)
        frame_bytes = jpeg.tobytes()

        yield (b'--frame\r\n'
               b'Content-Type: image/jpeg\r\n\r\n' + frame_bytes + b'\r\n\r\n')
    
    src_stream.stop()

@app.route('/video_feed')
def video_feed():
    return Response(process_frames(), mimetype='multipart/x-mixed-replace; boundary=frame')  

def main():
    queueing_thread.start()

    app.run(host='0.0.0.0', port=5560, debug=True)

    queueing_thread.join()
    cv2.destroyAllWindows()

if __name__ == '__main__':
    src_url = "rtsp://{your rtsp path}"
    queueing_thread = Thread(target=populate_queue, args=())
    # Daemonize the thread to exit when the main program exits
    queueing_thread.daemon = True
    main()

This particular approach is inadequate; for one, the video playback doesn’t smoothly transition between frames, as you’ll quickly see when you have a moving subject in your camera’s view. You’ll also notice that whether the sleep time for populating the queue is 0.1s or 0.01s it doesn’t seem to visibly advance the box any faster across the screen (this may be due to some limitation I haven’t considered). I’m hoping I can get help or suggestions on these forums, whether GStreamer is an appropriate solution, or whether to go another direction entirely.

Thanks again!

Honey_Patouceul · April 24, 2024, 7:30pm

This time I’ve been able to further test your case.

1. I think that the main issue you’re facing is the management of the queue. I’m not so familiar with python and its Threads/Queue modules, but just adapting your code to do that in the process_frames loop should convince you:

import cv2
from flask import Flask, Response
from imutils.video import VideoStream
import time

app = Flask(__name__)

# Colours, dimensions and other settings
RED = (0, 0, 255)   # BGR

src_width = 1920
src_height = 1080

xstep = 5
xpos = range(0, src_width, xstep)
xnum = int(src_width / xstep)
ypos = src_height / 2


def process_frames():
    src_stream = VideoStream(src=src_url).start()

    while True:
        frame = src_stream.read()
        xposIdx = int(process_frames.frame_nb) % int(xnum)
        x = xpos[xposIdx]
        y = ypos
        x_end = x + 100  # Adjust the width of the box
        y_end = y + 100  # Adjust the height of the box

        # Draw red box on the video
        cv2.rectangle(frame, (int(x), int(y)), (int(x_end), int(y_end)), RED, 2)
            
        # Encode the frame as JPEG
        _, jpeg = cv2.imencode('.jpg', frame)
        frame_bytes = jpeg.tobytes()

        yield (b'--frame\r\n'
               b'Content-Type: image/jpeg\r\n\r\n' + frame_bytes + b'\r\n\r\n')
        process_frames.frame_nb = process_frames.frame_nb + 1
        #cv2.waitKey(1)
    src_stream.stop()
process_frames.frame_nb = 0

@app.route('/video_feed')
def video_feed():
    return Response(process_frames(), mimetype='multipart/x-mixed-replace; boundary=frame')  


def main():
    app.run(host='127.0.0.1', port=5560, debug=True)
    cv2.destroyAllWindows()

if __name__ == '__main__':
    src_url = "rtsp://127.0.0.1:8554/test"
    main()

And you may use one of the following pipelines for testing from localhost:

gst-launch-1.0 souphttpsrc is-live=1 location=http://127.0.0.1:5560/video_feed ! queue ! multipartdemux single-stream=1 ! image/jpeg ! jpegdec ! videoconvert ! autovideosink

# Or using NVDEC
gst-launch-1.0 souphttpsrc is-live=1 location=http://127.0.0.1:5560/video_feed ! queue ! multipartdemux single-stream=1 ! image/jpeg ! nvv4l2decoder mjpeg=1 ! nvvidconv ! autovideosink

One part of the missed synchro issue may also come from imutils.

2. Using gstreamer backend allows a much more periodic frame acquisition and processing. Also try:

import cv2
from flask import Flask, Response

# Check that your opencv build for this python version has gstreamer videoio support
import re
print('GStreamer support: %s' % re.search(r'GStreamer\:\s+(.*)', cv2.getBuildInformation()).group(1))


app = Flask(__name__)

# Colours, dimensions and other settings
RED = (0, 0, 255)   # BGR

def process_frames():
    gst_cap_str = 'uridecodebin uri=' + src_url + ' ! nvvidconv ! video/x-raw,format=BGRx ! videoconvert ! video/x-raw,format=BGR ! queue ! appsink drop=1'
    print(gst_cap_str)
    cap = cv2.VideoCapture(gst_cap_str, cv2.CAP_GSTREAMER)
    if not cap.isOpened():
        print('Failed to open capture. Exiting')
        quit()
    fps = float(cap.get(cv2.CAP_PROP_FPS))
    w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    size = (w, h)
    print(f"Size: {(size)}, FPS: {fps}")

    xstep = 5
    xpos = range(0, w, xstep)
    xnum = int(w / xstep)
    ypos = h / 2

    while True:
        ret, frame = cap.read()
        if not ret:
            print('Failed to read from camera, aborting')
            break
        xposIdx = int(process_frames.frame_nb) % xnum
        x = xpos[xposIdx]
        y = ypos
        x_end = x + 100  # Adjust the width of the box
        y_end = y + 100  # Adjust the height of the box

        # Draw red box on the video
        cv2.rectangle(frame, (int(x), int(y)), (int(x_end), int(y_end)), RED, 2)
            
        # Encode the frame as JPEG
        _, jpeg = cv2.imencode('.jpg', frame)
        frame_bytes = jpeg.tobytes()

        yield (b'--frame\r\n'
               b'Content-Type: image/jpeg\r\n\r\n' + frame_bytes + b'\r\n\r\n')
        process_frames.frame_nb = process_frames.frame_nb + 1
        #cv2.waitKey(1)
    cap.release()
process_frames.frame_nb = 0

@app.route('/video_feed')
def video_feed():
    return Response(process_frames(), mimetype='multipart/x-mixed-replace; boundary=frame')  


def main():
    app.run(host='127.0.0.1', port=5560, debug=True)
    cv2.destroyAllWindows()

if __name__ == '__main__':
    src_url = "rtsp://127.0.0.1:8554/test"
    main()

You can safely ignore the warnings about gstreamer backend unable to query duration and compute position.

Arnold · April 29, 2024, 4:17pm

Thanks for the approaches you’ve suggested!
I’ve tested your code, and it definitely improves things.
I guess the exclusion of the thread altogether has something to do with it; it seems the reliance on threading is one thing that severely limits the throughput of my application. Unfortunately, I require a thread to be constantly listening for the positioning and other info from another program.

It seems my Jetson’s resources aren’t enough to render the feed as well as I’d have hoped because when I access the feed, using both the gst-launch pipeline and the browser, on a more powerful workstation, it produces a more consistent stream. It still has some skips and stalls, but handles much better (running $ htop shows resource useage of all programs is at a low for processing, and a medium for memory, which I believe would be low too, were it not for everything else that’s running on the workstation).

I began looking into incorporating WSGI to better support threading, as well as just going the way of overlaying using the web-based side of things; using some front-end framework makes sense as regards being less computationally expensive than OpenCV.

Honey_Patouceul · April 29, 2024, 7:15pm

Nice to see you’ve moved forward.

I didn’t try to further debug your case but my feeling was that imutils sends batchs of images rather than a synchronous flow as gstreamer does.
As a result, the queue receiving the box positions was frequently full and reset, so that when the batch of frames arrives the positions were lost.

Maybe you can try having a queue of frames and a queue of positions, each fed by different threads, and another thread for composing. First try to give big sizes to queues, if it works you would later try decreasing.

Though, this may just be a simulation case. In real life wouldn’t you first get the frame, then make a processing from it that would get the box attributes then compose ? Or do you get the box attributes from previous frames ? What depth ? You may size the queues considering this.

It seems my Jetson’s resources aren’t enough to render the feed as well as I’d have hoped

Be sure using 6 cores NVP model (15W or 20W) and boosting clocks with jetson_clocks script.
I also think that you may get very different performances from using a cheap SD Card or from using a good NVME SSD for L4T.

Arnold · May 9, 2024, 10:37am

About the power options, that’s just it. My Jetson is equipped with 500GB NVMe, and I’ve set it to use that same power option from all 6 cores, as well as having run the jetson_clocks script. It might be the way I’m processing the frames with OpenCV, as you’ve pointed out. From the resources I’ve scoured, I thought the way I handled them was efficient according to convention. I didn’t find anything further substantial. The gap in this space led me to believe that, for the long-term, it suits my case to make use of a more traditional option such as a web front-end to service as a UI.
Thank you for your time and efforts!