GStreamer TCP Stream incorrect frame size using OpenCV resize

Rz-Rz · January 6, 2025, 4:55pm

I am sending a video stream using the following GStreamer command line :

gst-launch-1.0 v4l2src device=/dev/video0 ! "image/jpeg, width=1280, height=720, fps=30/1" ! jpegdec ! videoconvert ! videoscale ! "video/x-raw, width=1760, height=990, pixel-aspect-ratio=1/1" ! jpegenc ! multipartmux ! tcpserversink port=5000 host=127.0.0.1

Everything seems to work well, except that on the receiving end, I am getting an error when I try to put the frame from a one-dimensional array to a 3 dimensional array using opencv:

class TcpMultipartFetcher:
    """
    Connects to a TCP multipart stream at `host:port` and decodes frames.
    Provides:
      - headless or non-headless (display) mode
      - a get_latest_frame() method to fetch the latest frame as a numpy array
    """

    def __init__(self, host="127.0.0.1", port=5000, headless=False):
        """
        :param host: TCP server hostname or IP
        :param port: TCP server port
        :param headless: if False, display the video in a GStreamer window
        """
        self.host = host
        self.port = port
        self.headless = headless

        # We store the latest frame here
        self._frame_queue = Queue(maxsize=1)
        self._running = False

        # Initialize GStreamer
        Gst.init(None)

        # Build GStreamer pipeline string
        #
        # Explanation:
        #   tcpclientsrc  : Connect to a TCP server
        #   multipartdemux: Split multipart (JPEG) data
        #   tee           : Duplicate the demuxed stream into two branches:
        #                   1) for display (autovideosink)
        #                   2) for appsink (to get frames in Python)
        #   appsink       : Allows Python to retrieve frames (emit-signals=true)
        #
        # The "sync=false" ensures the appsink doesn’t wait for clock sync.
        # If headless, we skip the tee -> autovideosink branch.
        if not self.headless:
            pipeline_str = (
                f"tcpclientsrc host={self.host} port={self.port} ! "
                f"multipartdemux ! tee name=t "
                f"t. ! queue ! jpegdec ! videoconvert ! autovideosink "
                f"t. ! queue ! jpegdec ! videoconvert ! appsink name=appsink "
                f"emit-signals=true sync=false"
            )
        else:
            pipeline_str = (
                f"tcpclientsrc host={self.host} port={self.port} ! "
                f"multipartdemux ! jpegdec ! videoconvert ! "
                f"appsink name=appsink emit-signals=true sync=false"
            )

        # Create the pipeline from the string
        self.pipeline = Gst.parse_launch(pipeline_str)

        # Retrieve the appsink element to attach a callback
        self.appsink = self.pipeline.get_by_name("appsink")
        if not self.appsink:
            print("Error: Could not find appsink in pipeline!")
            sys.exit(1)

        # Connect to the new-sample signal
        # This is called each time a new frame arrives
        self.appsink.connect("new-sample", self._on_new_sample)

        # We’ll run the GStreamer main loop in a separate thread
        self.loop = GLib.MainLoop()
        self.loop_thread = threading.Thread(target=self._loop_worker, daemon=True)

    def _loop_worker(self):
        """
        Worker method that runs the GLib main loop.
        """
        try:
            self.loop.run()
        except Exception as e:
            print(f"GStreamer main loop exception: {e}")

    def _on_new_sample(self, sink):
        """
        Callback when the appsink has a new sample (new frame).
        We pull the sample, convert to numpy, and store it.
        """
        sample = sink.emit("pull-sample")
        if not sample:
            return Gst.FlowReturn.ERROR

        # Extract buffer
        buf = sample.get_buffer()
        caps = sample.get_caps()
        structure = caps.get_structure(0)

        # Typically for a JPEG-decoded stream:
        #   format=BGR (or sometimes RGB),
        #   width=..., height=...
        width = structure.get_value("width")
        height = structure.get_value("height")

        # Extract the actual frame bytes
        success, mapinfo = buf.map(Gst.MapFlags.READ)
        if not success:
            return Gst.FlowReturn.ERROR

        # Convert to a NumPy array (assuming BGR)
        frame_data = np.frombuffer(mapinfo.data, dtype=np.uint8)
        buf.unmap(mapinfo)

        expected_size = width * height * 3
        print(f"Actual data size: {frame_data.size}, Expected size: {expected_size}")

        # Reshape to the correct dimensions
        # (height, width, 3) for 3 channels (BGR)
        **frame = frame_data.reshape((height, width, 3))**

        # Store the latest frame
        if not self._frame_queue.full():
            self._frame_queue.put(frame)

        return Gst.FlowReturn.OK

I am getting the following error :

Traceback (most recent call last):
 File "TcpMultipartFetcher.py", line 128, in _on_new_sample
   frame = frame_data.reshape((height, width, 3))
ValueError: cannot reshape array of size 2613600 into shape (990,1760,3)
Actual data size: 2613600, Expected size: 5227200

However the value that OpenCV is getting from the width/height seems to match what I am setting in the caps of the videoscale. But not the frame size and I don’t know why.

Rz-Rz · January 9, 2025, 1:25pm

GStreamer was feeding YUV2 frames instead of BGR as I was expecting, creating the mismatch of size.