How to reliably start/stop saving an H.264 stream to a file

Hi all. I’ve got a live stream from a MIPI camera. I want my app to be able to save 5s clips from the stream on an on-demand fashion. What I’ve tried so far (see my other question Missing daa at the start of new MP4 file for a description there) isn’t quite working out that well. So instead of asking “why is this broken”, I’m going to see if “how should I do this” might be more meaningful.

My app is running on an embedded Linux device (an NXP i.MX 8M mini) which has a MIPI video input an a hardware H.264 encoder.

The core of the application is based on this pipeline fragment:

v4l2src device=/dev/video0
! video/x-raw,width=1280,height=720,framerate=30/1,format=GRAY8
! appsink max-buffers=1 drop=true async=false

With this, my application has a thread that repeatedly gets frames from the appsink (by emiting “pull-sample” signals) and processes them. It performs some object detection logic and logs the results, retaining the results from the most recent frames, indexed by their timestamp. No problems with any of this.

For debugging purposes, we tee the video and use a cairooverlay to draw the processing results on the frame (using the timestamps to keep it all in sync). Then we encode the result to H.264, wrap that in RTP and send it out over UDP, whee it can be viewed by VLC or other similar receivers:

v4l2src device=/dev/video0
! video/x-raw,width=1280,height=720,framerate=30/1,format=GRAY8
! tee name=t1
  t1. ! appsink max-buffers=1 drop=true async=false
  t1. ! queue max-size-buffers=1 max-size-bytes=0 leaky=downstream
      ! videoconvert
      ! cairooverlay
      ! videoconvert
      ! vpuenc_h264
      ! rtph264pay
      ! udpsink host=192.168.5.2 port=5000 sync=false async=false

The vpuenc_h264 element is provided by NXP and is the hardware CODEC for the chip’s VPU.

Again, this works. I can launch a client on the host at 192.168.5.2 and stream the video from port 5000.

So now, what I want to be able to do is to have a way to, at run-time, save the video to a file. What I’ve tried so far is to add another tee after the H.264 encoder, with that stream going into an MPEG-TS mux, then send the result to a RidgeRun interpipe, where another pipeline saves the stream to a file. By playing and stopping that secondary pipeline, I can start/stop the data saving to the file:

The primary pipeline:

v4l2src device=/dev/video0
! video/x-raw,width=1280,height=720,framerate=30/1,format=GRAY8
! tee name=t1
  t1. ! appsink max-buffers=1 drop=true async=false
  t1. ! queue max-size-buffers=1 max-size-bytes=0 leaky=downstream
      ! videoconvert
      ! cairooverlay
      ! videoconvert
      ! vpuenc_h264
      ! tee name=t2
        t2. ! queue max-size-buffers=1 max-size-bytes=0 leaky=downstream
            ! rtph264pay
            ! udpsink host=192.168.5.2 port=5000 sync=false async=false
        t2. ! queue max-size-buffers=1 max-size-bytes=0 leaky=downstream
            ! h264parse
            ! mpegtsmux
            ! interpipesink name=ip1 sync=false async=false

And the secondary pipeline:

interpipesrc listen-to=ip1 allow-renegotiation=false accept-events=false format=time
! filesink sync=false async=false

So when it’s time to save a video clip, I can set the filename on the filesink and then set the pipelne with the interpipesrc to PLAYING, causing it to save the MPEG-TS buffers to a file. And then stop that pipeline to close the file.

It works, but the resulting video file ends up with up to 3s of black video before the stream plays normally.

I suspect this is because H.264 uses key-frames and we’re just dumping data from the middle of the stream to a file. It syncs up, thanks to the MPEG-TS encoding, but not until the next key frame arrives, causing the player to show a black screen until then.

So the big question is: What is the best way to shunt video into a file in a way that can be turned on and off?

Stopping and restarting the pipeline itself isn’t an option, because that would interrupt the application reading from the appsink.

I assume I wouldn’t see this if I teed the raw frames (before the vpuenc_h264 element, and then encoded them in the secondary pipeline (the interpipesrc), performing the MPEG-TS muxing there. But now there’s two instances of the VPU encoder running.

That in itself might not be a problem, but the above fragment is part of a larger system. There may actually be multiple video files being recorded from the above pipeline, each with its own respective interpipe sink/src. If each one needs its own VPU encoding step, that might consume more CPU/VPU power than the hardware has available.

Any ideas?

Since you may not have access to NXP’s documentation, here’s the gst-inspect output for the vpuenc_h264 element:

$ gst-inspect-1.0 vpuenc_h264
Factory Details:
  Rank                     primary + 1 (257)
  Long-name                IMX VPU-based AVC/H264 video encoder
  Klass                    Codec/Encoder/Video
  Description              Encode raw data to compressed video
  Author                   Multimedia Team <shmmmw@freescale.com>

Plugin Details:
  Name                     vpu
  Description              VPU video codec
  Filename                 /usr/lib/gstreamer-1.0/libgstvpu.so
  Version                  4.7.2
  License                  LGPL
  Source module            imx-gst1.0-plugin
  Binary package           Freescle Gstreamer Multimedia Plugins
  Origin URL               http://www.freescale.com

GObject
 +----GInitiallyUnowned
       +----GstObject
             +----GstElement
                   +----GstVideoEncoder
                         +----vpuenc_h264

Implemented Interfaces:
  GstPreset

Pad Templates:
  SINK template: 'sink'
    Availability: Always
    Capabilities:
      video/x-raw
                 format: { (string)NV12, (string)I420, (string)YUY2, (string)UYVY, (string)RGBA, (string)RGBx, (string)RGB16, (string)RGB15, (string)BGRA, (string)BGRx, (string)BGR16 }
                  width: [ 64, 1920 ]
                 height: [ 64, 1088 ]
              framerate: [ 0/1, 2147483647/1 ]
  
  SRC template: 'src'
    Availability: Always
    Capabilities:
      video/x-h264
          stream-format: { (string)avc, (string)byte-stream }
              alignment: { (string)au, (string)nal }

Element has no clocking capabilities.
Element has no URI handling capabilities.

Pads:
  SINK: 'sink'
    Pad Template: 'sink'
  SRC: 'src'
    Pad Template: 'src'

Element Properties:
  bitrate             : set bit rate in kbps (0 for automatic)
                        flags: readable, writable
                        Unsigned Integer. Range: 0 - 2147483647 Default: 0 
  gop-size            : How many frames a group-of-picture shall contain
                        flags: readable, writable
                        Unsigned Integer. Range: 0 - 32767 Default: 30 
  min-force-key-unit-interval: Minimum interval between force-keyunit requests in nanoseconds
                        flags: readable, writable
                        Unsigned Integer64. Range: 0 - 18446744073709551615 Default: 0 
  name                : The name of the object
                        flags: readable, writable, 0x2000
                        String. Default: "vpuenc_h264-0"
  parent              : The parent of the object
                        flags: readable, writable, 0x2000
                        Object of type "GstObject"
  qos                 : Handle Quality-of-Service events from downstream
                        flags: readable, writable
                        Boolean. Default: false
  qp-max              : maximum QP for any picture, default 0 makes wrapper to set 51
                        flags: readable, writable
                        Integer. Range: 0 - 51 Default: 0 
  qp-min              : minimum QP for any picture
                        flags: readable, writable
                        Integer. Range: 0 - 51 Default: 0 
  quant               : set quant value: H.264(0-51) (-1 for automatic)
                        flags: readable, writable
                        Integer. Range: -1 - 51 Default: -1 

I am not a video expert, but maybe someone more familiar with H.264 can suggest options for the above parameters that might be useful.

Two more related questions:

  • Can this capability (start/stop saving to a file) be done without an interpipe? Is it possible/advisable to stop individual elements within a pipeline without breaking the whole pipeline? If it can be done, I assume it will cause buffers to back up behind the stopped element, but I’m guessing that can be worked around using a leaky queue

  • And if that works, then perhaps I can stop/start the vpuenc_h264 element to get it to restart the encoding with a fresh key-frame. I think that should work if I’m only saving a single file. But I suspect it will introduce a glitch if I do it while one file is saving (e.g. to start a second file-save at that time).

There are new intersrc and intersink elements in upstream gst-plugins-rs main branch now to replace interpipe, fwiw.

Yes, you can do something like this (rather minimal) code example here: H.264 backlog recording example ($1760) · Snippets · freedesktop.org / Snippets · GitLab which shows both recording start/stop on demand (simulated with a timer), and how to keep a certain backlog around.

The backlog functionality is a bit dependent on the encoder though - if the encoder has a fixed buffer pool on the output side it might not support keeping lots of data in a backlog in the queue.

There’s a “force key unit event” in GStreamer / GstVideo that can be sent to an encoder to request a new keyframe. If this particular encoder supports it, I don’t know, but easy enough to test I suppose.

1 Like

Thanks. The link is very helpful. I’ll have to try it out on my system.

Keep in mind that the new intersrc/sink does not have all the features of the C based interpipe from Ridgerun. If you require advanced handling of events then the Rust based plugin is not the way to go as far as I can tell.

We implemented this mechanism. It works, but not in a way that’s good for our application.

More specifically, our application has multiple filesink objects. It is possible to start/stop multiple file-save operations independently from each other.

Using the presented technique (blocking inputs to the muxer), we’d need to have a separate muxer for each file sink.

We’d prefer to tee the output from mpegtsmux and implement the pad-blocking mechanism on queues that follow that tee. But it appears that the probe function never sees the GST_BUFFER_FLAG_DELTA_UNIT flag on buffers generated by the muxer. So we need a different way to identify a key frame. We’re going to start debugging the buffers in that part of the stream to see if there is anything else we can use/

Do you happen to already know of something that might work?

I suppose that we could have a probe on the input to the muxer that can unblock pads on the queues feeding the filesink. That wouldn’t be perfect, because there will be spurious buffers queued that would also go to the file, but hopefully it will be only a very small number.

I would just go for multiple muxers then, with one muxer per output file/branch. Probably easier to manage overall.

1 Like

Just a quick followup. We ultimately implemented something along the lines of the example that @tpm shared.

We did end up using multiple muxers in order to work around the problem of post-mux buffers never using the DELTA_UNIT flag. It turns out that they don’t have nearly as much overhead as feared.

Not an easy solution, but it’s working now. Thanks again for the tip that got us moving in the right direction.