Qtmux & mp4mux sets subtitle duration to 0

Hello,

Trying to use gstreamer to dynamically inject subtitles from appsrc and mux into fragmented mp4 with qtmux.

Here is my pipeline:

shmsrc socket-path=tmp/ipc/Router_output_1 do-timestamp=true is-live=true ! queue ! tsdemux name=demux demux. ! queue name=video_queue ! h264parse ! queue max-size-time=5000000000 ! mux.video_0 demux. ! queue name=audio_queue ! aacparse ! queue max-size-time=5000000000 ! mux.audio_0 appsrc name=FMP4_298bc64e629b1228 caps=application/x-subtitle format=time do-timestamp=false ! subparse ! mux.subtitle_0 qtmux name=mux fragment-duration=100 streamable=true ! filesink location=tmp/output.mp4

I can run the pipeline smoothly, no errors, I get my output.mp4 file with a subtitle stream as expected.

Properties:
  Duration: 0:00:16.417400000
  Seekable: yes
  Live: no
  container #0: Quicktime
    video #1: H.264 (High 4:4:4 Profile)
      Stream ID: df096621ff7474dedd1069d0baea21fd7babf3d2232005918494d7a1de52cd3e/001
      Width: 1920
      Height: 1080
      Depth: 24
      Frame rate: 4000/77
      Pixel aspect ratio: 1/1
      Interlaced: false
      Bitrate: 0
      Max bitrate: 0
    audio #2: MPEG-4 AAC
      Stream ID: df096621ff7474dedd1069d0baea21fd7babf3d2232005918494d7a1de52cd3e/002
      Language: <unknown>
      Channels: 2 (front-left, front-right)
      Sample rate: 44100
      Depth: 32
      Bitrate: 0
      Max bitrate: 0
    subtitles #3: Timed Text
      Stream ID: df096621ff7474dedd1069d0baea21fd7babf3d2232005918494d7a1de52cd3e/003
      Language: <unknown>

Unfortunately when I play the file and select the subtitle track, nothing appears.

I extracted the .srt with ffmpeg and this is what it looks like:

1
00:00:00,000 --> 00:00:00,000
<font face="Serif" size="0">Hello, world! 120</font>

2
00:00:02,000 --> 00:00:02,000
<font face="Serif" size="0">Hello, world! 140</font>

3
00:00:04,000 --> 00:00:04,000
<font face="Serif" size="0">Hello, world! 160</font>

4
00:00:06,000 --> 00:00:06,000
<font face="Serif" size="0">Hello, world! 180</font>

5
00:00:08,000 --> 00:00:08,000
<font face="Serif" size="0">Hello, world! 200</font>

Two issues:

  • The duration has been set to 0 for each segment (start = end).
  • The text is embedded in a font tag with size=0

I guess this is why the subtitles are not appearing when playing the track.

I’ve spent quite a lot of time debugging subparse, qtmux & mp4mux to understand why buffers are not timed properly.

And from the logs it seems ok, here is an excerpt:

0:00:22.148166291 59581    0x13c499140 DEBUG                  qtmux gstqtmux.c:5704:find_best_pad:<mux> Choosing pad mux:subtitle_0
0:00:22.148173625 59581    0x13c499140 DEBUG                  qtmux gstqtmux.c:5240:gst_qt_mux_add_buffer:<mux> Switching to next chunk for pad mux:subtitle_0: offset 2326608, size 1857, duration 0:00:00.020000000
0:00:22.148181000 59581    0x13c499140 DEBUG                  qtmux gstqtmux.c:5360:gst_qt_mux_add_buffer: dts: 99:99:99.999999999 pts: 0:00:20.000000000 timebase_dts: 20000 pts_offset: 0
0:00:22.148188833 59581    0x13c499140 DEBUG                  qtmux gstqtmux.c:2109:gst_qt_mux_send_mdat_header:<mux> Sending mdat's atom header, size 2
0:00:22.148194416 59581    0x13c499140 DEBUG                  qtmux gstqtmux.c:4512:gst_qt_mux_pad_fragment_add_buffer:<mux> sending fragment 0x13a4d40d0
0:00:22.148203416 59581    0x13c499140 DEBUG                  qtmux gstqtmux.c:4576:gst_qt_mux_pad_fragment_add_buffer:<mux:subtitle_0> calculating base decode time with first dts 12000 (0:00:12.000000000) and current dts 20000 (0:00:20.000000000) of 8000 (+0:00:08.000000000)
0:00:22.148210125 59581    0x13c499140 DEBUG                  qtmux gstqtmux.c:2109:gst_qt_mux_send_mdat_header:<mux> Sending mdat's atom header, size 19
0:00:22.148214375 59581    0x13c499140 DEBUG                  qtmux gstqtmux.c:4512:gst_qt_mux_pad_fragment_add_buffer:<mux> sending fragment 0x13a4c6800
0:00:22.148222125 59581    0x13c499140 DEBUG                  qtmux gstqtmux.c:4576:gst_qt_mux_pad_fragment_add_buffer:<mux:subtitle_0> calculating base decode time with first dts 12000 (0:00:12.000000000) and current dts 21000 (0:00:21.000000000) of 9000 (+0:00:09.000000000)

Not sure what is happening… If anyone could help me sort this out, it would be awesome!

Thanks!

Feels like some bug, perhaps you have a short TS you can share, am I write that it reproduce even with files instead of shmsrc?

Yes good point!

Here are steps to reproduce.

First launch this python script to start the muxing pipeline:

import threading

import gi  # type: ignore
import time

gi.require_version("Gst", "1.0")
gi.require_version("GstApp", "1.0")

from gi.repository import GLib, Gst, GstApp


def build_cmd() -> list[str]:
    cmd = [
        "udpsrc",
        "port=20000",
        "!",
        "queue",
        "!",
        "identity",
        "silent=false",
        "!",
        "tsdemux",
        "name=demux",
        "demux.",
        "!",
        "queue",
        "name=video_queue",
        "!",
        "h264parse",
        "!",
        "queue max-size-time=5000000000",
        "!",
        "mux.video_0",
        "demux.",
        "!",
        "queue",
        "name=audio_queue",
        "!",
        "aacparse",
        "!",
        "queue max-size-time=5000000000",
        "!",
        "mux.audio_0",
        "appsrc",
        f"name=subtitle_appsrc",
        "caps=text/x-raw,format=utf8",
        "format=time",
        "do-timestamp=false",
        "!",
        "queue max-size-time=5000000000",
        "!",
        "mux.subtitle_0",
        "qtmux",
        "name=mux",
        "fragment-duration=100",
        "streamable=true",
        "!",
        "filesink",
        "location=tmp/output.mp4",
    ]
    return cmd


def push_subtitles(_: Gst.Pipeline, appsrc: GstApp.AppSrc) -> None:
    """Push subtitles to the appsrc."""
    index = 1

    while index < 100:
        print(f"Feeding subtitles {index}")
        buf = Gst.Buffer.new_wrapped(f"Subtitle {index}".encode())
        buf.pts = buf.dts = 1000 * index * Gst.MSECOND
        buf.duration = 1000 * Gst.MSECOND
        appsrc.push_buffer(buf)
        index += 1

        time.sleep(1)

    appsrc.emit("end-of-stream")


def mux_subtitles() -> None:
    """Method run in a subprocess."""
    Gst.init(None)
    cmd = build_cmd()

    print(" ".join(cmd))
    pipeline = Gst.parse_launch(" ".join(cmd))
    appsrc = pipeline.get_by_name("subtitle_appsrc")

    demux = pipeline.get_by_name("demux")

    def on_pad_added(_: Gst.Element, pad: Gst.Pad) -> None:
        """Handle pad-added signal."""
        print("Pad added...", pad.get_current_caps().to_string())
        if "video/" in pad.get_current_caps().to_string():
            video_queue = pipeline.get_by_name("video_queue")
            pad.link(video_queue.get_static_pad("sink"))

        if "audio/" in pad.get_current_caps().to_string():
            audio_queue = pipeline.get_by_name("audio_queue")
            pad.link(audio_queue.get_static_pad("sink"))

    demux.connect("pad-added", on_pad_added)

    pipeline.set_state(Gst.State.PLAYING)
    glib_loop = GLib.MainLoop()

    t = threading.Thread(target=push_subtitles, args=(pipeline, appsrc), daemon=True)
    t.start()

    try:
        glib_loop.run()
    except KeyboardInterrupt:
        pass
    finally:
        print("setting to null")
        pipeline.set_state(Gst.State.NULL)


if __name__ == "__main__":
    mux_subtitles()

Then use this command to feed video:

gst-launch-1.0 -v \
    videotestsrc is-live=true \
    ! video/x-raw, width=1920, height=1080, framerate=50/1 \
    ! videoconvert \
    ! x264enc tune=zerolatency bitrate=1000 speed-preset=ultrafast \
    ! h264parse \
    ! queue \
    ! mux. \
    audiotestsrc wave=sine freq=440 is-live=true \
    ! audio/x-raw, rate=44100, channels=2 \
    ! audioconvert \
    ! avenc_aac bitrate=128000 \
    ! queue \
    ! mux. \
    mpegtsmux name=mux \
    ! udpsink host=127.0.0.1 port=20000

After some more testing, this setup seem to work.

The result I got was due to the tool I used to extract subtitles afterwards.

(ffmpeg apparently isn’t good to extract proper srt from mp4…)