I’m trying to add support for a custom audio codec with the aim of using gstreamer to use this codec with MPEG-DASH. I have a functioning parser and decoder element with typefind support, but the realtime performance is poor when used with uridecodebin pointing to a dash manifest. It is possible that the dash manifest is also to blame since I’ve modified shaka-packager in order to generate it, but everything works fine when streaming to a file which leads me to believe it’s some kind of buffering/latency/timestamping issue somewhere. I could do with some feedback on the overall approach I’ve taken since I’ve mostly been reverse engineering the wav, wavpack, and flac parser/decoders!
For context, the codec uses RIFF for the bitstream, with a config chunk to initilalise the decoder, an optional seek table chunk, and then subsequent packet chunks to be passed to the decoder which typically contain in the order of 10ms of audio and optionally a hash of the packet data.
I currently have a parser element which pulls the config data and sets it on the cap, and then passes the pure codec packet data to the decoder element (also removing the packet hash field if it was present). It does this by constructing a new buffer and setting the frame’s out_buffer
with timestamps determined by the packet duration and a global packet counter. I couldn’t find any useful examples of parser elemnts using the out_buffer
but it does “appear to work”.
I don’t make any use of the skipsize
value which seems to allow you to align the parser’s incoming frame. I tried an alternative implementation using this to pass the packet data without using out_buffer
, but it both performed worse (longer to decode whole stream to file), and appeared to introduce a large number of discont events where timestamps encountered were intespersed with invalid ones (max time value) so clearly didn’t configure something correctly.
Is this a reasonable approach to take? I was wondering if perhaps the parser should simply chop up the RIFF stream into the discrete RIFF chunks, and then have the decoder parse the RIFF aspect of the stream too?
Am I seeing realtime performance issues when streaming dash (not from file) simply because my codec packet duration is very short and so it’s exposing a lack of buffering out of the decoder that other codecs perhaps mask?
pipeline reading from file (performs fine):
gst-launch-1.0.exe filesrc location=.build/example.riff ! decodebin ! audioconvert ! audioresample ! autoaudiosink
pipeline reading from dash server (hicups as new segments are pulled):
gst-launch-1.0.exe uridecodebin uri=http://localhost/manifest.mpd ! audioconvert ! audioresample ! autoaudiosink