Isofmp4mux is causing a segfault in C application

btgoodwin · May 2, 2024, 3:44pm

I am switching an application over from the older mp4mux to the isofmp4mux RUST element. The application is written in C. The isofmp4mux is instantiated with default options in the application. When the application is idle (not creating a file), everything is fine. However, the moment the data streams begin flowing to the muxer, the application aborts (“core dumped” but it’s Ubuntu 22.04 WSL2 and I cannot find the cores anywhere despite ulimit -c being unlimited). I have tried using the rust-gdb wrapper, but I get the same result. If I set break abort in gdb, it does not break, but I did at least get that the element is to blame:

Thread 14 "isofmp4mux_21ba" received signal SIGSEGV, Segmentation fault.

What’s odd about this is I have unit tests that do streaming video and audio to the muxer and those tests reliably pass (and produce a valid file).

I have run out of ideas on how to debug. Has anyone else run into this situation?

tpm · May 2, 2024, 4:21pm

Have you tried running it in valgrind ?

(Might want to make sure you have debugging symbols installed for libc + glib + gstreamer)

btgoodwin · May 2, 2024, 5:05pm

Nice. Thank you! That gave me a little more to work with but I’m still confused at what I’m seeing.

Our application configures the gstreamer and glib loggers to redirect through our own logging library. The two functions in the trace before strcmp are part of that library:

==18944== Invalid read of size 1
==18944==    at 0x484FBD4: strcmp (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==18944==    by 0x11EC5A: is_healthcheck_log_message (logging.c:68)
==18944==    by 0x11FCE7: gst_fprintf_log (logging.c:494)
==18944==    by 0x4911462: gst_debug_log_literal_full (gstinfo.c:716)
==18944==    by 0x49115B6: gst_debug_log_literal (gstinfo.c:748)
==18944==    by 0x1000BD04: _ZN9gstreamer3log13DebugCategory14log_unfiltered17hadfa8b1aec5ee748E.llvm.5421467856547506564 (log.rs:226)
==18944==    by 0x100188E9: <gstfmp4::fmp4mux::imp::FMP4Mux as gstreamer_base::subclass::aggregator::AggregatorImpl>::sink_event (imp.rs:3127)
==18944==    by 0xFFDF05A: gstreamer_base::subclass::aggregator::aggregator_sink_event (aggregator.rs:768)
==18944==    by 0x100404F7: gstreamer_base::subclass::aggregator::AggregatorImpl::sink_event (aggregator.rs:312)
==18944==    by 0xFFDF768: gstreamer_base::subclass::aggregator::aggregator_sink_event (aggregator.rs:768)
==18944==    by 0x7062E6C: gst_aggregator_do_events_and_queries (gstaggregator.c:996)
==18944==    by 0x48FDEF9: gst_element_do_foreach_pad (gstelement.c:1413)
==18944==  Address 0x1 is not stack'd, malloc'd or (recently) free'd

What is being string-compared in that logging.c:68 is the message that was passed to the gstreamer logger (GST_<LEVEL>[_OBJECT]); we’re comparing it versus a constant string that we use for defining our health check responses (those get written out to a separate location). There are two other references valgrind made to this same stack:

Conditional jump or move depends on uninitialised value(s)
Use of uninitialised value of size 8

I’ve gone to the muxer’s imp.rs line but I cannot find the log.rs anywhere in the gstreamer source (1.22.12). Either way though I don’t see anything wrong with the lines in question and this same logging code (on our side) has been in production for a couple of years now under heavy use with the older mp4mux; we’ve not seen this before.

btgoodwin · May 2, 2024, 5:32pm

I also just learned that cat /proc/sys/kernel/core_pattern dumps the expected location of cores, which for WSL2 is /mnt/wslg/dumps. I was able to find a core which gave me a backtrace matching valgrind, showing that the string comparison is being applied to that invalid address 0x1.

Our code is expecting the format field of the GstDebugMessage created in gstinfo.c::gst_debug_log_literal_full to be set to something as it usually is, but it seems rust logging is taking a different path and the pointer is left uninitialized in that function (versus gstinfo.c::gst_debug_log_full_valist which does at least set it to the passed-in format argument).

btgoodwin · May 2, 2024, 7:41pm

@tpm having format initialized to NULL in that function makes our existing code work fine. I’ve worked around it by instead checking if message->message is non-null, and if so, check for our prefix. That change got me past the segfaulting when the rust element attempts to log messages.

Now I’m at the point where the application suddenly stops recording as soon as it starts because the muxer is emitting an immediate error Require DTS for video streams. The application doesn’t touch the PTS/DTS anywhere along the pipeline from the rtpbin, rtph264depay, and h264parse elements that precede the muxer, so I’m stumped as to what the problem is, but that’s clearly an issue for a different thread.

If you agree that the non-initialized format pointer member should be initialized, I would be happy to PR that one-liner.

tpm · May 2, 2024, 10:49pm

I’m not sure I fully follow yet, but sounds like you should file a MR with that change

As for the DTS issue, try adding an h264timestamper element.

slomo · May 3, 2024, 6:47am

GstDebugMessage is a private struct and you should not depend on its content at all and it might change in random ways between GStreamer versions. Only use the public API of it, i.e. gst_debug_message_get() and gst_debug_message_get_id().

What happens here is that gst_debug_log_literal_full() is called by the Rust code. This does not initialize the format but only initializes the message. This is not a problem: see the implementation of gst_debug_message_get(). Your code however assumes that format is always set to something meaningful, which is not the case.

The same problem would also happen if C (or whatever) code calls gst_debug_log_literal().

In summary: don’t use private implementation details of GstDebugMessage.

btgoodwin · May 3, 2024, 1:03pm

Yes, I tried using that for a good bit yesterday but could not get the muxer to operate without spewing INFO and WARN -level messages into the log like a flood.

H264 via RTP through isofmp4mux DTS/PTS issues - Application Development - GStreamer Discourse

That’s where I’m picking up this morning, trying to understand what the missing piece is here with ingesting H264 (and opus audio) via RTP and muxing it to a file as we had before with the qtmux -based muxer.

btgoodwin · May 3, 2024, 1:21pm

Yes, we know it’s a private. There are no public APIs for getting access to the arguments list in that structure though, so far as we have found, and our function needs to make use of the arguments in this case. While gst_debug_add_log_function gave us a way to customize our logging configuration, there were a few things like that opaque object that resulted in quite a bit of copy-paste out of the related object file so that we could maintain most of the original functionality while also inserting additional fields of interest (e.g., logical identifiers printed for all elements in the pipeline so that we can collate logs easily).

You’re correct though, our design was assuming that format would either be initialized NULL or be a valid pointer to a string since a lot of the GLib string parsing routines underneath handle NULL for arguments just fine. So far that has held true for the rest of the gstreamer framework that I have used over the last few years too (that pointers get initialized at a minimum to NULL). I was surprised to land on this yesterday.

slomo · May 3, 2024, 2:49pm

Can you create an issue in GitLab with what features you’re missing from the API? If you have no other way than relying on private implementation details then that’s not ideal

btgoodwin · May 3, 2024, 5:43pm

To be fair, the situation of sorting log entries and then running one out a different path from the other probably could have been better handled by somehow making a different category, perhaps. I’ve brought it up with the developer this morning to see if we can get away from doing this check / relaying information this way through the logging system.

As for features that drove us to this path, I’m not sure what part of what I did would be worth a PR, so I’ll try to explain how I arrived at this solution.

I could not find a way to provide alternate formatters for the object that is passed into the various log entries so that rather than just printing the object name and pointer address we could put some extra information there, because the function that performs the operation in gstreamer is also private to that object file (along with all the code that formats color into the message, if enabled, etc.). Initially all we wanted was the ability to tune how object is handled and ended up having to copy the default log handler, various functions and structures it depends against, etc. into our own logging implementation.

Our application results in multiple large hierarchies and being able to associate elements that we do not “own” (i.e., they’re from GStreamer plugins) with the hierarchy established by our own GstBins. When it is under heavy use, trying to eye-grep a log during an issue is next to impossible since all of the hierarchies of logs would be mixed together. The solution I came up with was to both tag elements (that we do not “own”) with GObject data, and, for our own elements, use an glib interface to help associate a generic element’s log message with our application’s logical hierarchy. Our log handler tries to walk up the object argument and create a colon-separated prefix of that hierarchy so that we get the default log information followed by our IDs:

date time location, etc. ... GRAND:PARENT:CHILD original log formatted message

If we had the ability to extend how object is handled, specifically, the result would have been our prefixes get inserted by the default log handler and at least initially we would have been “done” with that feature.

The second feature was formatting the log output to print JSON blobs, but that did not require any additional copy/paste things but did make use of the object crawler so that when we dump the blob, we have GRAND: <id>, ... for example. Our log aggregator then makes it easy to give columns using keys+values from JSON blob, so if we want only logs for a specific GRAND (and its hierarchy), we can trivially get that as well as a formatted table showing the logged message and its place in the hierarchy.

I’m not sure who else might care to have either of those features.

slomo · May 4, 2024, 9:10am

I’m not sure how such an API would look like, but this seems generally useful. AFAIU you’re not printing the actual hierarchy of the objects based on gst_object_get_parent() but build your own “logical” hierarchy based on an interface / qdata tags. Maybe an interface to generally “tag” objects for debug log purposes could be added to GStreamer and this could be made use of in the default logging function already.

Why do you need access to the format for all this though? Wouldn’t it be sufficient to use gst_debug_message_get() for getting the actual debug message and then formatting around it like gst_debug_log_get_line() and gst_debug_log_default() are doing?

btgoodwin · May 6, 2024, 3:17pm

Correct. I originally tried using get_parent as my association but it would often lead to deadlocks when using the GST_<whatever>_OBJECT API, which would take the object’s lock. I didn’t feel that directly accessing the parent pointer would be wise without that, so I switched to using GObject data as a way to “tag” any elements we directly insert with the ID of the object in the hierarchy that we do own (i.e., that is managing that element for us). And then for our own objects, I defined an interface for establishing the “logging parent” ID to help us skip a few get-parent calls as well as provide metadata for when we’re inserting this into JSON -formatted log messages in production.

The main thing the developer wanted actually was the arguments list so that we could take this health status structure and simply log it from anywhere in the application and let our log handler pull the structure out, printing it across a socket instead, turning our log handler into a router. In this case, we skip trying to serialize the structure to string with gst_debug_message_get in this one case, instead handing the GstStructure arg to a different handler (which writes over a separate socket rather than stderr/stdout). Since format is right there as well, the developer could string compare format against our constant to skip running the serialization entirely. My code change to move forward in my integration effort ends up doing gst_debug_message_get first, serializing everything, and then if the prefix is present, fall back to getting the arguments off that private structure. I’m going to likely change this further to get us completely away from the private pointer or using the logging interface for extricating this structure for us.

btgoodwin · May 6, 2024, 5:19pm

I was thinking about this over lunch. Perhaps it’s a few APIs.

An API that gives one the ability update a mutable list of fields prior to the log line being written out. For example, one of the things we also write out is the time stamp of the logged message, but we write it out as UTC not application duration (or whatever the term would be for the default that starts at 0:00:00 and increments up as the application runs). Having the ability to get passed a mutable list of fields that will be printed (in order) so that we could replace the DATE_TIME (example representative field name) field with our own value would have been handy. Likewise, inserting our own formatted HIERARCHY string after OBJECT_NAME would be the way to tune that behavior as well.
Rather than replacing the whole fprintf logger with a new default logger, an API that takes the immutable version of the list from (1) to then write it out in whatever intended means one has would be good.
A way to prepend what is ostensibly “object to string” that already handles the GObject parameter in the default printf handler would have saved a lot of our copy-paste code. If using this, we would not necessarily have needed 1 when using the stdout printf default log writer, however having access to 1 when we want to write out a JSON blob would have been very helpful (because we could have inserted as many key-value pairs as we needed).

slomo · May 8, 2024, 9:38am

That sounds like you’d want to have the GStreamer logging system extended by a structured logging mechanism? I.e. being able to attach arbitrary, structured data (e.g. as key-value pairs) to log messages that then can be pulled out again by the logging function in a structured way.

Do I understand that correctly?

Anything based on the format string is not going to work well and going to be rather fragile. That there is (sometimes) a format string is already an implementation detail, and how it’s going to look like even more.

I’m not sure how that relates to tagging the objects though, that seems like a completely orthogonal aspect I can see use-cases for both though.

Basically a way to provide some kind of extended format string to the default logging function instead of the default formatting it currently does? I’m not sure how useful that is compared to just providing your own logging function. It doesn’t seem like it would save you a lot of code.

Ideally that list should be exactly the arguments to the logging function though (plus a new one for structured key-value pairs?). I’m not sure how this would look different.

Making gst_debug_print_object() public would solve that, or not?

btgoodwin · May 8, 2024, 3:19pm

You make great points.

I’m not sure what this idea would really look like. Like for example it could be a list of unique field names and pre-formatted strings. The default logger would then create the full formatted line by concatenating the strings together. As for the public API, the ability to search the list for unique field names would make it possible to do various actions like inserting one’s own metadata between two known fields or delete/omit certain fields that are not of interest. This would be the pre-processing of fields that the application developer intends to print.
Conversely to (1), this would be the post-processing (the actual printer function) where the application developer can (as in our case) choose to print JSON blobs rather than the default linear/concatenated string.
Yes, I think having gst_debug_print_object() public would have made this a bit easier to implement on our side (a lot less copy-paste code) since could format the object and fall back to the default gst_debug_print_object() for things that are not ours. Turning it around though so that one can assign a callback to the default logger, so that the logger says, “give me NULL or a formatted string for this GObject,” would accomplish the same thing without having to also implement the rest of the default fprintf logger (with its conditional coloring macros, etc.).

slomo · May 9, 2024, 7:23am

See info: Make gst_debug_print_object() and gst_debug_print_segment() public (!6820) · Merge requests · GStreamer / gstreamer · GitLab , that’s the easy part!

I’d suggest looking at some structured logging APIs to get an idea how this could look like, e.g. g_log_structured().

The basic idea is that you have logs like you do now that provide some opaque log string (which you really shouldn’t try parsing), but instead of having a fixed set of additional structured metadata (source file, line, time, pid/tid, object, etc) this can be extended by the code doing the logging and retrieved again by the logging function to do something meaningful with that structured information.

From my understanding this would cover your use case and would also be generally useful.

btgoodwin · May 9, 2024, 12:18pm

Awesome – thank you for making that PR and the suggestion about g_log_structured(). We’ll take a look at the latter as we plan out standardizing how we log things in our GStreamer applications.

Cheers!

slomo · May 9, 2024, 12:31pm

Just to be clear, supporting structured logging in GStreamer would need API additions to the logging system. That’s not there yet I only pointed at g_log_structured() as an example how such an API could look like.

btgoodwin · May 9, 2024, 1:09pm

Yep, I understand.

We’re currently investigating logging solutions for our existing applications as well as the next one, which will be RUST-based. Finding or making something we can reuse that would support these kinds of features in a more general way is the aim.