Working on a face tracking zoom plugin

valpackett · March 31, 2025, 10:55pm

Hello

I’ve been working on a plugin (in Rust of course), currently called gst-middle-arena in a subtle reference to a commercial OS feature that inspired it xD Basically it filters a video stream to zoom/crop into (keeping a fixed output size) the area of the image that has people’s faces in it. The need came from mini-conference recordings – I wanted a close-up vertical video feed of the speaker to display side by side with the projector feed, but I didn’t want to operate the camera physically, but to set it up once with a wide FoV and not need to touch it.

Currently it’s a bit hacky in places, so I’d like to clean it up, upstream upstreamable pieces if possible, etc.

I wanted to run the detection asynchronously from the filtered feed – I was happy to see that in GStreamer I could just offload that to built-in elements: run a tee with one output going into the detection element through a leaky queue, and the other output going into the zoom-crop filter…
- However, I couldn’t figure out a “proper” way to pass data between these two elements (maybe I didn’t try hard enough, pipeline messages something something (?) – can they flow like that) and for now the communication just happens through a global mutex variable (lol). What’s the proper way to pass simple data (a rectangle) between a producer and a consumer that run as parallel pipeline elements?
Is there anything to do in the zooming-cropping element to ensure it doesn’t introduce variable latency?
I could not for the life of me figure out how the GstVideoCropMeta based cropping worked, it would produce very strange results. I ended up just using the fast_image_resize crate to do the whole crop-and-scale operation inside the plugin.
- But really the motion processing element specific to this task should just set the rectangle as metadata (maybe the new Region-of-Interest meta in 1.16) and another element or a combination of (upstream) elements should provide a crop-into-ROI-and-scale-to-fixed-output-size operation. Has anyone else worked on that before?
- Is the built-in default image scaling SIMD accelerated? Should I make a ~~fast_image_resize~~ pic-scale plugin and contribute it to plugins-rs?
  - What about format conversion, would it make sense to do the same with yuvutils-rs?
I’ve used the tract runtime for the model directly from my Rust code, but seeing as gstonnxinference exists in plugins-bad already, I should contribute an equivalent tract-based plugin to plugins-rs, right?
Looks like that’s already been discussed but pipewiresink’s mode=provide has been disappointing so far with its automatic termination upon “no more clients”… I ended up mostly running the pipeline inside of OBS so far, but I’d like to export the processed feed as a “virtual camera” that doesn’t just disappear because no one’s watching it.

Thanks!