Zero copy pipeline on Nvidia

abeltramo · July 10, 2025, 8:29am

Hi everyone!

I’m the main developer behind Games on Whales: a custom server for Moonlight that allows to stream virtual Wayland desktops to multiple clients.
The underlying idea is that we run a custom micro Wayland compositor as a Gstreamer plugin gst-wayland-display so that we can directly output the raw framebuffer to the pipeline that will eventually encode into H264/HEVC/AV1 (depending on what the client asks).

I’ve recently added support for outputting DMABuf in an attempt to achieve a proper zero copy pipeline and everything is pretty much working, especially on Intel/AMD with VAAPI which supports and properly negotiate DMA buffers directly.
On Nvidia it seems that the only way to pass a DMA buffer is to go through glupload so that it turns that out into GLMemory that can then be fed into nv*enc. I’ve got a few questions:

Is this going to be zero copy or is it incurring in a GPU->GPU copy?
Is there any way to avoid the gl plugins and directly pass the DMA buffer?
I’m not familiar with CUDA, so I’m not sure what could be the best path here. Should I look into integrating it into gst-wayland-display and outputting CUDAMemory buffers directly?

Using the gl plugins is causing me all sort of troubles: it seems that when I dispose of one Gstreamer pipeline (potentially we have multiple running pipelines for different clients) it’s messing something up with OpenGL that ends up crashing all the other running Wayland compositors (which use EGL for rendering).