Nvcudah264enc vs nv264enc sinks & performance

Although both encoder offer the same feature, they have different requirements to their sinks. nvcudah264enc seems to require its input already in a YUV format while nv264enc also allows RGBA formats. The nvcudah264enc’s restriction means an additional cudaconvert element in the pipeline.
However, measurements show that that RGBA → nv264enc is more performant than RGBA → cudaconvertnvcudah264enc.
Is there a specific reason nvcudah264enc does not provide the same video format interface as nv264enc?

the reason is because NVENC does not expose RGB → YUV conversion related parameters, so it’s not controllable.

NVENC launches CUDA kernel regardless of the input format (I guess it does linear → tiled conversion or similar). So, doing it at once like nvh264enc might be more performant, yes.

Can you share your performance measurement result? I might need to consider adding RGB format support into new encoders depending on perf. differences.

Yes, that’s true, the conversion is then kinda hidden.

Our tests were done with gst-launch-1.0

gst-launch-1.0 videotestsrc num-buffers=100 ! video/x-raw,width=3200,height=1200,framerate=60/1,format=RGBA ! cudaupload ! cudaconvert ! "video/x-raw(memory:CUDAMemory),format=NV12" ! nvcudah264enc ! h264parse ! mp4mux ! fakesink

gst-launch-1.0 videotestsrc num-buffers=100 ! video/x-raw,width=3200,height=1200,framerate=60/1,format=RGBA ! nvh264enc ! h264parse ! mp4mux ! fakesink

So doing this conversion at once, seems to take only a third of the time, than doing it separately.