D3d11screencapturesrc vs d3d12screencapturesrc

Can I ask what are the improvements from d3d11screencapturesrc to d3d12screencapturesrc ?

Is it faster? Does it uses less resources?
Does it offer additional features?

Thanks :slight_smile:

d3d12 ones in 1.24 are a bit experimental, not fully optimized.

both d3d11 and d3d12 screen capture elements use DXGI desktop duplication API + D3D11 (or Windows Graphics Capture + D3D11), and their features are the same. But performance can be different.

Notable differences are (in main branch case),

  • d3d11screencapture element copies desktop image (it’s d3d11 texture) to another d3d11 texture, but d3d12screencapture copies the image to d3d12 resource via d3d11<->d3d12 resource sharing API. From a screen capture element point of view, d3d11 and d3d12 would not show significant performance-wise differences in theory, but it can be varying depending on your pipeline. My observation is that d3d12’s performance is better than d3d11 in various scenarios.
  • In case of WGC, I re-wrote the capture logic for d3d12, and it addressed low-framerate issue exists in d3d11’s WGC mode
1 Like

Thanks for the very very informative answer Seungha, I really appreciate it as always.
I’m a big fan of your work on GStreamer :slight_smile:

Regarding the discussion started on GitLab I still find the d3d12screencapturesrc implementation slower than d3d11screencapturesrc.
There must be something that I’m not understanding.

I have a pretty decent system with a 7950X3D CPU (16 cores, 32 threads) and a RTX4090.

I am trying to follow your suggestion (if I have understood it well) and I added some queues to my pipeline.

./gst-launch-1.0 d3d12screencapturesrc ! queue max-size-time=0 max-size-buffers=16 max-size-bytes=0 ! d3d12convert ! "video/x-raw(memory:D3D12Memory),width=960,height=540" ! queue max-size-time=0 max-size-buffers=16 max-size-bytes=0 ! d3d12videosink

this pipeline is slower than this simple one:

./gst-launch-1.0 d3d11screencapturesrc ! queue ! d3d11convert ! "video/x-raw(memory:D3D11Memory),width=960,height=540" ! d3d11videosink

I am using the popular 3DMark 3D Steel Nomad benchmark to understand the differences between the pipelines, I want to have a very reproducible test environment.

d3d11 pipeline can capture 27FPS during the benchmark,
d3d12 pipeline can capture 22FPS during the same benchmark.

the output of the d3d12 pipeline is visibly choppy while the output of the d3d11 one is more linear.

I’m running those tests on a 3840x2160 display.
it’s interesting to notice that the low framerate is there when the captured resolution is lower than the current resolution of the screen.

When capturing at the same resolution of the screen, the framerate is much higher but the results is nearly as choppy.

I add that capture framerate with d3d12screencapturesrc is way too wobbly even using queues and even when there is nearly no load on the GPU,
it’s very difficult to use it for “real time capture” bacause of the wobbly/choppy nature of the capture…

it seems that most of my performance loss comes from d3d12convert.
is there a way “to fix its performance” when switching to DX12?

it seems that most of my performance loss comes from d3d12convert.
is there a way “to fix its performance” when switching to DX12?

Could you explain why you think so? btw, because both d3d11 and d3d12 elements use DXGI API with d3d11, and d3d12 has additional overhead caused by resource sharing between d3d11 ↔ d3d12, I think d3d11 pipeline can be more performant in case of your very simple pipeline case, I have not observed such case in my test cases though.

Always glad to see your answers, thanks for the time spent answering me.

I’m sorry, I was wrong in my last post, it’s not something that is related to d3d12convert but to screen resize done using caps (width/height).

This pipeline:
./gst-launch-1.0 d3d11screencapturesrc ! queue ! d3d11convert ! "video/x-raw(memory:D3D11Memory),width=960,height=540" ! d3d11videosink

is much faster and more smooth than the same one using DX12, even if adding queues with max-size-time, ecc…
./gst-launch-1.0 d3d12screencapturesrc ! queue max-size-time=0 max-size-buffers=16 max-size-bytes=0 ! d3d12convert ! "video/x-raw(memory:D3D12Memory),width=960,height=540" ! queue max-size-time=0 max-size-buffers=16 max-size-bytes=0 ! d3d12videosink

If I remove the CAPS width and height that scales down the image, DX12 pipeline is much faster than DX11 one.

So the huge performance loss must be there…

d3d12screencapturesrc is a very very nice addition, it’s much faster when not resizing the image, but I need to do so, and when doing so it is much slower (at least in my use case).

Thank you for the information. Too small async depth in d3d12convert might cause the perf issue I guess.

Anyway, I was thinking to remove the limitation. I’ll let you know once a potential fix get merged.

1 Like

Your work is already awesome, any additional improvements could make it stellar.

When you want I’m here.

Thank you.

@seungha I am doing very extensive test of the d3d12screencapturesrc and I have an interesting finding…

GStreamer 1.24.6 gives me two times the frame per second of the latest main branch when using this pipeline:

./gst-launch-1.0 d3d12screencapturesrc ! queue max-size-time=1000000 max-size-buffers=5 max-size-bytes=0 ! d3d12convert ! “video/x-raw(memory:D3D12Memory),width=960,height=540” ! queue max-size-time=1000000 max-size-buffers=5 max-size-bytes=0 ! d3d12videosink

same pipeline, same code, runs faster on the old 1.24.6 than the current nightly build.

it seems that in the latest branch, for this use case, there was a performance regression.
does it have any sense?

I can easily reproduce it running the 3D Mark benchmark (just to put some load on the GPU during the screen capture).

queue max-size-time=1000000 means that only 1ms buffer can be queued (I guess it’s just one buffer). I do recommend queue max-size-time=0 max-size-bytes=0 max-size-buffers={THE-NUMBER-OF-FRAMES} so that max size can be controlled by only max-size-buffers

btw, I merged https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/7444 which might be able to improve performance.

Hi @seungha.
performance improvements in your latest merge request is massive.

I’m really amazed by how better my d3d12screencapturesrc runs now when compared to the d3d11screencapturesrc ones.

I have a last question if possible.
when using the queue as per your suggestion like this:
queue max-size-time=0 max-size-bytes=0 max-size-buffers=5

I noticed that when the GPU is in idle, the framerate caps is respected:
framerate=60/1
produces 60 frames per second.

when the GPU is under load the framerate caps are not respected anymore.
framerate=60/1
sometimes
produces 70 or more frames per second.

is this normal? is there a workaround for this?

If your app enables high-resolution Windows clock (gst-launch and gst-play enable it by default), higher framerate than configured caps sounds like a bug in d3d12screencapturesrc’s scheduling code.

I guess d3d12screencapturesrc ! videorate drop-only=true ! ... can be a workaround

I confirm that d3d12screencapturesrc ! videorate drop-only=true !
fixed the problem.

do you plan to fix it in d3d12screencapturesrc or should we continue to use the ! videorate drop-only=true option?

thanks one million.

Hi there s there someone who knows if there are any drawbacks in using “videorate drop-only=true” as a workaround for duplicated frames?

Can I safely use d3d12screencapturesrc along with videorate drop-only=true?