Gstreamer pipeline: CPU usage patterns

My pipeline it pretty simple: video and audio sources routed into webrtcsink.

gst-launch-1.0 -v \
          ximagesrc display-name=$DISPLAY show-pointer=true use-damage=false remote=true blocksize=16384 enable-navigation-events=true \
            ! video/x-raw,framerate=$FPS/1 \
            ! timeoverlay \
            ! videoconvert \
            ! qsvh264enc bitrate=10000 low-latency=true target-usage=7 \
            ! video/x-h264,profile=baseline \
            ! queue \
            ! wrs. \
          pulsesrc \
            ! queue \
            ! wrs. \
          webrtcsink name=wrs enable-data-channel-navigation=true meta="meta,name=streamer-$VIDEO_ENC" congestion-control=disabled signaller::uri="ws://$SIGNALLING_SERVER_HOST:$SIGNALLING_SERVER_PORT" stun-server="stun://$STUN_SERVER_HOST:$STUN_SERVER_PORT"

As you can see, video encoding is offloaded to the hardware, but CPU usage remains pretty high. Without any optimizations gst-launch-1.0 process consumes 65-67% of a single CPU core (11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz). The load is distributed in a following way:

name CPU, %
ximagesrc0:src 38.32
video_0:src 11.74
audio_0:src 8.48
pulsesrc0:src 6.39
tokio-runtime-w 6.13
gst-launch-1.0 6.02
threaded-ml 5.92
queue0:src 2.86
nicesrc0:src 2.74
queue1:src 2.54
queue2:src 2.32
SCTP_timer 1.57
sctpdec0:src_1 1.56
sctpenc0:src 1.53
dtlsenc0:src 1.49
rtpsession-rtcp 0.39

image

Optimizing my pipeline became simplest by employing CPU affinity: (–cpuset-cpus=“0”). This immediately decreases core utilization by 10%, bringing it down to 55-57%.

name CPU, %
ximagesrc0:src 74
video_0:src 9.88
gst-launch-1.0 3.3
audio_0:src 3.17
pulsesrc0:src 1.9
threaded-ml 1.68
tokio-runtime-w 1.28
SCTP_timer 1.1
queue2:src 1.05
nicesrc0:src 0.95
queue0:src 0.73
queue1:src 0.59
rtpsession-rtcp 0.35

image

Now, a potential candidate for optimization is evident: ximagesrc0:src. Almost 50% of the time spent by this component is within the video_converter_matrix8_table function. By optimizing this function, we can potentially reduce CPU core usage to 40%. I attempted to address this issue here, but due to my limited experience, I couldn’t complete the task. The function already utilizes pre-computed values, and I’m uncertain about how to optimize it. Furthermore, this indicates that it’s faster than the ORC implementation.

Here are my questions/thoughts:

  1. Is it advisable for gst-launch to employ CPU affinity by default, considering its potential to nearly halve the load?
  2. Can you identify opportunities for optimization in the video_converter_matrix8_table?
  3. Are there any strategies, akin to those detailed here, that could significantly diminish CPU usage in my pipeline?
  4. What is the usual CPU usage in your pipeline, and what approaches do you employ to optimize it?

I’d move to Wayland, the use screen cast portal and pipewiresrc, as a side effect the RGB data will be dmabuf. Replace qsv with vapostproc ! vaxyzenc, these elements supports dmabuf. You’ll remove copies at display server side, offload color conversion . A lot of work of course, but clearly forward looking.

1 Like

Can you provide insights into the stability of pipewire, particularly when used with pipewire-pulse in comparison to the combination of x(imagesrc) and a Pulse server? Additionally, are there any compatibility concerns with browsers, Wine, and other applications? Lastly, do you have a sample Dockerfile that installs all dependencies and sets up a basic Pipewire + GStreamer pipeline?

Pipewire audio and it’s policy manager Wireplumber is default in most major distros and perform well. It does provide a performance gain, but to the light of your analyses is irrelevant.

As for zero-copy screencasting, Pipewire is just a transport, the implementation is made by compositors (XDG screencast portal). This is used by most major browsers, and you’ll find code using it like gnome-network-display. It has been quite reliable lately for me.

Wine Wayland support is probably younger, but is well supported now, and more unique features are being made, things that will never be supported on X11.

I can’t judge for you, and can’t write code for you, you are the only one in the end who knows if that makes sense for your project. As I said, it’s a forward looking direction. You may find satisfaction in using HW accelerated color converter (like vapostproc or glcolorconvert) in the short term.

Currently, I’m in the exploratory stage, with no code implemented as of now. However, I’ve created Dockerfiles for building all dependencies and gst-launch pipelines to facilitate easy adjustments. I’m curious to know if an application running in Weston kiosk mode will seamlessly integrate with “XDG screencast portal” / Pipewire and gstreamer? I would be grateful if you could provide a sample Dockerfile that incorporates this pipeline.

By the way, what are your opinions on this: Boycott Wayland. It breaks everything! · GitHub

I don’t know about Weston and screencasting, as this is not a Wayland protocol, there is not much rationale to implement this in a reference compositor. I’d look for mutter (gnome), wlroot or kwin.

As for the rant you share, this is yours to judge. The very first few paragraph are obviously odd, how can you break a software that has never been ported to Wayland? Perhaps this is about X11 compatibility layer, which of course does not support things that leave the door open to silent watchers, like the xtest extension used for screen, keyboard and mouse capture.

p.s. I only write Dockerfile for my own needs, and don’t have time or energy to fullfil your request on my spare time.

Certainly, I didn’t request you to create a Dockerfile for me; I simply wanted you to direct me to the existing one. While the renowned x11docker does support Wayland, it is excessively complex, making it challenging to extract only the Wayland-related components.

Meanwhile I’ll try labwc as it seems built on top of wlroots.

@ndufresne, I was finally able to build a gstreamer pipeline on top of the wlroots-based compositor, xdg-desktop-portal-wlr and pipewiresrc. Just replacing ximagesrc with pipewiresrc gave no any performance benefits - I still see pipewiresrc doing the same job ximagesrc did.
Replacing qsv with vapostproc ! vah264enc gives this error:

GST_CAPS gstpad.c:3263:gst_pad_query_accept_caps_default:<vah264enc0:sink> caps: video/x-raw(memory:DMABuf), format=(string)BGRx, width=(int)1920, height=(int)1080, framerate=(fraction)0/1, max-framerate=(fraction)60/1 were not compatible with: EMPTY

Tried to put these caps:

! video/x-raw(memory:DMABuf),format=BGRx,width=1920,height=1080,framerate=0/1,max-framerate=60/1

after vah264enc, but then getting:

0:00:00.192002517    68       0xd97810 ERROR           GST_PIPELINE subprojects/gstreamer/gst/parse/grammar.y:1128:gst_parse_perform_link: could not link vah264enc0 to queue0, vah264enc0 can't handle caps video/x-raw(memory:DMABuf), format=(string)BGRx, width=(int)1920, height=(int)1080, framerate=(fraction)0/1, max-framerate=(fraction)60/1

Could someone help fixing this pipeline:

pipewiresrc fd={fd} path={node_id}            
            ! vapostproc
            ! vah264enc
            ! queue
            ! wrs.
            pulsesrc
            ! queue
            ! wrs.
          webrtcsink...

Also, what’s the difference between va and vaapi plugins in gstreamer?
Should I use vaapih264enc or vah264enc?

libgstva.so is a rewrite of libgstvaapi.so, the decoders have been rebased on top of libgstcodecs library and the VA code has been simplified and focuses around DMABuf sharing (on Linux).

I suspect pipewiresrc and VA is not incompatible, VA will use format=DMA_DRM format=<drm_fourcc>:<drm_mofiers> only now.

Ok, so here is what I see:

gst-inspect-1.0 vapostproc:

      video/x-raw(memory:DMABuf)
                  width: [ 16, 16384 ]
                 height: [ 16, 16384 ]
                 format: DMA_DRM
             drm-format: { (string)NV12:0x0100000000000002, (string)YU12, (string)YV12, (string)YUYV:0x0100000000000002, (s
tring)YU16:0x0100000000000002, (string)P010:0x0100000000000002, (string)AR24:0x0100000000000002, (string)AB24:0x01000000000
00002, (string)AR30:0x0100000000000002, (string)AYUV:0x0100000000000002, (string)Y210:0x0100000000000002, (string)Y410:0x01
00000000000002, (string)P012:0x0100000000000002, (string)Y212:0x0100000000000002, (string)Y412:0x0100000000000002 }

gst-inspect-1.0 vah264enc:

video/x-raw(memory:DMABuf)
                  width: [ 32, 4096 ]
                 height: [ 32, 4096 ]
                 format: DMA_DRM
             drm-format: NV12:0x0100000000000002

This leaves me with NV12:0x0100000000000002 as the only choice. When trying to apply these caps to pipewiresrc:

pipewiresrc fd={fd} path={node_id}
            ! video/x-raw(memory:DMABuf),format=DMA_DRM,width=1920,height=1080,drm-format=NV12:0x0100000000000002,framerate=60/1
            ! vapostproc
            ! vah264enc

I’m getting this:

2023-11-23 01:44:01,587 DEBG 'pipewire' stdout output:
[E][68379.490771] pw.context   | [       context.c:  604 pw_context_debug_port_params()] params Spa:Enum:ParamId:EnumFormat: 0:0 Invalid argument (input format (no more input formats))
[E][68379.490777] pw.context   | [       context.c:  623 pw_context_debug_port_params()] Object: size 224, type Spa:Pod:Object:Param:Format (262147), id Spa:Enum:ParamId:EnumFormat (3)
[E][68379.490778] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:mediaType (1), flags 00000000
[E][68379.490780] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Id 2        (Spa:Enum:MediaType:video)
[E][68379.490781] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:mediaSubtype (2), flags 00000000
[E][68379.490782] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Id 1        (Spa:Enum:MediaSubtype:raw)
[E][68379.490782] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:Video:modifier (131074), flags 00000018
[E][68379.490784] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Choice: type Spa:Enum:Choice:Enum, flags 00000000 40 8
[E][68379.490784] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Long 72057594037927935
[E][68379.490785] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Long 72057594037927935
[E][68379.490786] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Long 0
[E][68379.490787] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:Video:format (131073), flags 00000000
[E][68379.490787] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Choice: type Spa:Enum:Choice:None, flags 00000000 16 0
[E][68379.490788] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:Video:size (131075), flags 00000000
[E][68379.490789] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Choice: type Spa:Enum:Choice:None, flags 00000000 24 8
[E][68379.490790] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Rectangle 1920x1080
[E][68379.490790] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:Video:framerate (131076), flags 00000000
[E][68379.490791] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Choice: type Spa:Enum:Choice:None, flags 00000000 24 8
[E][68379.490792] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Fraction 60/1
[E][68379.490793] pw.context   | [       context.c:  604 pw_context_debug_port_params()] params Spa:Enum:ParamId:EnumFormat: 1:0 Invalid argument (output format (no more input formats))
[E][68379.490794] pw.context   | [       context.c:  623 pw_context_debug_port_params()] Object: size 280, type Spa:Pod:Object:Param:Format (262147), id Spa:Enum:ParamId:EnumFormat (3)
[E][68379.490795] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:mediaType (1), flags 00000000
[E][68379.490796] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Id 2        (Spa:Enum:MediaType:video)
[E][68379.490796] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:mediaSubtype (2), flags 00000000
[E][68379.490809] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Id 1        (Spa:Enum:MediaSubtype:raw)
[E][68379.490810] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:Video:format (131073), flags 00000000
[E][68379.490810] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Id 8        (Spa:Enum:VideoFormat:BGRx)
[E][68379.490811] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:Video:modifier (131074), flags 00000018
[E][68379.490812] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Choice: type Spa:Enum:Choice:Enum, flags 00000000 80 8
[E][68379.490813] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Long 72057594037927935
[E][68379.490814] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Long 72057594037927935
[E][68379.490814] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Long 0
[E][68379.490815] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Long 72057594037927937
[E][68379.490816] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Long 72057594037927938
[E][68379.490816] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Long 72057594037927942
[E][68379.490817] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Long 72057594037927943
[E][68379.490817] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Long 72057594037927944
[E][68379.490818] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:Video:size (131075), flags 00000000
[E][68379.490819] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Rectangle 1920x1080
[E][68379.490819] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:Video:framerate (131076), flags 00000000
[E][68379.490820] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Fraction 0/1
[E][68379.490821] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:Video:maxFramerate (131077), flags 00000000
[E][68379.490822] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Choice: type Spa:Enum:Choice:Range, flags 00000000 40 8
[E][68379.490822] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Fraction 60/1
[E][68379.490823] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Fraction 1/1
[E][68379.490824] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Fraction 60/1
[E][68379.490824] pw.context   | [       context.c:  623 pw_context_debug_port_params()] Object: size 184, type Spa:Pod:Object:Param:Format (262147), id Spa:Enum:ParamId:EnumFormat (3)
[E][68379.490825] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:mediaType (1), flags 00000000
[E][68379.490826] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Id 2        (Spa:Enum:MediaType:video)
[E][68379.490827] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:mediaSubtype (2), flags 00000000
[E][68379.490827] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Id 1        (Spa:Enum:MediaSubtype:raw)
[E][68379.490828] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:Video:format (131073), flags 00000000
[E][68379.490829] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Id 7        (Spa:Enum:VideoFormat:RGBx)
[E][68379.490829] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:Video:size (131075), flags 00000000
[E][68379.490830] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Rectangle 1920x1080
[E][68379.490831] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:Video:framerate (131076), flags 00000000
[E][68379.490831] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Fraction 0/1
[E][68379.490832] pw.context   | [       context.c:  623 pw_context_debug_port_params()]   Prop: key Spa:Pod:Object:Param:Format:Video:maxFramerate (131077), flags 00000000
[E][68379.490833] pw.context   | [       context.c:  623 pw_context_debug_port_params()]     Choice: type Spa:Enum:Choice:Range, flags 00000000 40 8
[E][68379.490834] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Fraction 60/1
[E][68379.490834] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Fraction 1/1
[E][68379.490835] pw.context   | [       context.c:  623 pw_context_debug_port_params()]       Fraction 60/1
[E][68379.490837] pw.link      | [     impl-link.c:  184 link_update_state()] (45.0.0 -> 56.0.0) negotiating -> error (no more input formats) (configure-configure)

2023-11-23 01:44:01,588 DEBG 'wireplumber' stdout output:
M 01:44:01.588171    m-lua-scripting ../subprojects/wireplumber/modules/module-lua-scripting/api/api.c:387:object_activate_done: <WpSiStandardLink:0x557aa827d570> 1 of 1 PipeWire links failed to activate

2023-11-23 01:44:01,591 DEBG 'xdg-desktop-portal-wlr' stdout output:
2023/11/23 01:44:01 [ERROR] - pipewire: fatal error event from core

Correct me if I’m wrong, but seems pipewiresrc supports only RGB/BGRx video formats, which can’t be converted to NV12 and therefore the whole thing fails…

Yeah, seems RGB/BGRx is hardcoded in wlroots here: types/output/output.c · 0.16.2 · wlroots / wlroots · GitLab as DRM_FORMAT_XRGB8888, which correspond to XR24.
As we can see, vapostproc doesn’t support XR24, but supports e.g. AR24. Replacing DRM_FORMAT_XRGB8888 with DRM_FORMAT_ARGB8888 (AR24) enables communication between pipewiresrc and vaprostproc, but vah264enc still requires NV12… Is there some element in gstreamer which allows to perform this conversion? Or is there a workaround?

Unfortunately, your approach wouldn’t work, pipewire doesn’t support DMA_DRM format yet (treats it as INVALID), and everything ends up with:

[D][35438.606431] pw.context   | [       context.c:  752 pw_context_find_format()] 0x5585b5ce9ef0: enum output 0 with filter: 0x7ffed6ba67b0
[D][35438.606433] pw.context   | [       context.c:  753 pw_context_find_format()]  video/raw
[D][35438.606435] pw.context   | [       context.c:  753 pw_context_find_format()]          modifier : (Long) { 72057594037927935, 0 }
[D][35438.606436] pw.context   | [       context.c:  753 pw_context_find_format()]            format : (None) INVALID type 1
[D][35438.606437] pw.context   | [       context.c:  753 pw_context_find_format()]              size : (Rectangle) 1920x1080
[D][35438.606439] pw.context   | [       context.c:  753 pw_context_find_format()]         framerate : (Fraction) [ 0/1, 2147483647/1 ]

and then pipewire simply crashes. I have no idea how much effort required to add DMABuf support into pipewire.

Hi @rayrapetyan , how did you create that CPU usage breakdown by element in your post? I’ve tried GstShark but that only gives me framerate by element or overall CPU usage (by core).

And sorry for not DM’ing you, I don’t seem to have that privilege yet.

Thanks.


Update: Nevermind. I found it, or at least a way: top -H

vapostproc is the element that will convert in hardware.

P.s. it would be amazing is you do a how-to post about the graph you posted here.

@chfritz, I think it was “perf” or something similar… Silly me didn’t keep any records in the post (I really dislike it when others do this) :frowning: The chart was definitely created in Excel :slight_smile:

2 Likes