Background:
We have developed a GstBaseTransform-based element called qtimlmetaextractor
that extracts ML metadata from video buffers and converts it to text output.
Element behavior:
- Sink pad: video/x-raw (contains video frame + GstMeta with ML metadata)
- Source pad: text/x-raw (serialized ML metadata)
- The element only reads GstMeta from input buffers; it does NOT process
or modify the video frame content itself - Currently does not implement propose_allocation, so GST_VIDEO_META_API_TYPE
is not advertised to upstream
Use-case pipeline:
gst-launch-1.0 -ev \
qtimlvconverter name=preproc \
qtimltflite name=inference delegate=external external-delegate-path=libQnnTFLiteDelegate.so external-delegate-options="QNNExternalDelegate,backend_type=htp,htp_performance_mode=(string)2;" model=/opt/yolov8_det_w8a8.tflite \
qtimlpostprocess name=postproc settings="{\"confidence\": 75.0}" results=10 module=yolov8 labels=/opt/yolov8.json \
filesrc location=/opt/video.mp4 ! qtdemux ! queue ! h264parse ! v4l2h264dec capture-io-mode=4 output-io-mode=4 ! video/x-raw,format=NV12 ! queue ! tee name=split \
split. ! queue ! metamux. \
split. ! queue ! preproc. preproc. ! queue ! inference. inference. ! queue ! postproc. postproc. ! text/x-raw ! queue ! metamux. \
qtimetamux name=metamux ! qtimlmetaextractor ! fakesink
Problem:
In our hardware-accelerated pipeline, upstream elements (v4l2h264dec) query
mlmetaextractor’s sink pad for allocation capabilities. Since we don’t respond
with GST_VIDEO_META_API_TYPE support, upstream assumes we cannot handle video
metadata or DMA-buf backed buffers and falls back to software buffer allocation.
This causes issues because other downstream hardware elements (qtimlvconverter)
expect FD-backed DMA-buf memory but receive software buffers instead, breaking
the zero-copy hardware pipeline.
Question:
Is it semantically correct for an element to advertise GST_VIDEO_META_API_TYPE
support in propose_allocation even though:
- It transforms video/x-raw to text/x-raw (different media types)
- It never actually processes the video frame data itself
- It only needs to read metadata attached to the video buffer
Or should this be handled differently in the pipeline design?
Thanks for any guidance!