Differentiating H.264 from H.265 in RTP payload

Hi hackers,

We have a quite distributed system of k8s pods that talk RTP with each other in a wild and messy way. The video payload can be H.264 or H.265 and it is becoming a burden to get the metadata sent to each node so they know what to expect.

Because of that I have started developing a small element rtpvideocodecdetect that pretty much has the sink pad templates ~:

Pad Templates:
  SINK template: 'sink'
    Availability: Always
    Capabilities:
      application/x-rtp
                  media: video
  
  SRC template: 'src'
    Availability: Sometimes
    Capabilities:
      application/x-rtp
                  media: video
          encoding-name: < (string)H264, (string)H265 >
             clock-rate: 90000

It is sort of straight forward, it receives RTP and looks at the buffers and tries to determine what codec it is, it then sets the correct caps on the src pad. Downstream we can connect to the pad-added signal and construct the correct depayload and processing based on the caps of the pad.

Now, for my question. It was a bit more finicky than I expected to figure out if it is H.264 or H.265 in the RTP payload. I was hoping to find some crate or library to use but have not found one yet.

I currently open-code a bunch of checking of NAL and that is surprisingly tricky. A lot of H.264 is valid H.265 … for instance I tried with the cros-codecs crate but that told me some payload headers were parsable both as h.264 and h.265.

Is there anything in GStreamer I could lean on or use to help me out? Or does anyone know of a crate or library that can help me detect the codec? Or want to tell me of a much simpler way of achieving what I want?

Thanks for your time

Do you have control over the sender? If so, perhaps you could abuse some other parameter to implicitly signal the codec type, such as use an even pt number for H.264 and and odd one for H.265 or somesuch, or same for port numbers. Or you add a one-byte header extension to signal it.

(I know that doesn’t directly answer your actual question, just throwing some ideas around in case you hadn’t considered them yet.)

Hi, thx!

Yes, that is an option worth considering, I had to stop and think actually if we do control the sender in all cases :slight_smile:

If that is the case adding a custom one-byte RTP header might be the best way forward…

What I have now seems to work, but looking at the test cases in for instance rtph264 and the complexity in differentiating NALs between 64 and 65 does not fill me with confidence in myself…

suggest to take a look at detect_codec_type_and_bitrate() in mediaMin source code, which auto-detects H.264 and H.265 RTP, and also auto-detects a wide variety of audio / voice codecs. The company I work for has this code in several deployments, reliable so far. If you have pcaps that fail you can anonymize and send to our developers, they will get it working