Daniel Morin
February 13, 2024
Reading time:
GStreamer has long been the best framework to build pipelines to handle video streams, and in particular, live ones. It's no coincidence that it has been adopted widely by engineers wishing to build video analytics pipelines.
Within computers, we represent media data as a series of discreet samples over time, and in the case over images, over space. We generally don't care about the meaning of those samples, as the goal is to display them back to humans. This data is unstructured. Sometimes, we instead want to structure the content of this data to extract a meaning. For example, instead of just reddish pixels, we want to know that it's a strawberry. There exist a number of different type of algorithms to do this, from traditional computer vision to the latest trends in deep learning. But they all have in common that they produce some structured data describing the content of the input.
A typical example of object detection and classification using strawberries and leaves. More examples available here: https://col.la/gstanalyticsexamplesmodels. |
GStreamer is a natural choice to handle this kind of metadata describing the underlying media data. It has a flexible system to attach arbitrary bits of data to a media buffer. Many companies have built their machine learning analysis framework around GStreamer, but no one had made the effort to contribute upstream, until now.
Our goal was to create an analytics framework for GStreamer that decouples analysis steps from each other, leverages platform-specific acceleration where available, defines generic elements that function across platforms, and scales to large amounts of data and detections.
GStreamer has a feature called a GstMeta
which is a way to attach an arbitrary structure to a buffer (such as a video frame). In particular, there is also a region of interest meta that allows defining a rectangle in the image and attaching some data to it. Our first idea was to extend this, but we realized that it couldn't scale. For example, in a wide shot of a crowd, you could detect hundreds of people. The other thing we wanted make it easier to do the analysis in multiple steps, for example by having one step that detect objects, then further steps that find more information about specific objects.
We defined a new GstAnalyticsRelationMeta
that stores an array of metadata structures along with a graph of relations between those. This enables us to have an object at a specific location, then define a class of objects and have a "this object belongs to this class" type of relationship. For example, we can have a "car" class and a "tire" class, so we can define a relationship between object 1 as a car and object 2 as a tire. Furthermore, we can include a relationship between objects, such as object 2 being part of object 1 - the tire is part of the car.
In this example, there are 2 types of metadata, classification and object dectection. The classification further describes the objects. |
We've also defined some base classes of metadata: objects, classification and tracking. But more classes can be defined in the future, and plugins can even define their own.
We hope that this will be a first step to foster more collaboration between everyone using GStreamer as a common language for video analysis. Please don't hesitate to contact us if you want to discuss your GStreamer projects, or want help building media analytics into your products.
27/11/2024
Recently (test), both Weston 14.0, and 14.0.1 (bug fix) were released. Here's at look at some of the highlights and changes for this latest…
26/11/2024
Linux kernel 6.12 is here with real-time preemption support and an extensible scheduler class. Take a look at the contributions our kernel…
15/11/2024
The Linux Foundation Member Summit is an opportune time to gather on the state of open source. Our talk will address the concerns and challenges…
Comments (0)
Add a Comment