Google’s Computer Vision Model can Track Objects in Videos


Google developers have trained computer vision models to differentiate between multiple objects in a video

It could even differentiate objects in grayscale

Utilises self-supervised machine learning which is different from traditional machine learning




You might be wondering what self-supervised might means? Well, it’s not that different from supervised machine learning, in which the program is fed with tons of labelled data and it learn from that data to come up with a desirable solution. But self supervised learning might need only a portion of that data or none at all! That’s the reason researchers feel like it has immense potential.

Self-supervised learning has gained a lot of traction off late especially in the field of computer vision which actually requires more amount of labelled data compared to other fields in order to get required results. And now Google’s AI team has developed a model that can track objects in a video without requiring any labelled data at all.

How this works?

The team designed a convolutional neural network that adds colour to grayscale videos. While doing this the network learnt by itself how to track objects in a video. But the team admits in a video that the model was never trained with the sole aim of tracking objects, but managed to learn without any supervision and could follow multiple objects and still remain robust without requiring any training data. Now that sounds like some serious AI stuff. If you want to read about the blog the team revealed you can click here.

The researchers used videos from the public domain like the kinetics dataset to train the model. All these videos are in colour, so they had to be changed to grayscale, except for the first frame in each video. The convolutional network was then trained to predict the colours in the remaining frames. 

Why the conversion to grayscale you ask? That’s because there could be multiple objects in the video with the same colour and the team had to train the model to differentiate between these objects and add the specific colour.

So Let’s Summarise:

This might not be a high end model with various applications but it is a start to something new. A machine that can visualise the world around it and see it like a human does. This could be the first steps to providing computer vision to various AI that need to interact with the environment like a human would. For example, self driving cars.

We can only imagine what the future holds for us all!

Leave a Reply

1 Comment threads
0 Thread replies
Most reacted comment
Hottest comment thread
0 Comment authors
NVIDIA's Robots can now Track and Execute Human Actions - GL4L Recent comment authors
newest oldest most voted
Notify of