Variational autoencoder

Contributed by: Tejas

Variational autoencoder is one of the most talked-about technologies in Machine Learning. Why? What does it do?

Let’s start with understanding encoders and decoders in our daily life. For example, the radio or television signals that are relayed from the station are encoded, and once our device receives it, it decodes and plays the radio or TV.

In computer technology, the famous process is compression with ZIP or RAR or any other tools. Data compression in the encoding process and data extraction is the decoding process.

We will see more information about how the encoding and decoding process is automated by its Autoencoder.

Variational Autoencoders (VAEs) are the most effective and useful process for Generative Models. Generative models are used for generating new synthetic or artificial data from the real data. For example, new music composition from currently composed music.

With the help of neural networks with the correct weight, autoencoders get trained to provide the desired result. The neural network has an important role in Artificial Intelligence because of its self-learning infrastructure capacity, and encoders can compress the data.

Also Read: Introduction to Generative Adversarial Networks (GANs)

Here is a simple visual explanation of Autoencoder.

If we input image X and the encoder compresses data, which is also called dimension reductions (you may be familiar with PCA or the common dimension reduction process), the encoder chooses the best features (colour, size, shades, shape etc.) and stores highly compressed data in a space called a bottleneck or latent space, this is called encoding process. Similarly, the latent vector or bottleneck pushes data to the decoder, and it produces output image X’.

With Loss = L(X, X’), we train the model to minimise the loss. And this process gets automated, which is known as Autoencoder.  Here, we have a bottleneck with discrete values. Thus, this model tries to provide data which is close to original data. This is very useful in compression and denoise the data. Autoencoder helps us store a lot of high volume data and also helps dimension reductions. 

If the Autoencoder is capable of handling this, why do we use Variational Autoencoder (VAEs)? 

The basic difference between autoencoder and variational encoder is its ability to provide continuous data or a range of data in the latent space which is helping us to generate new data or a new image.

A screenshot of a cell phone Description automatically generated

Let us understand how we are generating new data. Let’s say we have the image of a celebrity face from which our encoder model has to recognize important features mentioned below. 

With every feature, we have a probability distribution. Our goal is to produce new data from the current data or a new face from the current face. How do faces differ? Skin tone, eye colour, hair colour, and many other features are different. But overall, the list of the features remains the same. Since we have a facility with two probability distributions: mean and standard deviations, we have datasets of two new ranges to provide to the decoder.

Lets see how this actually gets processed in the Neural network.

A close up of a logo Description automatically generated

As our input data follows a normal distribution, we will be able to provide two variables: mean and variance in the latent space. We want to build a multivariate Gaussian model with the assumption of non-correlation in data which helps us result in a simple vector.

Now, provide a set of random samples from mean and variance distributions from latent space to the decoder for the reproduction of data (image).

Still, we do not get the desired result unless we train this model to improvise with new samples every time.

Since this is not a one-time activity, we need to train the model. Backpropagation is one of the important processes to train the model. Since we have random sampling, we cannot perform backpropagation, but we perform a reparameterization trick. 

We can randomly sample ε from a unit Gaussian, and then shift the randomly sampled ε by the μ and scale it by σ.

A screenshot of a cell phone Description automatically generated

Now we can backpropagate, and the autoencoder can learn to improvise. Let us now see post reparameterization.

Now the most important part of the process is to identify the Loss function that helps to train the model and to minimise the loss. In our case, VAEs loss functions consist of two values.

Let’s us say encoding process as recognition model loss in recognition model will be calculated with the sum of the square of means which will be:


Let’s say the decoding process is generation model and error will be the difference between two distributions and which can be measured with KL divergence: 


Loss function of VAEs is:

L(x,x’) + ∑KL(q(z|x)||p(z))

We can conclude with a conceptual understanding of VAEs. This process is widely used to generate new data for driverless vehicles, data transfer, new synthetic music and images.  Let’s look at one of the classic examples of fake face production. 

A group of people posing for a photo Description automatically generated
Source :

In the above image, Source A and Source B are input to create a result in combination of A and B. 

Few more use case: 

  1. Scanning and analyzing Medical reports (X ray , MRI etc.)
  2. Producing future visual for self-driving cars 
  3. Production of new music compositions 
  4. Games
  5. Movies without real actors 

Many of these machines produce new data, and it can be used in any direction.

Some useful information 

Autoencoder: This is a self-trained process to compress and decompress the data. It is used to compress the data and denoise the data.

Latent space: This is an important variable. It is not easy to measure them directly. If someone has a high IQ, good education, and their maths is good. In this case, IQ, Education and Maths are visible variables but combine all these features, and we may call it an intelligent level; where intelligence is a latent variable.

Bottleneck: Encoded input data gets stored in a Bottleneck, which is a central part of the autoencoder process.

This brings us to the end of the blog on variational autoencoders. We hope you found this helpful. Enrol with Great Learning Academy’s free courses to learn more such concepts.



Please enter your comment!
Please enter your name here

16 − 9 =