- Neural Network
- What is backpropagation?
- How does backpropagation work?
- Loss Function
- Why do we need backpropagation?
- Feed Forward Network
- Types of Backpropagation
- Case Study

In typical programming, we input data, perform processing logic and receive an output. What if the output data can, in some way, influence the processing logic? That is what backpropagation algorithm is about. It positively influences the previous module to improve accuracy and efficiency. \

Let us delve deeper.

**Neural network**

A neural network is a collection of connected units. Each connection has a weight associated with it. This system helps in building predictive models based on huge data sets. It works like a human nervous system and helps in understanding images, learning like a human, and synthesizing speech, among many others.

Also Read: Convolutional Neural Network Model Architectures and Applications

**What is backpropagation?**

We can define the backpropagation algorithm as an algorithm that trains some given feed-forward Neural Network for a given input pattern where the classifications are known to us. At the point when every passage of the example set is exhibited to the network, the network looks at its yield reaction to the example input pattern. After that, the comparison done between output response and expected output with the error value is measured. Later, we adjust the connection weight based upon the error value measured.

Before we deep dive into backpropagation, we should be aware about who introduced this concept and when. It was first introduced in the 1960s and 30 years later it was popularized by David Rumelhart, Geoffrey Hinton, and Ronald Williams in the famous 1986 paper. In this paper, they spoke about the various neural networks. Today, backpropagation is doing good. Neural network training happens through backpropagation. By this approach, we fine-tune the weights of a neural net based on the error rate obtained in the previous run. The right manner of applying this technique reduces error rates and makes the model more reliable. Backpropagation is used to train the neural network of the chain rule method. In simple terms, after each feed-forward passes through a network, this algorithm does the backward pass to adjust the model’s parameters based on weights and biases. A typical supervised learning algorithm attempts to find a function that maps input data to the right output. Backpropagation works with a multi-layered neural network and learns internal representations of input to output mapping.

**How does backpropagation work?**

Let us take a look at how backpropagation works. It has four layers: input layer, hidden layer, hidden layer II and final output layer.

So, the main three layers are:

- Input layer
- Hidden layer
- Output layer

Each layer has its own way of working and its own way to take action such that we are able to get the desired results and correlate these scenarios to our conditions. Let us discuss other details needed to help summarizing this algorithm.

This image summarizes the functioning of the backpropagation approach.

- Input layer receives x
- Input is modeled using weights w
- Each hidden layer calculates the output and data is ready at the output layer
- Difference between actual output and desired output is known as the error
- Go back to the hidden layers and adjust the weights so that this error is reduced in future runs

This process is repeated till we get the desired output. The training phase is done with supervision. Once the model is stable, it is used in production.

**Loss function**

One or more variables are mapped to real numbers, which represent some price related to those values. Intended for backpropagation, the loss function calculates the difference between the network output and its probable output.

**Why do we need backpropagation?**

Backpropagation has many advantages, some of the important ones are listed below-

- Backpropagation is fast, simple and easy to implement
- There are no parameters to be tuned
- Prior knowledge about the network is not needed thus becoming a flexible method
- This approach works very well in most cases
- The model need not learn the features of the function

**Feed forward network**

Feedforward networks are also called MLN i.e Multi-layered Networks. They are known as feed-forward because the data only travels forward in NN through input node, hidden layer and finally to the output nodes. It is the simplest type of artificial neural network.

**Types of backpropagation**

There are two types of backpropagation networks.

- Static backpropagation
- Recurrent backpropagation

- Static backpropagation

In this network, mapping of a static input generates static output. Static classification problems like optical character recognition will be a suitable domain for static backpropagation.

- Recurrent backpropagation

Recurrent backpropagation is conducted until a certain threshold is met. After the threshold, the error is calculated and propagated backward.

The difference between these two approaches is that static backpropagation is as fast as the mapping is static.

**Case Study**

Let us perform a case study using backpropagation. For that, we will be using Iris data which contains features such as length and width of sepals and petals. With the help of those, we need to identify the species of a plant.

For this, we will build a multilayered neural network and will use the sigmoid function as it is a classification problem.

Let us read the libraries required and read the data.

```
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
```

To ignore warnings, we will import another library called warnings.

```
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
```

Let us now read the data

```
iris = pd.read_csv("iris.csv")
iris.head()
```

Now we will put labels to the class as 0,1 and 2.

```
iris['Species']. replace (['setosa', 'virginica', 'versicolor'], [0, 1, 2], inplace=True)
```

We will now define functions which will do the following.

- Perform one hot encoding to the output.
- Perform sigmoid function
- Normalize the features.

For one hot encoding, we define the following function.

```
def to_one_hot(Y):
n_col = np.amax(Y) + 1
binarized = np.zeros((len(Y), n_col))
for i in range(len(Y)):
binarized [i, Y[i]] = 1.
return binarized
Let us now define a sigmoid function
def sigmoid_func(x):
return 1/(1+np.exp(-x))
def sigmoid_derivative(x):
return sigmoid_func(x)*(1 – sigmoid_func(x))
```

Now we will define a function for normalization

```
def normalize (X, axis=-1, order=2):
l2 = np. atleast_1d (np.linalg.norm(X, order, axis))
l2[l2 == 0] = 1
return X / np.expand_dims(l2, axis)
```

Now we will apply normalization to the features and one hot encoding to the output

```
columns = ['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width']
x = pd.DataFrame(iris, columns=columns)
x = normalize(x.as_matrix())
columns = ['Species']
y = pd.DataFrame(iris, columns=columns)
y = y.as_matrix()
y = y.flatten()
y = to_one_hot(y)
```

Now it’s time to apply back propagation. To do that, we need to define weights and a learning rate. Let us do that. But before that we need to split the data for training and testing.

```
#Split data to training and validation data
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33)
#Weights
w0 = 2*np.random.random((4, 5)) - 1 #for input - 4 inputs, 3 outputs
w1 = 2*np.random.random((5, 3)) - 1 #for layer 1 - 5 inputs, 3 outputs
#learning rate
n = 0.1
```

We will set a list for errors and see how the change in training decreases the error via visualization.

```
errors = []
```

Let us perform the feed forward and back propagation network. For backpropagation, we will use gradient descent.

```
for i in range (100000):
Feed forward network
layer0 = X_train
layer1 = sigmoid_func(np.dot(layer0, w0))
layer2 = sigmoid_func(np.dot(layer1, w1))
Back propagation using gradient descent
layer2_error = y_train - layer2
layer2_delta = layer2_error * sigmoid_derivative(layer2)
layer1_error = layer2_delta.dot (w1.T)
layer1_delta = layer1_error * sigmoid_derivative(layer1)
w1 += layer1.T.dot(layer2_delta) * n
w0 += layer0.T.dot(layer1_delta) * n
error = np.mean(np.abs(layer2_error))
errors.append(error)
```

Accuracy will be gathered and visualized by subtracting the error from the training data

```
accuracy_training = (1 - error) * 100
```

Now let us visualize how accuracy increases by decreasing the error

```
plt.plot(errors)
plt.xlabel('Training')
plt.ylabel('Error')
plt.show()
```

Let us look at the accuracy now

```
print ("Training Accuracy of the model " + str (round(accuracy_training,2)) + "%")
Output: Training Accuracy of the model 99.04%
```

Our training model is performing really well. Now let us see the validation accuracy.

```
#Validate
layer0 = X_test
layer1 = sigmoid_func(np.dot(layer0, w0))
layer2 = sigmoid_func(np.dot(layer1, w1))
layer2_error = y_test - layer2
error = np.mean(np.abs(layer2_error))
accuracy_validation = (1 - error) * 100
print ("Validation Accuracy of the model “+ str(round(accuracy_validation,2)) + "%")
Output: Validation Accuracy 92.86%
```

The performance was as expected.

**Best practices to follow**

Some of the ways to get a good model are discussed below-

- If there is very less constraint, the system may not be effective
- Too much constraint with over training will lead to a slow process
- Focusing on few aspects will lead to bias

**Disadvantages of backpropagation**

- Input data holds the key to the overall performance
- Noisy data can lead to inaccurate results
- Matrix based approach is preferred over a mini-batch

In conclusion, Neural network is a collection of connected units with input and output mechanism, each of the connections has an associated weight. Backpropagation is the “backward propagation of errors” and is useful to train neural networks. It is fast, easy to implement and simple. Backpropagation is very beneficial for deep neural networks working over error prone projects like speech or image recognition.

*If you find this helpful and wish to learn more, upskill with Great Learning’s PGP – Deep Learning Certificate Program. *