Transfer learning
  1. What is Transfer Learning (TL)?
  2. How Transfer learning works
  3. When to use Transfer Learning (TL)?
  4. Transfer learning in NLP(Natural Language Processing)
  5. Use TL to detect COVID-19

What is Transfer Learning?

Transfer learning is a research problem in Deep learning (DL) that focuses on storing knowledge gained while training one model and applying it to another model. let me explain it further in terms of our daily life.

Take Barney, for example, Barney is an 18-year-old boy who wants to learn to ride a motorcycle. He already knows how to ride a bicycle. His friend, Ted who has never ridden a bicycle in his entire life also wants to learn to ride a motorcycle. Who do you think will learn first? Well, probably Barney has a better chance of learning first, the reason being Barney already has knowledge about how to balance a two-wheeler, so when it comes to learning motorcycle, it may not be so hard for him as compared to Ted who has to learn everything from scratch.

So to put it in simpler words, transfer learning is a machine learning technique where a model trained for one task is re-trained to perform a second but a related task.

The diagram below shows how transfer learning is different from traditional learning-

transfer learning

                                 Traditional Learning vs Transfer Learning

In the diagram above, looking at the traditional approach that is without transfer learning, both models, which are meant for different tasks are trained from scratch. On the other hand, when using transfer learning, we use our data-set to train a pre-trained model that performs a different task. If you really have an eye for detail, you can see that the second data-set(one in red) is smaller in size. We will know more about it in the coming sections.

How Transfer Learning Works

Now let us find out how transfer learning really works. To understand it in a better way, let us take an example of an object recognition problem that comes under the computer vision domain.

So how would you create a model that can recognize images of cars and output its type? Traditionally you will collect lots and lots of images of all different types of cars, train a convolutional neural network whose weights are randomly initialized. But how can we use transfer learning for this problem?

Let there be another pre-trained model that can recognize images of dogs and output the breed of dogs. More importantly, this model has been trained on a very large image data-set, so how can this model be manipulated so that it can recognize cars?

Really simple, all we need to do is delete or remove the last layer in the network and replace it with a new layer as shown in the image below. Also, the weights associated with the last layer are re-initialized and the model is trained using the images of different cars. That’s it, now the model will be able to recognize different types of cars. It is important to note here that instead of replacing the deleted layer with just one layer, we can add multiple layers.

transfer learning

So, how does it work? Basically it is seen that first layers learn to identify the basic features like edges, corners, texture, and patterns. These layers seem to capture the features that are broadly useful for analyzing images. The layers later identify more complex and specific features. So the earlier layers act as feature extractor which passes the features to the layers ahead.

This is one of the easiest and simplest ways to implement transfer learning and works best when the two tasks are similar. Now there are various other approaches that can be used depending upon the type of problem. Let us explore some of them in detail.

Approach 1

In this scenario, we will freeze the existing layers while training the model with the new data-set, meaning that the weights in these layers are not changed. During the training process, only the randomly initialized weights associated with newly added layers are changed until they converge. This process is also known as fine-tuning. And more specifically the last layer is fine-tuned in this approach.

This approach is better when the data available for training is less and the task for which the existing model is trained is similar to the task we are interested in.

transfer learning

Approach 2

Sometimes what we can do is freeze only a few or no layers of the pre-trained model. In practical situations, we take this approach if the task for which the pre-trained model was designed and the task of our interest is that not much similar. For example, If we want to design a model that can classify different X-ray images and diagnose a person for coronavirus (COVID19), I highly doubt that approach 1 will work. Instead, here we can take the second approach and use the existing model as the starting point for training the new model.

Initializing the wights using a pre-trained model instead of randomly initializing them can give a warm start to our training process and speed up the convergence process. To preserve the initialization from pre-training, it is common practice to lower the learning rate by an order of magnitude. Also to prevent the initial layers to change too early, it is better to freeze them and only fine-tune the randomly initialized layers until they converge, then unfreeze all other layers and fine-tune the whole network.

Approach 3

In the previous two approaches, we add the new layers at the end of the network, but this is not always necessary. If the tasks are the same but the type of input is a little different, we can add the new layers before the layers of the pre-trained model. For example, we have an object recognition model that has been trained on RGB images but now the new task is to build an object recognition model that inputs images that have depth channels in addition to RGB data. Even though if we don’t have enough data to train the model from scratch, it might be worth adding a few layers ahead of the existing layers of the pre-trained model and re-train them.

transfer learning

To get a detailed overview of how transfer learning is used in computer vision, check out this case study.

When to Use Transfer Learning

So, when do we use transfer learning and why do we even use it? Let’s find it out in this section.

Scenario 1: Let us assume, we have to develop a deep learning model that can identify an image of a snow leopard. Now perhaps, as this species of leopard is endangered, we might not be able to get our hands on a large collection of their images. Here we can use transfer learning. We can use images that are easily available like that of tigers and leopards and first train the model on these images. This model can then be used as the starting point for our model which has to identify snow leopards. Next, we train the model on a few images of snow leopards, this may involve using all or parts of the model, depending on the modeling approach used.

Scenario 2: We can use pre-trained publicly available models such as  VGG-16 and Resnet-50.The VGG-16 and Resnet-50 are the CNN models trained on more than a million images of 1000 different categories. Now for such a task, you require huge resources( high computational hardware such as GPU) and a lot of time. We can save us this trouble and use this model as a starting point for our own image classification tasks.

In chapter 11 of the book “Handbook Of Research On Machine Learning Applications and Trends: Algorithms, Methods, and Techniques ”, three possible benefits to look for when using transfer learning are mentioned :

  1. Higher start. The initial skill (before refining the model) on the source model is higher than it otherwise would be.
  2. Higher slope. The rate of improvement of skill during the training of the source model is steeper than it otherwise would be.
  3. Higher asymptote. The converged skill of the trained model is better than it otherwise would be.
Transfer Learning

                                  Three ways in which transfer learning might benefit

Check out this link to know more.

But you may not get these benefits in all the cases where transfer learning is applied. Ideally, you would only see all three benefits from a successful application of transfer learning. For example, it does not make sense if we use a pre-trained model trained for Stock price prediction for a task of computer vision. So here I doubt of getting any benefits from transfer learning.

So we can conclude that Transfer learning is an optimization, a shortcut to saving time or getting better performance. Also, we cannot be sure of the benefits of transfer learning until after the model has been developed and evaluated.

Transfer Learning in NLP

Till now, we have seen Transfer Learning in the context of computer vision but it is not limited to only tasks related to computer vision. There are various other tasks that can be done and optimized using transfer learning.

NLP is one such task. In general NLP(Natural Language Processing) and computer vision are two broad areas in which Transfer learning is common. Let’s find out how NLP uses the concept of Transfer learning to its advantage.

So how does a pre-trained model help in NLP tasks?  it just so happens that pre-training allows a model to capture and learn a variety of linguistic phenomena, such as long-term dependencies and negation, from a large-scale corpus. Then this knowledge is used (transferred) to initialize another model to perform well on a specific NLP task, such as sentiment analysis.

Let us take an example of a wake-word or a trigger-word detection system. This is a model that can identify wake-words. Wake-words are the words used to wake up speech control devices such as  “Ok Google”, “Alexa”.. Creating such a model from scratch needs a lot of data and resources to train. Let us implement transfer learning to simplify this problem.

You might have heard of speech recognition systems. Such models are trained using millions of audio snippets and are good at recognizing words. We take this pre-trained model as our starting point and manipulate it so that it can recognize only wake-words.

Transfer Learning

One such example of a pre-trained model for tasks related to Natural Language processing is BERT (Bidirectional Encoder Representations from Transformers) developed by the researchers at Google AI Language. BERT has been pre-trained on a lot of words – and on the whole of the English Wikipedia 2,500 million words and it has caused a stir in the Machine Learning community by presenting state-of-the-art results in a wide variety of NLP tasks, including Masked Language Modeling, Next Sentence Prediction.

While BERT has been pre-trained on Wikipedia, it is fine-tuned on questions and answers datasets which means it can carry out tasks such as Question Answering (SQuAD v1.1). 

How BERT actually works and what tasks it is capable of doing is a topic for another time. 

Using TL to detect COVID-19

Contributed by – python.learning

Normal lungs vs infected lungs

COVID-19 also known as coronavirus is a viral infection that has been identified as a global pandemic today. This virus is spreading very quickly and the test kits available for detecting coronavirus among masses are limited. So scientists are seeking alternative methods to detect the virus and one of the possible ways can be analyzing X-ray images of lungs.

The lungs of an infected person might be inflamed, making it tough to breathe. This can lead to pneumonia, an infection of the tiny air sacs (called alveoli) inside your lungs where your blood exchanges oxygen and carbon dioxide. If a doctor does a CT scan of the chest, they’ll probably see shadows or patchy areas called “ground-glass opacity.”

So based on the above facts, various Tech companies such as Alibaba have started to develop Machine learning models that can detect coronavirus. As per a report from Nikkei’s Asian Review, Alibaba claims its new system can detect coronavirus in CT scans of patients’ chests with 96% accuracy against viral pneumonia cases. And it only takes 20 seconds for the AI (Artificial Intelligence) to make a determination.

So why am I discussing such advancements about Artificial Intelligence in an article about Transfer Learning? As it turns out, a research conducted by  Ali Narin, Ceren Kaya, Ziynet Pamuk suggests using transfer learning to solve the above problem. Also, check out this article to know how AI is being used to fight coronavirus. So let us use our knowledge about transfer learning to solve this real-life problem.

Also the data, in this case, the X-rays of infected people is not widely available at this moment, so the data-set available publicly is very small. This is one of the reasons for using Transfer learning as it is not practical to build a model from scratch with such little data. 

The COVID-19 X-ray image dataset we’ll be using for this tutorial is curated by Dr. Joseph Cohen, a postdoctoral fellow at the University of Montreal. The rest of the X-ray images are collected from Kaggle’s Chest X-Ray Images (Pneumonia) dataset.

Our dataset is divided into two subparts, set 1 (train) and set 2 (validate). Set 1 contains 20 images of healthy lungs and 20 images of lungs infected with COVID19. Also, set 2 contains five samples (images) of healthy and five of infected lungs. We use the set 1 to train the model and the set 2  to validate(test) it.

In this tutorial, we are going to use Python and also some popular Deep Learning libraries such as Tensorflow and Keras. This code is implemented in the ‘Google Colab Notebook’, an interactive environment provided by Google which lets you code on the go. Colab doesn’t require any configuration and we don’t have to download any libraries to the computer system. Also, Colab gives access to free GPUs and TPUs which can speed up the training process. So you can just open Google Colab and start coding.

The first step is to use GPU as the processor, GPUs have far more processor cores than CPU and hence the training is faster. To enable GPU backend for your notebook go to. Runtime->Change runtime type->Hardware Accelerator->GPU.

Let’s start with a piece of code to check if the GPU is working. 

import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Output:

Transfer Learning

Next, mount the Google drive to the Google Colab, using this code

from google.colab import drive
drive.mount('/content/gdrive')

Output:

mount google drive

You will be forwarded to your google account. Copy the authentication code and paste it in Colab. Now you have access to the data stored in your Google Drive.

Next copy the path to the folder containing the data and use the method,  ImageDataGenerator to load data in the script and label them at the same time.

from keras.preprocessing.image import ImageDataGenerator

#copy path to training data and validation data 
Path_train="/content/gdrive/My Drive/Transfer learning Code/dataset/train"
Path_valid="/content/gdrive/My Drive/Transfer learning Code/dataset/valid"

image_generator = ImageDataGenerator(rescale=1./255)

#Images are reshaped to [224,224] as this is the requirement for VGG16 network
dataset = image_generator.flow_from_directory(Path_train, #contains images to training set 
                                              target_size=(224, 224),
                                              class_mode='categorical')
validation_set=image_generator.flow_from_directory(Path_valid, # contains images of validation set
                                              target_size=(224, 224),
                                              class_mode='categorical')

Output:

Transfer Learning

The next step is we import all the necessary dependencies required for the code.

import matplotlib.pyplot as plt
from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications.densenet import DenseNet201
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam

Note that we have imported two pre-trained models, VGG16 and DenseNet201.We will use both of them and compare their performance. Let us first use VGG16 as a base model.

baseModel = VGG16(weights="imagenet", include_top=False,
input_tensor=Input(shape=(224, 224, 3)))

# construct the head of the model that will be placed on top of the the base model
headModel = baseModel.output
headModel = AveragePooling2D(pool_size=(4, 4))(headModel)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(64, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(2, activation="softmax")(headModel)

# place the head model on top of the base model (this will become
# the actual model we will train)
model = Model(inputs=baseModel.input, outputs=headModel)

# loop over all layers in the base model and freeze them so they will
# *not* be updated during the first training process
for layer in baseModel.layers:
	layer.trainable = False

# compile our model
print("[INFO] compiling model...")
opt = Adam(lr=0.0001)
model.compile(loss="binary_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the head of the network
print("[INFO] training head...")
history_1= model.fit_generator(
	dataset,
	steps_per_epoch=20,
	validation_data=validation_set,
	validation_steps=5,
	epochs=10)
# summarize history for accuracy
plt.plot(history_1.history['accuracy'])
plt.plot(history_1.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'Validation'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history_1.history['loss'])
plt.plot(history_1.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'Validation'], loc='upper left')
plt.show()

Output:

Transfer Learning
training accuracy along with loss and validation accuracy along with loss after each epoch

Now let us use DenseNet as the base model. 

baseModel = DenseNet201(input_shape=[224, 224, 3], include_top=False, weights='imagenet')

# construct the head of the model that will be placed on top of the  the base model
headModel = baseModel.output
headModel = AveragePooling2D(pool_size=(4, 4))(headModel)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(64, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(2, activation="softmax")(headModel)

# place the head model on top of the base model (this will become
# the actual model we will train)
model = Model(inputs=baseModel.input, outputs=headModel)

# loop over all layers in the base model and freeze them so they will
# *not* be updated during the first training process
for layer in baseModel.layers:
	layer.trainable = False

# compile our model
print("[INFO] compiling model...")
opt = Adam(lr=0.0001)
model.compile(loss="binary_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the head of the network
print("[INFO] training head...")
history= model.fit_generator(
	dataset,
	steps_per_epoch=20,
	validation_data=validation_set,
	validation_steps=5,
	epochs=10)
# summarize history for accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'Validation'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'Validation'], loc='upper left')
plt.show()

Output:

Transfer Learning
training accuracy along with loss and validation accuracy along with loss after each epoch

The results obtained from the above two models are not impressive as suggested by the irregular spikes in the plots. This is due to the lack of samples in the data set. Also, from the above results, we conclude that for the given dataset the later model performs better than the VGG16 network. Thus the overall performance during Transfer Learning is also impacted by the base model selected. There are various other base models (pre-trained) available which you can try in your own model, check out them in the Keras Documentation.

Disclaimer: The sole purpose of this tutorial is to demonstrate how Transfer Learning is applied to real-world problems. The methods and techniques used in this section are meant for educational purposes only and cannot be relied on for the detection of COVID-19 in real-life situations. This article is for readers who are interested in Computer Vision/Deep Learning and want to learn via practical, hands-on methods.

This brings us to the end of this article where we have learned about Transfer Learning in Deep learning and its implementation.

If you wish to learn more about Python and the concepts of ML, upskill with Great Learning’s PG Program Artificial Intelligence and Machine Learning.

3

3 COMMENTS

  1. Could you please tell me the changes in the code if I want to divide my data set in 3 classes, COVID, Normal and Pneumonia?

    • Hi Raghav,
      The objective of this tutorial was to just show how Transfer Learning can be used. In this example we are using only 40 images to train the model. Once we have enough training data set, we can actually see a better performance and then we might use confusion matrix to check how well or poorly the model performs. But thanks for suggesting.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

11 − 2 =