1. What is Image recognition?
  2. How does Image recognition work?
  3. Working of Convolutional and Pooling layers
  4. Image recognition using Python
  5. Image recognition with a pre-trained network

The visual performance of Humans is much better than that of computers, probably because of superior high-level image understanding, contextual knowledge, and massively parallel processing. But human capabilities deteriorate drastically after an extended period of surveillance, also certain working environments are either inaccessible or too hazardous for human beings. So for these reasons, automatic recognition systems are developed for various applications. Driven by advances in computing capability and image processing technology, computer mimicry of human vision has recently gained ground in a number of practical applications.

What is Image recognition?

Image recognition

Image recognition refers to technologies that identify places, logos, people, objects, buildings, and several other variables in digital images. It may be very easy for humans like you and me to recognise different images, such as images of animals. We can easily recognise the image of a cat and differentiate it from an image of a horse. But it may not be so simple for a computer.

A digital image is an image composed of picture elements, also known as pixels, each with finite, discrete quantities of numeric representation for its intensity or grey level. So the computer sees an image as numerical values of these pixels and in order to recognise a certain image, it has to recognise the patterns and regularities in this numerical data.

Image with pixel values
An image of a dog represented by 40 x 40 pixels.

Image recognition should not be confused with object detection. In object detection, we analyse an image and find different objects in the image while image recognition deals with recognising the images and classifying them into various categories.

How does Image recognition work?

Typically the task of image recognition involves the creation of a neural network that processes the individual pixels of an image. These networks are fed with as many pre-labelled images as we can, in order to “teach” them how to recognize similar images.

So let me break the process for you in some simple steps:

  1. We need a dataset containing images with their respective labels. For example, an image of a dog must be labelled as a dog or something that we can understand.
  2. Next, these images are to be fed into a Neural Network and then trained on them. Usually, for the tasks concerned with images, we use convolutional neural networks. These networks consist of convolutional layers and pooling layers in addition to Multiperceptron layers(MLP). The working of convolutional and pooling layers are explained in the below.
  3. We feed in the image that is not in the training set and get predictions.

In the coming sections, by following these simple steps we will make a classifier that can recognise RGB images of 10 different kinds of animals.

Image recognition

Note: The model will only be able to recognise animals that are in the dataset. For example, a model trained to recognise dogs and cat cannot recognise boats 

Working of Convolutional and Pooling layers

Convolutional layers and Pooling layers are the major building blocks used in convolutional neural networks. Let us see them in detail

How does Convolutional Layer work?

The convolutional layer’s parameters consist of a set of learnable filters (or kernels), which have a small receptive field. These filters scan through image pixels and gather information in the batch of pictures/photos. Convolutional layers convolve the input and pass its result to the next layer. This is like the response of a neuron in the visual cortex to a specific stimulus. 

Convolution filter moves on the real image
Image recognition
Convolution operation

Below is an example of how convolution operation is done on an image. A similar process is done for all the pixels.

Here is an example of an image in our test set that has been convoluted with four different filters and hence we get four different images.

Images after applying convolution

How does Pooling Layer work?

The pooling operation involves sliding a two-dimensional filter over each channel of the feature map and summarising the features lying within the region covered by the filter. A pooling layer is usually incorporated between two successive convolutional layers. The pooling layer reduces the number of parameters and computation by down-sampling the representation. The pooling function can be either max or average. Max pooling is commonly used as it works better

The pooling operation involves sliding a two-dimensional filter over each channel of the feature map and summarising the features lying within the region covered by the filter. This process is illustrated below.

When passing the four images we got after convolution through a max-pooling layer of dimension 2×2, we get this as output

Images after Pooling

As we can see, the dimensions have decreased by one half but the information in the image is still preserved.

Image recognition using Python

Here I am going to use deep learning, more specifically convolutional neural networks that can recognise RGB images of ten different kinds of animals. An RGB image can be viewed as three different images(a red scale image, a green scale image and a blue scale image) stacked on top of each other, and when fed into the red, green and blue inputs of a colour monitor, it produces a colour image on the screen. We use a dataset known as  Animals-10 from Kaggle.

RGB image recognition
Image is shown in the red channel, blue channel and green channel

So, let us start making a classifier using Python and Keras. We are going to implement the program in Colab as we need a lot of processing power and Goggle Colab provides free GPUs.The overall structure of the neural network we are going to use can be seen in this image

Image recognition

The very first step is to get data on your Colab notebook. You don’t need a high-speed internet for this as it is directly downloaded into google cloud from the Kaggle cloud.

For getting the data, follow these steps:

  1. Go to your Kaggle account and click on my accounts. In case you don’t have a Kaggle account, create one, it is free.
  2. Next, download the kaggle.json file by clicking on the button ‘ create new API token’.
  3. Go to your Colab notebook and start coding

In this tutorial, we are using ImageGenerator to label the images. So, in case you are using some other dataset, be sure to put all images of the same class in the same folder. And then place all the folders in the folder.

# These steps are to be followed when using google colab
#and importing data from kaggle
from google.colab import files
# Install Kaggle library
!pip install -q kaggle
from google.colab import files
#upload the kaggle.json file
uploaded = files.upload()
#make a diectoryin which kajggle.json is stored
# ! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
#download the dataset into the colab
!kaggle datasets download -d alessiocorrado99/animals10
#unzip the data
!unzip /content/animals10.zip

#Incase you are using a local machine, start from here.
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras import Sequential,Model
from tensorflow.keras.layers import BatchNormalization,Dropout,Flatten
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import GlobalAveragePooling2D
from tensorflow.keras.preprocessing import image
import numpy as np
import os
import cv2

train_data_dir='/kaggle/input/animals10/raw-img/'
img_height=128
img_width=128
batch_size=64
nb_epochs=20
train_datagen = ImageDataGenerator(rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    validation_split=0.2) # set validation split

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical',
    subset='training') # set as training data

validation_generator = train_datagen.flow_from_directory(
    train_data_dir, # same directory as training data
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical',
    subset='validation') # set as validation data


model = Sequential()
inputShape = (128, 128, 3)
model.add(Conv2D(64, (3, 3), padding="same", activation='relu', input_shape=inputShape))
model.add(BatchNormalization())
model.add(Conv2D(32, kernel_size = 5, strides=2, padding='same', activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.4))
model.add(Conv2D(64, kernel_size = 5, strides=2, padding='same', activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(BatchNormalization())
model.add(Dropout(0.4))
model.add(Flatten())
model.add(Dropout(0.4))
model.add(Dense(64, activation='relu')) 
model.add(BatchNormalization())
model.add(Dense(10, activation='softmax'))
model.summary() 
#compile the model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
#train the model,this step takes alot of time (hours)
model.fit_generator(
    train_generator,
    steps_per_epoch = train_generator.samples // batch_size,
    validation_data = validation_generator, 
    validation_steps = validation_generator.samples // batch_size,
    epochs = nb_epochs)
#save the model for later use
model.save('path\name of model')


#order of the animals array is important
#animals=["dog", "horse","elephant", "butterfly",  "chicken",  "cat", "cow",  "sheep","spider", "squirrel"]
bio_animals=sorted(os.listdir('/content/raw-img'))
categories = {'cane': 'dog', "cavallo": "horse", "elefante": "elephant", "farfalla": "butterfly", "gallina": "chicken", "gatto": "cat", "mucca": "cow", "pecora": "sheep", "scoiattolo": "squirrel","ragno":"spider"}
def recognise(pred):
  animals=[categories.get(item,item)  for item in bio_animals]
  print("The image consist of ",animals[pred])

from tensorflow.keras.preprocessing import image
import numpy as np
img = image.load_img("https://d1m75rqqgidzqn.cloudfront.net/kaggle/input/testttt/OIF-e2bexWrojgtQnAPPcUfOWQ.jpeg", target_size=(128, 128))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
prediction=model.predict(x)
# prediction

recognise(np.argmax(prediction))

test_data_path="/content/test data/test_animals"
files=sorted(os.listdir(test_data_path))
files=files[1:]
for img in files:
  x=cv2.imread(os.path.join(test_data_path,img))
  cv2_imshow(x)
  recognise(np.argmax(predict[files.index(img)]))
  print("")

Output: I downloaded some images from google and used this model to label them. Here are the results

To predict Images, we need to upload them to the Colab(gets deleted automatically after the session is ended ) or you can even download them to your google drive permanently.

Follow the steps below to create a directory for test data

  1. Create a new folder called test data
  2. Next, create another folder in this folder named test animals
  3. Upload your images to this folder.

As we can see, this model did a decent job and predicted all images correctly except the one with a horse. This is because the size of images is quite big and to get decent results, the model has to be trained for at least 100 epochs. But due to the large size of the dataset and images, I could only train it for 20 epochs ( took 4 hours on Colab ).

To increase the accuracy and get an accurate prediction, we can use a pre-trained model and then customise that according to our problem.

Image Recognition with a pre-trained model

In this example, I am going to use the Xception model that has been pre-trained on Imagenet dataset. This technique is basically called Transfer learning. If you are not familiar with the topic, I highly recommend this article.

Xception Model is proposed by Francois Chollet. Xception is an extension of the inception Architecture which replaces the standard Inception modules with depthwise Separable Convolutions. This model is available on Keras and we just need to import it.So let’s start coding

from google.colab import files
# Install Kaggle library
!pip install -q kaggle
from google.colab import files
#upload the kaggle.json file
uploaded = files.upload()
#make a diectoryin which kajggle.json is stored
# ! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
#download the dataset into the colab
!kaggle datasets download -d alessiocorrado99/animals10
#unzip the data
!unzip /content/animals10.zip


import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras import Sequential,Model
from tensorflow.keras.layers import BatchNormalization,Dropout,Flatten
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import GlobalAveragePooling2D
from tensorflow.keras.preprocessing import image
from tensorflow.keras .layers import GlobalAveragePooling2D
import numpy as np
import os
import cv2

train_data_dir='/kaggle/input/animals10/raw-img/'
img_height=299
img_width=299
batch_size=64
nb_epochs=20
train_datagen = ImageDataGenerator(rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    validation_split=0.2) # set validation split

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical',
    subset='training') # set as training data

validation_generator = train_datagen.flow_from_directory(
    train_data_dir, # same directory as training data
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical',
    subset='validation') # set as validation data

#import a pre-trained model, without the top layers.We will customise 
#the top layers for our problem
base_model = tf.keras.applications.Xception(include_top=False, input_shape=(299,299,3))
#For now freeze the initial layers and do not train them
for layer in base_model.layers:
    layer.trainable = False
# create a custom top classifier
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(516, activation='relu')(x)
#since our problem has 10 differnt animals we have 10 classes
#thus we keep 10 nodes in the last layer
predictions = Dense(10, activation='softmax')(x)
model = Model(inputs=base_model.inputs, outputs=predictions)
model.summary()

model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

model.fit_generator(
    train_generator,
    steps_per_epoch = train_generator.samples // batch_size,
    validation_data = validation_generator, 
    validation_steps = validation_generator.samples // batch_size,
    epochs = nb_epochs)

#Now unfreeze the layers and train the whole model 
for layer in base_model.layers:
    layer.trainable = True
history =model.fit_generator(
    train_generator,
    steps_per_epoch = train_generator.samples // batch_size,
    validation_data = validation_generator, 
    validation_steps = validation_generator.samples // batch_size,
    epochs = nb_epochs)

model.save('path\name of model')
#order of the animals array is important
#animals=["dog", "horse","elephant", "butterfly",  "chicken",  "cat", "cow",  "sheep","spider", "squirrel"]
bio_animals=sorted(os.listdir('/content/raw-img'))
categories = {'cane': 'dog', "cavallo": "horse", "elefante": "elephant", "farfalla": "butterfly", "gallina": "chicken", "gatto": "cat", "mucca": "cow", "pecora": "sheep", "scoiattolo": "squirrel","ragno":"spider"}
def recognise(pred):
  animals=[categories.get(item,item)  for item in bio_animals]
  print("The image consist of ",animals[pred])

from tensorflow.keras.preprocessing import image
import numpy as np
img = image.load_img("https://d1m75rqqgidzqn.cloudfront.net/kaggle/input/testttt/OIF-e2bexWrojgtQnAPPcUfOWQ.jpeg", target_size=(299, 299))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
prediction=model.predict(x)
# prediction

recognise(np.argmax(prediction))

test_data_path="/content/test data/test_animals"
files=sorted(os.listdir(test_data_path))
files=files[1:]
for img in files:
  x=cv2.imread(os.path.join(test_data_path,img))
  cv2_imshow(x)
  recognise(np.argmax(predict[files.index(img)]))
  print("")

Output:

As we can see the model makes accurate predictions on all of the data in our test dataset. I have saved this model, hence it can be used at any time by using the function shown below:

from tensorflow import keras
model = keras.models.load_model('path .h5')
#e.g. model = keras.models.load_model('/content/simpleconvkag.h5')

In case you want the copy of the trained model or have any queries regarding the code, feel free to drop a comment.

This brings us to the end of this article. We have learned how image recognition works and classified different images of animals.

If you wish to learn more about Python and the concepts of Machine learning, upskill with Great Learning’s PG Program Artificial Intelligence and Machine Learning.

0

2 COMMENTS

    • Hi Ashutosh,
      It depends on the dataset we are working on and here my dataset contained folders of each animal in the order I have mentioned here. If you are facing problems regarding the labels,it is most probably that your folders are not in the same order as it was in my case. You can change the label array according to the order of your folders.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

two + 6 =