Image pre-processing
  1. Introduction to Image Pre-Processing
  2. Pixel brightness transformations(PBT)
  3. Gamma Correction
  4. Histogram equalization
  5. Sigmoid stretching 
  6. Geometric Transformations
  7. Image Filtering and Segmentation
  8. Image Segmentation
  9. Fourier transform

Contributed to: Sreekanth
LinkedIn profile:

Introduction to Image Pre-Processing

As a Machine Learning Engineer, data pre-processing or data cleansing is a crucial step and most of the ML engineers spend a good amount of time in data pre-processing before building the model. Some examples for data pre-processing includes outlier detection, missing value treatments and remove the unwanted or noisy data.

Similarly, Image pre-processing is the term for operations on images at the lowest level of abstraction. These operations do not increase image information content but they decrease it if entropy is an information measure. The aim of pre-processing is an improvement of the image data that suppresses undesired distortions or enhances some image features relevant for further processing and analysis task. 

There are 4 different types of Image Pre-Processing techniques and they are listed below.

  1. Pixel brightness transformations/ Brightness corrections
  2. Geometric Transformations
  3. Image Filtering and Segmentation 
  4. Fourier transform and Image restauration

Let’s discuss each type in detail.

Pixel brightness transformations(PBT)

Brightness transformations modify pixel brightness and the transformation depends on the properties of a pixel itself. In PBT, output pixel’s value depends only on the corresponding input pixel value. Examples of such operators include brightness and contrast adjustments as well as colour correction and transformations.

Contrast enhancement is an important area in image processing for both human and computer vision. It is widely used for medical image processing and as a pre-processing step in speech recognition, texture synthesis, and many other image/video processing applications

There are two types of Brightness transformations and they are below.

  1. Brightness corrections
  2. Gray scale transformation

The most common Pixel brightness transforms operations are

  1. Gamma correction or Power Law Transform
  2. Sigmoid stretching 
  3. Histogram equalization

Two commonly used point processes are multiplication and addition with a constant.


The parameters α>0 and β are called the gain and bias parameters and sometimes these parameters are said to control contrast and brightness respectively.

cv.convertScaleAbs(image, alpha=alpha, beta=beta) 

for different values of alpha and beta, the image brightness and contrast varies. 

Source : OpenCV

Gamma Correction

Gamma correction is a non-linear adjustment to individual pixel values. While in image normalization we carried out linear operations on individual pixels, such as scalar multiplication and addition/subtraction, gamma correction carries out a non-linear operation on the source image pixels, and can cause saturation of the image being altered.

Here the relation between output image and gamma is non linear.

In the above diagram, some of the objects are not visible in the original diagram and when gamma is 2.0, most of the objects are visible.

Code :  adjusted = adjust_gamma(original, gamma=gamma)

How To Apply Machine Learning to Recognise Handwriting

Histogram equalization

Histogram equalization is a well-known contrast enhancement technique due to its performance on almost all types of image. Histogram equalization provides a sophisticated method for modifying the dynamic range and contrast of an image by altering that image such that its intensity histogram has the desired shape. Unlike contrast stretching, histogram modelling operators may employ non-linear and non-monotonic transfer functions to map between pixel intensity values in the input and output images. 

The normalized histogram.

P(n)  = number of pixels with intensity n/ total number of pixels

Sigmoid stretching 

Sigmoid function is a continuous nonlinear activation function. The name, sigmoid, is obtained from the fact that the function is “S” shaped. Statisticians call this function the logistic function.

g (x,y) is Enhanced pixel value

c is Contrast factor

th is Threshold value

fs(x,y) is original image

By adjusting the contrast factor ‘c’ and threshold value it is possible to tailor the amount of lightening and darkening to control the overall contrast enhancement

Geometric Transformations

The earlier methods in this article deal with the colour and brightness/contrast. With geometric transformation, positions of pixels in an image are modified but the colours are unchanged.

Geometric transforms permit the elimination of geometric distortion that occurs when an image is captured. The normal Geometric transformation operations are rotation, scaling and distortion (or undistortion!) of images.

There are two basic steps in geometric transformations:

1. Spatial transformation of the physical rearrangement of pixels in the image

2. Grey level interpolation, which assigns grey levels to the transformed image

Transformations :

  1. Scaling : Scaling is just resizing of the image

2. Translation : Translation is the shifting of object’s location

3. Rotation : Just rotating an object with theta degrees

4. Shearing : Shifting the pixels horizontally

5. Affine Transformation : Instead of defining the scale factors, the shearing factors and the rotation angle, it is common to merge these three transformation into one matrix. The combination of the four transformations is therefore defined as Affine Transformation

6. Perspective Transformation : change the perspective of a given image or video for getting better insights about the required information. Here the points needs to be provided on the image from which want to gather information by changing the perspective.

Interpolation Methods :

After the transformation methods, the new point co-ordinates (x’,y’) were obtained. Lets suppose these new points do not in general fit the discrete raster of the output image. So Each pixel value in the output image raster can be obtained by interpolation methods.

The brightness interpolation problem is usually expressed in a dual way. The brightness value of the pixel (x’,y’) in the output image where x’ and y’ lie on the discrete raster and it is 

Different types of Interpolation methods are 

  1. Nearest neighbor interpolation is the simplest technique that re samples the pixel values present in the input vector or a matrix

2. Linear interpolation explores four points neighboring the point (x,y), and assumes that the brightness function is linear in this neighborhood.

3. Bicubic interpolation improves the model of the brightness function by approximating it locally by a bicubic polynomial surface.sixteen neighboring points are used for interpolation.

Image Filtering and Segmentation

The goal of using filters is to modify or enhance image properties and/or to extract valuable information from the pictures such as edges, corners, and blobs. A filter is defined by a kernel, which is a small array applied to each pixel and its neighbors within an image

Some of the basic filtering techniques are 

  1. Low Pass Filtering (Smoothing) : A low pass filter is the basis for most smoothing methods. An image is smoothed by decreasing the disparity between pixel values by averaging nearby pixels
  2. High pass filters (Edge Detection, Sharpening) : High-pass filter can be used to make an image appear sharper. These filters emphasize fine details in the image – the opposite of the low-pass filter. High-pass filtering works in the same way as low-pass filtering; it just uses a different convolution kernel.
  3. Directional Filtering : Directional filter is an edge detector that can be used to compute the first derivatives of an image. The first derivatives (or slopes) are most evident when a large change occurs between adjacent pixel values.Directional filters can be designed for any direction within a given space
  4. Laplacian Filtering : Laplacian filter is an edge detector used to compute the second derivatives of an image, measuring the rate at which the first derivatives change. This determines if a change in adjacent pixel values is from an edge or continuous progression. Laplacian filter kernels usually contain negative values in a cross pattern, centered within the array. The corners are either zero or positive values. The center value can be either negative or positive.

Computer Vision: Low-level Vision

Image Segmentation

Image segmentation is a commonly used technique in digital image processing and analysis to partition an image into multiple parts or regions, often based on the characteristics of the pixels in the image. Image segmentation could involve separating foreground from background, or clustering regions of pixels based on similarities in colour or shape.

Image Segmentation mainly used in 

  • Face detection
  • Medical imaging
  • Machine vision
  • Autonomous Driving

 There are two types of image segmentation techniques.

  1. Non-contextual thresholding : Thresholding is the simplest non-contextual segmentation technique. With a single threshold, it transforms a greyscale or colour image into a binary image considered as a binary region map. The binary map contains two possibly disjoint regions, one of them containing pixels with input data values smaller than a threshold and another relating to the input values that are at or above the threshold. The below are the types of thresholding techniques
  1. Simple thresholding
  2. Adaptive thresholding
  3. Colour thresholding
  1. Contextual segmentation : Non-contextual thresholding groups pixels with no account of their relative locations in the image plane. Contextual segmentation can be more successful in separating individual objects because it accounts for closeness of pixels that belong to an individual object. Two basic approaches to contextual segmentation are based on signal discontinuity or similarity. Discontinuity-based techniques attempt to find complete boundaries enclosing relatively uniform regions assuming abrupt signal changes across each boundary. Similarity-based techniques attempt to directly create these uniform regions by grouping together connected pixels that satisfy certain similarity criteria. Both the approaches mirror each other, in the sense that a complete boundary splits one region into two. The below ate the types of Contextual segmentation.
  1. Pixel connectivity
  2. Region similarity
  3. Region growing
  4. Split-and-merge segmentation
  1. Texture Segmentation : Texture is most important attribute in many image analysis or computer vision applications. The procedures developed for texture problem can be subdivided into four categories.
    1.  structural approach
    2. statistical approach
    3. model based approach
    4.  filter based approach

Fourier transform

The Fourier Transform is an important image processing tool which is used to decompose an image into its sine and cosine components. The output of the transformation represents the image in the Fourier or frequency domain, while the input image is the spatial domain equivalent. In the Fourier domain image, each point represents a particular frequency contained in the spatial domain image.

The Fourier Transform is used in a wide range of applications, such as image analysis, image filtering, image reconstruction and image compression.

The DFT(Discrete Fourier Transform) is the sampled Fourier Transform and therefore does not contain all frequencies forming an image, but only a set of samples which is large enough to fully describe the spatial domain image. The number of frequencies corresponds to the number of pixels in the spatial domain image, i.e. the image in the spatial and Fourier domain are of the same size.

For a square image of size N×N, the two-dimensional DFT is given by:

Inverse Fourier Transform is given by

Source : TutorialsPoint

In this article, the attempt is made to list down the different image pre-processing techniques. In the next articles, I will explain each and every technique with the Math and Python codes by using OpenCV and Neural Networks.

If you want to learn more about other artificial intelligence and machine learning techniques, study artificial intelligence today. Upskilling in this domain can help you advance your career. Join now! Feel free to leave your queries in the comments below.



Please enter your comment!
Please enter your name here

fourteen − thirteen =