Distracted Driver Monitoring System Using AI

NARAYANA DARAPANENI Director AIML Great Learning/Northwestern University Illinois, USA
Suman Kumar Student- Great Learning PGP-AIML Pune, India
Neeraj Tripathi Student- Great Learning PGP-AIML Pune, India
Bhavik Parikh Student- Great Learning PGP-AIML Pune, India
Tejas Beedkar Student- Great Learning PGP-AIML Pune, India
Anwesh Reddy Paduri Senior Data Scientist-DSML Great Learning Hyderabad, India
Abstract

According to a study, driving while distracted accounted for more than 15% of fatalities in 2008 in the United States. In 65.5% of these cases, the driver was alone in the cab. As all accidents cannot be monitored, the actual number of incidents and fatalities due to driver distraction can be significantly higher than the accounted incidents and fatalities. Some vehicles now come equipped with advanced driver assist systems (ADAS) to provide automated safety. ADAS uses a combination of sensors such as LiDAR, IR cameras, Radar, ultrasonic sensors and Visual spectrum cameras to perform object detection and get a situational awareness of the vehicle. Based on this, the ADAS system assist the driver or can take emergency action independently to avert a collision. Owing to the cost of ADAS systems, they are available only in premium cars. This paper explored design and development challenges to create an inexpensive, modular solution to monitor driver's and provide an alert when prolonged distraction is detected. This is not a substitute for a commercial ADAS system but a step towards low-cost driver safety options. Because of its modularity and use of commodity-class hardware, the system should be easy to retrofit in any car at an affordable price.

I. Introduction

This paper focuses on solution options for driver image segmentation and detection of key distraction indicators. This involves a combination of artificial intelligence and traditional coding techniques. We propose a real-time monitoring system to classify a drivers distraction level. The key performance indicator is the model accuracy in distinguishing safe driving from distracted driving behaviours.

The IEEE paper on "Machine Learning and End-to-End Deep Learning for Monitoring Driver Distractions fDpaintrom Physiological and Visual Signals" [5] provides an analysis for the determination of which ML methods perform best in detecting various driving distractions. The paper includes which sensors and which data-capture methods were used with a focus on:

1. Physiological sensors (palm electrodermal activity (pEDA), heart rate and breathing rate)

2.Video cameras (eye tracking, pupil diameter, nasal EDA (nEDA))

Figure 1 - Multi Modality DL fusion [5]

Figure image

The statistical analysis showed that the most informative feature/modality for detecting driver distraction depends on the type of distraction. Overall, the video-based modalities were most informative and classical Machine Learning classifiers realized high performance using one of the video-based modalities. In contrast, the Deep Learning classifiers require more modalities (either all modalities or pre-selected modalities) for the construction of useful classifiers [5].

Using a Pre-trained ImageNet model (VGG-16 architecture for applying transfer learning) and modifying the classifier for the task of distracted driver detection achieved an accuracy of 82.5% [6]. Other approaches [15] utilize combinations of pre-trained image classification models (CNN), classical data augmentation, OpenCV based image pre-processing[11][3], skin segmentation augmentation, VGG-16, GoogleNet, AlexNet, and ResNet. Experiments were conducted on the assisted driving test bed to evaluate the trained models [8] to achieve the following accuracies:

TABLE 1: COMPARISON OF VGG-16, GOOGLENET, ALEXNET, AND RESNET ACCURACIES

Figure image

Systems [9] for detecting states of distraction in drivers during daylight hours using machine vision techniques, which is based on the image segmentation of the eyes and mouth of a person with a frontface-view camera achieved performance accuracies of 90%. A decision concerning the state of the driver is the result from a multilayer perceptron-type neural network with all extracted features as inputs [7][9]. A solution consisting of a genetically weighted ensemble of convolutional neural networks is also available [10]. The convolutional neural networks[14] are trained on raw images, skin-segmented images, face images, hands images, and "face+hands" images. On those five images sources, training and benchmarking was done on an AlexNet network, an Inception V3 network, a ResNet network having 50 layers, and a VGG-16 network. An evaluation of a weighted sum of all networks outputs yielded the final class distribution[16] using a genetic algorithm achieving an overall accuracy of 90%.

Figure 2 - Ensemble CNN Architecture [10]

Figure image

The paper "Detection of Distracted Driver using Convolutional Neural Network" describes using a CNN based system to detect distracted driver and to identify the cause of distraction [4]. VGG-16 architecture was modified for this particular task and several regularization techniques were applied to prevent over-fitting to the training data, thereby achieving a classification accuracy of 95.54% with the number of parameters reduced from 140M in original VGG-16 to 15M only. This study provides a peer reviewed benchmark to compare our model's performance against. The original paper's accuracy scores are:

TABLE 2: CLASS-WISE ACCURACY FROM THE "DETECTION OF DISTRACTED DRIVER USING CONVOLUTIONAL NEURAL NETWORK" PAPER [4]

Figure image

II. DATASETS AND IMAGE PRE-PROCESSING

We started the study with the State Farm Distracted Driver Detection dataset obtained from Kaggle. The dataset consist of images grouped into the following 10 different classes:

c0: safe driving

c1: texting-right

c2: talking on the phone - right

c3: texting - left

c4: talking on the phone - left

c5: operating the radio

c6: drinking

c7: reaching behind

c8: hair and makeup

c9: talking to passenger

c10: Eating

Figure 3 - Sample Images from State Farm Data Set

Figure image

The above dataset was labelled by the authors with annotations (rectangular bounding boxes) using LabelImg tool. About 500 images from each category in the above dataset were segmented with the following labels:

1: lh (left hand)

2: rh (right hand)

3: steer_lh (left hand on steering wheel)

4: steer_rh (right hand on steering wheel)

5: phone

6: phone_lh (phone in the left hand)

7: phone_rh (phone in the right hand)

8: cup

9: cup_lh (cup in the left hand)

10: cup_rh (cup in the right hand)

11: head front

12: head left

13: head right

14: head back

15: head down

The following figures illustrate the original images against the labelled images.

Figure image

Figure 4 - Images labelled using LabelImg The Left hand image has the State farm dataset classification of c5: operating the radio. Our labels include 11: head_front as highlighted. The Right hand image has the State farm dataset classification of c9: talking to passenger. Our labels include 13: head right as highlighted.

III. MATERIALS AND METHODS

A. Exploratory Data Analysis (EDA)

We have trained our solution models using the Kaggle_State Farm Distracted Driver Detection dataset. Properties of the Kaggle dataset are:

TABLE 3: TRAINING DATASET PROPERTIES The following were observed as the major sources of variance in the images:

Figure image

1) Left hand vs right hand: In India, the steering wheel is on the right side of the car. Most of the image is the data set indicate the steering wheel on the left side of the car. The distortion can be corrected with a simple image flip as all images for training need to show the Indian standard.

Figure 5 - Left hand to right hand image flip

Figure image

2) Ethnicity and Gender of driver: The image set contains drivers of multiple ethnicities and genders. To generalize the model, the color of the drivers face, hair style, color and style of clothing and Head Gear following information in the image needs to be normalized.

Figure 6 - Ethnicity and Gender Variance As we can see, besides the sitting pose and human body form, the drivers in images 1,2,3 above have no other properties in common.

Figure image

3) Image Distortion: The location of the camera and driver in the vehicle is not consistent. As such the relative location of fixed components in the images varies.

Figure 7 - Camera / Driver variances

Figure image

Figure 8 - Camera placement Note - Experimentation has revealed that inconsistent camera placement is the highest distractor to model accuracy.

Figure image

4) Image Background: Several images in the data set appear to be staged. The driver window is padded with a sheet to block out the background. This is not the true with field applications where the background is constantly changing.

5) Image Size: All images in the data set are VGA 640x480 pixels. The low image size impacts the models as there are less pixels that capture the objects of interest.

6) Image Color Distribution: The colour histograms of a random sample of images indicate a tendency to have saturated colours as they are skewed to the edges.

Figure 9 - Colour Distribution All images will need to be normalized before training the models.

Figure image

Figure 10 - Image Normalization Our solution used CLAHE because of the nature of the images. The images of a car cabin would have high contrasts because of bright outside light and dim interiors.

Figure image

7) Time of Day: All training images are captured during daytime. To make the solution weather independent, we need an Infrared spectrum Camera and illumination.

B. Methodology

Initial architecture and design were validated by Proof of Concept (POC). Based on the lessons learned from POC iterations, the architecture has been modified.

Figure 11 - Final Architecture

Figure image

The solution architecture encompasses of three layers:

1) Input Layer: The objective of this layer is to take the images from different sources such as camera mounted on the dashboard of the vehicle etc.. The different building blocks of this layer are as below: Camera, Stored Video, Image Database.

2) Pre-Processing and Detection Layer:

Image Pre-processing: Apply filters such as CLAHE, Blurring, etc., Flip image based on left/right hand drive, Convert images to tensor for models.

Dirty/Blocked Lens: In order to identify the exposure level of the image, Contrast Limited Adaptive Histogram Equalization (CLAHE) is applied to adjust the exposure levels. For examining the blurriness, a common method is using Fast Fourier Transform.

Key points Detection: For key points detection, the Centernet HourGlass 104 Key points 512x512 model was used.

Custom Object Detection: Retraining the ssd mobilenet v2 fpnlite 640x640 coco17 model for custom object gives image segmented into the custom classes shown above.

3) Integrator: The integrator overlays the images obtained from the following components: F2.2 Is the camera blocked?, F3.1-Key points detection, F3.2 Image Class Diductor.

4) Face Redaction: Certain applications require that the privacy of the user be protected. In this case we need to redact the users face. This is done by pixelating the box that contains any one of the 5 head detection classes.

Figure 12 - Point Detection using Centernet

Figure image

Figure 13 - Visualizations captured using Retrained Resnet

Figure image

Figure 14 - Merged image from Integrator

Figure image

Figure 15 - Image with head redacted

Figure image

Figure 16 – Steer and Head Area Self Training Detection

V. DISCUSSIONS AND CONCLUSIONS

Assumptions and Limitations

The model may not detect driver distractions at night time because the dataset used are captured at daylight.

The dataset is of pre-pandemic era, we don't have images of drivers with masks which can lead to low generalization of the model for current drivers.

The model is dependent on the camera position and make of the car and dashboard.

Implications

Higher model accuracy can be obtained using a consistent location in the car.

Image pre-processing can be integrated into the camera physics. If we use a Infra-red spectrum camera with Infra-red illumination, then the image will be environmentally agnostic and monochrome.

Using an Object detection model is not recommended in future iterations. Using an image segmentation model that relies on the know location of the camera and vehicle interior can easily sperate out the driver.

Figure 17 - Segmentation Flow

Figure image

Driver distraction detection solutions are possible to implement with the following features using today's commercially available tools and techniques:

1. Embeddable solutions using low power and small form factor devices such as the Jetson Nano.

2. Driver segmentation, detection and classification using open-source (but not freely licensable) tools such as Tensorflow Model-garden.

In conclusion, it is possible to move the computation overhead of the Driver detection from the DNN-Object Detection models to the physical parameters of the system components (e.g. Car make, camera position, etc) and simple math such as subtracting the vehicle body form the image. This can allow for having a VERY small footprint solution for the Driver Detection and Distraction monitoring for commercial applications.

REFERENCES

  1. Wilson, F. A., & Stimpson, J. P. (2010). "Trends in fatalities from distracted driving in the United States, 1999 to 2008." American Journal of Public Health, 100(11), 2213-2219.
  2. Smirnov, A., & Lashkov, I. (n.d.). "State-of-the-art analysis of available advanced driver assistance systems." from E-werest.org website: https://ewerest.org/sites/default/files/files/conference17/AdvancedDriverAssistance.pdf
  3. N. Darapaneni, B. Krishnamurthy, and A. R. Paduri, "Convolution Neural Networks: A Comparative Study for Image Classification," in 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS), 2020, pp. 327-332.
  4. Image Processing. (n.d.). "Detection of distracted driver using convolutional neural network." Retrieved July 9, 2021, from Thecvf.com website: https://openaccess.thecvf.com/content_cvpr_2018_workshops/papers/w14/Baheti_Detection_of_Distracted_CVPR_2018_paper.pdf
  5. Gjoreski, M., Gams, M. Z., Lustrek, M., Genc, P., Garbas, J.-U., & Hassan, T. (2020). "Machine learning and end-to-end deep learning for monitoring driver distractions from physiological and visual signals." IEEE Access: Practical Innovations, Open Solutions, 8, 70590-70603.
  6. Oberoi, M., Panchal, H., & Jain, Y. (2013). "Driver Distraction Detection using Transfer Learning." Retrieved July 9, 2021, from Ijert.org website: https://www.ijert.org/research/driver-distractiondetection-using-transfer-learning-IDERTV9IS050862.pdf
  7. N. Darapaneni et al., "Automatic face detection and recognition for attendance maintenance," in 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS), 2020, pp. 236-241.
  8. (N.d.). Retrieved July 9, 2021, from Researchgate.net website: "Real-time Detection of Distracted Driving based on Deep Learning." https://www.researchgate.net/profile/Ha-Do-10/publication/326740203_Realtime_Detection_of_Distracted_Driving_based_on_Deep_Learning/links/5ba96a41a6fdccd3cb70a927/Real-time-Detection-of-Distracted-Driving-based-on-Deep-Learning.pdf
  9. Jiménez Moreno, R., Avilés Sánchez, O., & Amaya Hurtado, D. (2014). "Driver distraction detection using machine vision techniques." Ingeniería y Competitividad, 16(2), 55-63.
  10. Eraqi, H. M., Abouelnaga, Y., Saad, M. H., & Moustafa, M. Ν. (2019). "Driver distraction identification with an ensemble of convolutional neural networks." Journal of Advanced Transportation, 2019, 1-12.
  11. N. Darapaneni, R. Choubey, P. Salvi, A. Pathak, S. Suryavanshi, and A. R. Paduri, "Facial expression recognition and recommendations using deep neural network with transfer learning," in 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), 2020, pp. 0668-0673.
  12. Gjoreski, M., Gams, M. Z., Lustrek, M., Genc, P., Garbas, J.-U., & Hassan, T. (2020). "Machine learning and end-to-end deep learning for monitoring driver distractions from physiological and visual signals." IEEE Access: Practical Innovations, Open Solutions, 8, 70590-70603.
  13. Jain, D. K., Jain, R., Lan, X., Upadhyay, Y., & Thareja, A. (2021). "Driver distraction detection using capsule network." Neural Computing & Applications, 33(11), 6183-6196.
  14. Kim, W., Jung, W.-S., & Choi, H. K. (2019). "Lightweight driver monitoring system based on multi-Task Mobilenets." Sensors (Basel, Switzerland), 19(14), 3200.
  15. Mofid, N., Bayrooti, J., & Ravi, S. (2020). "Keep your Al-es on the road: Tackling distracted driver detection with convolutional neural networks and targeted data augmentation." Retrieved from http://arxiv.org/abs/2006.10955
  16. Alkinani, M. H., Khan, W. Z., & Arshad, Q. (2020). "Detecting human driver inattentive and aggressive driving behavior using deep learning: Recent advances, requirements and open challenges." IEEE Access: Practical Innovations, Open Solutions, 8, 105008-105030.
  17. N. Darapaneni et al., "Activity & emotion detection of recognized kids in CCTV video for day care using SlowFast & CNN," in 2021 IEEE World AI IoT Congress (AIIoT), 2021, pp. 0268-0274.
  18. N. Darapaneni et al., "Computer vision based license plate detection for automated vehicle parking management system," in 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), 2020, pp. 0800-0805.
  19. N. Darapaneni et al., "Autonomous car driving using deep learning," in 2021 2nd International Conference on Secure Cyber Computing and Communications (ICSCCC), 2021, pp. 29-33.
Explore More Research and Studies

Go Beyond Learning. Get Job-Ready.

Build in-demand skills for today's jobs with free expert-led courses and practical AI tools.

Explore All Courses
Scroll to Top