Distracted Driver Monitoring System Using AI
According to a study, driving while distracted accounted for more than 15% of fatalities in 2008 in the United States. In 65.5% of these cases, the driver was alone in the cab. As all accidents cannot be monitored, the actual number of incidents and fatalities due to driver distraction can be significantly higher than the accounted incidents and fatalities. Some vehicles now come equipped with advanced driver assist systems (ADAS) to provide automated safety. ADAS uses a combination of sensors such as LiDAR, IR cameras, Radar, ultrasonic sensors and Visual spectrum cameras to perform object detection and get a situational awareness of the vehicle. Based on this, the ADAS system assist the driver or can take emergency action independently to avert a collision. Owing to the cost of ADAS systems, they are available only in premium cars. This paper explored design and development challenges to create an inexpensive, modular solution to monitor driver's and provide an alert when prolonged distraction is detected. This is not a substitute for a commercial ADAS system but a step towards low-cost driver safety options. Because of its modularity and use of commodity-class hardware, the system should be easy to retrofit in any car at an affordable price.
This paper focuses on solution options for driver image segmentation and detection of key distraction indicators. This involves a combination of artificial intelligence and traditional coding techniques. We propose a real-time monitoring system to classify a drivers distraction level. The key performance indicator is the model accuracy in distinguishing safe driving from distracted driving behaviours.
The IEEE paper on "Machine Learning and End-to-End Deep Learning for Monitoring Driver Distractions fDpaintrom Physiological and Visual Signals" [5] provides an analysis for the determination of which ML methods perform best in detecting various driving distractions. The paper includes which sensors and which data-capture methods were used with a focus on:
1. Physiological sensors (palm electrodermal activity (pEDA), heart rate and breathing rate)
2.Video cameras (eye tracking, pupil diameter, nasal EDA (nEDA))
Figure 1 - Multi Modality DL fusion [5]
The statistical analysis showed that the most informative feature/modality for detecting driver distraction depends on the type of distraction. Overall, the video-based modalities were most informative and classical Machine Learning classifiers realized high performance using one of the video-based modalities. In contrast, the Deep Learning classifiers require more modalities (either all modalities or pre-selected modalities) for the construction of useful classifiers [5].
Using a Pre-trained ImageNet model (VGG-16 architecture for applying transfer learning) and modifying the classifier for the task of distracted driver detection achieved an accuracy of 82.5% [6]. Other approaches [15] utilize combinations of pre-trained image classification models (CNN), classical data augmentation, OpenCV based image pre-processing[11][3], skin segmentation augmentation, VGG-16, GoogleNet, AlexNet, and ResNet. Experiments were conducted on the assisted driving test bed to evaluate the trained models [8] to achieve the following accuracies:
TABLE 1: COMPARISON OF VGG-16, GOOGLENET, ALEXNET, AND RESNET ACCURACIES
Systems [9] for detecting states of distraction in drivers during daylight hours using machine vision techniques, which is based on the image segmentation of the eyes and mouth of a person with a frontface-view camera achieved performance accuracies of 90%. A decision concerning the state of the driver is the result from a multilayer perceptron-type neural network with all extracted features as inputs [7][9]. A solution consisting of a genetically weighted ensemble of convolutional neural networks is also available [10]. The convolutional neural networks[14] are trained on raw images, skin-segmented images, face images, hands images, and "face+hands" images. On those five images sources, training and benchmarking was done on an AlexNet network, an Inception V3 network, a ResNet network having 50 layers, and a VGG-16 network. An evaluation of a weighted sum of all networks outputs yielded the final class distribution[16] using a genetic algorithm achieving an overall accuracy of 90%.
Figure 2 - Ensemble CNN Architecture [10]
The paper "Detection of Distracted Driver using Convolutional Neural Network" describes using a CNN based system to detect distracted driver and to identify the cause of distraction [4]. VGG-16 architecture was modified for this particular task and several regularization techniques were applied to prevent over-fitting to the training data, thereby achieving a classification accuracy of 95.54% with the number of parameters reduced from 140M in original VGG-16 to 15M only. This study provides a peer reviewed benchmark to compare our model's performance against. The original paper's accuracy scores are:
TABLE 2: CLASS-WISE ACCURACY FROM THE "DETECTION OF DISTRACTED DRIVER USING CONVOLUTIONAL NEURAL NETWORK" PAPER [4]
We started the study with the State Farm Distracted Driver Detection dataset obtained from Kaggle. The dataset consist of images grouped into the following 10 different classes:
c0: safe driving
c1: texting-right
c2: talking on the phone - right
c3: texting - left
c4: talking on the phone - left
c5: operating the radio
c6: drinking
c7: reaching behind
c8: hair and makeup
c9: talking to passenger
c10: Eating
Figure 3 - Sample Images from State Farm Data Set
The above dataset was labelled by the authors with annotations (rectangular bounding boxes) using LabelImg tool. About 500 images from each category in the above dataset were segmented with the following labels:
1: lh (left hand)
2: rh (right hand)
3: steer_lh (left hand on steering wheel)
4: steer_rh (right hand on steering wheel)
5: phone
6: phone_lh (phone in the left hand)
7: phone_rh (phone in the right hand)
8: cup
9: cup_lh (cup in the left hand)
10: cup_rh (cup in the right hand)
11: head front
12: head left
13: head right
14: head back
15: head down
The following figures illustrate the original images against the labelled images.
Figure 4 - Images labelled using LabelImg The Left hand image has the State farm dataset classification of c5: operating the radio. Our labels include 11: head_front as highlighted. The Right hand image has the State farm dataset classification of c9: talking to passenger. Our labels include 13: head right as highlighted.
A. Exploratory Data Analysis (EDA)
We have trained our solution models using the Kaggle_State Farm Distracted Driver Detection dataset. Properties of the Kaggle dataset are:
TABLE 3: TRAINING DATASET PROPERTIES The following were observed as the major sources of variance in the images:
1) Left hand vs right hand: In India, the steering wheel is on the right side of the car. Most of the image is the data set indicate the steering wheel on the left side of the car. The distortion can be corrected with a simple image flip as all images for training need to show the Indian standard.
Figure 5 - Left hand to right hand image flip
2) Ethnicity and Gender of driver: The image set contains drivers of multiple ethnicities and genders. To generalize the model, the color of the drivers face, hair style, color and style of clothing and Head Gear following information in the image needs to be normalized.
Figure 6 - Ethnicity and Gender Variance As we can see, besides the sitting pose and human body form, the drivers in images 1,2,3 above have no other properties in common.
3) Image Distortion: The location of the camera and driver in the vehicle is not consistent. As such the relative location of fixed components in the images varies.
Figure 7 - Camera / Driver variances
Figure 8 - Camera placement Note - Experimentation has revealed that inconsistent camera placement is the highest distractor to model accuracy.
4) Image Background: Several images in the data set appear to be staged. The driver window is padded with a sheet to block out the background. This is not the true with field applications where the background is constantly changing.
5) Image Size: All images in the data set are VGA 640x480 pixels. The low image size impacts the models as there are less pixels that capture the objects of interest.
6) Image Color Distribution: The colour histograms of a random sample of images indicate a tendency to have saturated colours as they are skewed to the edges.
Figure 9 - Colour Distribution All images will need to be normalized before training the models.
Figure 10 - Image Normalization Our solution used CLAHE because of the nature of the images. The images of a car cabin would have high contrasts because of bright outside light and dim interiors.
7) Time of Day: All training images are captured during daytime. To make the solution weather independent, we need an Infrared spectrum Camera and illumination.
B. Methodology
Initial architecture and design were validated by Proof of Concept (POC). Based on the lessons learned from POC iterations, the architecture has been modified.
Figure 11 - Final Architecture
The solution architecture encompasses of three layers:
1) Input Layer: The objective of this layer is to take the images from different sources such as camera mounted on the dashboard of the vehicle etc.. The different building blocks of this layer are as below: Camera, Stored Video, Image Database.
2) Pre-Processing and Detection Layer:
Image Pre-processing: Apply filters such as CLAHE, Blurring, etc., Flip image based on left/right hand drive, Convert images to tensor for models.
Dirty/Blocked Lens: In order to identify the exposure level of the image, Contrast Limited Adaptive Histogram Equalization (CLAHE) is applied to adjust the exposure levels. For examining the blurriness, a common method is using Fast Fourier Transform.
Key points Detection: For key points detection, the Centernet HourGlass 104 Key points 512x512 model was used.
Custom Object Detection: Retraining the ssd mobilenet v2 fpnlite 640x640 coco17 model for custom object gives image segmented into the custom classes shown above.
3) Integrator: The integrator overlays the images obtained from the following components: F2.2 Is the camera blocked?, F3.1-Key points detection, F3.2 Image Class Diductor.
4) Face Redaction: Certain applications require that the privacy of the user be protected. In this case we need to redact the users face. This is done by pixelating the box that contains any one of the 5 head detection classes.
Figure 12 - Point Detection using Centernet
Figure 13 - Visualizations captured using Retrained Resnet
Figure 14 - Merged image from Integrator
Figure 15 - Image with head redacted
Figure 16 – Steer and Head Area Self Training Detection
Assumptions and Limitations
The model may not detect driver distractions at night time because the dataset used are captured at daylight.
The dataset is of pre-pandemic era, we don't have images of drivers with masks which can lead to low generalization of the model for current drivers.
The model is dependent on the camera position and make of the car and dashboard.
Implications
Higher model accuracy can be obtained using a consistent location in the car.
Image pre-processing can be integrated into the camera physics. If we use a Infra-red spectrum camera with Infra-red illumination, then the image will be environmentally agnostic and monochrome.
Using an Object detection model is not recommended in future iterations. Using an image segmentation model that relies on the know location of the camera and vehicle interior can easily sperate out the driver.
Figure 17 - Segmentation Flow
Driver distraction detection solutions are possible to implement with the following features using today's commercially available tools and techniques:
1. Embeddable solutions using low power and small form factor devices such as the Jetson Nano.
2. Driver segmentation, detection and classification using open-source (but not freely licensable) tools such as Tensorflow Model-garden.
In conclusion, it is possible to move the computation overhead of the Driver detection from the DNN-Object Detection models to the physical parameters of the system components (e.g. Car make, camera position, etc) and simple math such as subtracting the vehicle body form the image. This can allow for having a VERY small footprint solution for the Driver Detection and Distraction monitoring for commercial applications.
REFERENCES
- Wilson, F. A., & Stimpson, J. P. (2010). "Trends in fatalities from distracted driving in the United States, 1999 to 2008." American Journal of Public Health, 100(11), 2213-2219.
- Smirnov, A., & Lashkov, I. (n.d.). "State-of-the-art analysis of available advanced driver assistance systems." from E-werest.org website: https://ewerest.org/sites/default/files/files/conference17/AdvancedDriverAssistance.pdf
- N. Darapaneni, B. Krishnamurthy, and A. R. Paduri, "Convolution Neural Networks: A Comparative Study for Image Classification," in 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS), 2020, pp. 327-332.
- Image Processing. (n.d.). "Detection of distracted driver using convolutional neural network." Retrieved July 9, 2021, from Thecvf.com website: https://openaccess.thecvf.com/content_cvpr_2018_workshops/papers/w14/Baheti_Detection_of_Distracted_CVPR_2018_paper.pdf
- Gjoreski, M., Gams, M. Z., Lustrek, M., Genc, P., Garbas, J.-U., & Hassan, T. (2020). "Machine learning and end-to-end deep learning for monitoring driver distractions from physiological and visual signals." IEEE Access: Practical Innovations, Open Solutions, 8, 70590-70603.
- Oberoi, M., Panchal, H., & Jain, Y. (2013). "Driver Distraction Detection using Transfer Learning." Retrieved July 9, 2021, from Ijert.org website: https://www.ijert.org/research/driver-distractiondetection-using-transfer-learning-IDERTV9IS050862.pdf
- N. Darapaneni et al., "Automatic face detection and recognition for attendance maintenance," in 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS), 2020, pp. 236-241.
- (N.d.). Retrieved July 9, 2021, from Researchgate.net website: "Real-time Detection of Distracted Driving based on Deep Learning." https://www.researchgate.net/profile/Ha-Do-10/publication/326740203_Realtime_Detection_of_Distracted_Driving_based_on_Deep_Learning/links/5ba96a41a6fdccd3cb70a927/Real-time-Detection-of-Distracted-Driving-based-on-Deep-Learning.pdf
- Jiménez Moreno, R., Avilés Sánchez, O., & Amaya Hurtado, D. (2014). "Driver distraction detection using machine vision techniques." Ingeniería y Competitividad, 16(2), 55-63.
- Eraqi, H. M., Abouelnaga, Y., Saad, M. H., & Moustafa, M. Ν. (2019). "Driver distraction identification with an ensemble of convolutional neural networks." Journal of Advanced Transportation, 2019, 1-12.
- N. Darapaneni, R. Choubey, P. Salvi, A. Pathak, S. Suryavanshi, and A. R. Paduri, "Facial expression recognition and recommendations using deep neural network with transfer learning," in 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), 2020, pp. 0668-0673.
- Gjoreski, M., Gams, M. Z., Lustrek, M., Genc, P., Garbas, J.-U., & Hassan, T. (2020). "Machine learning and end-to-end deep learning for monitoring driver distractions from physiological and visual signals." IEEE Access: Practical Innovations, Open Solutions, 8, 70590-70603.
- Jain, D. K., Jain, R., Lan, X., Upadhyay, Y., & Thareja, A. (2021). "Driver distraction detection using capsule network." Neural Computing & Applications, 33(11), 6183-6196.
- Kim, W., Jung, W.-S., & Choi, H. K. (2019). "Lightweight driver monitoring system based on multi-Task Mobilenets." Sensors (Basel, Switzerland), 19(14), 3200.
- Mofid, N., Bayrooti, J., & Ravi, S. (2020). "Keep your Al-es on the road: Tackling distracted driver detection with convolutional neural networks and targeted data augmentation." Retrieved from http://arxiv.org/abs/2006.10955
- Alkinani, M. H., Khan, W. Z., & Arshad, Q. (2020). "Detecting human driver inattentive and aggressive driving behavior using deep learning: Recent advances, requirements and open challenges." IEEE Access: Practical Innovations, Open Solutions, 8, 105008-105030.
- N. Darapaneni et al., "Activity & emotion detection of recognized kids in CCTV video for day care using SlowFast & CNN," in 2021 IEEE World AI IoT Congress (AIIoT), 2021, pp. 0268-0274.
- N. Darapaneni et al., "Computer vision based license plate detection for automated vehicle parking management system," in 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), 2020, pp. 0800-0805.
- N. Darapaneni et al., "Autonomous car driving using deep learning," in 2021 2nd International Conference on Secure Cyber Computing and Communications (ICSCCC), 2021, pp. 29-33.