Deep learning stems from neural network-based artificial intelligence. Neural networks transformed the entire learning process of algorithms due to its efficient feature extraction capabilities. The deep learning approach has gained tremendous momentum in the past decade for the following reasons:
- Large and efficient datasets
- Huge rise in computational power capabilities
The best time to learn deep learning was 20 years ago. The second best time is now. Computational power has become democratised today. Google lends you a GPU for 12 hours at a go for no cost at all. Anyone interested in implementing models can find datasets online, and upload it to their Google Drive accounts and play with them through collaborations.
Kaggle also provides free kernels with GPUs. The beauty lies in the algorithms and learning the intricacies of their workings. Materials available on the web are too scattered, and joining a course or an academy that teaches these concepts from the basics helps in reducing the chaos.
In this article, we will consider commonly used datasets for computer vision from the perspective of the application of deep learning and computer vision. We will look at these datasets from their application standpoints.
3D Vision Research
3D60⁰ Dataset: This dataset provides synthetic and real scanned images of interior spaces with densely annotated spherical panoramas.
Voice Operated Character Animation: This dataset was created to achieve human-like performance in the field of audio-driven 3D facial animation. It is a 4D face dataset with 29 minutes of 4D scans captured at 60fps and synchronized audio from 12 speakers.
There are several datasets available for implementing solutions for autonomous driving vehicles. Datasets mentioned in the article may fall in more than two categories. So use your imagination to the fullest to play with these datasets.
Interaction Dataset: The INTERACTION dataset contains naturalistic motions of traffic participants in various highly interactive driving scenarios. Using drones and traffic cameras, various trajectories are captured from different countries, including the US, Germany, China.
The dataset can be applied for many behaviour-related research areas, such as
- Intention/behaviour/motion prediction
- Behaviour cloning and imitation learning
- Behaviour analysis and modeling
- Motion pattern and representation learning
- Interactive behaviour extraction and categorization
- Social and human-like behaviour generation
- Decision-making and planning algorithm development and verification
- Driving scenario/case generation
AEV Autonomous Driving Dataset (A2D2): An open multi-sensor dataset for autonomous driving research. The dataset includes more than 40,000 frames with semantic segmentation and points cloud labels, of which more than 12,000 frames have annotations along with the bounding boxes.
Computational photography is the use of computer processing capabilities in cameras to produce an enhanced image beyond what the lens and sensor picks up in a single shot.
Multiple Light Source Dataset: The dataset contains realistic scenarios for the evaluation of computational color constancy algorithms. At the same time, it aims to make the data as general as possible for various computer vision use cases.
Generated Faces dataset: A dataset created by AI, to eradicate the hindrance caused by copyrights during the use of datasets.
Anime face dataset: This is a dataset consisting of 63632 high-quality anime faces scraped from www.getchu.com, which are then cropped using the anime face detection algorithm in https://github.com/nagadomi/lbpcascade_animeface. Images sizes vary from 90 * 90 ~ 120 * 120
Human Pose Estimation:
Many applications make use of the human posture to determine various factors. For example, an app that teaches you yoga. The app needs to understand the right pose of the yoga posture, and then teach it to you and correct you if needed.
SURREAL Dataset: First large-scale person dataset to generate depth, body parts, optical flow, 2D/3D pose for RGB video input. The dataset contains 6M frames of synthetic humans. The images are photo-realistic renderings of people under large variations in shape, texture, viewpoint and pose.
LSUN or Large-Scale Scene Understanding: It is a dataset used to detect and accelerate the progress in the domain of scene understanding that includes scene classification, room layout estimation, etc.
MNIST: This is a dataset suitable for beginners in the field of computer vision. The number of classes is 10, that is, the digits 0-9. Keras has the following dataset inbuilt and many examples are available on the net.
Youtube-8M: A large-scale video dataset announced by Google in Sept 2016 is suitable for various computer vision tasks like image classification, event detection, etc. The labels per video are organized into 24 top-level verticals.
ImageNet: The ImageNet challenge marked the beginning of deep learning algorithms in the fields of computer vision. This is the dataset the challenge was based on.
MinneApple: This dataset was created to accurately identify the boundaries of apples for fruit plucking robots. The dataset enables direct comparisons by providing a large variety of high-resolution images acquired in orchards, together with human annotations of the fruit on trees. The fruits are labeled using polygonal masks for each object instance to aid in precise object detection, localization, and segmentation.
A*3D (Autonomous Driving in difficult environments): A*3D dataset is a step forward to make autonomous driving safer for pedestrians and the public in the real world.
– 230K human-labeled 3D object annotations in 39,179 LiDAR point cloud frames and corresponding frontal-facing RGB images.
– Captured at different times (day, night) and weather (sun, cloud, rain).
Wider Person: Diverse dataset for dense pedestrian detection in the wild. The WiderPerson dataset is a pedestrian detection benchmark dataset in the wild, of which images are selected from a wide range of scenarios, no longer limited to the traffic scenario. We choose 13,382 images and label about 400K annotations with various kinds of occlusions.
Exclusively Dark Image Dataset: To facilitate a new object detection and image enhancement research particularly in the low-light environment, we introduce the Exclusively Dark (ExDark) dataset (CVIU2019). The Exclusively Dark (ExDARK) dataset is a collection of 7,363 low-light images from very low-light environments to twilight (i.e 10 different conditions) with 12 object classes (similar to PASCAL VOC) annotated on both image class level and local object bounding boxes.
We have covered an extensive array of datasets for computer vision in this article. The goal is to understand the algorithms that make sense of the complexities of real-world data. Using these datasets shall help in strengthening the basic concepts and thus help you become an expert in the field of computer vision. Happy experimenting!