{"id":16531,"date":"2020-07-04T18:03:18","date_gmt":"2020-07-04T12:33:18","guid":{"rendered":"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/"},"modified":"2024-11-13T17:46:06","modified_gmt":"2024-11-13T12:16:06","slug":"dataset-in-machine-learning","status":"publish","type":"post","link":"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/","title":{"rendered":"Top 32 Dataset in Machine Learning | Machine Learning Dataset"},"content":{"rendered":"\n<p><strong>To build a machine learning model dataset is one of the main parts. Before we start with any algorithm we need to have a proper understanding of the data. These machine-learning datasets are basically used for research purposes. Most of the datasets are homogeneous in nature.<\/strong><\/p>\n\n\n\n<p>We use a dataset to train and evaluate our model and it plays a very vital role in the whole process. If our dataset is structured, less noisy, and properly cleaned then our model will give good accuracy on the evaluation time.<\/p>\n\n\n\n<h4 class=\"wp-block-heading has-text-align-center has-ast-global-color-4-background-color has-background\" class=\"wp-block-heading has-text-align-center has-ast-global-color-4-background-color has-background\" id=\"check-out-our-free-python-machine-learning-course\">Check out our free <a href=\"https:\/\/www.mygreatlearning.com\/academy\/learn-for-free\/courses\/python-for-machine-learning\" target=\"_blank\" rel=\"noreferrer noopener\">Python Machine Learning course<\/a><\/h4>\n\n\n\n<p>Top 20 datasets which are easily available online to train your Machine Learning Algorithm:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>ImageNet<\/li>\n\n\n\n<li>Coco dataset<\/li>\n\n\n\n<li>Iris Flower dataset<\/li>\n\n\n\n<li>Breast cancer Wisconsin (Diagnostic) Dataset<\/li>\n\n\n\n<li>Twitter sentiment Analysis Dataset<\/li>\n\n\n\n<li>MNIST dataset (handwritten data)<\/li>\n\n\n\n<li>Fashion MNIST dataset<\/li>\n\n\n\n<li>Amazon review dataset<\/li>\n\n\n\n<li>Spam SMS classifier dataset<\/li>\n\n\n\n<li>Spam-Mails Dataset<\/li>\n\n\n\n<li>Youtube Dataset<\/li>\n\n\n\n<li>CIFAR -10<\/li>\n\n\n\n<li>IMDB reviews<\/li>\n\n\n\n<li>Sentiment 140<\/li>\n\n\n\n<li>Facial image Dataset<\/li>\n\n\n\n<li>Wine Quality Dataset<\/li>\n\n\n\n<li>The Wikipedia corpus<\/li>\n\n\n\n<li>Free Spoken digit dataset<\/li>\n\n\n\n<li>Boston House price dataset<\/li>\n\n\n\n<li>Pima Indian Diabetes dataset<\/li>\n\n\n\n<li>Iris Dataset<\/li>\n\n\n\n<li>Diamond Dataset<\/li>\n\n\n\n<li>mtcars Dataset<\/li>\n\n\n\n<li>Boston Dataset<\/li>\n\n\n\n<li>Titanic Dataset <\/li>\n\n\n\n<li>Pima Indian Diabetes Dataset <\/li>\n\n\n\n<li>Beavers Dataset<\/li>\n\n\n\n<li>Cars93 Dataset <\/li>\n\n\n\n<li>Car-seats Dataset<\/li>\n\n\n\n<li> msleep Dataset<\/li>\n\n\n\n<li>Cushings Dataset<\/li>\n\n\n\n<li>ToothGrowth Dataset <\/li>\n<\/ol>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>1. ImageNet:<\/strong><\/p>\n\n\n\n<p>Size of the Dataset: ~ 150 GB<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Each record consist of with bounding boxes and respective class labels<\/li>\n\n\n\n<li>ImageNet provides 1000 images for each synset<\/li>\n\n\n\n<li>&nbsp;URLs of the images is given in the ImageNet<\/li>\n\n\n\n<li>Because of its large scale image dataset, it helps the researchers<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/image-net.org\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>2. Coco dataset:<\/strong><\/p>\n\n\n\n<p>Coco dataset stands for Common Objects in Context dataset Mirror and it is large-scale object detection, segmentation, and captioning dataset. This dataset has 1.5 million object instances for 80 object categories.<\/p>\n\n\n\n<p><strong>COCO has used five types of annotation&nbsp;<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>object detection<\/li>\n\n\n\n<li>keypoint detection<\/li>\n\n\n\n<li>stuff segmentation<\/li>\n\n\n\n<li>panoptic segmentation<\/li>\n\n\n\n<li>image captioning<\/li>\n<\/ul>\n\n\n\n<p>In COCO dataset annotations are stored in a JSON file.<\/p>\n\n\n\n<p><strong>Features are provided by the COCO dataset:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Object segmentation<\/li>\n\n\n\n<li>Recognition in context<\/li>\n\n\n\n<li>Superpixel stuff segmentation<\/li>\n\n\n\n<li>330K images (&gt;200K labelled)<\/li>\n\n\n\n<li>1.5 million object instances<\/li>\n\n\n\n<li>80 object categories<\/li>\n\n\n\n<li>91 stuff categories<\/li>\n\n\n\n<li>5 captions per image<\/li>\n\n\n\n<li>250,000 people with keypoints<\/li>\n<\/ul>\n\n\n\n<p><a href=\"http:\/\/cocodataset.org\/#home\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>3. Iris Flower Dataset:<\/strong><\/p>\n\n\n\n<p>The iris flower dataset is built for the beginners who just start learning machine learning techniques and algorithms. With the help of this data, you can start building a simple project in machine learning algorithms. The size of the dataset is small and data pre-processing is not needed. It has three different types of iris flowers like Setosa, Versicolour, and Virginica and their petal and sepal length, stored in a 150x4 numpy.ndarray.<\/p>\n\n\n\n<p><strong>Features<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The dataset consists of four attributes, i.e., sepal length in cm, sepal width in cm, petal length in cm, and petal width in cm.<\/li>\n\n\n\n<li>This dataset has three classes<\/li>\n\n\n\n<li>Each class of this dataset has 50 instances and the classes are Virginica, Setosa, and Versicolor.<\/li>\n\n\n\n<li>t characteristics of this dataset are multivariate.<\/li>\n\n\n\n<li>All of the attributes are real in this data<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/www.kaggle.com\/uciml\/iris\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>4. Breast cancer Wisconsin (Diagnostic) Dataset:<\/strong><\/p>\n\n\n\n<p>Breast cancer Wisconsin (Diagnostic) Dataset is one of the most popular datasets for classification problems in machine learning. This dataset based on breast cancer analysis. Features for this dataset computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe the characteristics of the cell nuclei present in the image.<\/p>\n\n\n\n<p><strong>Features<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Three types of attributes are mentioned in the dataset, i.e., ID, diagnosis, 30 real-valued input features.<\/li>\n\n\n\n<li>In the dataset for each cell nucleus, there are ten real-valued features calculated,i.e., radius, texture, perimeter, area, etc.<\/li>\n\n\n\n<li>The main two classes are specified in the dataset to predict i.e., benign and malignant.<\/li>\n\n\n\n<li>In this dataset total of 569 instances are present which include 357 benign and 212 malignant.<\/li>\n<\/ul>\n\n\n\n<p><strong>Attribute Information:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>&nbsp;ID number<\/li>\n\n\n\n<li>&nbsp;Diagnosis (M = malignant, B = benign)<br> 3-32)<\/li>\n<\/ol>\n\n\n\n<p><strong>Ten real-valued features are mentioned for each cell nucleus:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Radius (mean of distances from the centre to points on the perimeter)<\/li>\n\n\n\n<li>texture (standard deviation of grey-scale values)<\/li>\n\n\n\n<li>perimeter<\/li>\n\n\n\n<li>area<\/li>\n\n\n\n<li>smoothness (local variation in radius lengths)<\/li>\n\n\n\n<li>compactness (perimeter^2 \/ area - 1.0)<\/li>\n\n\n\n<li>&nbsp;concavity (severity of concave portions of the contour)<\/li>\n\n\n\n<li>concave points (number of concave portions of the contour)<\/li>\n\n\n\n<li>symmetry<\/li>\n\n\n\n<li>fractal dimension (\"coastline approximation\" - 1)<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/www.kaggle.com\/uciml\/breast-cancer-wisconsin-data\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>5. Twitter sentiment Analysis Dataset:<\/strong><\/p>\n\n\n\n<p>Analyzing sentiment is one of the most popular application in natural language processing(NLP) and to build a model on sentiment analysis this dataset will help you. This dataset is basically a text processing data and with the help of this dataset you can start building your first model on NLP.<\/p>\n\n\n\n<p><strong>Structure of the dataset:<\/strong><\/p>\n\n\n\n<p>Three main columns are there in this dataset,<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>ItemID - id of twit<\/li>\n\n\n\n<li>Sentiment - sentiment<\/li>\n\n\n\n<li>SentimentText - text of the twit<\/li>\n<\/ul>\n\n\n\n<p>Check out this free course on <a href=\"\/academy\/learn-for-free\/courses\/product-categorization-using-machine-learning\" target=\"_blank\" rel=\"noreferrer noopener\">product categorization machine learning<\/a><\/p>\n\n\n\n<p><strong>Features<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>This dataset consists of three types or three tones of data, like neutral, positive, and negative.<\/li>\n\n\n\n<li>Format of the dataset is CSV (Comma separated value)<\/li>\n\n\n\n<li>Dataset is divided into two parts 1. Train,csv 2. Test.csv<\/li>\n\n\n\n<li>So using this dataset you do not need to split your data for training and evaluation part.<\/li>\n\n\n\n<li>All you need to do, build your model using train.csv and evaluate your model using test.csv<\/li>\n\n\n\n<li>Two data fields are there, i.e., ItemID (ID of tweet) and SentimentText (text of the tweet).<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/www.kaggle.com\/c\/twitter-sentiment-analysis2\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p>6. <strong>MNIST dataset (handwritten data):<\/strong><\/p>\n\n\n\n<p>MNIST dataset is built on handwritten data. This dataset is one of the most popular deep learning image classification datasets. This dataset can be used for machine learning purpose as well. Dataset has 60000 instances or example for the training purpose and 10000 instances for the model evaluation. This dataset is beginner-friendly and helps to understand the techniques and the deep learning&nbsp; recognition pattern on real-world data.&nbsp; Data does not take much time to preprocess. For a beginner who is keen to learn deep learning or machine learning, they can start their first project with the help of this dataset.<\/p>\n\n\n\n<p><strong>Size: <\/strong>~50 MB<\/p>\n\n\n\n<p><strong>Number of Records:<\/strong> 70,000 images in 10 classes (including train and test part)<\/p>\n\n\n\n<p><strong>Features<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>MNIST dataset is one of the best datasets which helps to understand and learn the ML techniques and pattern recognition methods in deep learning on real-world data.<\/li>\n\n\n\n<li>Dataset contains four types of files like train-images-idx3-ubyte.gz, train-labels-idx1-ubyte.gz, t10k-images-idx3-ubyte.gz, and t10k-labels-idx1-ubyte.gz.<\/li>\n\n\n\n<li>MNIST dataset is divided into two parts 1. Train,csv 2. Test.csv<\/li>\n\n\n\n<li>So using this dataset you do not need to split your data for training and evaluation part.<\/li>\n\n\n\n<li>All you need to do, build your model using train.csv and evaluate your model using test.csv<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/www.kaggle.com\/oddrationale\/mnist-in-csv\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>7. Fashion MNIST dataset:<\/strong><\/p>\n\n\n\n<p>Fashion MNIST dataset is also one of the most use datasets and build on cloths data. Fashion&nbsp; MNIST dataset can be used for deep learning image classification problem. This dataset can be used for machine learning purpose as well. Dataset has 60000 instances or example for the training purpose and 10000 instances for the model evaluation. This dataset is beginner-friendly and helps to understand the techniques and the deep learning recognition pattern on real-world data.&nbsp; Data does not take much time to preprocess. For a beginner who is keen to learn deep learning or machine learning they can start their first project with the help of this dataset. Fashion MNIST dataset is created to replace MNIST dataset. All the images in this dataset are in grayscale with 10 classes.<\/p>\n\n\n\n<p><strong>Size:<\/strong> 30 MB<\/p>\n\n\n\n<p><strong>Number of Records:<\/strong> 70,000 images in 10 classes<\/p>\n\n\n\n<p><strong>Features<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Fashion MNIST dataset is one of the best dataset which helps to understand and learn the ML techniques and pattern recognition methods in deep learning on real-world data.<\/li>\n\n\n\n<li>Fashion MNIST dataset is divided into two parts 1. Train,csv 2. Test.csv<\/li>\n\n\n\n<li>So using this dataset you do not need to split your data for training and evaluation part.<\/li>\n\n\n\n<li>All you need to do, build your model using train.csv and evaluate your model using test.csv<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/www.kaggle.com\/zalando-research\/fashionmnist\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong><\/strong>8.&nbsp; <strong>Amazon review dataset:<\/strong><\/p>\n\n\n\n<p>Amazon review dataset is also used for Natural language processing purpose. Analyzing sentiment is one of the most popular application in natural language processing(NLP) and to build a model on sentiment analysis this dataset will help you. This dataset is basically a text processing data and with the help of this dataset, you can start building your first model on NLP. This dataset contains ratings, text, helpfulness votes, product metadata, description, category information, price, brand,&nbsp; image features, links for the product, and view and bought graph as well. All the data contains 142.8 billion reviews spanning May 1996-July 2014. This dataset will give you the essence of the real business problem and helps you to understand the trend the sales over the years.<\/p>\n\n\n\n<p><strong>Features<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Amazon review dataset consists of Amazon product reviews<\/li>\n\n\n\n<li>It includes both product and user information, ratings, and review<\/li>\n\n\n\n<li>Official Paper: J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.<\/li>\n\n\n\n<li>This data consists of duplicate data as well.<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/snap.stanford.edu\/data\/web-Amazon.html\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>9. Spam SMS classifier dataset:<\/strong><\/p>\n\n\n\n<p>In today's society finding spam, the message is one of the most important parts. So data scientist came up with an idea where you can train your model using the dataset and your model will predict the spam message. This dataset will help you to train your model to predict spam message. Machine learning classification algorithm can be used to build your model and this dataset is also beginner-friendly and easy to understand as well.&nbsp; Spam SMS classifier dataset has a set of SMS labelled messages that are collected for SMS Spam analysis.<\/p>\n\n\n\n<p><strong>Features<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Spam SMS classifier dataset has 5,574 messages<\/li>\n\n\n\n<li>This dataset is written in English.<\/li>\n\n\n\n<li>Each line of this dataset contains one message<\/li>\n\n\n\n<li>This dataset has two datasets: One column stands for the classification of spam message or not and another one is raw text.<\/li>\n\n\n\n<li>Spam SMS classifier dataset is in the CSV format (comma-separated value).<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/www.kaggle.com\/uciml\/sms-spam-collection-dataset\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>10.<\/strong><a href=\"https:\/\/www.kaggle.com\/venky73\/spam-mails-dataset\" rel=\"nofollow\"><strong> Spam-Mails Dataset:&nbsp;<\/strong><\/a><\/p>\n\n\n\n<p>In today's society finding spam mail is one of the most important parts. So data scientist came up with an idea where you can train your model using the dataset and your model will predict the spam mail. This dataset will help you to train your model to predict spam mail. Machine learning classification algorithm can be used to build your model and this dataset is also beginner-friendly and easy to understand as well.&nbsp; Spam mails dataset has a set of mail tagged. This dataset is a&nbsp; collection of 425 SMS spam messages was manually extracted from the Grumbletext Web site. This is basically a UK forum where the cell phone users make public claims about SMS spam messages. Most of them were receiving a huge number of spam messages every day. And the identification process of those spam messages was a very hard and time-consuming task. the process involved careful scanning hundreds of web pages. The Grumbletext Web site is <a href=\"http:\/\/www.grumbletext.co.uk\/\" rel=\"nofollow\">http:\/\/www.grumbletext.co.uk\/<\/a>. -&gt; A subset of 3,375 SMS randomly chosen ham messages of the NUS SMS Corpus (NSC), which is a dataset of about 10,000 legitimate messages collected for research at the Department of Computer Science at the National University of Singapore. The messages largely originate from Singaporeans and mostly from students attending the University. These messages were collected from volunteers who were made aware that their contributions were going to be made publicly available. The NUS SMS Corpus is available at: <a href=\"http:\/\/www.comp.nus.edu.sg\/~rpnlpir\/downloads\/corpora\/smsCorpus\/\" rel=\"nofollow\">http:\/\/www.comp.nus.edu.sg\/~rpnlpir\/downloads\/corpora\/smsCorpus\/<\/a>. -&gt; A list of 450 SMS ham messages collected from Caroline Tag's PhD Thesis.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Most of the part of the dataset are not spam that is about 86% almost.<\/li>\n\n\n\n<li>In this dataset you need to split your data, it does not come with train and test division<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/www.kaggle.com\/ishansoni\/sms-spam-collection-dataset\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>11. Youtube Dataset:&nbsp;<\/strong><\/p>\n\n\n\n<p>Youtube video dataset is based on youtube information about the videos they have. It helps to make a video classification model using a machine learning algorithm. YouTube-8M is a video dataset which consists of millions of YouTube video IDs. It has high-quality machine-generated annotations derived from numerous visual entities and audio-visual features from billions of frames and audio segments. This dataset helps to learn machine learning as well as computer vision part also. This dataset has improved quality of annotations and machine-generated labels and also it has&nbsp; 6.1 million URLs, labelled with a vocabulary of 3,862 visual entities. all the videos are annotated with one or more labels (an average of 3 labels per video).<\/p>\n\n\n\n<p><strong>Features<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>This dataset has a large-scaled labelled dataset with the high-quality machine-generated annotations.<\/li>\n\n\n\n<li>In this dataset videos are sampled uniformly.<\/li>\n\n\n\n<li>Each video in Youtube dataset is associated with at least one entity from the target vocabulary.<\/li>\n\n\n\n<li>The vocabulary of the dataset is available in CSV format (Comma-separated value)<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/research.google.com\/youtube8m\/\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>12. CIFAR -10:&nbsp;<\/strong><\/p>\n\n\n\n<p>CIFAR 10 is also an image classification dataset which consists of various object images. With the help of this dataset, we can perform many operations in machine learning and deep learning as well. CIFAR stands for <a href=\"https:\/\/en.wikipedia.org\/wiki\/Canadian_Institute_for_Advanced_Research\" rel=\"nofollow\">Canadian Institute For Advanced Research<\/a>. This dataset is one of the most commonly used datasets for machine learning research. CIFAR 10 dataset&nbsp; has 60,000 32x32 color images in 10 different classes. Those different classes are<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>aeroplanes<\/li>\n\n\n\n<li>cars<\/li>\n\n\n\n<li>birds<\/li>\n\n\n\n<li>cats<\/li>\n\n\n\n<li>deer<\/li>\n\n\n\n<li>dogs<\/li>\n\n\n\n<li>frogs<\/li>\n\n\n\n<li>horses<\/li>\n\n\n\n<li>ships<\/li>\n\n\n\n<li>and trucks<\/li>\n<\/ol>\n\n\n\n<p>And each of these class has 6000 images each.CIFAR 10 is used for Computer recognizing algorithm in deep learning to train computer how to recognize the object. Resolution of the images in CIFAR 10 is 32*32 that is considered as low resolution so it allows the learner to learn different algorithm with less time. CIFAR 10 dataset is beginner-friendly as well. This dataset is famous for deep learning algorithm convolutional neural network.<\/p>\n\n\n\n<p><strong>Features:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>CIFAR 10&nbsp; dataset is one of the best datasets which helps to understand and learn the ML techniques and object detection methods in deep learning on real-world data.<\/li>\n\n\n\n<li>CIFAR 10&nbsp; dataset is divided into two parts 1. Train 2. Test<\/li>\n\n\n\n<li>So using this dataset you do not need to split your data for training and evaluation part.<\/li>\n\n\n\n<li>All you need to do, build your model using train data and evaluate your model using test data<\/li>\n\n\n\n<li>IN CIFAR 10 Total, there are 50,000 training images and 10,000 test images.<\/li>\n\n\n\n<li>The dataset is divided into 6 parts \u2013 5 training batches and 1 test batch.<\/li>\n\n\n\n<li>Each batch has 10,000 images.<\/li>\n<\/ul>\n\n\n\n<p><strong>Size:<\/strong> 170 MB<\/p>\n\n\n\n<p><strong>Number of Records:<\/strong> 60,000 images in 10 classes<\/p>\n\n\n\n<p><a href=\"http:\/\/www.cs.toronto.edu\/~kriz\/cifar.html\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>13.&nbsp; IMDB reviews:&nbsp;<\/strong><\/p>\n\n\n\n<p>IMDB dataset stands for&nbsp; <a href=\"http:\/\/ai.stanford.edu\/~amaas\/data\/sentiment\/\" rel=\"nofollow\">Large Movie Review Dataset<\/a>. Analyzing sentiment is one of the most popular application in natural language processing(NLP) and to build a model on sentiment analysis IMDB movie review dataset will help you. This Large Movie Review dataset has 25,000 highly polar moving reviews which are may be good or bad. IMDB datset often use for sentiment analysis purpose using Machine learning or deep learning algorithm. This dataset is prepared by Standford researchers in 2011. This dataset comes with 50\/50 split for training and testing purpose. This dataset also achieved 88.89% accuracy. IMDB&nbsp; data was used for a Kaggle competition titled \u201c<a href=\"https:\/\/www.kaggle.com\/c\/word2vec-nlp-tutorial\/data\" rel=\"nofollow\">Bag of Words Meets Bags of Popcorn<\/a>\u201d in&nbsp; 2014 to early 2015. In that competition accuracy was achieved above 97% with winners achieving 99%.&nbsp; IMDB is popular for movie lovers as well and binary sentiment classification was mostly made using this.&nbsp; Without the training and test review examples in the dataset, there is further unlabeled data for use.<\/p>\n\n\n\n<p><strong>Size:<\/strong> 80 MB<\/p>\n\n\n\n<p><strong>Number of Records:<\/strong> 25,000 highly polar movie reviews for training, and 25,000 for testing<\/p>\n\n\n\n<p><strong>Features:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>IMDB&nbsp; dataset is one of the best dataset which helps to understand and learn the ML techniques and&nbsp; deep learning methods on real-world data.<\/li>\n\n\n\n<li>IMDB&nbsp; dataset is divided into two parts 1. Train 2. Test<\/li>\n\n\n\n<li>So using this dataset you do not need to split your data for training and evaluation part.<\/li>\n\n\n\n<li>All you need to do, build your model using train data and evaluate your model using test data<\/li>\n<\/ul>\n\n\n\n<p><a rel=\"nofollow\" href=\"http:\/\/ai.stanford.edu\/~amaas\/data\/sentiment\/\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>14. Sentiment 140:<\/strong><\/p>\n\n\n\n<p>Sentiment 140 dataset built on twitter data. Analyzing sentiment is one of the most popular application in natural language processing(NLP) and to build a model on sentiment analysis Sentiment 140 dataset will help you. This dataset is basically a text processing data and with the help of this dataset, you can start building your first model on NLP. Sentiment 140 dataset is beginner-friendly to start a new project in natural language processing. This data pre removed the emotions and it had six features altogether.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>polarity of the tweet<\/li>\n\n\n\n<li>id of the tweet<\/li>\n\n\n\n<li>date of the tweet<\/li>\n\n\n\n<li>the query<\/li>\n\n\n\n<li>username of the tweeter<\/li>\n\n\n\n<li>text of the tweet<\/li>\n<\/ul>\n\n\n\n<p><strong>Features:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>It has 1,600,000 tweets which were extracted using the twitter api<\/li>\n\n\n\n<li>The tweets were annotated like (0 = negative, 2 = neutral, 4 = positive)<\/li>\n\n\n\n<li>These annotations are used to detect&nbsp; the sentiment for the particular tweet<\/li>\n<\/ul>\n\n\n\n<p>Fields in the dataset:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>target<\/strong>: the polarity of the tweet (<em>0<\/em> = negative, <em>2<\/em> = neutral, <em>4<\/em> = positive)<\/li>\n\n\n\n<li><strong>ids<\/strong>: The id of the tweet ( <em>2087<\/em>)<\/li>\n\n\n\n<li><strong>date<\/strong>: the date of the tweet (<em>Sat May 16 23:58:44 UTC 2009<\/em>)<\/li>\n\n\n\n<li><strong>flag<\/strong>: The query (<em>lyx<\/em>). If there is no query, then this value is NO_QUERY.<\/li>\n\n\n\n<li><strong>user<\/strong>: the user that tweeted (<em>robotickilldozr<\/em>)<\/li>\n\n\n\n<li><strong>text<\/strong>: the text of the tweet (<em>Lyx is cool<\/em>)<\/li>\n<\/ul>\n\n\n\n<p><strong>Size:<\/strong> 80 MB (Compressed)<\/p>\n\n\n\n<p><strong>Number of Records:<\/strong> 1,60,000 tweets<\/p>\n\n\n\n<p><a href=\"http:\/\/help.sentiment140.com\/for-students\/\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>15. Facial image Dataset:<\/strong><\/p>\n\n\n\n<p>Facial image dataset is based on face images for male and female both. Using facial image dataset machine learning and deep learning algorithms can be performed to detect gender and emotion. It has a variation of data like variation of background and scale, and variation of expressions.<\/p>\n\n\n\n<p><strong>Information about the dataset:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Total number of individuals: 395<\/li>\n\n\n\n<li>Number of images per individual: 20<\/li>\n\n\n\n<li>Total number of images: 7900<\/li>\n\n\n\n<li>Gender:&nbsp; contains images of male and female subjects<\/li>\n\n\n\n<li>Race:&nbsp; contains images of people of various racial origins<\/li>\n\n\n\n<li>Age Range:&nbsp; the images are mainly of first year undergraduate&nbsp; students, so the majority of individuals are between 18-20 years old but some older individuals are also present.<\/li>\n<\/ul>\n\n\n\n<p><strong>Features<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The dataset has four directories.<\/li>\n\n\n\n<li>You can download the dataset according to your system requirement and demand.<\/li>\n\n\n\n<li>All the version of the data has the zipped version.<\/li>\n\n\n\n<li>Total 395 individuals are there and each of them has 20 images<\/li>\n\n\n\n<li>Resolution of the images are 180 * 200 pixel stored in 24 bit RGB JPEG format.<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/cswww.essex.ac.uk\/mv\/allfaces\/faces94.html\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>16. RED Wine Quality Dataset:<\/strong><\/p>\n\n\n\n<p>RED wine quality dataset is also popular and interesting for all the machine learning and deep learning enthusiast. This dataset is also beginner friendly and you can easily apply machine learning algorithm in this data. With the help of this dataset you can train your model to predict the wine quality. This dataset has wine's physicochemical properties. Regression and classification both approach of machine learning can be used by using Red wine quality dataset. In this dataset are related to red and white variants of the Portuguese \"Vinho Verde\" wine. Because of privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.). In the dataset, the classes are ordered and not balanced (e.g. there are much more normal wines than excellent or poor ones).<\/p>\n\n\n\n<p><strong>Information about input variables based on physicochemical tests:<\/strong><\/p>\n\n\n\n<p>1 - Fixed acidity<\/p>\n\n\n\n<p>2 - Volatile acidity<\/p>\n\n\n\n<p>3 - Citric acid<\/p>\n\n\n\n<p>4 - Residual sugar<\/p>\n\n\n\n<p>5 - Chlorides<\/p>\n\n\n\n<p>6 - Free sulfur dioxide<\/p>\n\n\n\n<p>7 - Total sulfur dioxide<\/p>\n\n\n\n<p>8 - Density<\/p>\n\n\n\n<p>9 - pH<\/p>\n\n\n\n<p>10 - Sulphates<\/p>\n\n\n\n<p>11 - Alcohol<\/p>\n\n\n\n<p>Output variable (based on sensory data):<\/p>\n\n\n\n<p>12 - Quality (score between 0 and 10)<\/p>\n\n\n\n<p><strong>Features<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>&nbsp;Two types of variables are there in the dataset, i.e., input and output variables.<\/li>\n\n\n\n<li>Input variables are fixed acidity, volatile acidity, citric acid, residual sugar, and so forth.<\/li>\n\n\n\n<li>The output variable is quality.<\/li>\n\n\n\n<li>12 attributes are present and the attribute characteristics are real.<\/li>\n\n\n\n<li>The number of total records is 4898.<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/www.kaggle.com\/uciml\/red-wine-quality-cortez-et-al-2009\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"\"><strong>&nbsp;<\/strong><\/h3>\n\n\n\n<p><strong>17. The Wikipedia corpus<\/strong>:<\/p>\n\n\n\n<p>Wikipedia corpus consists of Wikipedia data only. This has the collection of the full text on Wikipedia and contains almost 1.9 billion words from more than 4 million articles. This dataset is basically used for natural language processing purpose. It is a very powerful dataset and you can search by word, phrase or part of a paragraph itself.<\/p>\n\n\n\n<p><strong>Size:<\/strong> 20 MB<\/p>\n\n\n\n<p><strong>Number of Records:<\/strong> 4,400,000 articles containing 1.9 billion words<\/p>\n\n\n\n<p><strong>Features<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>This dataset has a large-scaled and can be used for machine learning and natural language processing purpose<\/li>\n\n\n\n<li>As the dataset is big in nature its helps to train the model perfectly<\/li>\n\n\n\n<li>It has 4,400,000 articles containing 1.9 billion words<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/nlp.cs.nyu.edu\/wikipedia-data\/\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>18. Free Spoken digit dataset<\/strong>:<\/p>\n\n\n\n<p>Free Spoken digit dataset is simple audio or speech data which consists of recordings of spoken English digits. The format of the file is wav at 8 kHz.&nbsp; All the recordings are trimmed to have near minimal silence at the beginning and ends. This dataset is created to solve the task of identifying spoken digits in audio. The main thing about the dataset is, it is open. So anyone can contribute to this repository. As it is open so it is expected that the dataset will grow over time<\/p>\n\n\n\n<p><strong>&nbsp;Characteristics of the Dataset:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>4 speakers<\/li>\n\n\n\n<li>2,000 recordings (50 of each digit per speaker)<\/li>\n\n\n\n<li>English pronunciations<\/li>\n<\/ul>\n\n\n\n<p><strong>Files format:<\/strong> {digitLabel}_{speakerName}_{index}.wav Example: 7_jackson_32.wav<\/p>\n\n\n\n<p><strong>Features:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Open source<\/li>\n\n\n\n<li>Helps to solve digit pronunciations problem<\/li>\n\n\n\n<li>Allows to contribute anyone<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/github.com\/Jakobovski\/free-spoken-digit-dataset\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>19. Boston House price dataset:&nbsp;<\/strong><\/p>\n\n\n\n<p>Boston House price dataset is collected from&nbsp; U.S Census Service concerning housing in the area of Boston Mass. This dataset is used to predict the house price depending upon a few attributes. Machine learning regression problem can be done using the data. The dataset has five hundred six cases all total.<\/p>\n\n\n\n<p>Total columns in the dataset:<\/p>\n\n\n\n<p><strong><em>crim<\/em><\/strong><\/p>\n\n\n\n<p>per capita crime rate by town.<\/p>\n\n\n\n<p><strong><em>zn<\/em><\/strong><\/p>\n\n\n\n<p>proportion of residential land zoned for lots over 25,000 sq.ft.<\/p>\n\n\n\n<p><strong><em>indus<\/em><\/strong><\/p>\n\n\n\n<p>proportion of non-retail business acres per town.<\/p>\n\n\n\n<p><strong><em>chas<\/em><\/strong><\/p>\n\n\n\n<p>Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).<\/p>\n\n\n\n<p><strong><em>nox<\/em><\/strong><\/p>\n\n\n\n<p>nitrogen oxides concentration (parts per 10 million).<\/p>\n\n\n\n<p><strong><em>rm<\/em><\/strong><\/p>\n\n\n\n<p>average number of rooms per dwelling.<\/p>\n\n\n\n<p><strong><em>age<\/em><\/strong><\/p>\n\n\n\n<p>proportion of owner-occupied units built prior to 1940.<\/p>\n\n\n\n<p><strong><em>dis<\/em><\/strong><\/p>\n\n\n\n<p>weighted mean of distances to five Boston employment centres.<\/p>\n\n\n\n<p><strong><em>rad<\/em><\/strong><\/p>\n\n\n\n<p>index of accessibility to radial highways.<\/p>\n\n\n\n<p><strong><em>tax<\/em><\/strong><\/p>\n\n\n\n<p>full-value property-tax rate per $10,000.<\/p>\n\n\n\n<p><strong><em>ptratio<\/em><\/strong><\/p>\n\n\n\n<p>pupil-teacher ratio by town.<\/p>\n\n\n\n<p><strong><em>black<\/em><\/strong><\/p>\n\n\n\n<p>1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town.<\/p>\n\n\n\n<p><strong><em>lstat<\/em><\/strong><\/p>\n\n\n\n<p>lower status of the population (percent).<\/p>\n\n\n\n<p><strong><em>medv<\/em><\/strong><\/p>\n\n\n\n<p>median value of owner-occupied homes in $1000s.<\/p>\n\n\n\n<p><strong>Features:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Total cases in the dataset 506<\/li>\n\n\n\n<li>&nbsp;14 attributes are there in each case, like: CRIM, AGE, TAX, and so forth.<\/li>\n\n\n\n<li>The format of the dataset is CSV (Comma separated value)<\/li>\n\n\n\n<li>Machine learning regression problem can be applied in the dataset<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/www.kaggle.com\/c\/boston-housing\" rel=\"nofollow\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>20. Pima Indian Diabetes dataset<\/strong>:<\/p>\n\n\n\n<p>Artificial Intelligence is now widely used in the healthcare and medical industry as well. The dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. Diabetes is one of the most common and dangerous diseases and now spreading of the diabetes is very easy. A chronic condition in diabetes body develops a resistance to insulin and a hormone which converts foods into Glucose. Diabetes affects so many people worldwide and it has Type 1 and Type 2 diabetes. For type 1 and type 2 diabetes, they have different characteristics. So&nbsp; Pima Indian Diabetes dataset is basically used to predict the diabetes based on certain diagnostic measurements. This machine learning model helps the society and the patient as well to detect the diabetes disease quickly. This is one of the best dataset to make a model on diabetes prediction. Particularly we can say all patients here are females at least 21 years old of Pima Indian heritage. There are to total of nine columns in the dataset:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Pregnancies<\/li>\n\n\n\n<li>Glucose<\/li>\n\n\n\n<li>Blood pressure<\/li>\n\n\n\n<li>Skin thickness<\/li>\n\n\n\n<li>Insulin<br>&nbsp;<\/li>\n\n\n\n<li>BMI<\/li>\n\n\n\n<li>DiabetesPedigreeFunction<\/li>\n\n\n\n<li>Age<\/li>\n\n\n\n<li>Outcome<\/li>\n<\/ol>\n\n\n\n<p><strong>Features:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The format of the dataset is CSV (Comma separated value)<\/li>\n\n\n\n<li>Almost most of the patients of this dataset are female, and at least 21 years old.<\/li>\n\n\n\n<li>There are several variables are there in the dataset, like, number of pregnancies, BMI, insulin level, age, and one target variable.<\/li>\n\n\n\n<li>It has a total of 768 rows and 9 columns<\/li>\n<\/ul>\n\n\n\n<p><a rel=\"nofollow\" href=\"https:\/\/www.kaggle.com\/uciml\/pima-indians-diabetes-database\"><strong>Download the Dataset<\/strong><\/a><\/p>\n\n\n\n<p><strong>21. Iris Dataset:<\/strong><br><\/p>\n\n\n\n<p>This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.<br><\/p>\n\n\n\n<p><strong>Format of the dataset:<\/strong><\/p>\n\n\n\n<p>iris is a data frame with 150 cases (rows) and 5 variables (columns) named Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species.<\/p>\n\n\n\n<p><strong><a href=\"https:\/\/d9jmtjs5r4cgq.cloudfront.net\/blog\/datasets\/iris.csv\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Download the Dataset. (opens in a new tab)\">Download the Dataset.<\/a><\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<p><strong>22. Diamonds Dataset:<\/strong><br><\/p>\n\n\n\n<p>This is a dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows:<br><\/p>\n\n\n\n<p><strong>Price: <\/strong>price in US dollars ($326\u2013$18,823)<br><\/p>\n\n\n\n<p><strong>Carat: <\/strong>weight of the diamond (0.2\u20135.01)<br><\/p>\n\n\n\n<p><strong>Cut: <\/strong>quality of the cut (Fair, Good, Very Good, Premium, Ideal)<br><\/p>\n\n\n\n<p><strong>Color: <\/strong>diamond colour, from D (best) to J (worst)<br><\/p>\n\n\n\n<p><strong>Clarity: <\/strong>a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))<br><\/p>\n\n\n\n<p><strong>X: <\/strong>length in mm (0\u201310.74)<br><\/p>\n\n\n\n<p><strong>Y: <\/strong>width in mm (0\u201358.9)<br><\/p>\n\n\n\n<p><strong>Z: <\/strong>depth in mm (0\u201331.8)<br><\/p>\n\n\n\n<p><strong>Depth: <\/strong>total depth percentage = z \/ mean(x, y) = 2 * z \/ (x + y) (43\u201379)<br><\/p>\n\n\n\n<p><strong>Table: <\/strong>width of top of diamond relative to widest point (43\u201395)<\/p>\n\n\n\n<p><strong><a href=\"https:\/\/d9jmtjs5r4cgq.cloudfront.net\/blog\/datasets\/diamonds.csv\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Download the dataset. (opens in a new tab)\">Download the dataset.<\/a><\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<p><strong>23. mtcars Dataset: (Motor Trend Car Road Tests)<\/strong><br><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>This data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973\u201374 models).<br><\/p>\n\n\n\n<p>This dataset comprises of the following columns:<br><\/p>\n\n\n\n<p><strong>mpg<\/strong>\tMiles\/(US) gallon<\/p>\n\n\n\n<p><strong>cyl<\/strong>\tNumber of cylinders<\/p>\n\n\n\n<p><strong>disp<\/strong>\tDisplacement (cu.in.)<\/p>\n\n\n\n<p><strong>hp<\/strong>\tGross horsepower<\/p>\n\n\n\n<p><strong>drat<\/strong>\tRear axle ratio<\/p>\n\n\n\n<p><strong>wt<\/strong>\tWeight (1000 lbs)<\/p>\n\n\n\n<p><strong>qsec<\/strong>\t1\/4 mile time<\/p>\n\n\n\n<p><strong>vs<\/strong>\tEngine (0 = V-shaped, 1 = straight)<\/p>\n\n\n\n<p><strong>am<\/strong>\tTransmission (0 = automatic, 1 = manual)<\/p>\n\n\n\n<p><strong>gear<\/strong>\tNumber of forward gears<\/p>\n\n\n\n<p><strong>carb<\/strong>\tNumber of carburetors<\/p>\n\n\n\n<p><strong><a href=\"https:\/\/drive.google.com\/file\/d\/1yPtCauYzvAIB-Ankqpcxl-1XeIEOL5Pu\/view?usp=sharing\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Download this dataset. (opens in a new tab)\">Download this dataset.<\/a><\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<p><strong>24. Boston Dataset: Housing Values in Suburbs of Boston<\/strong><br><\/p>\n\n\n\n<p>The Boston data frame has 506 rows and 14 columns.<br><\/p>\n\n\n\n<p><strong>Description of columns:<\/strong><br><\/p>\n\n\n\n<p><strong>Crim: <\/strong>per capita crime rate by town.<br><\/p>\n\n\n\n<p><strong>Zn: <\/strong>proportion of residential land zoned for lots over 25,000 sq.ft.<br><\/p>\n\n\n\n<p><strong>Indus: <\/strong>proportion of non-retail business acres per town.<br><\/p>\n\n\n\n<p><strong>Chas: <\/strong>Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).<br><\/p>\n\n\n\n<p><strong>Nox: <\/strong>nitrogen oxides concentration (parts per 10 million).<br><\/p>\n\n\n\n<p><strong>Rm: <\/strong>average number of rooms per dwelling.<br><\/p>\n\n\n\n<p><strong>Age: <\/strong>proportion of owner-occupied units built prior to 1940.<br><\/p>\n\n\n\n<p><strong>Dis: <\/strong>weighted mean of distances to five Boston employment centres.<br><\/p>\n\n\n\n<p><strong>Rad: <\/strong>index of accessibility to radial highways.<br><\/p>\n\n\n\n<p><strong>Tax: <\/strong>full-value property-tax rate per $10,000.<br><\/p>\n\n\n\n<p><strong>Ptratio: <\/strong>pupil-teacher ratio by town.<br><\/p>\n\n\n\n<p><strong>Black: <\/strong>1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town.<br><\/p>\n\n\n\n<p><strong>Lstat: <\/strong>lower status of the population (percent).<br><\/p>\n\n\n\n<p><strong>Medv: <\/strong>median value of owner-occupied homes in $1000s.<\/p>\n\n\n\n<p><strong><a rel=\"noreferrer noopener\" aria-label=\"Download this dataset. (opens in a new tab)\" href=\"https:\/\/d9jmtjs5r4cgq.cloudfront.net\/blog\/datasets\/Boston.csv\" target=\"_blank\">Download this dataset.<\/a><\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<p><strong>25<\/strong>. <strong>Titanic Dataset: Survival of passengers on the Titanic<\/strong><\/p>\n\n\n\n<p>This data set provides information on the fate of passengers on the fatal maiden voyage of the ocean liner \u2018Titanic\u2019, summarized according to economic status (class), sex, age and survival.<br><\/p>\n\n\n\n<p><strong>Format:<\/strong><\/p>\n\n\n\n<p>A 4-dimensional array resulting from cross-tabulating 2201 observations on 4 variables. The variables and their levels are as follows:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>Class:<\/strong>\t<em>1st, 2nd, 3rd, Crew<\/em><\/p>\n\n\n\n<p><strong>Sex:<\/strong>\t<em>Male, Female<\/em><\/p>\n\n\n\n<p><strong>Age:<\/strong>\t<em>Child, Adult<\/em><\/p>\n\n\n\n<p><strong>Survived:<\/strong>\t<em>No, Yes<\/em><br><\/p>\n\n\n\n<p><strong>Details about the event:<\/strong><br><\/p>\n\n\n\n<p>The sinking of the Titanic is a famous event, and new books are still being published about it. Many well-known facts\u2014from the proportions of first-class passengers to the \u2018women and children first\u2019 policy, and the fact that that policy was not entirely successful in saving the women and children in the third class\u2014are reflected in the survival rates for various classes of passenger<\/p>\n\n\n\n<p><strong><a href=\"https:\/\/d9jmtjs5r4cgq.cloudfront.net\/blog\/datasets\/titanic.csv\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">Download this dataset.<\/a><\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<p><strong>26. Pima Indian Diabetes Dataset:<\/strong><br><\/p>\n\n\n\n<p>A population of women who were at least 21 years old, of Pima Indian heritage and living near Phoenix, Arizona, was tested for diabetes according to World Health Organization criteria. The data was collected by the US National Institute of Diabetes and Digestive and Kidney Diseases.<br><\/p>\n\n\n\n<p>This data frame comprises of the following columns:<br><\/p>\n\n\n\n<p><strong>Npreg: <\/strong>number of pregnancies.<br><\/p>\n\n\n\n<p><strong>Glu: <\/strong>plasma glucose concentration in an oral glucose tolerance test.<br><\/p>\n\n\n\n<p><strong>Bp: <\/strong>diastolic blood pressure (mm Hg).<br><\/p>\n\n\n\n<p><strong>Skin: <\/strong>triceps skin fold thickness (mm).<br><\/p>\n\n\n\n<p><strong>Bmi: <\/strong>body mass index (weight in kg\/(height in m)^2).<br><\/p>\n\n\n\n<p><strong>Ped: <\/strong>diabetes pedigree function.<br><\/p>\n\n\n\n<p><strong>Age: <\/strong>age in years.<br><\/p>\n\n\n\n<p><strong>Type: <\/strong>Yes or No, for diabetic according to WHO criteria.<\/p>\n\n\n\n<p><strong><a href=\"https:\/\/d9jmtjs5r4cgq.cloudfront.net\/blog\/datasets\/Diabetes.csv\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Download this dataset. (opens in a new tab)\">Download this dataset.<\/a><\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<p><strong>27. Beavers Dataset:<\/strong><br><\/p>\n\n\n\n<p>This data set is part of a long study into body temperature regulation in beavers. Four adult female beavers were live-trapped and had a temperature-sensitive radio transmitter surgically implanted. Readings were taken every 10 minutes. The location of the beaver was also recorded and her activity level was dichotomized by whether she was in the retreat or outside of it since high-intensity activities only occur outside of the retreat.<br><\/p>\n\n\n\n<p>This data frame contains the following columns:<br><\/p>\n\n\n\n<p><strong>Day: <\/strong>The day number. The data includes only data from day 307 and early 308.<br><\/p>\n\n\n\n<p><strong>Time: <\/strong>The time of day formatted as hour-minute.<br><\/p>\n\n\n\n<p><strong>Temp: <\/strong>The body temperature in degrees Celsius.<br><\/p>\n\n\n\n<p><strong>Activ: <\/strong>The dichotomized activity indicator. 1 indicates that the beaver is outside of the retreat and therefore engaged in high-intensity activity.<\/p>\n\n\n\n<p><strong><a href=\"https:\/\/d9jmtjs5r4cgq.cloudfront.net\/blog\/datasets\/beavers.csv\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Download this dataset. (opens in a new tab)\">Download this dataset.<\/a><\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<p><strong>28<\/strong>. <strong>Cars93 Dataset: Data from 93 Cars on Sale in the USA in 1993<\/strong><br><\/p>\n\n\n\n<p>The Cars93 data frame has 93 rows and 27 columns. Below is the description of columns:<br><\/p>\n\n\n\n<p><strong>Manufacturer: <\/strong>Manufacturer of the vehicle<br><\/p>\n\n\n\n<p><strong>Model: <\/strong>Model of the vehicle<br><\/p>\n\n\n\n<p><strong>Type<\/strong>:Type: a factor with levels \"Small\", \"Sporty\", \"Compact\", \"Midsize\", \"Large\" and \"Van\".<br><\/p>\n\n\n\n<p><strong>Min.Price: <\/strong>Minimum Price (in $1,000): price for a basic version.<br><\/p>\n\n\n\n<p><strong>Price: <\/strong>Midrange Price (in $1,000): average of Min.Price and Max.Price.<br><\/p>\n\n\n\n<p><strong>Max.Price: <\/strong>Maximum Price (in $1,000): price for \u201ca premium version\u201d.<br><\/p>\n\n\n\n<p><strong>MPG.city: <\/strong>City MPG (miles per US gallon by EPA rating).<br><\/p>\n\n\n\n<p><strong>MPG.highway: <\/strong>Highway MPG.<br><\/p>\n\n\n\n<p><strong>AirBags: <\/strong>Air Bags standard. Factor: none, driver only, or driver &amp; passenger.<br><\/p>\n\n\n\n<p><strong>DriveTrain: <\/strong>Drive train type: rear wheel, front wheel or 4WD; (factor).<br><\/p>\n\n\n\n<p><strong>Cylinders: <\/strong>Number of cylinders (missing for Mazda RX-7, which has a rotary engine).<br><\/p>\n\n\n\n<p><strong>EngineSize: <\/strong>Engine size (litres).<br><\/p>\n\n\n\n<p><strong>Horsepower: <\/strong>Horsepower (maximum).<br><\/p>\n\n\n\n<p><strong>RPM: <\/strong>RPM (revs per minute at maximum horsepower).<br><\/p>\n\n\n\n<p><strong>Rev.per.mile: <\/strong>Engine revolutions per mile (in highest gear).<br><\/p>\n\n\n\n<p><strong>Man.trans.avail: <\/strong>Is a manual transmission version available? (yes or no, Factor).<br><\/p>\n\n\n\n<p><strong>Fuel.tank.capacity: <\/strong>Fuel tank capacity (US gallons).<br><\/p>\n\n\n\n<p><strong>Passengers: <\/strong>Passenger capacity (persons)<br><\/p>\n\n\n\n<p><strong>Length: <\/strong>Length (inches).<br><\/p>\n\n\n\n<p><strong>Wheelbase: <\/strong>Wheelbase (inches).<br><\/p>\n\n\n\n<p><strong>Width: <\/strong>Width (inches).<br><\/p>\n\n\n\n<p><strong>Turn.circle: <\/strong>U-turn space (feet).<br><\/p>\n\n\n\n<p><strong>Rear.seat.room<\/strong>: Rear seat room (inches) (missing for 2-seater vehicles).<br><\/p>\n\n\n\n<p><strong>Luggage.room: <\/strong>Luggage capacity (cubic feet) (missing for vans).<br><\/p>\n\n\n\n<p><strong>Weight: <\/strong>Weight (pounds).<br><\/p>\n\n\n\n<p><strong>Origin: <\/strong>Of non-USA or USA company origins? (factor).<br><\/p>\n\n\n\n<p><strong>Make: <\/strong>Combination of Manufacturer and Model (character).<\/p>\n\n\n\n<p><strong><a href=\"https:\/\/d9jmtjs5r4cgq.cloudfront.net\/blog\/datasets\/cars93.csv\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">Download this dataset.<\/a><\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<p><strong>29. Car-seats Dataset:<br><\/strong><\/p>\n\n\n\n<p>This is a simulated data set containing sales of child car seats at 400 different stores. So, it is a data frame with 400 observations on the following 11 variables:<br><\/p>\n\n\n\n<p><strong>Sales: <\/strong>Unit sales (in thousands) at each location<br><\/p>\n\n\n\n<p><strong>CompPrice: <\/strong>Price charged by competitor at each location<br><\/p>\n\n\n\n<p><strong>Income: <\/strong>Community income level (in thousands of dollars)<br><\/p>\n\n\n\n<p><strong>Advertising: <\/strong>Local advertising budget for company at each location (in thousands of dollars)<br><\/p>\n\n\n\n<p><strong>Population: <\/strong>Population size in region (in thousands)<br><\/p>\n\n\n\n<p><strong>Price: <\/strong>Price company charges for car seats at each site<br><\/p>\n\n\n\n<p><strong>ShelveLoc: <\/strong>A factor with levels Bad, Good and Medium indicating the quality of the shelving location for the car seats at each site<br><\/p>\n\n\n\n<p><strong>Age: <\/strong>Average age of the local population<br><\/p>\n\n\n\n<p><strong>Education: <\/strong>Education level at each location<br><\/p>\n\n\n\n<p><strong>Urban: <\/strong>A factor with levels No and Yes to indicate whether the store is in an urban or rural location<br><\/p>\n\n\n\n<p><strong>US: <\/strong>A factor with levels No and Yes to indicate whether the store is in the US or not<\/p>\n\n\n\n<p><a href=\"https:\/\/d9jmtjs5r4cgq.cloudfront.net\/blog\/datasets\/carseats.csv\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Download this dataset. (opens in a new tab)\"><strong>Download this dataset<\/strong>.<\/a><\/p>\n\n\n\n<p><strong>30. msleep Dataset:<br><\/strong><\/p>\n\n\n\n<p>This is an updated and expanded version of the mammals sleep dataset. It is a dataset with 83 rows and 11 variables.<br><\/p>\n\n\n\n<p><strong>Name: <\/strong>common name<br><\/p>\n\n\n\n<p><strong>Genus, vore: <\/strong>carnivore, omnivore or herbivore?<br><\/p>\n\n\n\n<p><strong>Order, conservation: <\/strong>the conservation status of the animal<br><\/p>\n\n\n\n<p><strong>Sleep_total: <\/strong>total amount of sleep, in hours<br><\/p>\n\n\n\n<p><strong>Sleep_rem: <\/strong>rem sleep, in hours<br><\/p>\n\n\n\n<p><strong>Sleep_cycle: <\/strong>length of sleep cycle, in hours<br><\/p>\n\n\n\n<p><strong>Awake: <\/strong>amount of time spent awake, in hours<br><\/p>\n\n\n\n<p><strong>Brainwt: <\/strong>brain weight in kilograms<br><\/p>\n\n\n\n<p><strong>Bodywt: <\/strong>body weight in kilograms<\/p>\n\n\n\n<p><strong><a href=\"https:\/\/d9jmtjs5r4cgq.cloudfront.net\/blog\/datasets\/msleep.csv\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Download this dataset. (opens in a new tab)\">Download this dataset.<\/a><\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<p><strong>31<\/strong>. <strong>Cushings Dataset: Diagnostic Tests on Patients with Cushing's Syndrome<\/strong><br><\/p>\n\n\n\n<p>Cushing's syndrome is a hypertensive disorder associated with over-secretion of cortisol by the adrenal gland. The observations are urinary excretion rates of two steroid metabolites.<br><\/p>\n\n\n\n<p>The Cushings data frame has 27 rows and 3 columns. The description of the columns is below:<br><\/p>\n\n\n\n<p><strong>Tetrahydrocortisone: <\/strong>urinary excretion rate (mg\/24hr) of Tetrahydrocortisone.<br><\/p>\n\n\n\n<p><strong>Pregnanetriol: <\/strong>urinary excretion rate (mg\/24hr) of Pregnanetriol.<br><\/p>\n\n\n\n<p><strong>Type: <\/strong>underlying type of syndrome, coded a (adenoma) , b (bilateral hyperplasia), c (carcinoma) or u for unknown.<\/p>\n\n\n\n<p><strong><a href=\"https:\/\/d9jmtjs5r4cgq.cloudfront.net\/blog\/datasets\/cushing.csv\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Download this dataset. (opens in a new tab)\">Download this dataset.<\/a><\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<p><strong>32. ToothGrowth Dataset:<\/strong><br><\/p>\n\n\n\n<p>The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg\/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).<\/p>\n\n\n\n<p>This is a data frame with 60 observations on 3 variables.<\/p>\n\n\n\n<p><strong><a href=\"https:\/\/d9jmtjs5r4cgq.cloudfront.net\/blog\/datasets\/tooth.csv\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Download this dataset. (opens in a new tab)\">Download this dataset.<\/a><\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<p>Dataset is the base and first step to build a machine learning applications.Datasets are available in different formats like .txt, .csv, and many more. For supervised machine learning, the labelled training dataset is used as the label works as a supervisor in the model. And for unsupervised learning algorithm in machine learning dataset&nbsp;label is required. The unsupervised model learns by itself not from the label.<\/p>\n\n\n\n<p>Please read the full article to understand which dataset is preferable for your machine learning algorithm.<\/p>\n\n\n\n<p>I hope this article will help you to understand thoroughly about the best 20 datasets which are available freely.<\/p>\n\n\n\n<p> For free upksilling courses on Machine Learning and data science, <a rel=\"noreferrer noopener\" aria-label=\"visit GL Academy (opens in a new tab)\" href=\"https:\/\/www.mygreatlearning.com\/academy\" target=\"_blank\"><strong>visit GL Academy<\/strong><\/a>. Also, explore our post graduate programs on data science <a href=\"https:\/\/www.mygreatlearning.com\/data-science\/courses\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"here (opens in a new tab)\">here<\/a>. <\/p>\n\n\n\n<p>Happy Learning!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"further-reading\">Further Reading<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Datasets for Computer Vision using Deep Learning<\/li>\n\n\n\n<li><a href=\"https:\/\/www.mygreatlearning.com\/blog\/sources-for-analytics-and-machine-learning-datasets\/\">Top 5 Sources For Analytics and Machine Learning Datasets<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.mygreatlearning.com\/blog\/free-download-datasets\/\">Free Data Sets for Analytics\/Data Science Project<\/a><\/li>\n\n\n\n<li> <a href=\"https:\/\/www.mygreatlearning.com\/blog\/top-data-scientists-in-the-world\/\">Top 10 Data Scientists in the World<\/a> <\/li>\n<\/ol>\n\n\n\n<div style=\"background-color: #efefef;border: 1px solid #000;padding: 8px\"><p><b>Find  Machine Learning Course in Top Indian Cities<\/b><\/p> \n    <a href=\"https:\/\/www.mygreatlearning.com\/pg-program-machine-learning-course-in-chennai\" title=\" Machine Learning Course in Chennai\">Chennai<\/a> | \n    <a href=\"https:\/\/www.mygreatlearning.com\/pg-program-machine-learning-course-in-bangalore\" title=\" Machine Learning Course in Bangalore\">Bangalore<\/a> | \n    <a href=\"https:\/\/www.mygreatlearning.com\/pg-program-machine-learning-course-in-hyderabad\" title=\" Machine Learning Course in Hyderabad\">Hyderabad<\/a> | \n    <a href=\"https:\/\/www.mygreatlearning.com\/pg-program-machine-learning-course-in-pune\" title=\" Machine Learning Course in Pune\">Pune<\/a> | \n    <a href=\"https:\/\/www.mygreatlearning.com\/pg-program-machine-learning-course-in-mumbai\" title=\" Machine Learninge Course in Mumbai\">Mumbai<\/a> | \n    <a href=\"https:\/\/www.mygreatlearning.com\/pg-program-machine-learning-course-in-delhi-ncr\" title=\" Machine Learning Course in Delhi NCR\">Delhi NCR<\/a><\/div>\n\n\n\n<p><br><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" class=\"wp-block-heading\" id=\"our-machine-learning-courses\">Our Machine Learning Courses<\/h2>\n\n\n\n<p>Explore our Machine Learning and AI courses, designed for comprehensive learning and skill development.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><strong>Program Name<\/strong><\/th><th><strong>Duration<\/strong><\/th><\/tr><tr><th><a href=\"https:\/\/professionalonline2.mit.edu\/no-code-artificial-intelligence-machine-learning-program\">MIT No code AI and Machine Learning Course<\/a><\/th><th>12 Weeks<\/th><\/tr><tr><th><a href=\"https:\/\/idss-gl.mit.edu\/mit-idss-data-science-machine-learning-online-program\">MIT Data Science and Machine Learning Course<\/a><\/th><th>12 Weeks<\/th><\/tr><tr><th><a href=\"https:\/\/www.mygreatlearning.com\/mit-data-science-and-machine-learning-program\">Data Science and Machine Learning Course<\/a><\/th><th>12 Weeks<\/th><\/tr><\/thead><\/table><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>To build a machine learning model dataset is one of the main parts. Before we start with any algorithm we need to have a proper understanding of the data. These machine-learning datasets are basically used for research purposes. Most of the datasets are homogeneous in nature. We use a dataset to train and evaluate our [&hellip;]<\/p>\n","protected":false},"author":41,"featured_media":16554,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[2],"tags":[],"content_type":[],"class_list":["post-16531","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.3 (Yoast SEO v27.3) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Top 20 Dataset in Machine Learning | ML Dataset | Great Learning<\/title>\n<meta name=\"description\" content=\"Machine Learning Datasets: Thorough knowledge about the best 20 datasets which are available freely. Download and use them for your data science projects.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Top 32 Dataset in Machine Learning | Machine Learning Dataset\" \/>\n<meta property=\"og:description\" content=\"Machine Learning Datasets: Thorough knowledge about the best 20 datasets which are available freely. Download and use them for your data science projects.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"Great Learning Blog: Free Resources what Matters to shape your Career!\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/GreatLearningOfficial\/\" \/>\n<meta property=\"article:published_time\" content=\"2020-07-04T12:33:18+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-11-13T12:16:06+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/06\/shutterstock_1278732886.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"471\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Great Learning Editorial Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/twitter.com\/Great_Learning\" \/>\n<meta name=\"twitter:site\" content=\"@Great_Learning\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Great Learning Editorial Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"25 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/dataset-in-machine-learning\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/dataset-in-machine-learning\\\/\"},\"author\":{\"name\":\"Great Learning Editorial Team\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#\\\/schema\\\/person\\\/6f993d1be4c584a335951e836f2656ad\"},\"headline\":\"Top 32 Dataset in Machine Learning | Machine Learning Dataset\",\"datePublished\":\"2020-07-04T12:33:18+00:00\",\"dateModified\":\"2024-11-13T12:16:06+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/dataset-in-machine-learning\\\/\"},\"wordCount\":5582,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/dataset-in-machine-learning\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2020\\\/06\\\/shutterstock_1278732886.jpg\",\"articleSection\":[\"AI and Machine Learning\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/dataset-in-machine-learning\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/dataset-in-machine-learning\\\/\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/dataset-in-machine-learning\\\/\",\"name\":\"Top 20 Dataset in Machine Learning | ML Dataset | Great Learning\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/dataset-in-machine-learning\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/dataset-in-machine-learning\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2020\\\/06\\\/shutterstock_1278732886.jpg\",\"datePublished\":\"2020-07-04T12:33:18+00:00\",\"dateModified\":\"2024-11-13T12:16:06+00:00\",\"description\":\"Machine Learning Datasets: Thorough knowledge about the best 20 datasets which are available freely. Download and use them for your data science projects.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/dataset-in-machine-learning\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/dataset-in-machine-learning\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/dataset-in-machine-learning\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2020\\\/06\\\/shutterstock_1278732886.jpg\",\"contentUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2020\\\/06\\\/shutterstock_1278732886.jpg\",\"width\":1000,\"height\":471,\"caption\":\"Free downloadable datasets\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/dataset-in-machine-learning\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Blog\",\"item\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI and Machine Learning\",\"item\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/artificial-intelligence\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Top 32 Dataset in Machine Learning | Machine Learning Dataset\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/\",\"name\":\"Great Learning Blog\",\"description\":\"Learn, Upskill &amp; Career Development Guide and Resources\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#organization\"},\"alternateName\":\"Great Learning\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#organization\",\"name\":\"Great Learning\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/GL-Logo.jpg\",\"contentUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/GL-Logo.jpg\",\"width\":900,\"height\":900,\"caption\":\"Great Learning\"},\"image\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/GreatLearningOfficial\\\/\",\"https:\\\/\\\/x.com\\\/Great_Learning\",\"https:\\\/\\\/www.instagram.com\\\/greatlearningofficial\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/school\\\/great-learning\\\/\",\"https:\\\/\\\/in.pinterest.com\\\/greatlearning12\\\/\",\"https:\\\/\\\/www.youtube.com\\\/user\\\/beaconelearning\\\/\"],\"description\":\"Great Learning is a leading global ed-tech company for professional training and higher education. It offers comprehensive, industry-relevant, hands-on learning programs across various business, technology, and interdisciplinary domains driving the digital economy. These programs are developed and offered in collaboration with the world's foremost academic institutions.\",\"email\":\"info@mygreatlearning.com\",\"legalName\":\"Great Learning Education Services Pvt. Ltd\",\"foundingDate\":\"2013-11-29\",\"numberOfEmployees\":{\"@type\":\"QuantitativeValue\",\"minValue\":\"1001\",\"maxValue\":\"5000\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#\\\/schema\\\/person\\\/6f993d1be4c584a335951e836f2656ad\",\"name\":\"Great Learning Editorial Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/unnamed.webp\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/unnamed.webp\",\"contentUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/unnamed.webp\",\"caption\":\"Great Learning Editorial Team\"},\"description\":\"The Great Learning Editorial Staff includes a dynamic team of subject matter experts, instructors, and education professionals who combine their deep industry knowledge with innovative teaching methods. Their mission is to provide learners with the skills and insights needed to excel in their careers, whether through upskilling, reskilling, or transitioning into new fields.\",\"sameAs\":[\"https:\\\/\\\/www.mygreatlearning.com\\\/\",\"https:\\\/\\\/in.linkedin.com\\\/school\\\/great-learning\\\/\",\"https:\\\/\\\/x.com\\\/https:\\\/\\\/twitter.com\\\/Great_Learning\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UCObs0kLIrDjX2LLSybqNaEA\"],\"award\":[\"Best EdTech Company of the Year 2024\",\"Education Economictimes Outstanding Education\\\/Edtech Solution Provider of the Year 2024\",\"Leading E-learning Platform 2024\"],\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/author\\\/greatlearning\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Top 20 Dataset in Machine Learning | ML Dataset | Great Learning","description":"Machine Learning Datasets: Thorough knowledge about the best 20 datasets which are available freely. Download and use them for your data science projects.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/","og_locale":"en_US","og_type":"article","og_title":"Top 32 Dataset in Machine Learning | Machine Learning Dataset","og_description":"Machine Learning Datasets: Thorough knowledge about the best 20 datasets which are available freely. Download and use them for your data science projects.","og_url":"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/","og_site_name":"Great Learning Blog: Free Resources what Matters to shape your Career!","article_publisher":"https:\/\/www.facebook.com\/GreatLearningOfficial\/","article_published_time":"2020-07-04T12:33:18+00:00","article_modified_time":"2024-11-13T12:16:06+00:00","og_image":[{"width":1000,"height":471,"url":"http:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/06\/shutterstock_1278732886.jpg","type":"image\/jpeg"}],"author":"Great Learning Editorial Team","twitter_card":"summary_large_image","twitter_creator":"@https:\/\/twitter.com\/Great_Learning","twitter_site":"@Great_Learning","twitter_misc":{"Written by":"Great Learning Editorial Team","Est. reading time":"25 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/#article","isPartOf":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/"},"author":{"name":"Great Learning Editorial Team","@id":"https:\/\/www.mygreatlearning.com\/blog\/#\/schema\/person\/6f993d1be4c584a335951e836f2656ad"},"headline":"Top 32 Dataset in Machine Learning | Machine Learning Dataset","datePublished":"2020-07-04T12:33:18+00:00","dateModified":"2024-11-13T12:16:06+00:00","mainEntityOfPage":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/"},"wordCount":5582,"commentCount":0,"publisher":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/06\/shutterstock_1278732886.jpg","articleSection":["AI and Machine Learning"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/","url":"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/","name":"Top 20 Dataset in Machine Learning | ML Dataset | Great Learning","isPartOf":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/#primaryimage"},"image":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/06\/shutterstock_1278732886.jpg","datePublished":"2020-07-04T12:33:18+00:00","dateModified":"2024-11-13T12:16:06+00:00","description":"Machine Learning Datasets: Thorough knowledge about the best 20 datasets which are available freely. Download and use them for your data science projects.","breadcrumb":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/#primaryimage","url":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/06\/shutterstock_1278732886.jpg","contentUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/06\/shutterstock_1278732886.jpg","width":1000,"height":471,"caption":"Free downloadable datasets"},{"@type":"BreadcrumbList","@id":"https:\/\/www.mygreatlearning.com\/blog\/dataset-in-machine-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog","item":"https:\/\/www.mygreatlearning.com\/blog\/"},{"@type":"ListItem","position":2,"name":"AI and Machine Learning","item":"https:\/\/www.mygreatlearning.com\/blog\/artificial-intelligence\/"},{"@type":"ListItem","position":3,"name":"Top 32 Dataset in Machine Learning | Machine Learning Dataset"}]},{"@type":"WebSite","@id":"https:\/\/www.mygreatlearning.com\/blog\/#website","url":"https:\/\/www.mygreatlearning.com\/blog\/","name":"Great Learning Blog","description":"Learn, Upskill &amp; Career Development Guide and Resources","publisher":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/#organization"},"alternateName":"Great Learning","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.mygreatlearning.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.mygreatlearning.com\/blog\/#organization","name":"Great Learning","url":"https:\/\/www.mygreatlearning.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.mygreatlearning.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/06\/GL-Logo.jpg","contentUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/06\/GL-Logo.jpg","width":900,"height":900,"caption":"Great Learning"},"image":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/GreatLearningOfficial\/","https:\/\/x.com\/Great_Learning","https:\/\/www.instagram.com\/greatlearningofficial\/","https:\/\/www.linkedin.com\/school\/great-learning\/","https:\/\/in.pinterest.com\/greatlearning12\/","https:\/\/www.youtube.com\/user\/beaconelearning\/"],"description":"Great Learning is a leading global ed-tech company for professional training and higher education. It offers comprehensive, industry-relevant, hands-on learning programs across various business, technology, and interdisciplinary domains driving the digital economy. These programs are developed and offered in collaboration with the world's foremost academic institutions.","email":"info@mygreatlearning.com","legalName":"Great Learning Education Services Pvt. Ltd","foundingDate":"2013-11-29","numberOfEmployees":{"@type":"QuantitativeValue","minValue":"1001","maxValue":"5000"}},{"@type":"Person","@id":"https:\/\/www.mygreatlearning.com\/blog\/#\/schema\/person\/6f993d1be4c584a335951e836f2656ad","name":"Great Learning Editorial Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/02\/unnamed.webp","url":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/02\/unnamed.webp","contentUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/02\/unnamed.webp","caption":"Great Learning Editorial Team"},"description":"The Great Learning Editorial Staff includes a dynamic team of subject matter experts, instructors, and education professionals who combine their deep industry knowledge with innovative teaching methods. Their mission is to provide learners with the skills and insights needed to excel in their careers, whether through upskilling, reskilling, or transitioning into new fields.","sameAs":["https:\/\/www.mygreatlearning.com\/","https:\/\/in.linkedin.com\/school\/great-learning\/","https:\/\/x.com\/https:\/\/twitter.com\/Great_Learning","https:\/\/www.youtube.com\/channel\/UCObs0kLIrDjX2LLSybqNaEA"],"award":["Best EdTech Company of the Year 2024","Education Economictimes Outstanding Education\/Edtech Solution Provider of the Year 2024","Leading E-learning Platform 2024"],"url":"https:\/\/www.mygreatlearning.com\/blog\/author\/greatlearning\/"}]}},"uagb_featured_image_src":{"full":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/06\/shutterstock_1278732886.jpg",1000,471,false],"thumbnail":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/06\/shutterstock_1278732886-150x150.jpg",150,150,true],"medium":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/06\/shutterstock_1278732886-300x141.jpg",300,141,true],"medium_large":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/06\/shutterstock_1278732886-768x362.jpg",768,362,true],"large":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/06\/shutterstock_1278732886.jpg",1000,471,false],"1536x1536":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/06\/shutterstock_1278732886.jpg",1000,471,false],"2048x2048":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/06\/shutterstock_1278732886.jpg",1000,471,false],"web-stories-poster-portrait":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/06\/shutterstock_1278732886.jpg",640,301,false],"web-stories-publisher-logo":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/06\/shutterstock_1278732886.jpg",96,45,false],"web-stories-thumbnail":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/06\/shutterstock_1278732886.jpg",150,71,false]},"uagb_author_info":{"display_name":"Great Learning Editorial Team","author_link":"https:\/\/www.mygreatlearning.com\/blog\/author\/greatlearning\/"},"uagb_comment_info":0,"uagb_excerpt":"To build a machine learning model dataset is one of the main parts. Before we start with any algorithm we need to have a proper understanding of the data. These machine-learning datasets are basically used for research purposes. Most of the datasets are homogeneous in nature. We use a dataset to train and evaluate our&hellip;","_links":{"self":[{"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/posts\/16531","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/users\/41"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/comments?post=16531"}],"version-history":[{"count":30,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/posts\/16531\/revisions"}],"predecessor-version":[{"id":107018,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/posts\/16531\/revisions\/107018"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/media\/16554"}],"wp:attachment":[{"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/media?parent=16531"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/categories?post=16531"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/tags?post=16531"},{"taxonomy":"content_type","embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/content_type?post=16531"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}