Whether you are a beginner in the field of data science or an expert who has worked on the trending projects of the domain, clean and authentic datasets are extremely crucial for the success of the outcome. With that in mind, we have sourced datasets on different domains to help you test your models and algorithms and build your skills.
Following are datasets on retail, healthcare, agricultural statistics, foreign investments, finance, and startup funding information. Budding data scientists and data science enthusiasts can use these datasets to practise and hone their skills. Each data set contains content clarification and attribute information so that it is easier for you to fit them into any analytical structure.
Check out data science courses.
The path to becoming an expert on data science is long and laborious. While understanding the latest trends is important in order to be at the top of your top, developing your own style is equally crucial to stay long in it. Use these following data sets to create projects and gain experience which you can showcase in your CV.
Datasets for Creating Projects of Data Science
|Sr. No.||Domain||Dataset link||Description|
|1.||Retail Analytics||Online Retail||Abstract: This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. |
Attribute Information: Invoice No: Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter ‘c’, it indicates a cancellation.StockCode: Product (item) code. Nominal, a 5-digit integral number uniquely assigned to each distinct product.
Description: Product (item) name. Nominal.Quantity: The quantities of each product (item) per transaction. Numeric.Invoice Date and time. Numeric, the day and time when each transaction was generated.UnitPrice: Unit price. Numeric, Product price per unit in sterling.CustomerID: Customer number. Nominal, a 5-digit integral number uniquely assigned to each customer.Country: Country name. Nominal, the name of the country where each customer resides.
|2.||Healthcare Analytics||Heart Diseases||This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to this date. The “goal” field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4. Attribute Information:> 1. age> 2. sex> 3. chest pain type (4 values)> 4. resting blood pressure> 5. serum cholesterol in mg/dl> 6. fasting blood sugar > 120 mg/dl> 7. resting electrocardiographic results (values 0,1,2)> 8. maximum heart rate achieved> 9. exercise induced angina> 10. oldpeak = ST depression induced by exercise relative to rest> 11. the slope of the peak exercise ST segment> 12. number of major vessels (0-3) colored by flourosopy> 13. thal: 3 = normal; 6 = fixed defect; 7 = reversible defect|
|3.||Environmental Analytics||Agriculture crop Production in India||This Dataset can solve the problems of various crops Cultivation/production in India. Attribute Information: crop:string, crop name Variety:string,crop subsidiary name state: string,Crops Cultivation/production Place Quantity:Integer,no of Quintals/Hectares production:Integer,no of years Production Season:DateTime,medium(no of days),long(no of days) Unit:String , Tons Cost:Integer, cost of cultivation and Production Recommended Zone:String ,place(State,Mandal,Village)|
|4.||Investment Analytics||Foreign Direct Investment In India||To understand the Foreign direct investment in India for the last 17 years from 2000-01 to 2016-17. This dataset contains sector and financial year-wise data of FDI in India|
|5.||Financial Analytics||Capitalization of top 500 companies in India||This data set has information on the market capitalisation of the top 50 companies in India.|
Serial NumberNameName of CompanyMar Cap – CroreMarket Capitalization in CroresSales Qtr – CroreQuarterly Sale in crores
|6.||Business Analytics||Indian Startup Funding||This dataset has funding information of the Indian startups from January 2015 to August 2017. It includes columns with the date funded, the city the startup is based out of, the names of the funders, and the amount invested (in USD).|
Sr NoDate ddmmyyyy
Startup Vertical SubVertical City Location Investors xe2x80x99 Name Investment Type Amount in USD Remarks