You’ve probably heard the word ‘Data Science’ pop up in numerous conversations, news articles and across different media. This article is a primer on what Data science is all about.
Data science is the future of technology and is creating millions of jobs world wide. Tech giants like Facebook, Google, IBM are spending millions of dollars in research and development in the fields of Machine Learning and Artificial Intelligence which are based on Data Science concepts. Data science jobs are one of the most sought after on websites like Linkedin, Glassdoor and Monster.
What is Data Science?
As the name suggests, Data science deals with large amount ofs data.
This vast amount of data needs to be grouped, classified and structured and used to draw useful insights to drive business growth. Doing this sounds very simple,  but it actually isn’t. In order to read the data, many tools and algorithms have to be used to visualize, structure and read the data to eventually derive insights.
Data science is used as a broader generic term these days – when people use the word Data science, they are usually referring to different fields that come under Data Science, like Data Analytics, Business Analytics, Machine Learning and Artificial Intelligence. Each field is unique in it’s own way and they all have critical applications in business.
Data science flow-chart
Data Science for Beginners
The chart above shows the different steps that are part of the of a Data Scientists workflow. The rest of the article focuses on detailing these steps.
Step 1:
Obtaining the Data
One first needs to identify what kind of data needs to be analysed. This data could be around customer buying patterns or sales forecasts or even customer behavior across different touchpoints of a business. This data needs to be exported to an excel or a csv file. The next step would be to make this data easily readable, i.e. it should be labelled and structured the right way so that it is easy to analyse.
Skills and tools required

  • *Database management : SQL
  • *Understanding the database and what it represents
  • *Retrieving raw unstructured data in the form of text, docs, photos, videos etc.
  • *Distributed storage : Hadoop, Spark, or Apache

Step 2:
Scrubbing or cleaning the data
This is an important step because before you are able to read the data, you must make sure it is in a perfectly readable state, without any mistakes, no missing values or wrong values. The data has to be consistent throughout, to ensure you can make an error free analysis.
Skills and tools required

  • *Scripting language – Python, R, SAS
  • *Data wrangling tools – Python, Pandas, R
  • *Distributed processing – Hadoop, Mapreduce/spark

Step 3:
Exploratory Data Analytics
Now that your data is clean and readable, it’s time to get to the real work – Analyzing the data. This is done by visualizing the data in various ways and identifying patterns to spot anything out of the ordinary. In order to be able to analyse the data, you must have high attention to detail to identify if anything is out of place. Additionally, you need to be able to think out of the box to identify trends and build out hypotheses. And then based on this analysis, come with solutions. This is the primary job of a Data Analyst.
Skills and tools required

  • *Python libraries – Numpy, Matplotlib, Pandas, Scipy
  • *R libraries  – GGplot2, Dplyr
  • *Inferential statistics
  • *Data visualization
  • *Experimental design

Step 4:
Modelling or Machine Learning
Machine Learning is an application of Artificial Intelligence, in which, a machine can follow commands and rules (algorithms) and come up with predictive solutions without any human supervision.
The data engineer or scientist writes down a set of instructions for the Machine Learning algorithm to follow based on the data that has to be analysed. The algorithm uses these instructions in an iterative manner to come up with the right output.
After cleaning up the data and finding out essential features through the data exploration phase, using a statistical model as a predictive tool will help you develop relatively error-free business insights enabling you to improve  your overall decision making.
Skills and tools required

  • *Machine learning – supervised, unsupervised and reinforcement machine learning
  • *Evaluation methods
  • *Machine learning libraries – Python (sci-kit learn) / R (CARET)
  • *Linear algebra and multivariate calculus

Step 5:
Interpreting or ‘data storytelling’
This is the final step, in which you uncover your findings and present it to the organization. The most important skill in this would be your ability to explain your results. Hence the term ‘storytelling’.
In order to understand how the data can affect the business or how your solution helps to provide better business solutions, you must also have a good understanding of your current organizations business and business processes.
Skills and tools required

  • *Knowledge of your business domain
  • *Data visualization tools – Tableau, GGplot, Seaborn etc.
  • *Communication – presentation skills, both verbal and written

Now that you know what skills and tools you need to know in order to become a data scientist, the next step for you is to learn all these tools and enter into the vast field yourself.



Please enter your comment!
Please enter your name here

4 × 1 =