## Introduction

Jupyter Notebook is an open-source web application. This application allows you to create documents that can contain live code, equations, visualizations, images, and narrative text. This application is mainly used for data science or statistical evaluation purpose. These processes include data cleaning, data transformation, numerical simulation, mathematical computations, statistical modelling, data visualization, machine learning, and deep learning concepts, etc. So you can explain this application as a data science tool kit.

What is data science? If we don’t know what data science is, we can’t understand the importance of Jupyter notebook.

Data has become more valuable. As we all know, the fast-growing industries are mainly dependent on new technologies, so a large amount of data is at the heart of all of them. To manage, store, pre-process, and take meaningful insights from data is important. But, the main problem is storing and processing the data.

Here, data science comes into picture. We can imagine data science is like an umbrella under which all other processes take place.

## What is Data Science?

Data science is the process through which we can get meaningful information from the massive amount of data. In simple terms, read and study the data to get proper intuitive insights. Data Science is a mix of various tools, algorithms, and machine learning and deep learning concepts that are used to discover hidden patterns from the raw and unstructured data. It is necessary to understand the basics of Data Science and Python before learning about Jupyter.

Also Read: 100+ Data Science Interview Questions

## Why do we need Data Science?

In the past, we used to have data in a structured format. Now, as the volume of the data is increasing, the amount of structured data becomes very less. All the unstructured and semi-structured data are collected from various sources, thus, we can’t guarantee that the data is in proper format.

Our conventional system cannot cope with massive amounts of unstructured data. To solve this problem, data science comes into picture.

Let’s have a look at the statistics for the number of semi or unstructured data in the upcoming time.

As per the statistician, 80-90% of the data will be unstructured. This is because of significant growth in the industry.

Some of the most important applications of Data Science are as follows.

#### 1. Recommendation system:

The product recommendation technique becomes one of the most popular techniques to influence the customer to buy similar products.

Let’s see an example. A salesperson in Big Bazaar is trying to increase the sales of the store by bundling various products together and giving discounts on them. Let us assume he bundles shampoo and conditioner together, and gives a discount on them. Customers will buy them together for a discounted price.

#### 2. Future Forecasting:

Predictive analysis is one of the most used domains in data science. We are all aware of weather forecasting or future forecasting based on various types of data that are collected from various sources.

Example:

If we want to forecast COVID-19 cases to get an overview of increasing cases in the current pandemic situation, we can do so with the help of data science techniques.

#### 3. Fraud and Risk Detection:

As online transactions are booming with time, there is a high possibility to lose your personal data. One of the most intellectual applications of data science is fraud and risk detection.

Example:

Credit card fraud detection depends on the amount, merchant, location, time and other variables as well. If any of these aspects look unnatural, the transaction will automatically be canceled. Your card will also be blocked for 24 hours or more.

#### 4.Self Driving Car:

In today’s world, self-driving cars are one of the most successful inventions. Based on the previous data, we train our car to take decisions on its own. In this process, we can give a penalty to our model if it does not perform well. The car (model) becomes more intelligent with time as it starts learning through real time experiences.

#### 5.Image Recognition:

When you want to recognize any image, data science has the ability to detect the object and then classify it for further ease in recognition. The most popular example of image recognition is the face recognition feature in our smartphones. First, the system will detect the face, it then classifies your face as a human face. After this process, it decides whether the phone belongs to the actual owner or not.

#### 6. Speech to text Convert:

Speech recognition is a process used by computers to understand natural language. We are all quite familiar with Google Assistance. Have you ever tried to understand how this assistance works? Google Assistance first tries to recognize our speech and then it converts this speech into the text form with the help of algorithms.

These are the most used components in Data Science.

1. Statistics: Statistics is used to analyse and get the insights of the essential components in the considerable amount of data.

2. Mathematics: Mathematics is the most critical, primary, and necessary part of data science. It is used to study structure, quantity, quality, space, and change in data. So every aspiring data scientist must have good knowledge in mathematics to read the data mathematically and build meaningful insights from the data.

3. Visualization: Visualization represents the context visually with the insights. It helps to understand the huge volume of data properly.

4. Data engineering: Data engineering helps to acquire, store, retrieve, and transform the data, and it also includes metadata (data about data) to the data.

5. Domain Expertise: Domain expertise helps to get a proper explanation from using their expertise in different areas.

6. Advanced computing: Advance computing is a big part of designing, writing, debugging, and maintaining the source code of computer programs.

7. Machine learning: Machine learning is the most useful and essential part of data science. It helps identify the best features to build an accurate model.

Now, we have a rough idea about are the most important domains in data science. We will now move on to learning about Jupyter notebook install.

Jupyter is mostly used by beginners as well as companies. It has almost forty different programming languages and Python is one of them. Before installing the Jupyter notebook, you have to make sure that Python (Python 3.3 or greater, or Python 2.7) is already installed to your system. This is because installing the Jupyter Notebook requires Python.

Also Read: Data Science Tutorial for beginners

Jupyter Notebook can be installed in two possible ways:

## Install Jupyter notebook by Anaconda

#### What is Anaconda?

Anaconda is a free and open-source platform for programming languages such as Python and R. This platform comes with the Python interpreter and various packages that are related to Artificial Intelligence.

The main agenda behind the Anaconda Platform is to make it easy for people who are keenly interested in these fields. It comes with many pre-installed libraries and packages and it just needs a single installation process. This platform is beginner-friendly and easy to use.

• Install Python and Jupyter using the Anaconda Distribution: Includes Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science.
• Using PIP command:
Install Jupyter using the PIP package manager used to install and manage software packages/libraries written in Python.

### Installing Jupyter Notebook using Anaconda

Anaconda platform also contains Jupyter, Spyder, and more. This is mainly used for large data processing, data analytics, heavy scientific computing. One sub-application of anaconda is Spyder that is used for Python. OpenCV Library for image processing which is used in Python also works in Spyder. Package versions are managed by the package management system called Conda.

In order to install Jupyter using Anaconda, Please follow the following instructions:

1. Install Anaconda:

3. Select the respective platform: Windows/Mac/Linux

5. Open and execute the .exe installer

6. Launch Anaconda Navigator

7. Click on the Install Jupyter Notebook Button

8. Beginning the Installation

10. Finish Installation

## Installing Jupyter Notebook using pip command

PIP stands for the package management system which is used to install and manage software packages/libraries. These libraries and the packages are written in Python. These files are stored in a large “on-line repository” termed as Python Package Index (PyPI). pip uses PyPI as the default source for packages and their dependencies.

Before we start installing pip, we have to check the version of the pip command. If the version of the pip command is not updated then we need to update the pip in our system.

#### Update PIP command

`python3 -m pip install --upgrade pip`

Then after updating the pip version we need to follow the upcoming process to install Jupyter.

• Command to install Jupyter: pip3 install Jupyter
• Begin Installation
• Collect Files and Data
• Run Installation
• Finish Installation

Now Launch the Jupyter:
Use the command to launch Jupyter using command-line:

`jupyter notebook`