- Getting Started
– What is NumPy
– Why use it?
- Introduction to NumPy Array
– Using NumPy
– NumPy Arrays
– Creating Numpy Arrays
- Built-In Methods
– Zeros and ones
- Array Attributes and Methods
- Numpy Indexing and Selection
– Bracket Indexing and Selection
- Indexing 2D Array/Matrix
– Fancy Indexing
- Numpy Operations
– Universal Array Functions
- Append, Concentrate and Stack
What is NumPy?
Numpy is a tool for mathematical computing and data preparation in Python. It can be utilized to perform a number of mathematical operations on arrays such as trigonometric, statistical and algebraic routines. This library provides many useful features including handling n-dimensional arrays, broadcasting, performing operations, data generation, etc., thus, it’s the fundamental package for scientific computing with Python. It also provides a large collection of high-level mathematical functions to operate on arrays.
Look at other machine learning libraries.
Why use it?
It is incredibly fast, as it has bindings to C libraries. Python list has less properties than numpy array, which is why you will use arrays over lists. It helps in data preprocessing. Numpy is surprisingly compact, fast and easy to use, so let’s dive into installation.
The terminal on your machine is often used to install/manage/delete Python packages. Numpy too, can be installed from your command line using:
pip3 install numpy
or through your Anaconda environment. To install a specific package such as NumPy into an existing Anaconda environment called “myenv”:
conda install –name myenv numpy
If you do not specify the environment name, which in this example is done by –name myenv, the package installs into the current environment:
conda install numpy
Introduction to NumPy array
Once the installation of the Numpy package is completed, the implementation in a Python file is done using the following command:
import numpy as np
Numpy has many different built-in functions and capabilities. This tutorial will not cover them all, but instead, we will focus on some of the most important aspects: vectors, arrays, matrices, number generation and few more. The rest of the Numpy capabilities can be explored in detail in the Numpy documentation. Now let’s discuss arrays.
Numpy arrays will be the main focus in this course. So, what are arrays? Arrays are a reserved space in memory for a list of values. These arrays essentially come in two flavors: vectors and matrices. Vectors are strictly one dimensional and matrices are two or more dimensional arrays. They certainly take part in the root architecture of programming and give a whole lot of new possibilities, which will be explored later on.
Creating NumPy Arrays
The pre-built function np.array is the correct way to create an array. As you can see, first we import numpy as np, meaning that later we will access the numpy functions using the ‘np.functionName’ format. Than we call the function ‘array’ from it. The only required parameter for the array function is data, in a form of list. There are more parameters which also can be found in the numpy.array documentation. We either convert ‘data’ into an array, or we specify the data inside the function, as shown below:
import numpy as np
data = [1,2,3]
arr = np.array(data)
import numpy as np
arr = np.array([1,2,3])
These two codes produce the same output when we print their type:
Both cases work perfectly fine. Other significant parameters to be concidered are dtype, copy, ndmin, etc. Every array consists of two parts, the value and the index. The value is the actual numbers the array holds and the index is the position of the value in the array. This is essential, because it allows you to access certain values just by knowing their index, or finding the index of a certain value/s. We will go deeper in indexing later on in the course.
Numpy allows us to use many built-in methods for generating arrays. Let’s examine the most used from those, as well as their purpose:
np.arange() – array of arranged values from low to high value
np.zeros() – array of zeros with specified shape
np.ones() – similarly to zeros, array of ones with specified shape
np.linspace() – array of linearly spaced numbers, with specified size
np.eye() – two dimensional array with ones on the diagonal, zeros elsewhere
Numpy’s arange function will return evenly spaced values within a given interval. Works similarly to Python’s range() function. The only required parameter is ‘stop’, while all the other parameters are optional:
Zeros and ones
Numpy provides functions that are able to create arrays of 1’s and 0’s. The required parameter for these functions is ‘shape’.
Create array filled with zero values:
Numpy’s linspace function will return evenly spaced numbers over a specified interval. Required parameters for this functions are ‘start’ and ‘stop’.
The parameter ‘num’ specifies the number of samples to generate, and the default value is 50. The value defined in the parameter ‘num’ must be non-negative. You are able to change the data type of your values using ‘dtype’ as parameter.
Numpy’s eye function will return Identity Matrix. The identity matrix is a square matrix that has 1’s along the main diagonal and 0’s for all other entries. This matrix is often written simply as ‘I’, and is special in that it acts like 1 in matrix multiplication. Required parameter for this function is ‘N’, number of rows in the output.
Specification of a data type of the matrix’s values using ‘dtype’ is also possible.
Numpy allows you to use various functions to produce arrays with random values. To access these functions, first we have to access the ‘random’ function itself. This is done using ‘np.random’, after which we specify which function we need. Here is a list of the most used random functions and their purpose:
np.random.rand() – produce random values in the given shape from 0 to 1
np.random.randn() – produce random values with a ‘standard normal’ distribution, from -1 to 1
np.random.randint() – produce random numbers from low to high, specified as parameter
The rand function uses only one parameter which is the ‘shape’ of the output. You need to specify the output format you need, whether it is one or two dimensional array. If there is no argument passed to the function, it returns a single value. Otherwise, it produces number of numbers as specified. For example:
The randn function is similar to the rand function, except it produces a number with standard normal distribution. What this means, is that it generates number with distribution of 1 and mean of 0, i.e. value from -1 to +1 by default:
The distribution is now equal to 4, so the given floats vary between minus and plus 4. Other mathematical operations such as multiplication, division, subtraction are possible in order to modify the distribution, depending on the needs.
Randint is used to generate whole random numbers, ranging between low(inclusive) and high(exclusive) value. Specifying a parameter like ‘(1, 100)’ will create random values from 1 to 99.
Array Attributes and Methods
Now we will continue with more attributes and methods that can be used on arrays. In this lecture we will talk about:
Reshape – changes the shape of an array into the desired shape
Shape – returns the shape of the given array as parameter
Dtype – returns the data type of the values in the array
These methods will improve your ‘trial-and-error’, meaning, once you find yourself in a situation where you encounter an error, applying methods like this may help you locate the error faster, thus it will save you a lot of time in the future. Let’s dive straight in.
This method allows you to transform one dimensional array to more dimensional, or the other way around. Reshape will not affect your data values. Let’s check out this code:
The array named ‘arr’ is now reshaped into a 5 by 5 matrix and by this, we specify the number of rows and the number of columns. Key thing to notice is that the array still has all 25 elements. Reshaping it into a 4 by 5 matrix(4 rows, 5 columns), would’ve produced an error since the reshape size is not the same size as the array’s. It would’ve been possible if the array had only 20 elements. To reverse the process and return the array into it’s original shape, we could do this:
The ‘shape’ method will return you a tuple consisting of the array’s dimensions. Let’s check out the shape of the previously used array:
This function allows you to check the data type of the array’s values.
There can be more the one data type present in an array, so make sure to check Numpy’s documentation on ‘dtypes’ for more.
Numpy Indexing and Selection
In this lecture, we will discuss how to select element or groups of elements from an array and change them. Here is a list of what we will cover in this lecture:
Indexing – pick one or more elements from an array
Broadcasting – changing values within an index range
Bracket Indexing and Selection
The simplest way to pick one or more elements of an array looks very similar to Python lists. We will be using the following array as an example
Numpy arrays differ from a normal Python list because of their ability to broadcast. Below is an example of setting a value within index range (Broadcasting).
Indexing 2D Array/Matrix
The main idea behind this lecture is to help you get comfortable with indexing in more than 1 dimensions. Below is a list of what we will cover.
Indexing a 2D array – Indexing matrices differs from vectors
Fancy indexing – Selecting entire rows or columns out of order
Selection – Selection based off of comparison operators
They both work. Throughout this lecture we will be using the second notation.
Now, how do you index a column? I will show you an example of selecting the second column with all the values inside.
You can see that both examples provide the same output. In these kinds of cases, using the first example is recommended for the sake of simplicity.
Fancy indexing allows you to select entire rows or columns out of order. To show this, let’s quickly build out a numpy array of zeros.
Let’s briefly go over how to use brackets for selection based off of comparison operators. But first we need to create an array we will use as an example.
We can perform different types of operations on NumPy arrays. What this means is we can sum, subtract, multiply or divide the values inside our array, even do things like taking the square root. Below is a list of what we will cover in this lecture.
Arithmetic Operations – sum, subtract, multiply, divide on arrays
Universal Array Functions – Mathematical operations provided by NumPy
While performing arithmetic operations between two arrays it is important that they have the same dimensions. We will use the following array as an example
If you run the last example you will again get a warning for dividing with zero. Note that this time instead of None, we get a value of infinity.
Universal Array Functions
Numpy comes with many universal array functions, which are essentially just mathematical operations you can use to perform the operation across the array. Let’s show some common ones.
When dealing with messy data you will often need to stick multiple arrays together. In this lecture we will cover how this is done using numpy. Another list of what we will cover:
Append, Concatenate and Stack
Append – append one array to another
Concatenate – Concatenate two arrays
Stack – Stack one array to another horizontally or vertically
These functions don’t work in-place, meaning you need to put your combined arrays to a new variable. Throughout this lecture we will be using the following arrays as an example:
To append using numpy we use np.append() function which requires three parameters, ‘arr’, ‘values’ and ‘axis’ on which to append.
Concatenate works similarly to append, but instead of ‘arr’ and ‘values’ as parameters it takes a tuple of two arrays. Let’s show you few examples.
There are two ways of stacking arrays together, horizontally and vertically. Example for vertical stack:
Watch the following video to understand how NumPy Scikit works in Machine Learning.0