- Introduction to Matplotlib
- Installing Matplotlib
- How to use Matplotlib
- The Relation between – Matplotlib, Pyplot and Python
- Create a simple plot
- Adding elements to a plot
- Making Multiple Plots in One Figure
- Create Subplots
- Figure Object
- Axes Object
- Different Types of Plots
- Saving Plot
Contributed by: Mr. Sridhar Anchoori
LinkedIn profile: https://www.linkedin.com/in/sridhar-anchoori-42156722/
‘A Picture is Worth more than a thousand words’, similarly in the context of data ‘A visualisation is worth more than a complex data table or report’.
Data Visualisation is one of the critical skills expected from data scientists. Most of the business problems could be understood and addressed using visualisation techniques. Visualisation basically involves Exploratory Data Analysis (EDA) and Graphical Plots. Effective visualisation helps the users to understand the patterns from the data and solve the business problem more effectively. Another advantage of visualisation is to simplify the complex data into an understandable format.
People find it very easy to read an image much easier than text. Visualisation is the best communication platform to analyse and interpret the data. It helps the users to understand vast amounts of information easily. Data visualisation helps in understanding the trends, correlation, patterns, distributions etc.,
There are multiple tools and technologies available in the industry for data visualisation, python being the most used. Python offers multiple libraries for data visualisation, few of the popular graphic libraries are:
- Pandas visualisation
This document helps in understanding the matplotlib library which is widely used in the industry. Matplotlib has a variety of graphical features and is very easy to understand. This article focuses on different graphical features including syntax.
There are multiple ways to install the matplotlib library. The easiest way to install matplotlib is to download the Anaconda package. Matplotlib is default installed with Anaconda package and does not require any additional steps.
- Download anaconda package from the official site of Anaconda
- To install matplotlib, go to anaconda prompt and run the following command
<code>pip install matplotlib or conda install matplotlib </code>
- Verify whether the matplotlib is properly installed using the following command in Jupyter notebook
<code>import matplotlib matplotlib.__version__ </code>
How to use Matplotlib
Before using matplotlib, we need to import the package. This can be done using the ‘import’ method in Jupyter notebook. PyPlot is the graphical module in matplotlib which is mostly used for data visualisation, importing PyPlot is sufficient to work around data visualisation.
<code># import matplotlib library as mpl import matplotlib as mpl #import the pyplot module from matplotlib as plt (short name used for referring the object) import matplotlib.pyplot as plt </code>
The relation between – Matplotlib, Pyplot and Python
- Python Is a very popular programming language, used for web development, mathematics and statistical analysis. Python works on most of the platforms and is also simple to use.
- Python has multiple libraries used for specific purposes, below libraries are mostly used for visualisation and data analysis.
- As you observe one of the packages is matplotlib which is developed using python. This library is very widely used for data visualisations.
- PyPlot is a module in matplotlib which provides MATLAB like interface. MATLAB is heavily used for statistical analysis in the manufacturing industry. MATLAB is a licensed software and requires a significant amount of money to buy and use, whereas PyPlot is an open-source module and gives similar functionality as MATLAB using python. Just to conclude PyPlot has been seen as a replacement of MATLAB in the context of open source.
Create a Simple Plot
Here we will be depicting a basic plot using some random numbers generated using NumPy. The simplest way to create a graph is using the ‘plot()’ method. To generate a basic plot, we need two axes (X) and (Y), and we will generate two random numbers using the ‘linspace()’ method from Numpy.
<code># import the NumPy package import numpy as np # generate random number using NumPy, generate two sets of random numbers and store in x, y x = np.linspace(0,50,100) y = x * np.linspace(100,150,100) # Create a basic plot plt.plot(x,y)</code>
# Basic plot is generated as shown below:
Adding Elements to Plot
The plot generated above does not have all the elements to understand it better. Let’s try to add different elements for the plot for better interpretation. The elements that could be added for the plot includes title, x-Label, y-label, x-limits, y-limits.
<code># set different elements to the plot generated above # Add title using ‘plt.title’ # Add x-label using ‘plt.xlabel’ # Add y-label using ‘plt.ylabel’ # set x-axis limits using ‘plt.xlim’ # set y-axis limits using ‘plt.ylim’ # Add legend using ‘plt.legend’</code>
# Refer chart below, that has the elements added i.e., title, x-label, y-label, x-limits and y-limits
Let’s, add few more elements to the plot like colour, markers, line customisation.
<code># add color, style, width to line element plt.plot(x, y, c = 'r', linestyle = '--', linewidth=2)</code>
<code># add markers to the plot, marker has different elements i.e., style, color, size etc., plt.plot (x, y, marker='*', markersize=3, c=’g’) </code>
<code># add grid using grid() method Plt.grid(True) # add legend and label plt.legend()</code>
Plots could be customised at three levels:
- b – blue
- c – cyan
- g – green
- k – black
- m – magenta
- r – red
- w – white
- y – yellow
- Can use Hexadecimal, RGB formats
- Line Styles
- ‘-‘ : solid line
- ‘- -‘: dotted line
- ‘- .’: dash-dot line
- ‘:’ – dotted line
- Marker Styles
- . – point marker
- , – Pixel marker
- v – Triangle down marker
- ^ – Triangle up marker
- < – Triangle left marker
- > – Triangle right marker
- 1 – Tripod down marker
- 2 – Tripod up marker
- 3 – Tripod left marker
- 4 – Tripod right marker
- s – Square marker
- p – Pentagon marker
- * – Star marker
- Other configurations
- color or c
Making Multiple Plots in One Figure
There could be some situations where the user may have to show multiple plots in a single figure for comparison purpose. For example, a retailer wants to know the sales trend of two stores for the last 12 months and he would like to see the trend of the two stores in the same figure.
Let’s plot two lines sin(x) and cos(x) in a single figure and add legend to understand which line is what.
<code># lets plot two lines Sin(x) and Cos(x) # loc is used to set the location of the legend on the plot # label is used to represent the label for the line in the legend # generate the random number x= np.arange(0,1500,100) plt.plot(np.sin(x),label='sin function x') plt.plot(np.cos(x),label='cos functon x') plt.legend(loc='upper right')</code>
<code># To show the multiple plots in separate figure instead of a single figure, use plt.show() statement before the next plot statement as shown below x= np.linspace(0,100,50) plt.plot(x,'r',label='simple x') plt.show() plt.plot(x*x,'g',label='two times x') plt.show() plt.legend(loc='upper right')</code>
There could be some situations where we should show multiple plots in a single figure to show the complete storyline while presenting to stakeholders. This can be achieved with the use of subplot in matplotlib library. For example, a retail store has 6 stores and the manager would like to see the daily sales of all the 6 stores in a single window to compare. This can be visualised using subplots by representing the charts in rows and columns.
<code> # subplots are used to create multiple plots in a single figure # let’s create a single subplot first following by adding more subplots x = np.random.rand(50) y = np.sin(x*2) #need to create an empty figure with an axis as below, figure and axis are two separate objects in matplotlib fig, ax = plt.subplots() #add the charts to the plot ax.plot(y)</code>
<code># Let’s add multiple plots using subplots() function # Give the required number of plots as an argument in subplots(), below function creates 2 subplots fig, axs = plt.subplots(2) #create data x=np.linspace(0,100,10) # assign the data to the plot using axs axs.plot(x, np.sin(x**2)) axs.plot(x, np.cos(x**2)) # add a title to the subplot figure fig.suptitle('Vertically stacked subplots')</code>
<code># Create horizontal subplots # Give two arguments rows and columns in the subplot() function # subplot() gives two dimensional array with 2*2 matrix # need to provide ax also similar 2*2 matrix as below fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2) # add the data to the plots ax1.plot(x, x**2) ax2.plot(x, x**3) ax3.plot(x, np.sin(x**2)) ax4.plot(x, np.cos(x**2)) # add title fig.suptitle('Horizontal plots')</code>
<code># another simple way of creating multiple subplots as below, using axs fig, axs = plt.subplots(2, 2) # add the data referring to row and column axs[0,0].plot(x, x**2,'g') axs[0,1].plot(x, x**3,'r') axs[1,0].plot(x, np.sin(x**2),'b') axs[1,1].plot(x, np.cos(x**2),'k') # add title fig.suptitle('matrix sub plots')</code>
Matplotlib is an object-oriented library and has objects, calluses and methods. Figure is also one of the classes from the object ‘figure’. The object figure is a container for showing the plots and is instantiated by calling figure() function.
‘plt.figure()’ is used to create the empty figure object in matplotlib. Figure has the following additional parameters.
- Figsize – (width, height) in inches
- Dpi – used for dots per inch (this can be adjusted for print quality)
<code># let’s create a figure object # change the size of the figure is ‘figsize = (a,b)’ a is width and ‘b’ is height in inches # create a figure object and name it as fig fig = plt.figure(figsize=(4,3)) # create a sample data X = np.array([1,2,3,4,5,6,8,9,10]) Y = X**2 # plot the figure plt.plot(X,Y)</code>
<code># let’s change the figure size and also add additional parameters like facecolor, edgecolor, linewidth fig = plt.figure(figsize=(10,3),facecolor='y',edgecolor='r',linewidth=5)</code>
Axes is the region of the chart with data, we can add the axes to the figure using the ‘add_axes()’ method. This method requires the following four parameters i.e., left, bottom, width, and height
- Left – position of axes from left of figure
- bottom – position of axes from the bottom of figure
- width – width of the chart
- height – height of the chart
Other parameters that can be used for the axes object are:
- Set title using ‘ax.set_title()’
- Set x-label using ‘ax.set_xlabel()’
- Set y-label using ‘ax.set_ylabel()’
<code># lets add axes using add_axes() method # create a sample data y = [1, 5, 10, 15, 20,30] x1 = [1, 10, 20, 30, 45, 55] x2 = [1, 32, 45, 80, 90, 122] # create the figure fig = plt.figure() # add the axes ax = fig.add_axes([0,0,2,1]) l1 = ax.plot(x1,y,'ys-') l2 = ax.plot(x2,y,'go--') # add additional parameters ax.legend(labels = ('line 1', 'line 2'), loc = 'lower right') ax.set_title("usage of add axes function") ax.set_xlabel('x-axix') ax.set_ylabel('y-axis') plt.show()</code>
Different Types of Matplotlib Plots
Matplotlib has a wide variety of plot formats, few of them include bar chart, line chart, pie chart, scatter chart, bubble chart, waterfall chart, circular area chart, stacked bar chart etc., We will be going through most of these charts in this document with some examples. There are some elements that are common for each plot that can be customised like axis, color etc., and there could be some elements that are specific to the respective chart.
Bar graph represents the data using bars either in Horizontal or Vertical directions. Bar graphs are used to show two or more values and typically the x-axis should be categorical data. The length of the bar is proportional to the counts of the categorical variable on x-axis.
- The function used to show bar graph is ‘plt.bar()’
- The bar() function expects two lists of values one on x-coordinate and another on y-coordinate
plt.bar() function has the following specific arguments that can be used for configuring the plot.
- Width, Color, edge colour, line width, tick_label, align, bottom,
- Error Bars – xerr, yerr
<code># lets create a simple bar chart # x-axis is shows the subject and y -axis shows the markers in each subject subject = ['maths','english','science','social','computer'] marks =[70,80,50,30,78] plt.bar(subject,marks) plt.show()</code>
<code>#let’s do some customizations #width – shows the bar width and default value is 0.8 #color – shows the bar color #bottom – value from where the y – axis starts in the chart i.e., the lowest value on y-axis shown #align – to move the position of x-label, has two options ‘edge’ or ‘center’ #edgecolor – used to color the borders of the bar #linewidth – used to adjust the width of the line around the bar #tick_label – to set the customized labels for the x-axis plt.bar(subject,marks,color ='g',width = 0.5,bottom=10,align ='center',edgecolor='r',linewidth=2,tick_label=subject)</code>
<code># errors bars could be added to represent the error values referring to an array value # here in this example we used standard deviation to show as error bars plt.bar(subject,marks,color ='g',yerr=np.std(marks))</code>
<code># to plot horizontal bar plot use plt.barh() function plt.barh(subject,marks,color ='g',xerr=np.std(marks))</code>
Pie charts display the proportion of each value against the total sum of values. This chart requires a single series to display. The values on the pie chart shows the percentage contribution in terms of a pie called Wedge/Widget. The angle of the wedge/widget is calculated based on the proportion of values. This visualisation is best when we are trying to compare different segments within the total values. For example, a sales manager wants to know the contribution of type of payments in a month i.e., paid through cash, credit card, debit card, PayPal, any other online apps.
- The function used for pie chart is ‘plt.pie()’
- To draw a pie chart, we need only one list of values, each wedge is calculated as proportion converted into angle.
plt.pie() function has the following specific arguments that can be used for configuring the plot.
- labels – used to show the widget categories
- explode – used to pull out the widget/wedge slice
- autopct – used to show the % of contributions for the widgets
- Set_aspect – used to
- shadow – to show the shadow for a slice
- colours – to set the custom colours for the wedges
- startangle – to set the angles of the wedges
<code># Let’s create a simple pie plot # Assume that we have a data on number of tickets resolved in a month # the manager would like to know the individual contribution in terms of tickets closed in the week # data Tickets_Closed = [10, 20, 8, 35, 30, 25] Agents = ['Raj', 'Ramesh', 'Krishna', 'Arun', 'Virag', 'Mahesh'] # create pie chart plt.pie(Tickets_Closed, labels = Agents)</code>
<code>#Let’s add additional parameters to pie plot #explode – to move one of the wedges of the plot #autopct – to add the contribution % explode = [0.2,0.1,0,0.1,0,0] plt.pie(Tickets_Closed, labels = Agents, explode=explode, autopct='%1.1f%%' )</code>
Scatterplot is used to visualise the relationship between two columns/series of data. The graph displays the collection of data points without connecting. The chart needs two variables, one variable shows X-position and the second variable shows Y-position. Scatterplot is used to represent the association between variables and mostly advised to use before regression. Scatterplot helps in understanding the following information across the two columns
- Any relationship exists between the two columns
- + ve Relationship
- Or -Ve relationship
- The function used for the scatter plot is ‘plt.scatter()’
plt.scatter() function has the following specific arguments that can be used for configuring the plot.
- size – to manage the size of the points
- color – to set the color of the points
- marker – type of marker
- alpha – transparency of point
- norm – to normalize the data (scaling between 0 to 1)
<code># let's create a simple scatter plot # generate the data with random numbers x = np.random.randn(1000) y = np.random.randn(1000) plt.scatter(x,y)</code>
<code># as you observe there is no correlation exists between x and y # let’s try to add additional parameters # size – to manage the size of the points #color – to set the color of the points #marker – type of marker #alpha – transparency of point size = 150*np.random.randn(1000) colors = 100*np.random.randn(1000) plt.scatter(x, y, s=size, c = colors, marker ='*', alpha=0.7)</code>
Histogram is used to understand the distribution of the data. It is an estimate of the probability distribution of continuous data. It is similar to bar graph as discussed above but this is used to represent the distribution of a continuous variable whereas bar graph is used for discrete variable. Every distribution is characterised by four different elements including
- Center of the distribution
- Spread of the distribution
- Shape of the distribution
- Peak of the distribution
Histogram requires two elements x-axis shown using bins and y-axis shown with the frequency of the values in each of the bins form the data set. Every bin has a range with minimum and maximum values.
- The function used for scatter plot is ‘plt.hist()’
plt.hist() function has the following specific arguments that can be used for configuring the plot.
- bins – number of bins
- alpha – transparency of the color
- xlim – to set the x-limits
- ylim – to set the y-limits
- xticks, yticks
- facecolor, edgecolor, density
<code># let’s generate random numbers and use the random numbers to generate histogram data = np.random.randn(1000) plt.hist(data)</code>
<code># let’s add additional parameters # facecolor # alpha # edgecolor # bins data = np.random.randn(1000) plt.hist(data, facecolor ='y',linewidth=2,edgecolor='k', bins=30, alpha=0.6)</code>
<code># lets create multiple histograms in a single plot # Create random data hist1 = np.random.normal(25,10,1000) hist2 = np.random.normal(200,5,1000) #plot the histogram plt.hist(hist1,facecolor = 'yellow',alpha = 0.5, edgecolor ='b',bins=50) plt.hist(hist2,facecolor = 'orange',alpha = 0.8, edgecolor ='b',bins=30)</code>
Saving plot as an image using ‘savefig()’ function in matplotlib. The plot can be saved in multiple formats like .png, .jpeg, .pdf and many other supporting formats.
<code># let's create a figure and save it as image items = [5,10,20,25,30,40] x = np.arange(6) fig = plt.figure() ax = plt.subplot(111) ax.plot(x, y, label='items') plt.title('Saving as Image') ax.legend() fig.savefig('saveimage.png')</code>
Image is saved with a filename as ‘saveimage.png’.
<code>#To display the image again, use the following package and commands import matplotlib.image as mpimg image = mpimg.imread("saveimage.png") plt.imshow(image) plt.show() </code>
This brings us to the end of this Matplotlib tutorial. If you wish to learn more about Python, upskill with Great Learning’s PG program in Artificial Intelligence and Machine Learning.