SQL for Data Analysis Tutorial

This SQL tutorial covers essential techniques for filtering, summarizing, and joining data for effective analysis.

SQL for Data Analysis

Every business relies on information derived from customer transactions, product performance, and other data. However, when this information scales to millions or billions of records, extracting meaningful insights becomes a significant challenge.

Excel or Python can become slow, crash, or require excessive manual effort to uncover even simple insights:

  • Which customers are bringing the most revenue?
  • What is performing better this quarter?

This is where SQL (Structured Query Language) is useful.

In this tutorial, you will get to know how SQL can assist you in organizing, filtering, and analyzing large volumes of data easily to transform raw data into information that can be used to make decisions.

What is SQL?

SQL (Structured Query Language) is a standardized programming language designed to work with relational databases. It plays a vital role in managing and analyzing large volumes of data efficiently. SQL is widely used for:

  • Data Retrieval – extracting specific information from vast datasets.
  • Data Storage – organizing and saving data systematically in tables.
  • Data Management – maintaining and structuring databases for optimal performance.
  • Data Analysis – generating insights and reports from raw data.

It also enables users to perform essential operations such as:

  • Data Creation
  • Data Updating
  • Data Deletion

Key Functions of SQL for Data Analysis

1. Query Data Efficiently: SQL lets you pull exactly the data you need using commands like SELECT and FROM.

2. Filter & Focus:

  • Narrow down datasets with WHERE, AND, OR, & other conditions.
  • Focus on only the relevant rows instead of handling entire tables manually.

3. Aggregate & Summarize: Use GROUP BY, COUNT(), SUM(), AVG() to summarize data.

4. Combine Multiple Tables: JOIN tables to create a unified view of related data.

5. Sort & Limit Results: Organize data with ORDER BY & restrict output using LIMIT for quick analysis.

SQL is the analyst’s toolkit inside the database; it lets you answer business questions quickly, safely, and efficiently.

Academy Pro

SQL Data Analytics Course

Learn SQL for data analytics in this course designed to help you turn raw data into clear insights. Master essential tools and techniques to make smarter, data-driven business choices.

5 Hrs
39 Coding Exercises
Learn SQL for Data Analysis

Core SQL Commands for Data Analysis

1. Your First Queries – SELECT and FROM

When working with a database, your primary objective is to find answers to your questions. SQL provides two essential commands to do this:

  • SELECT – tells SQL what information you want to see.
  • FROM – tells SQL where to find it (the table).

These commands are the foundation for every SQL query you’ll write. Let’s understand this with a step-by-step using a Sample Dataset – Products

ProductIDProductNameCategoryPriceStock
1LaptopElectronics80050
2HeadphonesElectronics50200
3ChairFurniture120150
4DeskFurniture300100

Query 1 – Seeing Everything in the Table

Before starting an analysis, it’s useful to preview the entire table to understand its structure. To do this, you need to write:

SQL Query:

SELECT * FROM Products;

Explanation:

  • The asterisk (*) means “all columns.”
  • This query retrieves every column & every row from the table, providing you with an overview of your data before narrowing it down.

Output:

ProductIDProductNameCategoryPriceStock
1LaptopElectronics80050
2HeadphonesElectronics50200
3ChairFurniture120150
4DeskFurniture300100

Query 2 – Selecting Specific Columns

Once you know what’s in the table, you can fetch specific information, such as just the product names and their prices. To get this, the SQL query would be:

SELECT ProductName, Price 
FROM Products;

Explanation:

  • List the exact column names you want after SELECT: ProductName, Price
  • Separate multiple columns with a comma.

This approach retrieves only the data you need, rather than the entire table.

Output:

ProductNamePrice
Laptop800
Headphones50
Chair120
Desk300

2. Filtering Your Data – WHERE

Using the WHERE clause allows you to retrieve only those rows that meet specific conditions.

Common Operators

OperatorDescription
=Equals
!=Not equals
>, <Greater than / Less than
>=Greater than or equal to
<=Less than or equal to
ANDBoth conditions must be true
OREither condition can be true
INMatches any value in a list
LIKEPattern matching (e.g., LIKE 'A%' for names starting with A)
BETWEENFor a range of numbers or dates

Step 1: Sample Dataset – Customers

CustomerIDCustomerNameCountryAgePurchaseAmount
1Alice BrownUSA28500
2John MillerCanada351200
3Priya SharmaIndia24700
4Robert ChenChina40300
5Maria LopezMexico301500
6David WilsonUSA22250

Query 1 – Customers Who Spent More Than $700

We can filter the dataset to focus on customers who spent over $700 for targeted marketing as well as analysis.

SQL Query:

SELECT CustomerName, Country, PurchaseAmount
FROM Customers
WHERE PurchaseAmount > 700;

Explanation:

  • The WHERE clause filters rows based on a condition.
  • Here, it selects only customers whose PurchaseAmount is greater than 700.

Output:

CustomerNameCountryPurchaseAmount
John MillerCanada1200
Maria LopezMexico1500

Query 2 – Customers from the USA Under 30 Years Old

To analyze a specific demographic, we can find customers who are both from the USA & under 30 years old. This helps understand patterns in younger local customers’ behavior.

SQL Query:

SELECT CustomerName, Country, Age
FROM Customers
WHERE Country = 'USA' AND Age < 30;

Explanation:

  • AND ensures both conditions are true.
  • Only customers who meet both criteria from the USA & younger than 30 are returned.

Output:

CustomerNameCountryAge
Alice BrownUSA28
David WilsonUSA22

Query 3 – Customers from India or Mexico

To retrieve customers from either India or Mexico, to include multiple countries in our analysis & enable regional comparisons or targeted campaigns.

SQL Query:

SELECT CustomerName, Country
FROM Customers
WHERE Country = 'India' OR Country = 'Mexico';

Explanation:

  • OR allows either condition to be true.
  • Returns customers from India or Mexico, covering multiple geographic locations.

Output:

CustomerNameCountry
Priya SharmaIndia
Maria LopezMexico

3. Summarizing Your Data – GROUP BY and Aggregates

Instead of reviewing every row in large datasets, SQL’s aggregate functions & GROUP BY let you summarize data to answer your key business questions.

Key Aggregate Functions

Function / CommandDescription
COUNT()Counts the number of rows.
SUM()Adds up all values in a column.
AVG()Calculates the average of values.
MIN() / MAX()Finds the minimum or maximum value.
GROUP BYGroups rows by a column so aggregate functions can be applied to each group.

Important Rule

Aggregate functions like:

  • SUM,
  • AVG,
  • MIN,
  • MAX,
  • COUNT(column_name)

All ignore NULL values. The main exception is COUNT(*), which counts every row in the group, regardless of whether individual columns contain NULLs.

A Quick Note on Renaming Columns with AS

In the following sections, you will see the keyword AS. This is the alias command.

Its job is simple: to rename a column or the result of a calculation in your output. This is used for two main reasons:

  • Readability:
    It makes your report headers much cleaner (e.g., VisitCount is easier to read than COUNT(*)).
  • Sorting:
    It gives you a simple name to use in your ORDER BY clause.

How to use it?

You place AS immediately after the column or function you want to rename, followed by the new name you want to give it.

Step 1: Sample Dataset – user_interactions

UserIDActionContentIDTimestampDuration
1Watch1012024-03-05 10:22:455
2Like1022024-03-05 11:45:12NULL
3SubscribeNULL2024-03-05 13:34:21NULL
4Watch1032024-03-05 15:12:338
5Like1042024-03-05 17:45:21NULL

Note: The Duration column represents minutes watched & is used to demonstrate:

  • SUM
  • AVG
  • MIN
  • MAX functions.

Query 1 – Counting Actions

To understand overall user engagement, it’s useful to know how frequently each type of action occurs.

SQL Query:

SELECT Action, COUNT(*) AS ActionCount
FROM user_interactions
GROUP BY Action;

Explanation:

  • COUNT(*) AS ActionCount counts the total number of rows for each group & renames the new column to ActionCount.
  • GROUP BY Action groups all rows by the Action column (‘Watch’, ‘Like’, ‘Subscribe’), so the count is calculated for each unique action.

Output:

ActionActionCount
Watch2
Like2
Subscribe1

Query 2 –Summarizing Durations by Action (SUM, AVG, MIN, MAX)

To understand engagement across all action types, we can calculate the:
Total
Average
Shortest & longest
durations for each action simultaneously.

SQL Query:

SELECT
  Action,
  SUM(Duration) AS TotalDuration,
  AVG(Duration) AS AvgDuration,
  MIN(Duration) AS MinDuration,
  MAX(Duration) AS MaxDuration
FROM user_interactions
GROUP BY Action;

Explanation:

  • GROUP BY Action first splits the table into three groups:
    • ‘Watch’
    • ‘Like’
    • ‘Subscribe’
  • The aggregate functions (SUM, AVG, MIN, MAX) are then applied to each group separately.
  • ‘Watch’ Group: The insights are
    • SUM(5, 8) is 13.
    • AVG(5, 8) is 6.5.
    • MIN is 5,
    • MAX is 8.
  • ‘Like’ & ‘Subscribe’ Groups: Since their Duration values are all NULL, the functions ignore them, resulting in NULL for the totals.

Output

ActionTotalDurationAvgDurationMinDurationMaxDuration
Watch136.558
LikeNULLNULLNULLNULL
SubscribeNULLNULLNULLNULL

4. Sorting Results With ORDER BY and LIMIT

Using ORDER BY with LIMIT, you can rank your results and focus on the top entries.
This allows analysts to quickly identify the most important insights, such as:

  • The Busiest Departments
  • Top-Performing Products
  • Highest-Value Customers

without having to manually sort or sift through all the data.

Key Points

  • ORDER BY – sorts results by one or more columns.
  • ASC (Ascending, A-Z, 1-10) – default order.
  • DESC (Descending, Z-A, 10-1) – reverse order.
  • LIMIT – restricts the output to a specific number of rows.

Step 1: Sample Dataset – PatientVisits

VisitIDPatientIDVisitDateDepartmentReason
1P10012024-03-01CardiologyRoutine Checkup
2P10022024-03-01General MedicineFlu Symptoms
3P10032024-03-02OrthopedicsKnee Pain
4P10042024-03-02NeurologyMigraine
5P10012024-03-03CardiologyFollow-up Visit

Query – Finding the Busiest Departments

To determine which departments are handling the most patient visits, we can count the number of visits per department & sort the results.

SQL Query:

SELECT Department, COUNT(*) AS VisitCount
FROM PatientVisits
GROUP BY Department
ORDER BY VisitCount DESC;

Explanation:

  • SELECT Department, COUNT(*) AS VisitCount – Shows the department and renames the count column to “VisitCount”.
  • FROM PatientVisits – Uses data from the PatientVisits table.
  • GROUP BY Department – Groups all visits by department before counting.
  • ORDER BY VisitCount DESC – Sorts the results by our new VisitCount column in descending order, so the busiest department appears first.

Output:

DepartmentVisitCount
Cardiology2
General Medicine1
Orthopedics1
Neurology1

Step 2 – Limiting the Results

If you only want to see the single busiest department, you can limit the results to just the top row.

SQL Query:

SELECT Department, COUNT(*) AS VisitCount
FROM PatientVisits
GROUP BY Department
ORDER BY VisitCount DESC
LIMIT 1;

Output:

DepartmentVisitCount
Cardiology2

You can use a free online SQL compiler to practice as you go.

5. Combining Data – JOIN

Often, the data you need is stored in multiple tables. To work with this related data, SQL provides the JOIN command, which merges tables based on a common column.

JOIN – merges rows from two or more tables based on a related column.

Common types of JOINs:

  • INNER JOIN – returns only rows with matching values in both tables.
  • LEFT JOIN – returns all rows from the left table & matched rows from the right table (NULL if no match).
  • RIGHT JOIN – returns all rows from the right table & matched rows from the left table (NULL if no match).

Primary Key and Foreign Key:

  • Primary Key (PK) – a unique identifier for each row in a table. Ensures every record is distinct.
  • Foreign Key (FK) – a column in one table that references the Primary Key in another table. It establishes a relationship between tables.

Using Primary Key and Foreign Key ensures data integrity and enables accurate JOIN operations.

Step 1: Sample Datasets

Table 1: Customers (Primary Key: CustomerID)

CustomerIDCustomerNameCountry
1Alice BrownUSA
2John MillerCanada
3Priya SharmaIndia
4TRobert ChenChina

Table 2: Orders (Foreign Key: CustomerID referencing Customers.CustomerID)

OrderIDCustomerIDOrderAmountOrderDate
10115002024-03-01
102212002024-03-02
10317002024-03-03
10433002024-03-04
10552002024-03-05

Query 1 – INNER JOIN: Customers Who Placed Orders

An INNER JOIN allows us to see only the customers who have matching orders, along with their order information.
This query links customers to their specific orders and only returns those who have made a purchase.

SQL Query:

SELECT Customers.CustomerName, Orders.OrderID, Orders.OrderAmount
FROM Customers
INNER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;

Output:

CustomerNameOrderIDOrderAmount
Alice Brown101500
Alice Brown103700
John Miller1021200
Priya Sharma104300

Query 2 – LEFT JOIN: All Customers and Their Orders

We use a LEFT JOIN when we need to display all customers, even if they haven’t placed any orders. This method ensures no customer from the “left” table (Customers) is excluded from the report, regardless of their order activity.

SQL Query:

SELECT Customers.CustomerName, Orders.OrderID, Orders.OrderAmount
FROM Customers
LEFT JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;

If a customer has no orders, OrderID & OrderAmount are NULL.

Output:

CustomerNameOrderIDOrderAmount
Alice Brown101500
Alice Brown103700
John Miller1021200
Priya Sharma104300
Robert ChenNULLNULL

Query 3 – RIGHT JOIN: All Orders and Their Customers

A RIGHT JOIN is used to list all orders, even if they don’t have matching customer information. This is useful for identifying data integrity issues, such as “orphan” orders that are in the system without a corresponding customer.

SQL Query:

SELECT Customers.CustomerName, Orders.OrderID, Orders.OrderAmount
FROM Customers
RIGHT JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;

If an order has a matching customer, CustomerName is included, and orders with no matching customer will show NULL for CustomerName.

Output:

CustomerNameOrderIDOrderAmount
Alice Brown101500
John Miller1021200
Alice Brown103700
Priya Sharma104300
NULL105200

Learning SQL equips you with a universal skill for understanding and analyzing data, whether your goal is to generate reports, discover patterns, or answer complex business questions.

SQL allows you to interact with large datasets in a structured and efficient manner. Its flexibility and scalability make it a core competency for data analysts, business intelligence professionals, and anyone who wants to convert raw data into actionable insights.

If you wish to improve these skills and utilize them in real projects, then signing up for a SQL Data Analytics course will be the best option for you. You will learn practical methods and set up a solid base for a data-driven future career.

Suggested Read: List of 27 SQL Query Examples to Master Data Analysis

Avatar photo
Saurabh Kango
Senior Manager - Data Science and Analytics
Scroll to Top