Contributed by: Nallaperumal
Some of the topics that can be concentrated while attending DBMS interview questions are depicted below. These are some of the essential questions that you come across in any assignments or if they give some problem to solve on the spot.
We will see the topics and then jump on to the problem to have a quick look at some examples.
- Self-Join
- Aggregate based on dates (Month, year, week)
- Cumulative sum & rolling averages
- Multiple join conditions
- Use of Analytical function
- Use of Window functions
- Cross join
- Use of CASE and PIVOT statements (Display rows as columns vice versa)
- SQL – NULL Values
- GROUP BY and HAVING clause (Difference between Group by and Having)
- Use of LIMIT Clause
- UNIQUE and PRIMARY Key difference
- Use of DISTINCT and ORDER BY
- Use of functions such as SUBSTRING, LENGTH, UPPER, LOWER
- Conditions like EXISTS and IN
Before jumping into the above topics I would like to give a quick overview on the topic CTE (Common Table Expression). This can be used as an alternative to a subquery or if you want to combine multiple tables with some unique conditions this can be leveraged. It even works well for UPDATE, DELETE and INSERT. If you also want to go for scenarios where it is recursive this can be utilized. It acts as a temporary table.
CTE (Common Table Expressions)
It starts with a WITH clause followed by a temporary table name.
With [CTE_NAME] AS ( CTE QUERY )
CTE NAME – temporary table name
CTE Query – Query that we are going to use
Table: Employee_details
EmpID | Emp_Name | City |
1 | Jacob | Chennai |
2 | Utkarsh | Bangalore |
3 | Ritesh Kumar | Hyd |
4 | Kanan | Kochi |
5 | Madhan | Chennai |
If we want to display only the Emp Name and EmpID from the above table corresponding to city Chennai then,
Solution:
WITH CTE AS (SELECT Emp_Name, EmpID FROM employee WHERE city = ‘Chennai’) select * from CTE;
Output: Here point to note is that whatever you have selected inside the CTE query only those columns will be accessible.
Emp Name | EmpID |
Jacob | 1 |
Madhan | 5 |
We can also add more tables in the CTE by giving a comma and naming the next temporary tablename.
1. SELF-JOIN: (Manager /Employee hierarchy)
Let say the query is to find out the list of Manager hierarchy for the employee Madhan.
Table name: Employee
EmpID | Emp_Name | Manager_ID |
1 | Jacob | NULL |
2 | Utkarsh | 6 |
3 | Ritesh Kumar | 1 |
4 | Kanan | 2 |
5 | Madhan | 3 |
6 | Naveen | NULL |
Solution:
Our thought process should be Madhan->Ritesh Kumar->Jacob
The above all rows should be displayed. We will use a CTE here (Note the following is based on Oracle SQL) slight change of syntax might occur in other backend utilities.
WITH EMP_CTE_MNG AS ( SELECT EmpID, Emp_Name,manager_ID FROM Employee WHERE ManagerId IS NULL ), EMP_LOW_LEVEL( SELECT T.EmpID,T.Emp_Name , T.manager_ID FROM Employee T INNER JOIN Employee C ON C.EmployeeId=T.ManagerId And T.EmpID = 5 ) SELECT a.EmpID,a.Emp_Name,a.manager_ID from EMP_CTE_MNG union all SELECT b.EmpID,b.Emp_Name,b.manager_ID from EMP_LOW_LEVEL
Output:
EmpID | Emp_Name | Manager_ID |
1 | Jacob | NULL |
3 | Ritesh Kumar | 1 |
5 | Madhan | 3 |
This type of problem is called as recursive problem statement.
2. AGGREGATE BASED ON DATES:
Let us look at a problem where we have a pictures table and say we want to display the total number of pictures sold with respect to each month and yearly wise then
Table: Pictures
Date_created | No_of_pictures_Sold |
1/1/2020 | 200 |
2/2/2020 | 400 |
2/3/2020 | 500 |
3/4/2020 | 800 |
3/5/2020 | 1200 |
12/5/2019 | 5000 |
Solution:
select extract(year from date_created) as year, extract(month from date_created) as month, sum(No_of_pictures_Sold) as Total_pictures_sold from Pictures group by extract(year from date_created), extract(month from date_created) order by year,month asc;
Output:
Year | Month | Total_pictures_sold |
2019 | 12 | 5000 |
2020 | 1 | 200 |
2020 | 2 | 900 |
2020 | 3 | 2000 |
3. CUMMULATIVE SUM and ROLLING AVERAGE:
Write a query for the following table to find out the cumulative amount for each day.
Table: Amounts
Date | Amount |
1/1/2020 | 100 |
1/2/2020 | -100 |
1/3/2020 | 200 |
1/4/2020 | 300 |
1/5/2020 | 500 |
Solution:
Here we can solve using two ways. One main advantage of SQL is that we can solve the problem with many solutions but you have to get the problem right.
Solution a:
SELECT a.date date, SUM(b.amount) as cummulative_amount FROM amounts a JOIN amounts b ON a.date >= b.date GROUP BY a.date ORDER BY date ASC
Solution b: (using window functions)
SELECT date, SUM(amount)OVER (ORDER BY date ASC) as cumulative_amount FROM Amounts ORDER BY date ASC
Output:
Date | cummulative_amount |
1/1/2020 | 100 |
1/2/2020 | 0 |
1/3/2020 | 200 |
1/4/2020 | 500 |
1/5/2020 | 1000 |
ROLLING AVERAGE:
The following problem is an adaptation from https://www.sisense.com/blog/rolling-average/
Write a query to get 7-day rolling (preceding) average of daily Sales amount.
Table: Sales
Date | Sales_Amount |
1/1/2020 | 20 |
1/2/2020 | 30 |
1/3/2020 | 40 |
1/4/2020 | 50 |
1/5/2020 | 60 |
1/6/2020 | 70 |
1/7/2020 | 80 |
1/8/2020 | 90 |
1/9/2020 | 100 |
1/10/2020 | 110 |
Solution:
SELECT a.date, AVG(b.Sales_Amount) Avg_Sales_Amount FROM Sales a INNER JOIN Sales b ON a.date <= b.date + interval '6 days' AND a.date >= b.date GROUP BY a.date
The above solution can also be handled using analytical functions in oracle such as preceding and current rows. Say if we have month and amount if we want to find out moving average for 3 months then,
Month | Amount | Moving_Avg |
1 | 200 | 0.00 |
2 | 300 | 0.00 |
3 | 400 | 300.00 |
4 | 500 | 400.00 |
5 | 700 | 533.33 |
6 | 800 | 666.67 |
7 | 1200 | 900.00 |
8 | 300 | 766.67 |
9 | 400 | 633.33 |
10 | 500 | 400.00 |
11 | 900 | 600.00 |
12 | 1000 | 800.00 |
To get the moving average for the above we will use the following query:
SELECT month, SUM(amount) AS month_amount, AVG(SUM(amount)) OVER (ORDER BY month ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS moving_average FROM sales GROUP BY month ORDER BY month;
4. USE OF ANALYTICAL FUNCTIONS:
There are many analytical functions which we use but here we are going to look upon two functions LEAD and LAG.
Let us see with an example supposing if we want to see the order date and when a previous order was made for a particular order with respect to same product we can leverage by making use of LAG function.
Table : Products
ORDER_DATE | PRODUCT_ID | QTY |
9/25/2010 | 1000 | 20 |
9/26/2010 | 2000 | 25 |
9/27/2010 | 1000 | 90 |
9/28/2010 | 2000 | 100 |
9/29/2010 | 2000 | 20 |
9/30/2010 | 1000 | 5 |
Solution:
SELECT product_id, order_date, LAG (order_date,1) OVER (PARTITION BY product_id ORDER BY order_date) AS prev_order_date FROM orders;
Output:
PRODUCT_ID | ORDER_DATE | PREV_ORDER_DATE |
1000 | 9/25/2010 | NULL |
1000 | 9/27/2010 | 9/25/2010 |
1000 | 9/30/2010 | 9/27/2010 |
2000 | 9/26/2010 | NULL |
2000 | 9/28/2010 | 9/26/2010 |
2000 | 9/29/2010 | 9/28/2010 |
For the same problem if we need to look at next order date then,
SELECT product_id, order_date, LEAD (order_date,1) OVER (PARTITION BY product_id ORDER BY order_date) AS next_order_date FROM orders;
Output:
PRODUCT_ID | ORDER_DATE | NEXT_ORDER_DATE |
1000 | 9/25/2010 | 9/27/2010 |
1000 | 9/27/2010 | 9/30/2010 |
1000 | 9/30/2010 | NULL |
2000 | 9/26/2010 | 9/28/2010 |
2000 | 9/28/2010 | 9/29/2010 |
2000 | 9/29/2010 | NULL |
6. USE OF WINDOW FUNCTIONS:
Let us say we have a Salaries table – If we need to find out the top 3rd salary from the table
Table : Salaries
Dept No | empID | salary |
IT Engineer | 1 | 50000 |
IT Engineer | 4 | 45000 |
IT Engineer | 3 | 30000 |
IT Engineer | 2 | 10000 |
IT Engineer | 5 | 35000 |
Infrastructure | 6 | 55000 |
Infrastructure | 7 | 50000 |
Data Analyst | 9 | 55000 |
Data Analyst | 8 | 50000 |
Data Analyst | 10 | 30000 |
Solution:
WITH sal_rank AS (SELECT empID, DENSE_RANK() OVER(ORDER BY salary DESC) Dense_Rnk FROM salaries) SELECT empID FROM sal_rank WHERE Dense_Rnk = 3;
Here, I would suggest using DENSE_RANK() , if we go by RANK() it will skip ranks if the values are the same that is why dense_rank is recommended here. Let us look at the output of rank and dense_rank as well.
Dept No | empID | salary | Rank | Dense Rank |
IT Engineer | 1 | 50000 | 3 | 2 |
IT Engineer | 4 | 45000 | 6 | 3 |
IT Engineer | 3 | 30000 | 9 | 6 |
IT Engineer | 2 | 10000 | 10 | 7 |
IT Engineer | 5 | 35000 | 7 | 4 |
Infrastructure | 6 | 55000 | 1 | 1 |
Infrastructure | 7 | 50000 | 3 | 2 |
Data Analyst | 9 | 55000 | 1 | 1 |
Data Analyst | 8 | 50000 | 3 | 2 |
Data Analyst | 10 | 30000 | 9 | 6 |
Rank will skip the rows if two salaries are the same. For instance, in the above case if you see it has skipped rank 2 because the first salary which is the highest has repeated twice.
7. CROSS JOIN:
Consider a table state Channels where each row is a state and total hours watched per week for a channel.
Table : Channels
State | Total_hours_watched |
TN | 12000 |
KL | 14000 |
AP | 5000 |
KA | 5200 |
TA | 4000 |
MH | 3500 |
Question:
Write a query to get the pairs of states with total channel hours within 2000 of each other.
For the snippet above, we would want to see something like (we need all the list this is just a sample)
State A | State B |
TN | KL |
KL | TN |
Solution:
SELECT a.state as State_A, b.state as State_B FROM Channels a CROSS JOIN Channels b WHERE ABS(a. Total_hours_watched - b. Total_hours_watched) < 2000 AND a.state <> b.state ;
Note: In the above query if we want to eliminate the duplicates (i.e) TN and KL should appear only once in that case a minor tweak is required.
SELECT a.state as State_A, b.state as State_B FROM Channels a CROSS JOIN Channels b WHERE ABS(a. Total_hours_watched - b. Total_hours_watched) < 2000 AND a.state > b.state
8. USE OF CASE AND PIVOT STATEMENT:
PIVOT is usually used to convert rows to columns and vice versa. If you also want to do a mapping with respect to month wise, we can make use of pivot.
Table: orders
order_id | customer_ref | product_id |
100001 | SMITHA | 10 |
100002 | SMITHA | 20 |
100003 | ANDREW | 30 |
100004 | ANDREW | 40 |
100005 | JAMES | 10 |
100006 | JAMES | 20 |
100007 | SMITHA | 20 |
100008 | SMITHA | 10 |
100009 | SMITHA | 20 |
If we want to look at cross tabulation as in what is the number of times each products are bought by the customer.
Solution:
SELECT * FROM ( SELECT customer_ref, product_id FROM orders ) PIVOT ( COUNT(product_id) FOR product_id IN (10, 20, 30) ) ORDER BY customer_ref;
Output:
customer_ref | 10 | 20 | 30 |
ANDREW | 0 | 0 | 1 |
JAMES | 1 | 1 | 0 |
SMITHA | 2 | 3 | 0 |
CASE Statements:
There are some exceptional cases where CASE statements can be used as a substitute to PIVOT as well.
Let us try to look at the example:
If you want to update or display certain values or update columns based on certain conditions we can leverage CASE statements.
Table: Products
Product_ID | Product_Name | Amount |
1 | PC | 30000 |
2 | Iphone | 50000 |
3 | Moto | 12000 |
4 | Speaker | 1200 |
5 | Book | 500 |
Say if the ask is to find to label the items as luxury,economy,mass based on the amount range say
0-100 – Cheap
101-1500 – Mass
1501-30000-economy
>30000 as Luxury
Then, CASE statement would come in handy.
Solution:
SELECT Product_ID, Product_Name, Amount , CASE WHEN Amount between 0 and 100 THEN 'Cheap' WHEN Amount between 101 and 1500 THEN 'Mass' WHEN Amount between 1501 and 30000 THEN 'Economy' WHEN Amount >30000 THEN 'Luxury' ELSE 'Not Available' END product_group FROM products ;
One advise here would be CASE always works in first come first in basis – Meaning (If a conditions satisfies for a row it will not go for next one )
Supposing in the above say if we have amount = 0 as ‘NA’ but in your CASE statement if the first condition is Amount <=100 and last condition is amount = 0. It will take up first and will ignore the last.
9. SQL-NULL Values:
With respect to NULL values always remember if you are going to check for a condition in a column which has null values then it should be of the form ‘…WHERE COLUMN_NAME IS NULL’
One more point to note here is that NULL is never equal to NULL***.
10. GROUP BY AND HAVING clause:
Say you have a table named Employees
Table Name: Employee
EmpID | Department | Salary |
1 | IT | 55000 |
2 | IT | 60000 |
3 | IT | 40000 |
4 | Sales | 35000 |
5 | Sales | 44000 |
6 | IT | 35000 |
If our ask is to find out the number of people with respect to each department who are getting salary greater than or equal to 40000 and having more than 2 person (in nos) in the respective department.
Solution:
SELECT department, COUNT(*) AS "Number of employees" FROM employees WHERE salary >= 40000 GROUP BY department HAVING COUNT(*) > 2;
Output:
Department | Number of employees |
IT | 3 |
The above GROUP BY can also be used to find out aggregate functions (such as SUM,MIN,MAX) etc.
One pro tip here would be – Always it is a good practice to write to the columns selected in SELECT clause to incorporate that in the GROUP BY clause (though in some SQL IDEs it will accept other columns) but it is ideal to have the columns that are provided in SELECT clause.
If you are going to use an AGGREGATE function then GROUP is recommended most of the times.
WHERE Clause | HAVING Clause |
It is used prior to GROUP BY | It is used after GROUP BY |
Mostly if you want a pre-filter go for WHERE clause | If you want a post-filter after GROUP BY go for HAVING |
It is not generalized to GROUP BY | It works only with GROUP BY |
The above differences can be related to query used before.
11. LIMIT clause:
As the name suggests, if you want to LIMIT the result that is displayed in the SQL result. You can make use of this LIMIT functionality.
This LIMIT is not supported by all SQL versions in some of them it may not work.
Lets say you have products table as shown below
Table Name: Products
ORDER_DATE | PRODUCT_ID | QTY |
9/25/2010 | 1000 | 20 |
9/26/2010 | 2000 | 25 |
9/27/2010 | 1000 | 90 |
9/28/2010 | 2000 | 100 |
9/29/2010 | 2000 | 28 |
9/30/2010 | 1000 | 35 |
Our ask is to find out the products where the qty is greater than 25 and to display only top3 items.
Solution:
SELECT product_id, qty FROM products WHERE qty > 25 ORDER BY qty DESC LIMIT 3;
Output: It will display only top 3 rows
PRODUCT_ID | QTY |
2000 | 100 |
1000 | 90 |
1000 | 35 |
To Generalize the GROUP BY, HAVING and LIMIT clause,
I have portrayed a picture to depict as how you should ideally look at them. Note that the following is just for your better understanding. In real-time along with the following the execution steps goes through a series of process but here we are just outlaying the overall idea.
12. Difference between PRIMARY and UNIQUE KEY:
PRIMARY KEY | UNIQUE KEY |
The main purpose of the primary key is to provide a means to identify each record in the table. | Unique can be used to ensure rows are unique with respect to the particular column |
There can be only one primary key for a table | There can be multiple UNIQUE keys for a table |
It should always be NOT NULL and should not get repeated | Column can contain NULL |
It results in CLSUTERED index by default. | It is used when we want our columns to be unique |
13. Use of DISTINCT and ORDER BY:
If we want to display only unique values for a column, we use DISTINCT.
Consider the table films:
ID | LENGTH | FILM |
1 | 220 | FilmName1 |
2 | 220 | FilmName2 |
3 | 230 | FilmName3 |
4 | 220 | FilmName4 |
5 | 240 | FilmName5 |
6 | 230 | FilmName6 |
7 | 120 | FilmName7 |
If we want to find out unique length of films then,
Solution:
SELECT DISTINCT length FROM films;
LENGTH |
220 |
230 |
240 |
120 |
It shows the unique value in the table.
14. Use of functions such as SUBSTRING, LENGTH, UPPER, LOWER:
Subbstr() is used to extract a certain number of letters from the column.
For example:
It starts with 1 as index:
Substr(column_name,start_position,length)
SUBSTR(‘Great Lakes’, 1, 5)
Result: Great
SUBSTR(‘Great Lakes’, 7, 2)
Result: La
UPPER and LOWER: These are used to convert column names into capital letters or lower case.
15. Difference between EXISTS and IN:
EXISTS clause | IN clause |
The EXISTS operator stops scanning rows once the subquery returns the first row because it can determine the result | The IN operator must scan all rows returned by the subquery to conclude the result |
IN clause can’t compare anything with NULL values | EXISTS clause can compare everything with NULL values |
The EXISTS operator is faster than IN operator when the result set of the subquery is large | The IN operator is faster than EXISTS operator when the result set of the subquery is small |
16. Query to find monthly change in a column from table A
Select month, Column_A, Column_A - LAG(Column_A,1) Over (Order By month) As Month_on_Month_change From table_X ;
17. Query to find Running total by Business Units for a specific company
Select year, Business_Unit, Amount_utilized, Sum(Amount_utilized) Over (Partition By Business_Unit Order by Year) As RunningTotal From Table_companies;
Inspired by Eric Weber.
Some Pro tips for Interviews:
- Understand the problem clearly, determine the main objective of the problem, try to visualize the output once.
- Understand the ER diagram provided and see how the mapping is done.
- Try to identify the type of JOIN you want to use (INNER, CROSS, LEFT, RIGHT). If it is a self—join problem, handle them carefully.
- If it is on the spot problem – try to verbalize your assumptions and by clarifying the same with the interviewer as well so that in case if you see some difference – you can correct it.
- Most interviewers will be looking upon the logic you are going to use. Be precise and clear of what is expected from the question asked.
- Practice is the only key for any language and with respect to SQL – it is 100% true.
- Try to do some exercise from hacker rank.
- Lastly with respect to SQL there can be multiple approaches to solve a problem, just see to it that the end results are met (not only for specific cases but also for some corner cases as well).
I would like to conclude by saying that–
“The difference between ordinary and extraordinary is practice” –Vladimir Horowitz
This brings us to the end of the blog on DMBS Interview Questions. We hope that these DBMS Interview Questions help you. If you wish to learn more such concepts, you can join Great Learning Academy’s Free Online Courses and learn the most in-demand skills today.
1