dbms interview questions

Contributed by: Nallaperumal

Some of the topics that can be concentrated while attending DBMS interview questions are depicted below. These are some of the essential questions that you come across in any assignments or if they give some problem to solve on the spot.

We will see the topics and then jump on to the problem to have a quick look at some examples.

  1. Self-Join
  2. Aggregate based on dates (Month, year, week)
  3. Cumulative sum & rolling averages
  4. Multiple join conditions
  5. Use of Analytical function
  6. Use of Window functions
  7. Cross join
  8. Use of CASE and PIVOT statements (Display rows as columns vice versa)
  9. SQL – NULL Values
  10. GROUP BY and HAVING clause (Difference between Group by and Having)
  11. Use of LIMIT Clause
  12. UNIQUE and PRIMARY Key difference
  13. Use of DISTINCT and ORDER BY
  14. Use of functions such as SUBSTRING, LENGTH, UPPER, LOWER
  15. Conditions like EXISTS and IN

Before jumping into the above topics I would like to give a quick overview on the topic CTE (Common Table Expression). This can be used as an alternative to a subquery or if you want to combine multiple tables with some unique conditions this can be leveraged. It even works well for UPDATE, DELETE and INSERT. If you also want to go for scenarios where it is recursive this can be utilized. It acts as a temporary table.

CTE (Common Table Expressions)

It starts with a WITH clause followed by a temporary table name.

With [CTE_NAME] AS
(
CTE QUERY
)

CTE NAME – temporary table name

CTE Query – Query that we are going to use

Table: Employee_details

EmpIDEmp_Name City
1JacobChennai
2UtkarshBangalore
3Ritesh KumarHyd
4KananKochi
5MadhanChennai

If we want to display only the Emp Name and EmpID from the above table corresponding to city Chennai then,

Solution:

WITH
CTE AS (SELECT Emp_Name, EmpID  FROM employee WHERE city = ‘Chennai’)
select * from CTE;

Output: Here point to note is that whatever you have selected inside the CTE query only those columns will be accessible.

Emp Name EmpID
Jacob1
Madhan5

We can also add more tables in the CTE by giving a comma and naming the next temporary tablename.

1. SELF-JOIN: (Manager /Employee hierarchy)

Let say the query is to find out the list of Manager hierarchy for the employee Madhan.

Table name:  Employee

EmpIDEmp_Name Manager_ID
1JacobNULL
2Utkarsh6
3Ritesh Kumar1
4Kanan2
5Madhan3
6NaveenNULL

Solution:

Our thought process should be Madhan->Ritesh Kumar->Jacob 

The above all rows should be displayed. We will use a CTE here (Note the following is based on Oracle SQL) slight change of syntax might occur in other backend utilities.

WITH EMP_CTE_MNG AS
(
SELECT EmpID, Emp_Name,manager_ID FROM Employee WHERE ManagerId IS NULL
),
EMP_LOW_LEVEL(
SELECT T.EmpID,T.Emp_Name , T.manager_ID FROM Employee T
INNER JOIN Employee C ON C.EmployeeId=T.ManagerId
And T.EmpID = 5 )
SELECT a.EmpID,a.Emp_Name,a.manager_ID from  EMP_CTE_MNG 
union all 
SELECT b.EmpID,b.Emp_Name,b.manager_ID from EMP_LOW_LEVEL

Output:

EmpIDEmp_Name Manager_ID
1JacobNULL
3Ritesh Kumar1
5Madhan3

This type of problem is called as recursive problem statement.

2. AGGREGATE BASED ON DATES:

Let us look at a problem where we have a pictures table and say we want to display the total number of pictures sold with respect to each month and yearly wise then

Table: Pictures

Date_createdNo_of_pictures_Sold
1/1/2020200
2/2/2020400
2/3/2020500
3/4/2020800
3/5/20201200
12/5/20195000

Solution:

select extract(year from date_created) as year, extract(month from date_created) as month,
 sum(No_of_pictures_Sold) as Total_pictures_sold
from Pictures
group by extract(year from date_created), extract(month from date_created)
order by year,month asc;





Output:

Year MonthTotal_pictures_sold
2019125000
20201200
20202900
202032000

3. CUMMULATIVE SUM and ROLLING AVERAGE:

Write a query for the following table to find out the cumulative amount for each day.

Table: Amounts

DateAmount
1/1/2020100
1/2/2020-100
1/3/2020200
1/4/2020300
1/5/2020500

Solution: 

Here we can solve using two ways. One main advantage of SQL is that we can solve the problem with many solutions but you have to get the problem right.

Solution a:

SELECT a.date date, SUM(b.amount) as cummulative_amount FROM amounts a
JOIN amounts b 
ON a.date >= b.date
GROUP BY a.date 
ORDER BY date ASC





Solution b: (using window functions)

SELECT date, 
 SUM(amount)OVER (ORDER BY date ASC)
 as cumulative_amount 
FROM Amounts
ORDER BY date ASC


Output:

Datecummulative_amount 
1/1/2020100
1/2/20200
1/3/2020200
1/4/2020500
1/5/20201000

ROLLING AVERAGE:

The following problem is an adaptation from https://www.sisense.com/blog/rolling-average/

Write a query to get 7-day rolling (preceding) average of daily Sales amount.

Table: Sales

DateSales_Amount
1/1/202020
1/2/202030
1/3/202040
1/4/202050
1/5/202060
1/6/202070
1/7/202080
1/8/202090
1/9/2020100
1/10/2020110

Solution:

SELECT a.date, AVG(b.Sales_Amount) Avg_Sales_Amount 
FROM  Sales a 
INNER JOIN Sales b ON a.date <= b.date + interval '6 days' 
AND a.date >= b.date
GROUP BY  a.date

The above solution can also be handled using analytical functions in oracle such as preceding and current rows. Say if we have month and  amount if we want to find out moving average for 3 months then,

MonthAmountMoving_Avg
12000.00
23000.00
3400300.00
4500400.00
5700533.33
6800666.67
71200900.00
8300766.67
9400633.33
10500400.00
11900600.00
121000800.00

To get the moving average for the above we will use the following query:

SELECT month, SUM(amount) AS month_amount,
AVG(SUM(amount)) OVER
 (ORDER BY month ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS moving_average
FROM  sales
GROUP BY month
ORDER BY month;

4. USE OF ANALYTICAL FUNCTIONS:

There are many analytical functions which we use but here we are going to look upon two functions LEAD and LAG.

Let us see with an example supposing if we want to see the order date and when a previous order was made for a particular order with respect to same product we can leverage by making use of LAG function.

Table : Products

ORDER_DATEPRODUCT_IDQTY
9/25/2010100020
9/26/2010200025
9/27/2010100090
9/28/20102000100
9/29/2010200020
9/30/201010005

Solution:

SELECT product_id, order_date,
LAG (order_date,1) OVER (PARTITION BY product_id ORDER BY order_date) AS prev_order_date
FROM orders;

Output:

PRODUCT_IDORDER_DATEPREV_ORDER_DATE
10009/25/2010NULL
10009/27/20109/25/2010
10009/30/20109/27/2010
20009/26/2010NULL
20009/28/20109/26/2010
20009/29/20109/28/2010

For the same problem if we need to look at next order date then,

SELECT product_id, order_date,
LEAD (order_date,1) OVER (PARTITION BY product_id ORDER BY order_date) AS next_order_date
FROM orders;

Output:

PRODUCT_IDORDER_DATENEXT_ORDER_DATE
10009/25/20109/27/2010
10009/27/20109/30/2010
10009/30/2010NULL
20009/26/20109/28/2010
20009/28/20109/29/2010
20009/29/2010NULL

6. USE OF WINDOW FUNCTIONS:

Let us say we have a Salaries table – If we need to find out the top 3rd salary from the table

Table : Salaries

Dept NoempIDsalary
IT Engineer150000
IT Engineer445000
IT Engineer330000
IT Engineer210000
IT Engineer535000
Infrastructure655000
Infrastructure750000
Data Analyst955000
Data Analyst850000
Data Analyst1030000

Solution:

WITH sal_rank AS 
 (SELECT empID, 
    DENSE_RANK() OVER(ORDER BY salary DESC) Dense_Rnk  FROM  salaries)
SELECT empID FROM  sal_rank
WHERE   Dense_Rnk = 3;

Here, I would suggest using DENSE_RANK() , if we go by RANK() it will skip ranks if the values are the same that is why dense_rank is recommended here. Let us look at the output of rank and dense_rank as well.

Dept NoempIDsalaryRankDense Rank
IT Engineer15000032
IT Engineer44500063
IT Engineer33000096
IT Engineer210000107
IT Engineer53500074
Infrastructure65500011
Infrastructure75000032
Data Analyst95500011
Data Analyst85000032
Data Analyst103000096

Rank will skip the rows if two salaries are the same. For instance, in the above case if you see it has skipped rank 2 because the first salary which is the highest has repeated twice.

7. CROSS JOIN:

Consider a table state Channels where each row is a state and total hours watched per week for a channel.

Table : Channels

StateTotal_hours_watched
TN12000
KL14000
AP5000
KA5200
TA4000
MH3500

Question: 

Write a query to get the pairs of states with total channel hours within 2000 of each other. 

For the snippet above, we would want to see something like (we need all the list this is just a sample)

State AState B
TNKL
KLTN

Solution:

SELECT
 a.state as State_A, 
 b.state as State_B 
 FROM    Channels  a
 CROSS JOIN Channels  b 
 WHERE ABS(a. Total_hours_watched - b. Total_hours_watched) < 2000
  AND a.state <> b.state ;

Note: In the above query if we want to eliminate the duplicates (i.e) TN and KL should appear only once in that case a minor tweak is required.

  SELECT
 a.state as State_A, 
 b.state as State_B 
 FROM    Channels  a
 CROSS JOIN Channels  b 
 WHERE ABS(a. Total_hours_watched - b. Total_hours_watched) < 2000
  AND a.state > b.state


8. USE OF CASE AND PIVOT STATEMENT:

PIVOT is usually used to convert rows to columns and vice versa. If you also want to do a mapping with respect to month wise, we can make use of pivot. 

Oracle PIVOT
Image credits:  Oracletutorial.com

Table: orders

order_idcustomer_refproduct_id
100001SMITHA10
100002SMITHA20
100003ANDREW30
100004ANDREW40
100005JAMES10
100006JAMES20
100007SMITHA20
100008SMITHA10
100009SMITHA20

If we want to look at cross tabulation as in what is the number of times each products are bought by the customer.

Solution:

SELECT * FROM
(
  SELECT customer_ref, product_id
  FROM orders
)
PIVOT
(
  COUNT(product_id)
  FOR product_id IN (10, 20, 30)
)
ORDER BY customer_ref;

Output:

customer_ref102030
ANDREW001
JAMES110
SMITHA230

CASE Statements:

There are some exceptional cases where CASE statements can be used as a substitute to PIVOT as well.

Let us try to look at the example:

If you want to update or display certain values or update columns based on certain conditions we can leverage CASE statements.

Table: Products

Product_IDProduct_NameAmount
1PC30000
2Iphone50000
3Moto12000
4Speaker1200
5Book500

Say if the ask is to find to label the items as luxury,economy,mass based on the amount range say

0-100 – Cheap
101-1500 – Mass
1501-30000-economy
>30000 as Luxury 

Then, CASE statement would come in handy.

Solution:

SELECT
  Product_ID,
  Product_Name,
  Amount ,
  CASE
  WHEN Amount between 0 and 100 THEN 'Cheap'
  WHEN Amount between 101 and  1500 THEN 'Mass'
  WHEN Amount between 1501 and  30000   THEN 'Economy'
  WHEN Amount >30000  THEN 'Luxury'
  ELSE 'Not Available'
  END product_group
FROM  products ;


One advise here would be CASE always works in first come first in basis – Meaning (If a conditions satisfies for a row it will not go for next one ) 

Supposing in the above say if we have amount = 0 as ‘NA’ but in your CASE statement if the first condition is Amount <=100 and last condition is amount = 0. It will take up first and will ignore the last.

9. SQL-NULL Values:

With respect to NULL values always remember if you are going to check for a condition in a column which has null values then it should be of the form ‘…WHERE COLUMN_NAME IS NULL

One more point to note here is that NULL is never equal to NULL***.

10. GROUP BY AND HAVING clause:

Say you have a table named Employees

Table Name: Employee

EmpIDDepartmentSalary
1IT55000
2IT60000
3IT40000
4Sales35000
5Sales44000
6IT35000

If our ask is to find out the number of people with respect to each department who are getting salary greater than or equal to 40000 and having more than 2 person (in nos) in the respective department.

Solution:

SELECT department, COUNT(*) AS "Number of employees"
FROM employees
WHERE salary >= 40000
GROUP BY department
HAVING COUNT(*) > 2;

Output:

DepartmentNumber of employees
IT3

The above GROUP BY can also be used to find out aggregate functions (such as SUM,MIN,MAX) etc.

One  pro tip here would be – Always it is a good practice to write to the columns selected in SELECT clause to incorporate that in the GROUP BY clause (though in some SQL IDEs it will accept other columns) but it is ideal to have the columns that are provided in SELECT clause.

If you are going to use an AGGREGATE function then GROUP is recommended most of the times.

WHERE ClauseHAVING Clause
It is used prior to GROUP BYIt is used after GROUP BY
Mostly if you want a pre-filter go for WHERE clauseIf you want a post-filter after GROUP BY go for HAVING
It is not generalized to GROUP BYIt works only with GROUP BY

The above differences can be related to query used before.

11. LIMIT clause:

As the name suggests, if you want to LIMIT the result that is displayed in the SQL result. You can make use of this LIMIT functionality.

This LIMIT is not supported by all SQL versions in some of them it may not work.

Lets say you have products table as shown below

Table Name: Products

ORDER_DATEPRODUCT_IDQTY
9/25/2010100020
9/26/2010200025
9/27/2010100090
9/28/20102000100
9/29/2010200028
9/30/2010100035

Our ask is to find out the products where the qty is greater than 25 and to display only top3 items.

Solution:

SELECT product_id, qty
FROM products
WHERE qty > 25
ORDER BY qty DESC
LIMIT 3;

Output: It will display only top 3 rows

PRODUCT_IDQTY
2000100
100090
100035

To Generalize the GROUP BY, HAVING and LIMIT clause,

I have portrayed a picture to depict as how you should ideally look at them. Note that the following is just for your better understanding. In real-time along with the following the execution steps goes through a series of process but here we are just outlaying the overall idea.

12. Difference between PRIMARY and UNIQUE KEY:

PRIMARY KEYUNIQUE KEY
The main purpose of the primary key is to provide a means to identify each record in the table.Unique can be used to ensure rows are unique with respect to the particular column
There can be only one primary key for a tableThere can be multiple UNIQUE keys for a table
It should always be NOT NULL and should not get repeatedColumn can contain NULL
It results in CLSUTERED index by default.It is used when we want our columns to be unique

13. Use of DISTINCT and ORDER BY:

If we want to display only unique values for a column, we use DISTINCT.

Consider the table films:

IDLENGTHFILM
1220FilmName1
2220FilmName2
3230FilmName3
4220FilmName4
5240FilmName5
6230FilmName6
7120FilmName7

If we want to find out unique length of films then,

Solution:

SELECT DISTINCT length 
FROM films;

LENGTH
220
230
240
120

It shows the unique value in the table.

14. Use of functions such as SUBSTRING, LENGTH, UPPER, LOWER:

Subbstr() is used to extract a certain number of letters from the column.

For example:

It starts with 1 as index:

Substr(column_name,start_position,length)

SUBSTR(‘Great Lakes’, 1, 5)

Result: Great

SUBSTR(‘Great Lakes’, 7, 2)

Result: La

UPPER and LOWER: These are used to convert column names into capital letters or lower case.

15. Difference between EXISTS and IN:

EXISTS clauseIN clause
The EXISTS operator stops scanning rows once the subquery returns the first row because it can determine the resultThe IN operator must scan all rows returned by the subquery to conclude the result
IN clause can’t compare anything with NULL valuesEXISTS clause can compare everything with NULL values
The EXISTS operator is faster than IN operator when the result set of the subquery is largeThe IN operator is faster than EXISTS operator when the result set of the subquery is small

16. Query to find monthly change in a column from table A 

Select month,
          Column_A,
         Column_A - LAG(Column_A,1) Over (Order By month) As Month_on_Month_change
From table_X ;

17. Query to find Running total by Business Units for a specific company 

Select year,
 Business_Unit,
Amount_utilized,
Sum(Amount_utilized) 
Over (Partition By 
Business_Unit Order by 
Year) As RunningTotal
From 
Table_companies;

Inspired by Eric Weber.

Some Pro tips for Interviews:

  • Understand the problem clearly, determine the main objective of the problem, try to visualize the output once.
  • Understand the ER diagram provided and see how the mapping is done.
  • Try to identify the type of JOIN you want to use (INNER, CROSS, LEFT, RIGHT). If it is a self—join problem, handle them carefully.
  • If it is on the spot problem – try to verbalize your assumptions and by clarifying the same with the interviewer as well so that in case if you see some difference – you can correct it.
  • Most interviewers will be looking upon the logic you are going to use. Be precise and clear of what is expected from the question asked.
  • Practice is the only key for any language and with respect to SQL – it is 100% true.
  • Try to do some exercise from hacker rank.
  • Lastly with respect to SQL there can be multiple approaches to solve a problem, just see to it that the end results are met (not only for specific cases but also for some corner cases as well).

I would like to conclude by saying that– 

“The difference between ordinary and extraordinary is practice” –Vladimir Horowitz

This brings us to the end of the blog on DMBS Interview Questions. We hope that these DBMS Interview Questions help you. If you wish to learn more such concepts, you can join Great Learning Academy’s Free Online Courses and learn the most in-demand skills today.

1

LEAVE A REPLY

Please enter your comment!
Please enter your name here

thirteen − 4 =