Understanding SQL Window Function | Advanced SQL

A Deep Dive into ROW_NUMBER(), RANK(), DENSE_RANK()

Suparna Chowdhury
5 min readSep 27, 2024

ROW_NUMBER() : Assigns a unique sequential integer to each row within a partition, starting at 1 for the first row.

RANK(): It assigns the same rank to rows with identical values and leaves gaps in the ranking sequence for subsequent ranks. For example, if two rows are tied for rank 1, the next rank will be 3.

DENSE_RANK(): It assigns the same rank to tied rows and continues with the next rank in sequence without leaving any gaps. For example, if two rows are tied for rank 1, the next rank assigned will be 2

Use Cases:

ROW_NUMBER():

User’s Third Transaction [Uber SQL Interview Question]:

This question is sourced from DataLemur.

Write a query to obtain the third transaction of every user. Output the user id, spend and transaction date.

CREATE TABLE transactions (
user_id INT,
spend DECIMAL(10, 2),
transaction_date DATETIME
);

INSERT INTO transactions (user_id, spend, transaction_date) VALUES
(111, 100.50, '2022-01-08 12:00:00'),
(111, 55.00, '2022-01-10 12:00:00'),
(121, 36.00, '2022-01-18 12:00:00'),
(145, 24.99, '2022-01-26 12:00:00'),
(111, 89.60, '2022-02-05 12:00:00'),
(145, 45.30, '2022-02-28 12:00:00'),
(121, 22.20, '2022-04-01 12:00:00'),
(121, 67.90, '2022-04-03 12:00:00'),
(263, 156.00, '2022-04-11 12:00:00'),
(230, 78.30, '2022-06-14 12:00:00'),
(263, 68.12, '2022-07-11 12:00:00'),
(263, 100.00, '2022-07-12 12:00:00');

Approach to Solve:

  • We want to identify the third transaction for each user in the transactions table.
  • This requires assigning a unique row number to each transaction, grouped by user. We can use ROW_NUMBER() function to assign a sequential number to each transaction for every user, ordered by the transaction date.
  • We define a CTE named cte that retrieves all columns from the transactions table. We add a new column row_num using the ROW_NUMBER() function.
  • Then, we filter the results to select only those transactions where the row number is equal to 3.
WITH cte as(
SELECT
*,
ROW_NUMBER()
OVER(PARTITION BY user_id ORDER BY transaction_date) as row_num
FROM transactions
)
SELECT
user_id
, spend
, transaction_date
FROM cte
WHERE
row_num = 3

Rank():

Department Highest Salary:

This problem is sourced from LeetCode.com

The Employee table contains details about each employee within the organization. It includes the following columns: id, name, salary, and departmentId.

The Department table provides information about the different departments in the organization. It includes the following columns: id, name (department name).

The two tables are linked through the departmentId in the Employee table, which references the id in the Department table.

Write a solution to find employees who have the highest salary in each of the departments.


CREATE TABLE Department (
id INT PRIMARY KEY,
name VARCHAR(50)
);

CREATE TABLE EmpDetails (
id INT PRIMARY KEY,
employee VARCHAR(50),
salary INT,
departmentId INT
);

INSERT INTO Department (id, name) VALUES
(1, 'HR'),
(2, 'Sales');

INSERT INTO EmpDetails (id, employee, salary, departmentId) VALUES
(1, 'Ankit', 90000, 1),
(2, 'Serena', 85000, 1),
(3, 'Sachin', 50000, 2),
(4, 'Jenny', 70000, 2),
(5, 'Eva', 70000, 1),
(6, 'Ishan', 40000, 2),
(7, 'Leena', 75000, 2),
(8, 'Tim', 80000, 1),
(9, 'Rachel', 80000, 2),
(10, 'Suparna', 90000, 1);

Approach to Solve:

  • The goal is to find the highest earners in each department from the EmpDetails table and display their names, department names, and salaries.
  • We will use a Common Table Expression (CTE) to calculate the rankings of salaries within each department. This allows us to easily filter for the highest earners.
WITH cte AS (
SELECT *,
RANK()
OVER (PARTITION BY departmentId ORDER BY salary DESC) AS rnk
FROM Employee
)
  • We need to use the RANk() function, which assigns a rank to each employee’s salary within their department, ensuring that employees with the same salary receive the same rank. This is important because if two or more employees have the same salary, they should share the same rank. Unlike ROW_NUMBER, which would assign a unique sequential number to each row regardless of salary ties, RANK() effectively handles these duplicates, making it suitable for identifying all top earners in a department.
  • After ranking the salaries, we will filter the results to select only those employees with a rank of 1, which corresponds to the highest salary in each department.
WITH cte AS (
SELECT *,
RANK()
OVER (PARTITION BY departmentId
ORDER BY salary DESC) AS rnk
FROM EmpDetails
)

SELECT
d.name AS Department,
cte.employee AS Employee,
cte.salary AS Salary
FROM cte
JOIN Department d ON cte.departmentId = d.id
WHERE cte.rnk = 1;

DENSE_RANK():

Department Top Three Salaries:

This problem is sourced from LeetCode.com

A company wants to identify the top earners in each department. A high earner is defined as an employee whose salary ranks among the top three unique salaries within that department.

Approach to Solve:

  • We want to find the top three earners in each department, including their names, department names, and salaries. If multiple employees have the same salary, we need to ensure they get the same ranking.
  • We will use the DENSE_RANK() function. This function will rank the salaries in descending order within each department, and it will assign the same rank to employees with the same salary.
  • Since we need the department names alongside the employee details, we’ll need to join the Employee table with the Department table.

Subquery with DENSE_RANK():

We select all columns from the Employee table and add a rnk column using DENSE_RANK(), which restarts ranking for each department and sorts salaries in descending order.

SELECT *, 
DENSE_RANK() OVER (PARTITION BY departmentId ORDER BY salary DESC) AS rnk
FROM Employee

Joining with the Department Table:

We join the subquery (aliased as E) with the Department table to associate each employee with their respective department based on departmentId.

JOIN Department D ON e.departmentId = d.id

Filtering for Top 3 Earners:

WHERE e.rnk <= 3;

Complete Solution:

SELECT
d.name AS Department
, e.employee AS Employee
, e.Salary
FROM (
SELECT
employee
, salary
, departmentId,
DENSE_RANK() over(partition by departmentId order by salary desc) as rnk
from EmpDetails
) as e
JOIN Department d
ON e.departmentId = d.id
WHERE rnk <= 3;

In summary, understanding the differences between ROW_NUMBER(), RANK(), and DENSE_RANK() is essential for effective data analysis in SQL, allowing for precise ranking and ordering of results based on your specific needs.

Complete Solution in my GitHub:

  1. User’s Third Transaction (Uber)
  2. Department’s Highest Salary
  3. Department’s Top Three Salaries

--

--

Suparna Chowdhury
Suparna Chowdhury

Written by Suparna Chowdhury

Hey! I'm into all things data—data science, machine learning, SQL, and Tableau. Join me as I simplify complex ideas and explore the power of data!

No responses yet