Today we will see a very interesting SQL SERVER Challenge about Cumulative Total Calculation. During a recent Comprehensive Database Performance Health Check, I was asked if I knew the most efficient ways to solve one problem. While the problem was not directly related to SQL Server, I thought it would be interesting for all of you to explore the same.
Cumulative Total Calculation Challenge
Here’s the T-SQL code to generate the Orders
table:
CREATE TABLE Orders ( OrderID INT, OrderDate DATE, Total DECIMAL(10, 2) ); INSERT INTO Orders (OrderID, OrderDate, Total) VALUES (1, '2023-05-01', 100), (2, '2023-05-02', 150), (3, '2023-05-03', 200), (4, '2023-05-04', 120), (5, '2023-05-05', 180);
A select to this table would return following result:
Orders Table
+---------+------------+-------+
| OrderID | OrderDate | Total |
+---------+------------+-------+
| 1 | 2023-05-01 | 100 |
| 2 | 2023-05-02 | 150 |
| 3 | 2023-05-03 | 200 |
| 4 | 2023-05-04 | 120 |
| 5 | 2023-05-05 | 180 |
+---------+------------+-------+
The challenge is to write a T-SQL query to calculate the cumulative total of the Total
column, ordered by the OrderDate
. The result should include all the rows from the original table, along with an additional column called CumulativeTotal
that represents the cumulative sum of the Total
column up to that row.
Expected output:
+---------+------------+-------+----------------+
| OrderID | OrderDate | Total | CumulativeTotal|
+---------+------------+-------+----------------+
| 1 | 2023-05-01 | 100 | 100 |
| 2 | 2023-05-02 | 150 | 250 |
| 3 | 2023-05-03 | 200 | 450 |
| 4 | 2023-05-04 | 120 | 570 |
| 5 | 2023-05-05 | 180 | 750 |
+---------+------------+-------+----------------+
Well, now I am going to give you 3 potential solutions, but before that, you should give it a try. Trust me, the solutions are not that difficult at all.
Solution 1: Using a Subquery
SELECT OrderID, OrderDate, Total, (SELECT SUM(Total) FROM Orders o2 WHERE o2.OrderDate <= o1.OrderDate) AS CumulativeTotal FROM Orders o1 ORDER BY OrderDate
Solution 2: Using the SUM() Window Function
SELECT OrderID, OrderDate, Total, SUM(Total) OVER (ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CumulativeTotal FROM Orders ORDER BY OrderDate
Solution 3: Using a Recursive CTE (Common Table Expression)
WITH CumulativeCTE AS ( SELECT OrderID, OrderDate, Total, ROW_NUMBER() OVER (ORDER BY OrderDate) AS RowNum FROM Orders ), RecursiveCTE AS ( SELECT OrderID, OrderDate, Total, RowNum, CAST(Total AS DECIMAL(10, 2)) AS CumulativeTotal FROM CumulativeCTE WHERE RowNum = 1 UNION ALL SELECT c.OrderID, c.OrderDate, c.Total, c.RowNum, CAST(r.CumulativeTotal + c.Total AS DECIMAL(10, 2)) FROM CumulativeCTE c INNER JOIN RecursiveCTE r ON c.RowNum = r.RowNum + 1 ) SELECT OrderID, OrderDate, Total, CumulativeTotal FROM RecursiveCTE ORDER BY OrderDate;
Efficiency
Among the three solutions, Solution 2 (using the SUM() window function) is the most efficient. It leverages the built-in window function, which is specifically designed for calculating aggregations over a window of rows. The window function performs the cumulative sum calculation in a single pass through the data, avoiding the need for subqueries or recursion.
Solution 1 (using a subquery) requires a subquery to calculate the sum for each row, resulting in multiple queries being executed, which can be slower for large datasets.
Solution 3 (using a recursive CTE) involves recursion and multiple self-joins, which can be computationally expensive and slower compared to the other solutions.
Therefore, Solution 2 is the recommended and most efficient approach for calculating the cumulative total in T-SQL.
If you have any alternative solutions, please post them in the comments. There is one method which I have not included here where I have used the LAG() function. Let us see if you can come up with that solution.
If you want to discuss more about it, you can reach out to me via Twitter.
Reference:Â Pinal Dave (http://blog.SQLAuthority.com)
2 Comments. Leave new
If you use the sub query and have more than one order per day (likely) the result won’t be correct, would it? You’d have to add a time stamp but even that could not be accurate on a busy system with multiple transactions per second.
Thanks for showing how to use the window function!
—-Using self join—-
—————————
SELECT
o1.OrderID,
o1.OrderDate,
o1.Total,
SUM(o2.Total) AS CumulativeTotal
FROM Orders o1
left join Orders o2 on o2.orderdate <= o1.orderdate
group by o1.OrderID,
o1.OrderDate,
o1.Total
ORDER BY o1.OrderDate