SQL SERVER Challenge – Cumulative Total Calculation

Today we will see a very interesting SQL SERVER Challenge about Cumulative Total Calculation. During a recent Comprehensive Database Performance Health Check, I was asked if I knew the most efficient ways to solve one problem. While the problem was not directly related to SQL Server, I thought it would be interesting for all of you to explore the same.

SQL SERVER Challenge - Cumulative Total Calculation CumulativeTotalCalculation-800x576

Cumulative Total Calculation Challenge

Here’s the T-SQL code to generate the Orders table:

CREATE TABLE Orders (
    OrderID INT,
    OrderDate DATE,
    Total DECIMAL(10, 2)
);
INSERT INTO Orders (OrderID, OrderDate, Total)
VALUES
    (1, '2023-05-01', 100),
    (2, '2023-05-02', 150),
    (3, '2023-05-03', 200),
    (4, '2023-05-04', 120),
    (5, '2023-05-05', 180);

A select to this table would return following result:

Orders Table
+---------+------------+-------+
| OrderID | OrderDate | Total |
+---------+------------+-------+
| 1 | 2023-05-01 | 100 |
| 2 | 2023-05-02 | 150 |
| 3 | 2023-05-03 | 200 |
| 4 | 2023-05-04 | 120 |
| 5 | 2023-05-05 | 180 |
+---------+------------+-------+

The challenge is to write a T-SQL query to calculate the cumulative total of the Total column, ordered by the OrderDate. The result should include all the rows from the original table, along with an additional column called CumulativeTotal that represents the cumulative sum of the Total column up to that row.

Expected output:

+---------+------------+-------+----------------+
| OrderID | OrderDate | Total | CumulativeTotal|
+---------+------------+-------+----------------+
| 1 | 2023-05-01 | 100 | 100 |
| 2 | 2023-05-02 | 150 | 250 |
| 3 | 2023-05-03 | 200 | 450 |
| 4 | 2023-05-04 | 120 | 570 |
| 5 | 2023-05-05 | 180 | 750 |
+---------+------------+-------+----------------+

Well, now I am going to give you 3 potential solutions, but before that, you should give it a try. Trust me, the solutions are not that difficult at all.

Solution 1: Using a Subquery

SELECT 
    OrderID,
    OrderDate,
    Total,
    (SELECT SUM(Total) FROM Orders o2 
WHERE o2.OrderDate <= o1.OrderDate) AS CumulativeTotal
FROM
    Orders o1
ORDER BY
    OrderDate

Solution 2: Using the SUM() Window Function

SELECT 
    OrderID,
    OrderDate,
    Total,
    SUM(Total) OVER (ORDER BY OrderDate ROWS 
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CumulativeTotal
FROM
    Orders
ORDER BY
    OrderDate

Solution 3: Using a Recursive CTE (Common Table Expression)

WITH CumulativeCTE AS (
    SELECT
        OrderID,
        OrderDate,
        Total,
        ROW_NUMBER() OVER (ORDER BY OrderDate) AS RowNum
    FROM
        Orders
), RecursiveCTE AS (
    SELECT
        OrderID,
        OrderDate,
        Total,
        RowNum,
        CAST(Total AS DECIMAL(10, 2)) AS CumulativeTotal 
    FROM
        CumulativeCTE
    WHERE
        RowNum = 1
    UNION ALL
    SELECT
        c.OrderID,
        c.OrderDate,
        c.Total,
        c.RowNum,
        CAST(r.CumulativeTotal + c.Total AS DECIMAL(10, 2)) 
    FROM
        CumulativeCTE c
    INNER JOIN
        RecursiveCTE r ON c.RowNum = r.RowNum + 1
)
SELECT
    OrderID,
    OrderDate,
    Total,
    CumulativeTotal
FROM
    RecursiveCTE
ORDER BY
    OrderDate;

Efficiency

Among the three solutions, Solution 2 (using the SUM() window function) is the most efficient. It leverages the built-in window function, which is specifically designed for calculating aggregations over a window of rows. The window function performs the cumulative sum calculation in a single pass through the data, avoiding the need for subqueries or recursion.

Solution 1 (using a subquery) requires a subquery to calculate the sum for each row, resulting in multiple queries being executed, which can be slower for large datasets.

Solution 3 (using a recursive CTE) involves recursion and multiple self-joins, which can be computationally expensive and slower compared to the other solutions.

Therefore, Solution 2 is the recommended and most efficient approach for calculating the cumulative total in T-SQL.

If you have any alternative solutions, please post them in the comments. There is one method which I have not included here where I have used the LAG() function. Let us see if you can come up with that solution.

If you want to discuss more about it, you can reach out to me via Twitter.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

Challenge, CTE, Ranking Functions, SQL Scripts, SQL Sub Query
Previous Post
SQL SERVER – Count Occurrences of a Specific Value in a Column
Next Post
SQL Server Deadlocks – Interview Q and A

Related Posts

2 Comments. Leave new

  • If you use the sub query and have more than one order per day (likely) the result won’t be correct, would it? You’d have to add a time stamp but even that could not be accurate on a busy system with multiple transactions per second.
    Thanks for showing how to use the window function!

    Reply
  • Sanjay Monpara
    June 16, 2023 12:32 pm

    —-Using self join—-
    —————————

    SELECT
    o1.OrderID,
    o1.OrderDate,
    o1.Total,
    SUM(o2.Total) AS CumulativeTotal
    FROM Orders o1
    left join Orders o2 on o2.orderdate <= o1.orderdate
    group by o1.OrderID,
    o1.OrderDate,
    o1.Total
    ORDER BY o1.OrderDate

    Reply

Leave a Reply