[Note from Pinal]: In this episode of the Notes from the Field series database expert Kathi Kellenberger explains about T-SQL Over clause. Kathi is an amazing instructor, she was the SQL author I have read in my early career. The reason, I love SQL Server because her writing has instigated love for this technology in me. Today she brings a relatively interesting topic for database experts. Read the experience of Kathi in her own words.
The OVER clause was added to SQL Server in 2005. Yes, it’s been around for a decade, but most SQL Server professionals know very little about all the ways it can be used. Whenever you add a T-SQL window (also called windowed or windowing) function such as ROW_NUMBER to your query, you must use the OVER clause. Getting the OVER clause right is critical to getting the correct results. Window functions require the OVER clause, but there is one other situation where the OVER clause is used: with the sequence object. In this article, I’m going to stick just with the ways the OVER clause is used with window functions.
The OVER clause has three components: partitioning, ordering, and framing. Partitioning is always supported, but support for ordering and framing depends on which type of window function you are using.
The partitioning component, expressed as PARTITION BY, is optional and is supported for all types of window functions. Partitioning divides up the rows. It’s kind of like a physical window that is divided up into panes. The entire window is a window. The panes are also windows. Say that I partition by CustomerID. The partition for CustomerID 1000 is restricted to rows for CustomerID 1000. For the calculation on any given row, the window consists of rows from that partition only. This means that the ROW_NUMBER function restarts the row numbers for each partition, for example. You can specify more than one column for PARTITION BY as long as you separate the columns with commas. Partitioning is not the same as grouping with the GROUP BY clause. When you use the GROUP BY clause, only one row is returned for each group. Partitioning is just a way to divide the data in order to perform the calculations; the detail columns are returned. Figure 1 shows an example.
Figure 1: The effects of PARTITION BY
Ordering within the OVER clause is supported when the data must be sorted to perform the calculation. For example, ROW_NUMBER requires an ORDER BY expression within the OVER clause because the rows must be lined up. Actually, most of the window functions support ORDER BY. ORDER BY in the OVER clause is not supported for calculating subtotals, for example. You don’t need the data sorted to calculate a sum. Note that the ORDER BY within the OVER clause has nothing to do with an ORDER BY clause found in the query itself. Just like partitioning, you can specify a comma delimited list of columns. Figure 2 shows the results of ORDER BY in the OVER clause.
Figure 2: Using ORDER BY
The third component of the OVER CLAUSE is called framing. Framing was introduced with SQL Server 2012 and is used in only a couple of situations: accumulating window aggregates (running totals, moving averages, etc.) and the functions FIRST_VALUE and LAST_VALUE. While partitioning is similar to window panes, framing is like a stained glass window. When performing the calculations, every row might see a different set of rows through the window. For example, when calculating a running total, row 1 sees row 1. Row 2 sees rows 1 and 2. Row 3 see rows 1, 2, and 3. The syntax for framing is a bit complex, so I am going to save it for my next post. By default, the frame consists of the first row of the sorted partition and all subsequent rows up to the current row. Figure 3 is an example. The small arrows represent the current row where the calculation is being performed.
Figure 3: The results of framing
To see the OVER clause in action, run this query against the AdventureWorks database:
SELECT CustomerID, SalesOrderID, TotalDue, ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY SalesOrderID) AS RowNum, SUM(TotalDue) OVER() AS GrandTotal, SUM(TotalDue) OVER(PARTITION BY CustomerID) AS SubTotal, SUM(TotalDue) OVER(PARTITION BY CustomerID ORDER BY SalesOrderID) AS RunningTotal, LAG(SalesOrderID) OVER(PARTITION BY CustomerID ORDER BY SalesOrderID) AS PrevOrder, FIRST_VALUE(TotalDue) OVER(PARTITION BY CustomerID ORDER BY SalesOrderID) AS FirstAmt FROM Sales.SalesOrderHeader;
Next time, I’ll provide a deep dive into framing. If you would like to learn more about T-SQL window functions, be sure to check out my latest book Expert T-SQL Window Functions in SQL Server or my Pluralsight course T-SQL Window Functions.
If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.
Reference: Pinal Dave (https://blog.sqlauthority.com)
Windows Functions are the best part of SQL
I bought the book because of the post, I find it odd that Pinal always finds to post about something I’m currently researching!