[Note from Pinal]: In this episode of the Notes from the Field series database expert Kathi Kellenberger explains about T-SQL Over clause. Kathi is an amazing instructor, she was the SQL author I have read in my early career. The reason, I love SQL Server because her writing has instigated love for this technology in me. Today she brings a relatively interesting topic for database experts. Read the experience of Kathi in her own words.
One of the best kept secrets of T-SQL window functions, also called windowing or windowed functions, is framing. Framing was introduced with SQL Server 2012, and it allows the set of rows, or window, to be defined very granularly. For example, when calculating a running total, each row needs to see an increasing number of values to perform the calculation. Row 1 needs to see just row 1. Row 2 needs to see rows 1 and 2. Row 3 needs to see rows 1, 2, and 3. Framing allows each row to have a distinct set of rows for the window function calculation.
NOTE: In my last SQL Authority post, I explained the OVER clause. Be sure to review that article if the OVER clause is new to you.
Framing is used in a very specific set of circumstances: with accumulating window aggregates and with FIRST_VALUE and LAST_VALUE. Accumulating window aggregates are the aggregate functions you use every day along with an OVER clause containing an ORDER BY. You can calculate a running total, for example. FIRST_VALUE lets you pull in any column from the first row of the frame. LAST_VALUE lets you pull in any column from the final row of the frame. By default, if you don’t specify the frame in the OVER clause when it is supported, it will consist of the rows starting with row 1 of the partition and continue to the current row. Often, the default frame will give you the results you are looking for. By explicitly specifying the frame, you can achieve even more functionality, avoid some logical errors, and get better performance.
Framing has several keywords you should learn.
Table 1 lists each one.
|ROWS||A positional operator used to define the frame.|
|RANGE||A logical operator used to define the frame. This operator is not fully implemented as of SQL Server 2014. It is the default frame type.|
|CURRENT ROW||The row where the result of the calculation will be returned and also used as a boundary.|
|UNBOUNDED PRECEDING||The first row of the partition, often used as the lower boundary.|
|UNBOUNDED FOLLOWING||The final row of the partition, often used as the upper boundary.|
|# PRECEDING||A number of rows before the current row, often used as the lower boundary. This is only supported with ROWS.|
|# FOLLOWING||A number of rows following the current row, often used as the upper boundary. This is only supported with ROWS.|
|BETWEEN||Used to connect the two boundaries.|
Table 1: Framing terms
Here is the default frame, when one isn’t specified:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
This means that the rows in the window will consist of the first row of the partition and continue to the current row. This works for calculating a running total as long as the ORDER BY columns found in the OVER CLAUSE are unique. If not, because RANGE evaluates the data logically, you can get results that don’t make sense.
When I speak on this topic, I tell my audience they need to learn three things: Use ROWS. Use ROWS. And use ROWS. Instead of relying on the default frame, to calculate a running total, use this frame:
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
If that is too much to type, you can also abbreviate it like this:
ROWS UNBOUNDED PRECEDING
If you would like to calculate a three month moving average, for example, use this frame:
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
I’ll save the performance differences for another post, but I would like to show you some logical problems found by leaving out the frame. This example runs in the AdventureWorks database. The ORDER BY column is not unique, and I have chosen a customer with multiple orders on the same date.
SELECT CustomerID, SalesOrderID, CAST(OrderDate AS Date) AS OrderDate, TotalDue,
SUM(TotalDue) OVER(ORDER BY OrderDate
ROWS UNBOUNDED PRECEDING) AS RunningTotal,
SUM(TotalDue) OVER(ORDER BY OrderDate) AS DefFrameRunningTotal,
FIRST_VALUE(SalesOrderID) OVER(ORDER BY OrderDate) AS FirstOrder,
LAST_VALUE(SalesOrderID) OVER(ORDER BY OrderDate
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS LastOrder,
LAST_VALUE(SalesOrderID) OVER(ORDER BY OrderDate) AS DefFrameLastOrder
WHERE CustomerID = 29586
ORDER BY OrderDate;
Here are the results:
Take a look at the RunningTotal column. The value increases until you get to the final row. The correct frame is specified the OVER clause. Take a look at the values in DefFrameRunningTotal. Notice that, instead of continually increasing, it repeats values. This problem is due to the default frame, using RANGE, which looks at the value of OrderDate instead of just the position of the row. If the OrderDate values are the same, then they belong together in the same window when RANGE is used. You can solve this problem by using ROWS or by making sure that the ORDER BY expression in the OVER clause is unique. (Use ROWS. Use ROWS. And use ROWS.)
Take a look at the FirstOrder column. In each case, the value returned came from the first row of the partition. Even without specifying the frame, it works correctly. The LastOrder column looks correct as well. Notice that I specified a frame with ROWS that starts with the current row and goes up to the end of the partition. When leaving out the frame as shown in DefFrameLastOrder, the results don’t make much sense at all. Remember, the default frame only goes up to the current row. But since RANGE is a logical operator, it actually will go to a row with the same ORDER BY value as the current row. By default, the frame doesn’t go all the way to the end of the partition where the last value can be found. To correctly use LAST_VALUE, you must specify the frame, and you should use ROWS.
Table 2 lists commonly used frames:
|ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW||Start at row 1 of the partition and include rows up to the current row.|
|ROWS UNBOUNDED PRECEDING||Start at row 1 of the partition and include rows up to the current row.|
|ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING||Start at the current row and include rows up to the end of the partition.|
|ROWS BETWEEN N PRECEDING AND CURRENT ROW.||Start at a specified number of rows before the current row and include rows up to the current row.|
|ROWS BETWEEN CURRENT ROW AND N FOLLOWING||Start at the current row and include rows up to a specified number of rows following the current row.|
|ROWS BETWEEN N PRECEDING AND N FOLLOWING||Start at a specified number of rows before the current row and include a specified number of rows following the current row. Yes, the current row is also included!|
Table 2: Commonly used framing syntax
Specifying the correct frame is really important to ensure that you achieve the expected results. There are also some important performance differences which I’ll talk about in my next SQL Authority post.
If you would like to learn more about T-SQL window functions, be sure to check out my latest book Expert T-SQL Window Functions in SQL Server or my Pluralsight course T-SQL Window Functions.
If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.
Reference: Pinal Dave (https://blog.sqlauthority.com)