This blog post is written in response to the T-SQL Tuesday post of Prox ‘n’ Funx. This is a very interesting subject. By the way Brad Schulz is my favorite guy when it is about blogging. I respect him as well learn a lot from him. Everybody is writing something new his subject, I decided to start SQL Server 2012 analytic functions series.
SQL Server 2012 introduces new analytical function CUME_DIST(). This function provides cumulative distribution value. It will be very difficult to explain this in words so I will attempt small example to explain you this function. Instead of creating new table, I will be using AdventureWorks sample database as most of the developer uses that for experiment.
Let us fun following query.
USE AdventureWorks
GO
SELECT SalesOrderID, OrderQty,
CUME_DIST() OVER(ORDER BY SalesOrderID) AS CDist
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY CDist DESC
GO
Above query will give us following result.
Now let us understand what is the formula behind CUME_DIST and why the values in SalesOrderID = 43670 are 1.
Let us take more example and be clear about why the values in SalesOrderID = 43667 are 0.5.
Now let us enhence the same example and use PARTITION BY into the OVER clause and see the results. Run following query in SQL Server 2012.
USE AdventureWorks
GO
SELECT SalesOrderID, OrderQty, ProductID,
CUME_DIST() OVER(PARTITION BY SalesOrderID
ORDER BY ProductID ) AS CDist
FROM Sales.SalesOrderDetail s
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY s.SalesOrderID DESC, CDist DESC
GO
Now let us see the result of this query. We are have changed the ORDER BY clause as well partitioning by SalesOrderID.
You can see that CUME_DIST() function provides us different results. Additionally now we see value 1 multiple times. As we are using partitioning for each group of SalesOrderID we get the CUME_DIST() value.
CUME_DIST() was long awaited Analytical function and I am glad to see it in SQL Server 2012.
Reference: Pinal Dave (https://blog.sqlauthority.com)
8 Comments. Leave new
Good to see Analytical function within T-SQL itself.
Hi Pinal sir,
Thank you for the SQL Server 2012 Updates.
Regard$
Chirag Satasiya
Nice Works From You Pinal,
One can easily understand the way u have explained here. Thanks.
I understood the explanation without the PARTITION BY clause, and can easily see how the cumulative distribution value was calculated, however the author has made no attempt to explain the code using partition by. Could you please explain how the values were obtained when group by is used.
Notice ‘ORDER BY ProductID’ in “CUME_DIST() OVER(PARTITION BY SalesOrderID ORDER BY ProductID)”
So, Let’s consider Product id 773 in Group1.
Now, let’s answer some questions:
1. How many rows have ProductID = 773 in ‘Group1’?
Ans. 1
2. How many rows are below the ProductID ‘773’ in ‘Group1’?
Ans. 2
3. Total count of rows in ‘Group1’?
Ans. 4
So, let’s put all the values in the formula mentioned by Pinal,
(1+2)/4
which equals 0.75
Hope it helps.. :)
Can any one please explain real life examples where cume_dist function can be used?
Great explaination. The only site I’ve found this morning with this much clarity.