SQL SERVER – Introduction to CUME_DIST – Analytic Functions Introduced in SQL Server 2012

SQL SERVER - Introduction to CUME_DIST - Analytic Functions Introduced in SQL Server 2012 TSQL2sDay This blog post is written in response to the T-SQL Tuesday post of Prox ‘n’ Funx. This is a very interesting subject. By the way Brad Schulz is my favorite guy when it is about blogging. I respect him as well learn a lot from him. Everybody is writing something new his subject, I decided to start SQL Server 2012 analytic functions series.

SQL Server 2012 introduces new analytical function CUME_DIST(). This function provides cumulative distribution value. It will be very difficult to explain this in words so I will attempt small example to explain you this function. Instead of creating new table, I will be using AdventureWorks sample database as most of the developer uses that for experiment.

Let us fun following query.
USE AdventureWorks
GO
SELECT SalesOrderID, OrderQty,
CUME_DIST() OVER(ORDER BY SalesOrderID) AS CDist
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY CDist DESC
GO

Above query will give us following result.

SQL SERVER - Introduction to CUME_DIST - Analytic Functions Introduced in SQL Server 2012 cumedist1

Now let us understand what is the formula behind CUME_DIST and why the values in SalesOrderID = 43670 are 1.

SQL SERVER - Introduction to CUME_DIST - Analytic Functions Introduced in SQL Server 2012 cumedist3

Let us take more example and be clear about why the values in SalesOrderID = 43667 are 0.5.

SQL SERVER - Introduction to CUME_DIST - Analytic Functions Introduced in SQL Server 2012 cumedist4

Now let us enhence the same example and use PARTITION BY into the OVER clause and see the results. Run following query in SQL Server 2012.

USE AdventureWorks
GO
SELECT SalesOrderID, OrderQty, ProductID,
CUME_DIST() OVER(PARTITION BY SalesOrderID
ORDER BY ProductID ) AS CDist
FROM Sales.SalesOrderDetail s
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY s.SalesOrderID DESC, CDist DESC
GO

Now let us see the result of this query. We are have changed the ORDER BY clause as well partitioning by SalesOrderID.

SQL SERVER - Introduction to CUME_DIST - Analytic Functions Introduced in SQL Server 2012 cumedist5

You can see that CUME_DIST() function provides us different results. Additionally now we see value 1 multiple times. As we are using partitioning for each group of SalesOrderID we get the CUME_DIST() value.

SQL SERVER - Introduction to CUME_DIST - Analytic Functions Introduced in SQL Server 2012 cumedist6

CUME_DIST() was long awaited Analytical function and I am glad to see it in SQL Server 2012.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Function, SQL Scripts
Previous Post
SQL SERVER – Video – Performance Improvement in Columnstore Index
Next Post
SQLAuthority News – Updates on Contests, Books and SQL Server

Related Posts

8 Comments. Leave new

  • Good to see Analytical function within T-SQL itself.

    Reply
  • Chirag Satasiya
    November 9, 2011 2:18 pm

    Hi Pinal sir,

    Thank you for the SQL Server 2012 Updates.

    Regard$
    Chirag Satasiya

    Reply
  • Nice Works From You Pinal,

    Reply
  • deepti ramdas
    July 16, 2013 3:24 pm

    One can easily understand the way u have explained here. Thanks.

    Reply
  • I understood the explanation without the PARTITION BY clause, and can easily see how the cumulative distribution value was calculated, however the author has made no attempt to explain the code using partition by. Could you please explain how the values were obtained when group by is used.

    Reply
    • Notice ‘ORDER BY ProductID’ in “CUME_DIST() OVER(PARTITION BY SalesOrderID ORDER BY ProductID)”

      So, Let’s consider Product id 773 in Group1.

      Now, let’s answer some questions:
      1. How many rows have ProductID = 773 in ‘Group1’?
      Ans. 1
      2. How many rows are below the ProductID ‘773’ in ‘Group1’?
      Ans. 2
      3. Total count of rows in ‘Group1’?
      Ans. 4

      So, let’s put all the values in the formula mentioned by Pinal,
      (1+2)/4
      which equals 0.75

      Hope it helps.. :)

      Reply
  • Can any one please explain real life examples where cume_dist function can be used?

    Reply
  • Great explaination. The only site I’ve found this morning with this much clarity.

    Reply

Leave a Reply