SQL SERVER – Introduction to PERCENTILE_DISC() – Analytic Functions Introduced in SQL Server 2012

SQL Server 2012 introduces new analytical function PERCENTILE_DISC().

The book online gives following definition of this function: Computes a specific percentile for sorted values in an entire rowset or within distinct partitions of a rowset in Microsoft SQL Server 2012 Release Candidate 0 (RC 0). For a given percentile value P, PERCENTILE_DISC sorts the values of the expression in the ORDER BY clause and returns the value with the smallest CUME_DIST value (with respect to the same sort specification) that is greater than or equal to P.

If you are clear with understanding of the function – no need to read further. If you got lost here is the same in simple words – find value of the column which is equal or more than CUME_DIST.

Before you continue reading this blog I strongly suggest you read about CUME_DIST function over here Introduction to CUME_DIST – Analytic Functions Introduced in SQL Server 2012.

Now let’s have fun following query:

USE AdventureWorks
GO
SELECT SalesOrderID, OrderQty, ProductID,
CUME_DIST() OVER(PARTITION BY SalesOrderID
ORDER BY ProductID ) AS CDist,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY ProductID)
OVER (PARTITION BY SalesOrderID) AS PercentileDisc
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC
GO

The above query will give us the following result:

You can see that I have used PERCENTILE_DISC(0.5) in query, which is similar to finding median but not exactly. PERCENTILE_DISC() function takes a percentile as a passing parameters. It returns the value as answer which value is equal or great to the percentile value which is passed into the example. For example in above example we are passing 0.5 into the PERCENTILE_DISC() function. It will go through the resultset and identify which rows has values which are equal to or great than 0.5. In first example it found two rows which are equal to 0.5 and the value of ProductID of that row is the answer of PERCENTILE_DISC(). In some third windowed resultset there is only single row with the CUME_DIST() value as 1 and that is for sure higher than 0.5 making it as a answer.

To make sure that we are clear with this example properly. Here is one more example where I am passing 0.6 as a percentile.

Now let’s have fun following query:
USE AdventureWorks
GO
SELECT SalesOrderID, OrderQty, ProductID,
CUME_DIST() OVER(PARTITION BY SalesOrderID
ORDER BY ProductID ) AS CDist,
PERCENTILE_DISC(0.6) WITHIN GROUP (ORDER BY ProductID)
OVER (PARTITION BY SalesOrderID) AS PercentileDisc
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC
GO

The above query will give us the following result:

The result of the PERCENTILE_DISC(0.6) is ProductID of which CUME_DIST() is more than 0.6. This means for SalesOrderID 43670 has row with CUME_DIST() 0.75 is the qualified row, resulting answer 773 for ProductID.

I hope this explanation makes it further clear.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

About these ads

2 thoughts on “SQL SERVER – Introduction to PERCENTILE_DISC() – Analytic Functions Introduced in SQL Server 2012

  1. Pingback: SQL SERVER – 2012 – Summary of All the Analytic Functions – MSDN and SQLAuthority « Journey to SQLAuthority

  2. Pingback: SQL SERVER – Weekly Series – Memory Lane – #004 « SQL Server Journey with SQL Authority

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s