# SQL SERVER – Puzzle to Win Print Book – Explain Value of PERCENTILE_CONT() Using Simple Example

From last several days I am working on various Denali Analytical functions and it is indeed really fun to refresh the concept which I studied in the school.

Earlier I wrote article where I explained how we can use PERCENTILE_CONT() to find median over here SQL SERVER – Introduction to PERCENTILE_CONT() – Analytic Functions Introduced in SQL Server 2012. Today I am going to ask question based on the same blog post. Again just like last time the intention of this puzzle is as following:

1. Learn new concept of SQL Server 2012
2. Learn new concept of SQL Server 2012 even if you are on earlier version of SQL Server.

On another note, SQL Server 2012 RC0 has been announced and available to download SQL SERVER – 2012 RC0 Various Resources and Downloads.

Now let’s have fun following query:

```USE AdventureWorks GO SELECT SalesOrderID, OrderQty, ProductID, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ProductID) OVER (PARTITION BY SalesOrderID) AS MedianCont FROM Sales.SalesOrderDetail WHERE SalesOrderID IN (43670, 43669, 43667, 43663) ORDER BY SalesOrderID DESC GO ```

The above query will give us the following result:

The reason we get median is because we are passing value .05 to PERCENTILE_COUNT() function. Now run read the puzzle.

## Puzzle:

Run following T-SQL code:

```USE AdventureWorks GO SELECT SalesOrderID, OrderQty, ProductID, PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY ProductID) OVER (PARTITION BY SalesOrderID) AS MedianCont FROM Sales.SalesOrderDetail WHERE SalesOrderID IN (43670, 43669, 43667, 43663) ORDER BY SalesOrderID DESC GO ```
Observe the result and you will notice that MidianCont has different value than before, the reason is PERCENTILE_CONT function has 0.9 value passed. For first four value the value is 775.1.

Now run following T-SQL code:
```USE AdventureWorks GO SELECT SalesOrderID, OrderQty, ProductID, PERCENTILE_CONT(0.1) WITHIN GROUP (ORDER BY ProductID) OVER (PARTITION BY SalesOrderID) AS MedianCont FROM Sales.SalesOrderDetail WHERE SalesOrderID IN (43670, 43669, 43667, 43663) ORDER BY SalesOrderID DESC GO ```

Observe the result and you will notice that MidianCont has different value than before, the reason is PERCENTILE_CONT function has 0.1 value passed. For first four value the value is 709.3.

Now in my example I have explained how the median is found using this function. You have to explain using mathematics and explain (in easy words) why the value in last columns are 709.3 and 775.1

## Rules

• Open world-wide (where Amazon ships books)
• If you blog about puzzle’s solution and if you win, you win additional surprise gift as well.

## Prizes

Print copy of my new book SQL Server Interview Questions Amazon|Flipkart

If you already have this book, you can opt for any of my other books SQL Wait Stats [Amazon|Flipkart|Kindle] and SQL Programming [Amazon|Flipkart|Kindle].

Reference: Pinal Dave (https://blog.sqlauthority.com)

## How to Find Table Cardinality from the Execution Plan? – Interview Question of the Week #213

• This is very much like the post I did on percentile a while back.

Essentially the idea is:

Four items, means three ‘gaps’ between them. .9 * 3 = 2.7, so the answer is 70% of the way between items 3 and 4. Or… (776-773) * .7 + 773 = 775.1

If you want .1, then it’s .1 * 3 = 0.3, therefore 30% of the way between items 1 and 2. Or… (710-709) * .3 + 709 = 709.3.

• There is already one valid and correct answer which I have kept hidden. However, let me give you hint – it is from Rob Farley. See if you can additionally use this hint to find the answer.

(2*776)+773/3
(2*709)+710/3

Thanks.

• percentile value = 0.90
Number of rows in group=4
RN=1+(P*(N-1))=1+(0.9*(4-1))=3.7
CEILING(RN)=4
FLOOR(RN)=3

if(CRN=FRN=RN) then
(value of expression from row at RN)
otherwise
(CRN – RN) * (value of expression for row at FRN) +
(RN – FRN) * (value of expression for row at CRN)

As CRN != FRN != RN
(CRN – RN) * (value of expression for row at FRN) +
(RN – FRN) * (value of expression for row at CRN)

i.e (4-3.7)*(773)+(3.7-3)*(776)=775.1

percentile value = 0.10
Number of rows in group=4
RN=1+(P*(N-1))=1+(0.1*(4-1))=1.3
CEILING(RN)=2
FLOOR(RN)=1

if(CRN=FRN=RN) then
(value of expression from row at RN)
otherwise
(CRN – RN) * (value of expression for row at FRN) +
(RN – FRN) * (value of expression for row at CRN)

(2-1.3)*(709)+(1.3-1)*(710)=709.3

• Population

709
710
773
776

Formulas:
RN = (1+ (P*(N-1))

CRN = CEILING(RN) and FRN = FLOOR(RN)

Value = (CRN – RN) * (value of expression for row at FRN) +
(RN – FRN) * (value of expression for row at CRN)

Percentile 0,1 0,9

RN (1+(0,1*(4-1)))=1,3 (1+(0,9*(4-1)=3,7

CRN = 2 CRN = 4
FRN = 1 FRN = 3

Value for 0,1 (2-1,3)*(709)+(1,3-1)*710= 709,3
Value for 0,9 (4-3,7)*773+(3,7-3)*776=775,1

Basically we find the row number that is of interest for us, 1,3 and 3,7 respectively.
Then, we compute the value by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN) using following formula:

Value = (CRN – RN) * (value of expression for row at FRN) +
(RN – FRN) * (value of expression for row at CRN)

This was an interesting exercise.

• P.Anish Shenoy
November 21, 2011 1:29 pm

Hi Sir,

Formula for the result set is as explained below :

The data set to be observed is :

X1-709
X2-710
X3-773
X4-776

1. For First snap shot with “PERCENTILE_CONT(0.9)”

n = 4
p = 0.9

where n = total number of observation
and p = percentile value

(n-1)p = (4-1)*0.9 = 2.7 = j + g = 2.0 + 0.7

where j = integer part of (n-1)*p
and g = fraction part of (n-1)*p

if g = 0 then percentile value = X(j+1)
if g > 0 then percentile value = X(j+1) + g(X(j+2) – X(j+1))

now for the above

0.7 > 0 therefore the formula = X(2+1) + 0.7(X(2+2) – X(2+1))

= 773 + 0.7(776-773)
= 773 + 2.1
= 775.1

2. Similarly for second snap shot with “PERCENTILE_CONT(0.1)”

n = 4
p = 0.1

(n-1)p = (4-1)*0.1 = 0.3 = 0 + 0.3

0.3 > 0 therefore the formula = X(0+1) + 0.3(X(0+2) – X(0+1))

= 709 + 0.3(710-709)
= 709 + 0.3
= 709.3

The formula is “Empirical Distribution Function – Interpolation”

amazing blog today sir as you said “it is indeed really fun to refresh the concept which I studied in the school”

Thanks and Regards,
P.Anish Shenoy

• The result of PERCENTILE_CONT is computed by linear interpolation between values after ordering them. Using the percentile value (P) and the number of rows (N) in the aggregation group, we compute the row number we are interested in after ordering the rows with respect to the sort specification. This row number (RN) is computed according to the formula RN = (1+ (P*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN).

The final result will be:
If (CRN = FRN = RN) then the result is
(value of expression from row at RN)
Otherwise the result is
(CRN – RN) * (value of expression for row at FRN) +
(RN – FRN) * (value of expression for row at CRN)

— PERCENTILE_CONT function has 0.1 value
—-(ORDER BY ProductID)OVER (PARTITION BY SalesOrderID)
—-709,710,773,776

declare @p float = 0.1
declare @n int = 4
DECLARE @RN FLOAT
select @RN=(1+ (@p*(@n-1)))–1.3
SELECT @RN
DECLARE @CRN float
select @CRN=CEILING(@RN)–2.0
select @CRN
DECLARE @FRN float
select @FRN=FLOOR(@RN)–1.0
select @FRN
— value at row 1 = 709,value at row 2 = 710

SELECT (@CRN-@RN) *(709) + (@RN-@FRN) *(710)
GO
— @crn-@rn = 2.0 – 1.3 = 0.7
— @rn-@Frn = 1.3 – 1.0 = 0.3
— ((0.7)*(709)) + ((0.3)*(710))
— 709.3

–=========================================================

— PERCENTILE_CONT function has 0.9 value
—-(ORDER BY ProductID)OVER (PARTITION BY SalesOrderID)
—-709,710,773,776

declare @p float = 0.9
declare @n int = 4
DECLARE @RN FLOAT
select @RN=(1+ (@p*(@n-1)))–3.7
SELECT @RN
DECLARE @CRN float
select @CRN=CEILING(@RN)–4.0
select @CRN
DECLARE @FRN float
select @FRN=FLOOR(@RN)–3.0
select @FRN

— value at row 3 = 773,value at row 4 = 776

SELECT (@CRN-@RN) *(773) + (@RN-@FRN) *(776)
— @CRN-@RN = 4.0 – 3.7 = 0.3
— @rn-@Frn = 3.7 – 3.0 = 0.7
— ((0.3)*(773)) + ((0.7)*(776))
— 775.1

• The result of PERCENTILE_CONT is computed by linear interpolation between values after ordering them. Using the percentile value (P) and the number of rows (N) in the aggregation group, we compute the row number we are interested in after ordering the rows with respect to the sort specification. This row number (RN) is computed according to the formula RN = (1+ (P*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN).

The final result will be:
If (CRN = FRN = RN) then the result is
(value of expression from row at RN)
Otherwise the result is
(CRN – RN) * (value of expression for row at FRN) +
(RN – FRN) * (value of expression for row at CRN)

/*
=========================================================
PERCENTILE_CONT function has 0.9 value
(ORDER BY ProductID)OVER (PARTITION BY SalesOrderID)
709,710,773,776
*/
declare @p float = 0.9
declare @n int = 4
DECLARE @RN FLOAT
select @RN=(1+ (@p*(@n-1))) /* 3.7 */
SELECT @RN
DECLARE @CRN float
select @CRN=CEILING(@RN) /* 4.0 */
select @CRN
DECLARE @FRN float
select @FRN=FLOOR(@RN) /* 3.0 */
select @FRN

/* value at row 3 = 773,value at row 4 = 776 */

SELECT (@CRN-@RN) *(773) + (@RN-@FRN) *(776)

/*
@CRN-@RN = 4.0 – 3.7 = 0.3
@rn-@Frn = 3.7 – 3.0 = 0.7
((0.3)*(773)) + ((0.7)*(776))
775.1
*/
GO

/*
=========================================================
PERCENTILE_CONT function has 0.1 value
(ORDER BY ProductID)OVER (PARTITION BY SalesOrderID)
709,710,773,776
*/
declare @p float = 0.1
declare @n int = 4
DECLARE @RN FLOAT
select @RN=(1+ (@p*(@n-1))) /* 1.3 */
SELECT @RN
DECLARE @CRN float
select @CRN=CEILING(@RN) /* 2.0 */
select @CRN
DECLARE @FRN float
select @FRN=FLOOR(@RN) /* 1.0 */
select @FRN

/* value at row 1 = 709,value at row 2 = 710 */

SELECT (@CRN-@RN) *(709) + (@RN-@FRN) *(710)
/*
@crn-@rn = 2.0 – 1.3 = 0.7
@rn-@Frn = 1.3 – 1.0 = 0.3
((0.7)*(709)) + ((0.3)*(710))
709.3
*/