From last several days I am working on various Denali Analytical functions and it is indeed really fun to refresh the concept which I studied in the school.
Earlier I wrote article where I explained how we can use PERCENTILE_CONT() to find median over here SQL SERVER – Introduction to PERCENTILE_CONT() – Analytic Functions Introduced in SQL Server 2012. Today I am going to ask question based on the same blog post. Again just like last time the intention of this puzzle is as following:
- Learn new concept of SQL Server 2012
- Learn new concept of SQL Server 2012 even if you are on earlier version of SQL Server.
On another note, SQL Server 2012 RC0 has been announced and available to download SQL SERVER – 2012 RC0 Various Resources and Downloads.
Now let’s have fun following query:
USE AdventureWorks
GO
SELECT SalesOrderID, OrderQty, ProductID,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ProductID)
OVER (PARTITION BY SalesOrderID) AS MedianCont
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC
GO
The above query will give us the following result:
The reason we get median is because we are passing value .05 to PERCENTILE_COUNT() function. Now run read the puzzle.
Puzzle:
Run following T-SQL code:
USE AdventureWorks
GO
SELECT SalesOrderID, OrderQty, ProductID,
PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY ProductID)
OVER (PARTITION BY SalesOrderID) AS MedianCont
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC
GO
Observe the result and you will notice that MidianCont has different value than before, the reason is PERCENTILE_CONT function has 0.9 value passed. For first four value the value is 775.1.
Now run following T-SQL code:
USE AdventureWorks
GO
SELECT SalesOrderID, OrderQty, ProductID,
PERCENTILE_CONT(0.1) WITHIN GROUP (ORDER BY ProductID)
OVER (PARTITION BY SalesOrderID) AS MedianCont
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC
GO
Observe the result and you will notice that MidianCont has different value than before, the reason is PERCENTILE_CONT function has 0.1 value passed. For first four value the value is 709.3.
Now in my example I have explained how the median is found using this function. You have to explain using mathematics and explain (in easy words) why the value in last columns are 709.3 and 775.1
Hint:
Rules
- Leave a comment with your detailed answer by Nov 25′s blog post.
- Open world-wide (where Amazon ships books)
- If you blog about puzzle’s solution and if you win, you win additional surprise gift as well.
Prizes
Print copy of my new book SQL Server Interview Questions Amazon|Flipkart
If you already have this book, you can opt for any of my other books SQL Wait Stats [Amazon|Flipkart|Kindle] and SQL Programming [Amazon|Flipkart|Kindle].
Reference:Â Pinal Dave (https://blog.sqlauthority.com)
9 Comments. Leave new
This is very much like the post I did on percentile a while back.
Essentially the idea is:
Four items, means three ‘gaps’ between them. .9 * 3 = 2.7, so the answer is 70% of the way between items 3 and 4. Or… (776-773) * .7 + 773 = 775.1
If you want .1, then it’s .1 * 3 = 0.3, therefore 30% of the way between items 1 and 2. Or… (710-709) * .3 + 709 = 709.3.
The answer is:
(2*776)+773/3
(2*709)+710/3
Thanks.
percentile value = 0.90
Number of rows in group=4
RN=1+(P*(N-1))=1+(0.9*(4-1))=3.7
CEILING(RN)=4
FLOOR(RN)=3
if(CRN=FRN=RN) then
(value of expression from row at RN)
otherwise
(CRN – RN) * (value of expression for row at FRN) +
(RN – FRN) * (value of expression for row at CRN)
As CRN != FRN != RN
(CRN – RN) * (value of expression for row at FRN) +
(RN – FRN) * (value of expression for row at CRN)
i.e (4-3.7)*(773)+(3.7-3)*(776)=775.1
percentile value = 0.10
Number of rows in group=4
RN=1+(P*(N-1))=1+(0.1*(4-1))=1.3
CEILING(RN)=2
FLOOR(RN)=1
if(CRN=FRN=RN) then
(value of expression from row at RN)
otherwise
(CRN – RN) * (value of expression for row at FRN) +
(RN – FRN) * (value of expression for row at CRN)
(2-1.3)*(709)+(1.3-1)*(710)=709.3
Population
709
710
773
776
Formulas:
RN = (1+ (P*(N-1))
CRN = CEILING(RN) and FRN = FLOOR(RN)
Value = (CRN – RN) * (value of expression for row at FRN) +
(RN – FRN) * (value of expression for row at CRN)
Percentile 0,1 0,9
RN (1+(0,1*(4-1)))=1,3 (1+(0,9*(4-1)=3,7
CRN = 2 CRN = 4
FRN = 1 FRN = 3
Value for 0,1 (2-1,3)*(709)+(1,3-1)*710= 709,3
Value for 0,9 (4-3,7)*773+(3,7-3)*776=775,1
Basically we find the row number that is of interest for us, 1,3 and 3,7 respectively.
Then, we compute the value by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN) using following formula:
Value = (CRN – RN) * (value of expression for row at FRN) +
(RN – FRN) * (value of expression for row at CRN)
This was an interesting exercise.
Hi Sir,
Formula for the result set is as explained below :
The data set to be observed is :
X1-709
X2-710
X3-773
X4-776
1. For First snap shot with “PERCENTILE_CONT(0.9)”
n = 4
p = 0.9
where n = total number of observation
and p = percentile value
(n-1)p = (4-1)*0.9 = 2.7 = j + g = 2.0 + 0.7
where j = integer part of (n-1)*p
and g = fraction part of (n-1)*p
if g = 0 then percentile value = X(j+1)
if g > 0 then percentile value = X(j+1) + g(X(j+2) – X(j+1))
now for the above
0.7 > 0 therefore the formula = X(2+1) + 0.7(X(2+2) – X(2+1))
= 773 + 0.7(776-773)
= 773 + 2.1
= 775.1
2. Similarly for second snap shot with “PERCENTILE_CONT(0.1)”
n = 4
p = 0.1
(n-1)p = (4-1)*0.1 = 0.3 = 0 + 0.3
0.3 > 0 therefore the formula = X(0+1) + 0.3(X(0+2) – X(0+1))
= 709 + 0.3(710-709)
= 709 + 0.3
= 709.3
The formula is “Empirical Distribution Function – Interpolation”
amazing blog today sir as you said “it is indeed really fun to refresh the concept which I studied in the school”
Thanks and Regards,
P.Anish Shenoy
The result of PERCENTILE_CONT is computed by linear interpolation between values after ordering them. Using the percentile value (P) and the number of rows (N) in the aggregation group, we compute the row number we are interested in after ordering the rows with respect to the sort specification. This row number (RN) is computed according to the formula RN = (1+ (P*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN).
The final result will be:
If (CRN = FRN = RN) then the result is
(value of expression from row at RN)
Otherwise the result is
(CRN – RN) * (value of expression for row at FRN) +
(RN – FRN) * (value of expression for row at CRN)
— PERCENTILE_CONT function has 0.1 value
—-(ORDER BY ProductID)OVER (PARTITION BY SalesOrderID)
—-709,710,773,776
declare @p float = 0.1
declare @n int = 4
DECLARE @RN FLOAT
select @RN=(1+ (@p*(@n-1)))–1.3
SELECT @RN
DECLARE @CRN float
select @CRN=CEILING(@RN)–2.0
select @CRN
DECLARE @FRN float
select @FRN=FLOOR(@RN)–1.0
select @FRN
— value at row 1 = 709,value at row 2 = 710
SELECT (@CRN-@RN) *(709) + (@RN-@FRN) *(710)
GO
— @crn-@rn = 2.0 – 1.3 = 0.7
— @rn-@Frn = 1.3 – 1.0 = 0.3
— ((0.7)*(709)) + ((0.3)*(710))
— 709.3
–=========================================================
— PERCENTILE_CONT function has 0.9 value
—-(ORDER BY ProductID)OVER (PARTITION BY SalesOrderID)
—-709,710,773,776
declare @p float = 0.9
declare @n int = 4
DECLARE @RN FLOAT
select @RN=(1+ (@p*(@n-1)))–3.7
SELECT @RN
DECLARE @CRN float
select @CRN=CEILING(@RN)–4.0
select @CRN
DECLARE @FRN float
select @FRN=FLOOR(@RN)–3.0
select @FRN
— value at row 3 = 773,value at row 4 = 776
SELECT (@CRN-@RN) *(773) + (@RN-@FRN) *(776)
— @CRN-@RN = 4.0 – 3.7 = 0.3
— @rn-@Frn = 3.7 – 3.0 = 0.7
— ((0.3)*(773)) + ((0.7)*(776))
— 775.1
The result of PERCENTILE_CONT is computed by linear interpolation between values after ordering them. Using the percentile value (P) and the number of rows (N) in the aggregation group, we compute the row number we are interested in after ordering the rows with respect to the sort specification. This row number (RN) is computed according to the formula RN = (1+ (P*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN).
The final result will be:
If (CRN = FRN = RN) then the result is
(value of expression from row at RN)
Otherwise the result is
(CRN – RN) * (value of expression for row at FRN) +
(RN – FRN) * (value of expression for row at CRN)
/*
=========================================================
PERCENTILE_CONT function has 0.9 value
(ORDER BY ProductID)OVER (PARTITION BY SalesOrderID)
709,710,773,776
*/
declare @p float = 0.9
declare @n int = 4
DECLARE @RN FLOAT
select @RN=(1+ (@p*(@n-1))) /* 3.7 */
SELECT @RN
DECLARE @CRN float
select @CRN=CEILING(@RN) /* 4.0 */
select @CRN
DECLARE @FRN float
select @FRN=FLOOR(@RN) /* 3.0 */
select @FRN
/* value at row 3 = 773,value at row 4 = 776 */
SELECT (@CRN-@RN) *(773) + (@RN-@FRN) *(776)
/*
@CRN-@RN = 4.0 – 3.7 = 0.3
@rn-@Frn = 3.7 – 3.0 = 0.7
((0.3)*(773)) + ((0.7)*(776))
775.1
*/
GO
/*
=========================================================
PERCENTILE_CONT function has 0.1 value
(ORDER BY ProductID)OVER (PARTITION BY SalesOrderID)
709,710,773,776
*/
declare @p float = 0.1
declare @n int = 4
DECLARE @RN FLOAT
select @RN=(1+ (@p*(@n-1))) /* 1.3 */
SELECT @RN
DECLARE @CRN float
select @CRN=CEILING(@RN) /* 2.0 */
select @CRN
DECLARE @FRN float
select @FRN=FLOOR(@RN) /* 1.0 */
select @FRN
/* value at row 1 = 709,value at row 2 = 710 */
SELECT (@CRN-@RN) *(709) + (@RN-@FRN) *(710)
/*
@crn-@rn = 2.0 – 1.3 = 0.7
@rn-@Frn = 1.3 – 1.0 = 0.3
((0.7)*(709)) + ((0.3)*(710))
709.3
*/
using interpolation
y2 = [(x2 – x1) (y3 – y1)/(x3-x1)] + y1