SQL SERVER – Puzzle to Win Print Book – Explain Value of PERCENTILE_CONT() Using Simple Example

From last several days I am working on various Denali Analytical functions and it is indeed really fun to refresh the concept which I studied in the school.

Earlier I wrote article where I explained how we can use PERCENTILE_CONT() to find median over here SQL SERVER – Introduction to PERCENTILE_CONT() – Analytic Functions Introduced in SQL Server 2012. Today I am going to ask question based on the same blog post. Again just like last time the intention of this puzzle is as following:

  1. Learn new concept of SQL Server 2012
  2. Learn new concept of SQL Server 2012 even if you are on earlier version of SQL Server.

On another note, SQL Server 2012 RC0 has been announced and available to download SQL SERVER – 2012 RC0 Various Resources and Downloads.

Now let’s have fun following query:

USE AdventureWorks
GO
SELECT SalesOrderID, OrderQty, ProductID,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ProductID)
OVER (PARTITION BY SalesOrderID) AS MedianCont
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC
GO

The above query will give us the following result:

SQL SERVER - Puzzle to Win Print Book - Explain Value of PERCENTILE_CONT() Using Simple Example percentiledisc1

The reason we get median is because we are passing value .05 to PERCENTILE_COUNT() function. Now run read the puzzle.

Puzzle:

Run following T-SQL code:

USE AdventureWorks
GO
SELECT SalesOrderID, OrderQty, ProductID,
PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY ProductID)
OVER (PARTITION BY SalesOrderID) AS MedianCont
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC
GO

Observe the result and you will notice that MidianCont has different value than before, the reason is PERCENTILE_CONT function has 0.9 value passed. For first four value the value is 775.1.

SQL SERVER - Puzzle to Win Print Book - Explain Value of PERCENTILE_CONT() Using Simple Example median1

Now run following T-SQL code:
USE AdventureWorks
GO
SELECT SalesOrderID, OrderQty, ProductID,
PERCENTILE_CONT(0.1) WITHIN GROUP (ORDER BY ProductID)
OVER (PARTITION BY SalesOrderID) AS MedianCont
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC
GO

Observe the result and you will notice that MidianCont has different value than before, the reason is PERCENTILE_CONT function has 0.1 value passed. For first four value the value is 709.3.

SQL SERVER - Puzzle to Win Print Book - Explain Value of PERCENTILE_CONT() Using Simple Example median2

Now in my example I have explained how the median is found using this function. You have to explain using mathematics and explain (in easy words) why the value in last columns are 709.3 and 775.1

Hint:

Rules

  • Leave a comment with your detailed answer by Nov 25′s blog post.
  • Open world-wide (where Amazon ships books)
  • If you blog about puzzle’s solution and if you win, you win additional surprise gift as well.

Prizes

Print copy of my new book SQL Server Interview Questions Amazon|Flipkart

If you already have this book, you can opt for any of my other books SQL Wait Stats [Amazon|Flipkart|Kindle] and SQL Programming [Amazon|Flipkart|Kindle].

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Function, SQL Scripts
Previous Post
SQL SERVER – Introduction to PERCENTILE_CONT() – Analytic Functions Introduced in SQL Server 2012
Next Post
SQL SERVER – Introduction to PERCENTILE_DISC() – Analytic Functions Introduced in SQL Server 2012

Related Posts

9 Comments. Leave new

  • This is very much like the post I did on percentile a while back.

    Essentially the idea is:

    Four items, means three ‘gaps’ between them. .9 * 3 = 2.7, so the answer is 70% of the way between items 3 and 4. Or… (776-773) * .7 + 773 = 775.1

    If you want .1, then it’s .1 * 3 = 0.3, therefore 30% of the way between items 1 and 2. Or… (710-709) * .3 + 709 = 709.3.

    Reply
  • The answer is:
    (2*776)+773/3
    (2*709)+710/3

    Thanks.

    Reply
  • percentile value = 0.90
    Number of rows in group=4
    RN=1+(P*(N-1))=1+(0.9*(4-1))=3.7
    CEILING(RN)=4
    FLOOR(RN)=3

    if(CRN=FRN=RN) then
    (value of expression from row at RN)
    otherwise
    (CRN – RN) * (value of expression for row at FRN) +
    (RN – FRN) * (value of expression for row at CRN)

    As CRN != FRN != RN
    (CRN – RN) * (value of expression for row at FRN) +
    (RN – FRN) * (value of expression for row at CRN)

    i.e (4-3.7)*(773)+(3.7-3)*(776)=775.1

    percentile value = 0.10
    Number of rows in group=4
    RN=1+(P*(N-1))=1+(0.1*(4-1))=1.3
    CEILING(RN)=2
    FLOOR(RN)=1

    if(CRN=FRN=RN) then
    (value of expression from row at RN)
    otherwise
    (CRN – RN) * (value of expression for row at FRN) +
    (RN – FRN) * (value of expression for row at CRN)

    (2-1.3)*(709)+(1.3-1)*(710)=709.3

    Reply
  • Population

    709
    710
    773
    776

    Formulas:
    RN = (1+ (P*(N-1))

    CRN = CEILING(RN) and FRN = FLOOR(RN)

    Value = (CRN – RN) * (value of expression for row at FRN) +
    (RN – FRN) * (value of expression for row at CRN)

    Percentile 0,1 0,9

    RN (1+(0,1*(4-1)))=1,3 (1+(0,9*(4-1)=3,7

    CRN = 2 CRN = 4
    FRN = 1 FRN = 3

    Value for 0,1 (2-1,3)*(709)+(1,3-1)*710= 709,3
    Value for 0,9 (4-3,7)*773+(3,7-3)*776=775,1

    Basically we find the row number that is of interest for us, 1,3 and 3,7 respectively.
    Then, we compute the value by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN) using following formula:

    Value = (CRN – RN) * (value of expression for row at FRN) +
    (RN – FRN) * (value of expression for row at CRN)

    This was an interesting exercise.

    Reply
  • Hi Sir,

    Formula for the result set is as explained below :

    The data set to be observed is :

    X1-709
    X2-710
    X3-773
    X4-776

    1. For First snap shot with “PERCENTILE_CONT(0.9)”

    n = 4
    p = 0.9

    where n = total number of observation
    and p = percentile value

    (n-1)p = (4-1)*0.9 = 2.7 = j + g = 2.0 + 0.7

    where j = integer part of (n-1)*p
    and g = fraction part of (n-1)*p

    if g = 0 then percentile value = X(j+1)
    if g > 0 then percentile value = X(j+1) + g(X(j+2) – X(j+1))

    now for the above

    0.7 > 0 therefore the formula = X(2+1) + 0.7(X(2+2) – X(2+1))

    = 773 + 0.7(776-773)
    = 773 + 2.1
    = 775.1

    2. Similarly for second snap shot with “PERCENTILE_CONT(0.1)”

    n = 4
    p = 0.1

    (n-1)p = (4-1)*0.1 = 0.3 = 0 + 0.3

    0.3 > 0 therefore the formula = X(0+1) + 0.3(X(0+2) – X(0+1))

    = 709 + 0.3(710-709)
    = 709 + 0.3
    = 709.3

    The formula is “Empirical Distribution Function – Interpolation”

    amazing blog today sir as you said “it is indeed really fun to refresh the concept which I studied in the school”

    Thanks and Regards,
    P.Anish Shenoy

    Reply
  • The result of PERCENTILE_CONT is computed by linear interpolation between values after ordering them. Using the percentile value (P) and the number of rows (N) in the aggregation group, we compute the row number we are interested in after ordering the rows with respect to the sort specification. This row number (RN) is computed according to the formula RN = (1+ (P*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN).

    The final result will be:
    If (CRN = FRN = RN) then the result is
    (value of expression from row at RN)
    Otherwise the result is
    (CRN – RN) * (value of expression for row at FRN) +
    (RN – FRN) * (value of expression for row at CRN)

    — PERCENTILE_CONT function has 0.1 value
    —-(ORDER BY ProductID)OVER (PARTITION BY SalesOrderID)
    —-709,710,773,776

    declare @p float = 0.1
    declare @n int = 4
    DECLARE @RN FLOAT
    select @RN=(1+ (@p*(@n-1)))–1.3
    SELECT @RN
    DECLARE @CRN float
    select @CRN=CEILING(@RN)–2.0
    select @CRN
    DECLARE @FRN float
    select @FRN=FLOOR(@RN)–1.0
    select @FRN
    — value at row 1 = 709,value at row 2 = 710

    SELECT (@CRN-@RN) *(709) + (@RN-@FRN) *(710)
    GO
    — @crn-@rn = 2.0 – 1.3 = 0.7
    — @rn-@Frn = 1.3 – 1.0 = 0.3
    — ((0.7)*(709)) + ((0.3)*(710))
    — 709.3

    –=========================================================

    — PERCENTILE_CONT function has 0.9 value
    —-(ORDER BY ProductID)OVER (PARTITION BY SalesOrderID)
    —-709,710,773,776

    declare @p float = 0.9
    declare @n int = 4
    DECLARE @RN FLOAT
    select @RN=(1+ (@p*(@n-1)))–3.7
    SELECT @RN
    DECLARE @CRN float
    select @CRN=CEILING(@RN)–4.0
    select @CRN
    DECLARE @FRN float
    select @FRN=FLOOR(@RN)–3.0
    select @FRN

    — value at row 3 = 773,value at row 4 = 776

    SELECT (@CRN-@RN) *(773) + (@RN-@FRN) *(776)
    — @CRN-@RN = 4.0 – 3.7 = 0.3
    — @rn-@Frn = 3.7 – 3.0 = 0.7
    — ((0.3)*(773)) + ((0.7)*(776))
    — 775.1

    Reply
  • The result of PERCENTILE_CONT is computed by linear interpolation between values after ordering them. Using the percentile value (P) and the number of rows (N) in the aggregation group, we compute the row number we are interested in after ordering the rows with respect to the sort specification. This row number (RN) is computed according to the formula RN = (1+ (P*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN).

    The final result will be:
    If (CRN = FRN = RN) then the result is
    (value of expression from row at RN)
    Otherwise the result is
    (CRN – RN) * (value of expression for row at FRN) +
    (RN – FRN) * (value of expression for row at CRN)

    /*
    =========================================================
    PERCENTILE_CONT function has 0.9 value
    (ORDER BY ProductID)OVER (PARTITION BY SalesOrderID)
    709,710,773,776
    */
    declare @p float = 0.9
    declare @n int = 4
    DECLARE @RN FLOAT
    select @RN=(1+ (@p*(@n-1))) /* 3.7 */
    SELECT @RN
    DECLARE @CRN float
    select @CRN=CEILING(@RN) /* 4.0 */
    select @CRN
    DECLARE @FRN float
    select @FRN=FLOOR(@RN) /* 3.0 */
    select @FRN

    /* value at row 3 = 773,value at row 4 = 776 */

    SELECT (@CRN-@RN) *(773) + (@RN-@FRN) *(776)

    /*
    @CRN-@RN = 4.0 – 3.7 = 0.3
    @rn-@Frn = 3.7 – 3.0 = 0.7
    ((0.3)*(773)) + ((0.7)*(776))
    775.1
    */
    GO

    /*
    =========================================================
    PERCENTILE_CONT function has 0.1 value
    (ORDER BY ProductID)OVER (PARTITION BY SalesOrderID)
    709,710,773,776
    */
    declare @p float = 0.1
    declare @n int = 4
    DECLARE @RN FLOAT
    select @RN=(1+ (@p*(@n-1))) /* 1.3 */
    SELECT @RN
    DECLARE @CRN float
    select @CRN=CEILING(@RN) /* 2.0 */
    select @CRN
    DECLARE @FRN float
    select @FRN=FLOOR(@RN) /* 1.0 */
    select @FRN

    /* value at row 1 = 709,value at row 2 = 710 */

    SELECT (@CRN-@RN) *(709) + (@RN-@FRN) *(710)
    /*
    @crn-@rn = 2.0 – 1.3 = 0.7
    @rn-@Frn = 1.3 – 1.0 = 0.3
    ((0.7)*(709)) + ((0.3)*(710))
    709.3
    */

    Reply
  • using interpolation
    y2 = [(x2 – x1) (y3 – y1)/(x3-x1)] + y1

    Reply

Leave a Reply