SQL SERVER – Introduction to Hierarchical Query using a Recursive CTE – A Primer

SQL SERVER - Introduction to Hierarchical Query using a Recursive CTE - A Primer book2 This blog post is inspired from SQL Queries Joes 2 Pros: SQL Query Techniques For Microsoft SQL Server 2008 – SQL Exam Prep Series 70-433 – Volume 2.

[Amazon] | [Flipkart] | [Kindle] | [IndiaPlaza]


What is a Common Table Expression (CTE)

A CTE can be thought of as a temporary result set and are similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. A CTE is generally considered to be more readable than a derived table and does not require the extra effort of declaring a Temp Table while providing the same benefits to the user. However; a CTE is more powerful than a derived table as it can also be self-referencing, or even referenced multiple times in the same query.

The basic syntax structure for a CTE is shown below:
WITH MyCTE
AS ( SELECT EmpID, FirstName, LastName, ManagerID
FROM Employee
WHERE ManagerID IS NULL )
SELECT *
FROM MyCTE

Building a Recursive CTE

In the following examples, you will learn how to harness the power of a recursive CTE query by fulfilling a common business requirement, retrieving hierarchical data. By the time the final query is complete you will be able to easily determine how many levels from the top executive each employee is.

A recursive CTE requires four elements in order to work properly.

  1. Anchor query (runs once and the results ‘seed’ the Recursive query)
  2. Recursive query (runs multiple times and is the criteria for the remaining results)
  3. UNION ALL statement to bind the Anchor and Recursive queries together.
  4. INNER JOIN statement to bind the Recursive query to the results of the CTE.

WITH MyCTE
AS ( SELECT EmpID, FirstName, LastName, ManagerID
FROM Employee
WHERE ManagerID IS NULL
UNION ALL
SELECT EmpID, FirstName, LastName, ManagerID
FROM Employee
INNER JOIN MyCTE ON Employee.ManagerID = MyCTE.EmpID
WHERE Employee.ManagerID IS NOT NULL )
SELECT *
FROM MyCTE

Identify the Anchor and Recursive Query

Anyone who does not have a boss is considered to be at the top level of the company and everyone who does have a boss either works for the person(s) at the top level (upper management), or the people that work for them (mid-management thru base employees).

Solarwinds

For example, a CEO is at the top level and thus has a ManagerID of null. Likewise, everyone below the CEO will have a ManagerID. This is demonstrated in the two queries below:

SQL SERVER - Introduction to Hierarchical Query using a Recursive CTE - A Primer j2p-day2-image-1

The first SELECT statement will become your Anchor query as it will find the employee that has a ManagerID of null (representing Level 1 of the organization). The second SELECT statement will become your Recursive query and it will find all employees that do have a ManagerID (representing Level 2-3 of this organization).

As you can see from the results so far, these queries are unable to give hierarchical data on which level each employee is at within the organization.

Add the Anchor and Recursive query to a CTE

Begin transforming this entire query into a CTE by placing a UNION ALL statement between the Anchor and Recursive queries. Now add parentheses around the entire query, indenting it, moving it down, and adding the declaration WITH EmployeeList AS before the open parenthesis, and then add SELECT * FROM EmployeeList on the next line after the close parenthesis.

Your query should now look like the screenshot below:

SQL SERVER - Introduction to Hierarchical Query using a Recursive CTE - A Primer j2p-day2-image-2

As you can see, the results from your CTE are exactly the same as the results returned from running the anchor and Recursive queries simultaneously in the previous example.

Add an expression to track hierarchical level

The Anchor query (aliased as ‘Boss’) inside the CTE represents everyone at Level 1 (i.e. Sally Smith). The Recursive query (aliased as ‘Emp’) represents everyone at Levels 2 and 3. In order to visualize each level in a result set, you will need to add an expression field to each query.

Add the expression “1 AS EmpLevel” to the Anchor query and the expression “2 AS EmpLevel” to the Recursive query. Before executing the entire query, look closely at the expression field. The EmpLevel expressions in the Anchor query will hard-code the numeral 1 (for Sally Smith’s level), while the EmpLevel expressions in the Recursive query will hard-code the numeral 2 for everyone else.

Your query should now look like the screenshot below:

SQL SERVER - Introduction to Hierarchical Query using a Recursive CTE - A Primer j2p-day2-image-3

The two new expression fields were a helpful step. In fact, they show the correct EmpLevel information for Sally Smith and for the people at Level 2 (i.e., Adams, Bender, Brown, Kennson, Lonning and Osako). However, the 2 is just a hard-coded placeholder to help visualize your next step. Lisa Kendall and several other employees need to be at Level 3.

Ideally you would like to make the expression dynamic by replacing “2 AS EmpLevel” with the expression “EmpLevel + 1”.

Add a self-referencing INNER JOIN statement

Let’s take a moment and recognise why this is not going to work quite so simply. The idea to increment EmpLevel in the recursive query of the CTE is on the right track. Unfortunately, the recursive query is trying to reference a field called EmpLevel but can’t find one, since it has only been materialized in the result set of the Anchor query and does not yet exist in the recursive set.

How can you materialize the EmpLevel field for the recursive query? We can use the CTE for this! Remember, a recursive CTE requires an INNER JOIN to connect the recursive query to the CTE itself. Go ahead and write an INNER JOIN statement binding the recursive query ‘Emp’ to the CTE ‘EmployeeList AS EL’ ON Emp.ManagerID = EL.EmpID.

Your query should now look like the screenshot below:

SQL SERVER - Introduction to Hierarchical Query using a Recursive CTE - A Primer j2p-day2-image-4

Success! You can now see that Sally is at the first level, Alex is at the second level and Lisa appears at the third level. Since a CTE can reference itself, the ‘Emp’ recursive query can now access the EmpLevel field materialized in the EmployeeList CTE.


This blog post is inspired from SQL Queries Joes 2 Pros: SQL Query Techniques For Microsoft SQL Server 2008 – SQL Exam Prep Series 70-433 – Volume 2.

[Amazon] | [Flipkart] | [Kindle] | [IndiaPlaza]

Reference: Pinal Dave (https://blog.sqlauthority.com)

Solarwinds
Previous Post
SQL SERVER – Introduction to SQL Server Security – A Primer
Next Post
SQL SERVER – Introduction to Basics of a Query Hint – A Primer

Related Posts

70 Comments. Leave new

  • An expert solution. Thank you!

    Reply
  • Dear Pinal,

    I am facing a performance challenge using CTE.
    Let me explain the scenario.

    I am calling in a SP using .Net program’s in line query.
    in this SP first we are extracting data from XML which are of size minimum 500 KB to 4 MB each approximately 30,000 files. later we are using concept of CTE to make records unique.
    but here performance issue comes. some times it takes 30-40 minutes to execute the main SP. I just want to know where i made mistake. Ur help will be really appreciated.

    Here i am sending 2 SPs (Main and CTE one)

    CREATE PROCEDURE [dbo].[usp_Read_BSXML] (
    @XmlDocumentBS xml,
    @CIN varchar(200),
    @FindateID int,
    @prm_error_message VARCHAR(250) = ” OUTPUT
    )
    AS
    DECLARE @XmlDocumentHandle int

    BEGIN

    BEGIN TRANSACTION
    BEGIN TRY

    /*– Create an internal representation of the XML document.*/
    EXEC sp_xml_preparedocument @XmlDocumentHandle OUTPUT, @XmlDocumentBS, ”

    — Insert record into XBRL_BS_TMP table
    INSERT INTO XBRL_BS_TMP (CIN,contextRef, NodeName, NodeValue, NodeType, FinDate,FindateId)
    SELECT @CIN as CIN,
    Cast(c.n.value(‘@contextRef’ ,’varchar(1000)’) as varchar(1000)) as contextRef,
    Cast(n.value(‘local-name(.)’,’varchar(4000)’) as varchar(4000)) as NodeName,
    Cast(c.n.value(‘.’,’varchar(MAX)’) as varchar(max)) as NodeValue ,
    cast(n.query(‘namespace-uri(.)’) as varchar(200)) as NodeType,
    substring(cast(n.query(‘namespace-uri(.)’) as varchar(100)),35,10) as FinDate,
    @FindateID as FindateId
    from @XmlDocumentBS.nodes(‘//*’) as C(N)
    Where c.n.value(‘@contextRef’ ,’varchar(100)’) is not null;

    — Insert record into XBRL_BS_FINDATE table
    INSERT INTO XBRL_BS_FINDATE (FindateId,CIN,contextRefId,StartDate,EndDate,InstantDate)
    SELECT @FindateID as FindateId,@CIN as CIN,
    Cast(t.n.value(‘@id’ ,’varchar(1000)’) as varchar(1000)) as contextRefId,
    Cast(n.value(‘(*:period/*:startDate/text())[1]’,’varchar(1000)’) as varchar(1000)) as StartDate,
    Cast(n.value(‘(*:period/*:endDate/text())[1]’,’varchar(1000)’) as varchar(1000)) as EndDate,
    Cast(n.value(‘(*:period/*:instant/text())[1]’,’varchar(1000)’) as varchar(1000)) as InstantDate
    from @XmlDocumentBS.nodes(‘//*:context’) as T(N)
    Where t.n.value(‘@contextRef’,’varchar(MAX)’) is null and t.n.value(‘@id’ ,’varchar(MAX)’) is not null;

    — To insert explicitMember node value and dimension attribute value
    Insert into XBRL_BS_SCENARIO_DATA(FindateId,CIN,contextRefId,dimension,explicitMember,typedMember)
    SELECT @FindateID as FindateId,@CIN as CIN,*
    from (SELECT
    context.Node.value(‘@id’, ‘varchar(1000)’) ‘contextRefId’
    ,items.Cd.value(‘@dimension’, ‘varchar(1000)’) ‘dimension’
    ,items.Cd.value(‘.’, ‘VARCHAR(1000)’) ‘explicitMember’
    ,items.Cd.value(‘.’, ‘VARCHAR(1000)’) ‘typedMember’
    FROM
    @XmlDocumentBS.nodes(‘//*:context’) context(Node)
    CROSS APPLY context.Node.nodes(‘./*:scenario/*:explicitMember’) items(Cd)
    Where context.Node.value(‘@contextRef’,’varchar(MAX)’) is null and context.Node.value(‘@id’ ,’varchar(MAX)’) is not null

    UNION ALL

    SELECT
    context.Node.value(‘@id’, ‘varchar(1000)’) ‘contextRefId’
    ,items.Cd.value(‘@dimension’, ‘varchar(1000)’) ‘dimension’
    ,items.Cd.value(‘.’, ‘VARCHAR(1000)’) ‘explicitMember’
    ,items.Cd.value(‘.’, ‘VARCHAR(1000)’) ‘typedMember’
    FROM
    @XmlDocumentBS.nodes(‘//*:context’) context(Node)
    CROSS APPLY context.Node.nodes(‘./*:scenario/*:typedMember’) items(Cd)
    Where context.Node.value(‘@contextRef’,’varchar(MAX)’) is null and context.Node.value(‘@id’ ,’varchar(MAX)’) is not null

    ) AS tbl

    EXEC usp_INSERT_ON_O_BS_DEFINITION @CIN,@FindateID;

    EXEC sp_xml_removedocument @XmlDocumentHandle OUTPUT;

    COMMIT
    END TRY

    BEGIN CATCH

    SET @prm_error_message = ERROR_MESSAGE()
    select @prm_error_message,0

    ROLLBACK
    END CATCH
    END
    GO

    =====================================================================
    usp_INSERT_ON_O_BS_DEFINITION (To make my info unique)

    USE [XBRLNew_02]
    GO

    /****** Object: StoredProcedure [dbo].[usp_INSERT_ON_O_BS_DEFINITION] Script Date: 10/06/2015 15:29:20 ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE PROCEDURE [dbo].[usp_INSERT_ON_O_BS_DEFINITION] (
    @CIN VARCHAR(200),
    @finDateId int,
    @prm_error_message VARCHAR(250) = ” OUTPUT
    )
    AS

    BEGIN

    BEGIN TRANSACTION
    BEGIN TRY

    ;WITH BS_CTE (RowNumber, CIN, contextRefId, FindateID, explicitMember, explicitMembers, dimesnsion, dimensions) AS
    (
    SELECT 1, BS_SC.CIN, BS_SC.contextRefId, BS_SC.FindateId, MIN(BS_SC.explicitMember),
    CAST(MIN(BS_SC.explicitMember) AS VARCHAR(8000)),
    MIN(BS_SC.dimension),
    CAST(MIN(BS_SC.dimension) AS VARCHAR(8000))
    FROM XBRL_BS_SCENARIO_DATA BS_SC
    WHERE BS_SC.CIN =@CIN
    and BS_SC.findateId = @findateId
    GROUP BY BS_SC.FindateId,BS_SC.CIN,BS_SC.contextRefId

    UNION ALL

    SELECT C.RowNumber + 1, B.CIN, B.contextRefId, B.FindateId, B.explicitMember, C.explicitMembers+’, ‘ + B.explicitMember,
    B.dimension, C.dimensions+’, ‘ + B.dimension
    FROM XBRL_BS_SCENARIO_DATA B, BS_CTE C
    WHERE B.CIN = C.CIN
    And B.FindateId = @findateId
    AND B.FindateId = C.FindateID AND B.contextRefId = C.contextRefId
    AND B.CIN = @CIN AND B.explicitMember > C.explicitMember)

    INSERT INTO O_XBRL_BS_DFINITION (CIN, FindateId, contextRefId, FYDate,explicitMembers,dimensions,RowNumber)
    (
    SELECT FY_Date.CIN, FY_Date.FindateId, FY_Date.contextRefId ,
    CASE WHEN FY_Date.StartDate IS NULL THEN FY_Date.InstantDate
    ELSE FY_Date.EndDate END AS FYDate,
    Scenario.explicitMembers, Scenario.dimensions, Scenario.RowNumber
    FROM
    XBRL_BS_FINDATE FY_Date LEFT OUTER JOIN
    (SELECT BS_CTE.RowNumber, BS_CTE.CIN,BS_CTE.FindateID, BS_CTE.contextRefId, BS_CTE.explicitMembers, BS_CTE.dimensions
    FROM BS_CTE ,
    (SELECT Z.CIN, Z.FindateID, Z.contextRefId, MAX(Z.RowNumber) As MaxRow
    FROM BS_CTE Z GROUP BY Z.FindateID,Z.CIN, Z.contextRefId ) AS SummaryRow
    WHERE SummaryRow.FindateID = BS_CTE.FindateID AND SummaryRow.CIN = BS_CTE.CIN AND SummaryRow.contextRefId = BS_CTE.contextRefId
    AND SummaryRow.MaxRow = BS_CTE.RowNumber) Scenario ON
    FY_Date.CIN = Scenario.CIN AND FY_Date.FindateId = Scenario.FindateID AND FY_Date.contextRefId = Scenario.contextRefId
    WHERE FY_Date.CIN = @CIN
    And FY_Date.FindateId = @findateId
    )

    COMMIT
    END TRY

    BEGIN CATCH
    SET @prm_error_message = ERROR_MESSAGE()
    select @prm_error_message,0

    ROLLBACK
    END CATCH
    END
    GO

    Reply
  • Great Article Pinal. I’m trying to get this to work where the Top Level Parents have various versions and I need to select the first one (which I can using Partition). I then need the children of just those top level parents. I correctly get those, but the second half of the Union brings in the children of all of them. Thanks. Chris.

    Reply
  • Hi, i have a data like this

    id, parent id, child id
    1,1,2
    2,1,3
    3,2,4
    4,2,5
    5,3,6
    6,3,7
    7,8,9
    8,8,10
    9,9,11
    10,9,12
    11,10,13
    12,10,14

    and I need via a query results like
    1|2
    1|2|4
    1|2|5
    1|3
    1|3|6
    1|3|7
    8|9
    8|9|11
    8|9|12
    8|10
    8|10|13
    8|10|14

    Reply
  • Thanks you Pinal, this was very helpful.

    Reply
  • Great article Pinal. I have a question. I want to add a new column called ParentId and update it with top level parent I’d (in the above example, all records except EmpId 11 would have ParentId as 11. Can you please help me to do it. (In my table there are millions of records. )

    Reply
  • How to achieve below with Common Table Expression (CTE).

    Problem –
    ParentCustomerNo ChildCustomerNo
    123456 343432
    343432 888888
    888888 999999
    999999 666666
    777777 111111
    111111 222222
    787878 898989
    898989 656565
    656565 454545

    Result –
    Child Top Parent
    222222 777777
    666666 123456
    999999 123456
    888888 123456
    454545 787878
    656565 787878
    898989 787878

    Please let me know.

    Reply
  • Is it better to use CTE with hierarchical id on the table ?
    Could you build this example with hierarchyid ?

    Reply
  • How to achieve below with CTE or other solution.

    Table with parent and child…. Wanted to retrive child and top parent.

    Table –
    ParentCustomerNo ChildCustomerNo
    123456 343432
    343432 888888
    888888 999999
    999999 666666
    777777 111111
    111111 222222
    787878 898989
    898989 656565
    656565 454545

    Result –

    Child Top_Parent
    222222 777777
    666666 123456
    999999 123456
    888888 123456
    454545 787878
    656565 787878
    898989 787878

    Reply
  • Thanks for the example. Should it work equally well when multiple records qualify as anchors – e.g. there is no CEO as such but, for example, half a dozen board members at level 0 where each board member oversees a tree structure of employees – senior managers, manager, base employee etc?

    Reply
  • Hi,
    I am facing the same scenario.
    Please let me know if you have found the resolution on the same.

    Reply
  • you are a very best!!!

    Reply
  • How can we show the data in a particular order

    Kelly Smith
    David Kenneson
    Lisa Kenson
    John
    Barabara

    Reply
  • Hi Pinal Thank you for this article. I am looking for the reverse..that is find all recursive children of a parent in the hierarchical data. So in your employee table in the example the recursive children of Sally should be Sally–>David–>Lisa,John,Barbara,Phil
    I want to get these in a table along with Employee Level .
    Sally 1
    David 4
    Lisa 3
    John 3
    Barbara 3
    Phil 3
    Thanks!

    Reply
  • Saurabh Desai
    June 7, 2017 11:11 am

    I’ve tried the exact approach as mentioned in the blog but Inner join is not working as expected. After applying inner join between recursive query and CTE it is only showing result for anchor query. Please help

    Reply
  • Subhrangsu Bhowmik
    August 4, 2017 3:08 pm

    Awesome work done

    Reply
  • Hi, Thanks for a great explanation. I realize that this is an old thread and this has been asked before, but I did not see an answer. How would I then carry down the level 1 manager to all subsequent levels of employees so that I can see whom everyone rolls up to? the assumption being that there are multiple level 1 managers and each has a chain of command.

    Reply
  • Flynthia Knox
    July 3, 2019 2:14 am

    I am using Microsoft SQL Server Management Studio 2014 to export data. Some of the company names in my database have commas in the name. I am trying to export data into a true .csv file that surrounds each column in quotes. The option within SQL Server Management Studio 2014 doesn’t work. Do you have any suggestions to get around that?

    I have also tried to create a .bat file, using the following line, but it is still not capturing the quotes around each column.

    sqlcmd -S servername -d databasename -E -i “SQL_Statement.sql” -o “OutputFile.csv” -s”,” -W

    Any help is greatly appreciated.

    Reply
  • Thank you Pinal. You did simplified CONNECT BY VALUE and START WITH clauses of Oracle into SQL SERVER version.
    I was thinking to use HierachyId::GetRoot() method. Thanks a lot. Pradyumna..

    Reply

Leave a Reply

Menu