SQL SERVER – The Self Join – Inner Join and Outer Join

Self Join has always been an note-worthy case. It is interesting to ask questions on self join in a room full of developers. I often ask – if there are three kind of joins, i.e.- Inner Join, Outer Join and Cross Join; what type of join is Self Join? The usual answer is that it is an Inner Join. In fact, it can be classified under any type of join. I have previously written about this in my interview questions and answers series. I have also mentioned this subject when I explained the joins in detail over SQL SERVER – Introduction to JOINs – Basic of JOINs.

When I mention that Self Join can be the outer join, I often get a request for an example for the same. I have created example using AdventureWorks Database of Self Join earlier, but that was meant for inner join as well. Let us create a new example today, where we will see how Self Join can be implemented as an Inner Join as well as Outer Join.

Let us first create the same table for an employee. One of the columns in the same table contains the ID of manger, who is also an employee for the same company. This way, all the employees and their managers are present in the same table. If we want to find the manager of a particular employee, we need use self join.

USE TempDb
GO
-- Create a Table
CREATE TABLE Employee(
EmployeeID INT PRIMARY KEY,
Name NVARCHAR(50),
ManagerID INT
)
GO
-- Insert Sample Data
INSERT INTO Employee
SELECT 1, 'Mike', 3
UNION ALL
SELECT 2, 'David', 3
UNION ALL
SELECT 3, 'Roger', NULL
UNION ALL
SELECT 4, 'Marry',2
UNION ALL
SELECT 5, 'Joseph',2
UNION ALL
SELECT 7, 'Ben',2
GO
-- Check the data
SELECT *
FROM Employee
GO

SQL SERVER - The Self Join - Inner Join and Outer Join  selfjoini1

We will now use inner join to find the employees and their managers’ details.

-- Inner Join
SELECT e1.Name EmployeeName, e2.name AS ManagerName
FROM Employee e1
INNER JOIN Employee e2
ON e1.ManagerID = e2.EmployeeID
GO

SQL SERVER - The Self Join - Inner Join and Outer Join  selfjoini2

From the result set, we can see that all the employees who have a manager are visible. However we are unable to find out the top manager of the company as he is not visible in our resultset. The reason for the same is that due to inner join, his name is filtered out. Inner join does not bring any result which does not have manager id. Let us convert Inner Join to Outer Join and then see the resultset.

-- Outer Join
SELECT e1.Name EmployeeName, ISNULL(e2.name, 'Top Manager') AS ManagerName
FROM Employee e1
LEFT JOIN Employee e2
ON e1.ManagerID = e2.EmployeeID
GO

SQL SERVER - The Self Join - Inner Join and Outer Join  selfjoini3

Once we convert Inner Join to Outer Join, we can see the Top Manager as well. Here we have seen how Self Join can behave as an inner join as well as an outer join.

As I said earlier, many of you know these details, but there are many who are still confused about this concept. I hope that this concept is clear from this post.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Joins, SQL Scripts
Previous Post
SQL SERVER – Upper Case Shortcut SQL Server Management Studio
Next Post
SQLAuthority News – I am a MVP and I Love SQL Server

Related Posts

50 Comments. Leave new

  • Gary Melhaff
    May 14, 2015 3:31 am

    Dear Pinal,

    btw…You are the MAN of SqlServer help – I have received benefit from your posts many many times! Thanks you!

    I have a little more difficult self-join problem I’d like to see if you could solve.

    Below reflects a real situation for a source table in an application that I need to use for correcting data downstream to find the currently assigned id. It is a table that logs party merges that is a simple “from” party id column and a “to” party id column.

    Where its different than the typical employee/manager situation is that I need to be able to find the top last id in the chain of merges so that I can reference the correct party id when tying them to transactions. The source application has historically sometimes not merge correctly leaving child transactions still tied to the original party id. Therefore the data warehouse has to correct these as it assigns the dimensionality to the transactions coming in. The current process uses a TSql function that takes in an id and looks up the top parent for the returned value. But that’s causing performance problems so I need to re-think the approach to make it set-based so that it can be derived either in the query through join to a CTE or maintained as a separate lookup table for the join. Bottom line is I need turn a list of all single level from/to party ids into a new list of the every party_id involved in a merge along with its topmost parent party id.

    — Below builds sample party merge table and displays with typical
    — recursive outer join

    USE TempDb
    GO
    — Create a Table
    CREATE TABLE Party_merge(
    FromPartyID INT PRIMARY KEY,
    ToPartyID INT
    )
    GO
    — Insert Sample Data
    — 3 gets gets 4 children (1, 2, 8, 9) assigned to merge to it and then itself gets merged to id=10
    — 2 is the new parent for 4, 5, and 7
    — 11 gets 12 assigned. Only parent 12 has only one child (11)

    — Note:
    — 14 gets parent assigned as 14 (due to source system issue)

    INSERT INTO Party_merge
    SELECT 1, 3
    UNION ALL
    SELECT 2, 3
    UNION ALL
    SELECT 14, 14
    UNION ALL
    SELECT 4, 2
    UNION ALL
    SELECT 5, 2
    UNION ALL
    SELECT 7, 2
    UNION ALL
    SELECT 8, 3
    UNION ALL
    SELECT 9, 3
    UNION ALL
    SELECT 3, 10
    UNION ALL
    SELECT 11, 12
    GO
    — Check the data

    SELECT *
    FROM Party_merge
    ORDER BY FromPartyID, ToPartyID
    GO

    —————————————————–
    SELECT p1.FromPartyID, p1.ToPartyId
    FROM Party_merge p1
    LEFT OUTER JOIN Party_merge p2
    ON p1.ToPartyID = p2.FromPartyID
    WHERE p1.FromPartyID p1.ToPartyID
    —————————————————

    Reply
    • What is the expected output?

      Reply
      • Gary Melhaff
        May 15, 2015 12:53 pm

        Pinal, the outcome should be the root child aligned with the top parent. The objective is to align any child anywhere in ancestry with their current top parent – so it can be used as a lookup without a call to a row – based lookup since that can’t scale in a query. Does that make sense? I need a cte that can be joined to provide the current top parent.

      • Gary Melhaff
        May 18, 2015 11:20 am

        Sorry, the expected output is…
        1,10
        2,10
        3,10
        4,10
        5,10
        8,10
        9,10
        11,12

      • Will try if I have time. Mean time you may want to try MSDN forums as there are some real T-SQL experts there.

Leave a Reply