[Note from Pinal]: This is a 76th episode of Notes from the Field series. Hierarchy is one of the most important concepts in SQL Server but there are not clear tutorial for it. I have often observed that this simple concept is often ignored or poorly handled due to lack of understanding.
In this episode of the Notes from the Field series database expert Kevin Hazzard explains Table Hierarchy in Microsoft SQL Server Database. Read the experience of Kevin in his own words.
When you need to manage a set of tables in Microsoft SQL Server, it’s good to know the required order of operations. The order could be hard-coded into the process but such approaches tend to fail when the database schema evolves. Instead, I prefer to use the catalog view named [sys].[foreign_keys] to discover the relationships between tables dynamically. Long ago, I wrote a function called [LoadLevels] that I’ve used in hundreds of processes to make them reusable and more resilient. The code for that function is shown in Listing 1:
-- ========================================================================== -- Description: Get the load levels by tracing foreign keys in the database. -- License: Creative Commons (Free / Public Domain) -- Rights: This work (Linchpin People LLC Database Load Levels Function, -- by W. Kevin Hazzard), identified by Linchpin People LLC, is -- free of known copyright restrictions. -- Warranties: This code comes with no implicit or explicit warranties. -- Linchpin People LLC and W. Kevin Hazzard are not responsible -- for the use of this work or its derivatives. -- ========================================================================== CREATE FUNCTION [dbo].[LoadLevels]() RETURNS @results TABLE ( [SchemaName] SYSNAME, [TableName] SYSNAME, [LoadLevel] INT ) AS BEGIN WITH [key_info] AS ( SELECT [parent_object_id] AS [from_table_id], [referenced_object_id] AS [to_table_id] FROM [sys].[foreign_keys] WHERE [parent_object_id] <> [referenced_object_id] AND [is_disabled] = 0 ), [level_info] AS ( SELECT -- anchor part [st].[object_id] AS [to_table_id], 0 AS [LoadLevel] FROM [sys].[tables] AS [st] LEFT OUTER JOIN [key_info] AS [ki] ON [st].[object_id] = [ki].[from_table_id] WHERE [ki].[from_table_id] IS NULL UNION ALL SELECT -- recursive part [ki].[from_table_id], [li].[LoadLevel] + 1 FROM [key_info] AS [ki] INNER JOIN [level_info] AS [li] ON [ki].[to_table_id] = [li].[to_table_id] ) INSERT @results SELECT OBJECT_SCHEMA_NAME([to_table_id]) AS [SchemaName], OBJECT_NAME([to_table_id]) AS [TableName], MAX([LoadLevel]) AS [LoadLevel] FROM [level_info] GROUP BY [to_table_id]; RETURN END
The [LoadLevels] function walks through the table relationships in the database to discover how they’re connected to one another. As the function moves from one relationship to the next, it records the levels where they exist in the hierarchy. A partial output of the function as executed against Microsoft’s AdventureWorks2014 sample database is shown in Figure 1.
Ordering to show the highest load levels first, notice that the most dependent table in the AdventureWorks2014 database is [Sales].[SalesOrderDetails]. Since the load levels are zero-based in the function output, that table is eight levels high in the hierarchy. In other words, if I were developing an Extract, Transform & Load (ETL) system for [Sales].[SalesOrderDetails], there are at least seven other tables that need to be loaded before it. For all 71 tables in the AdventureWorks2014 database, the function reveals some interesting facts about the load order:
- Level 0 – 25 tables, these can be loaded first
- Level 1 – 8 tables, these can be loaded after level 0
- Level 2 – 8 tables, these can be loaded after level 1
- Level 3 – 19 tables, …
- Level 4 – 7 tables
- Level 5 – 1 table
- Level 6 – 1 table
- Level 7 – 2 tables, these must be loaded last
The [LoadLevels] function uses two Common Table Expressions (CTE) to do its work. The first one is called [key_info]. It is non-recursive and gathers just the foreign keys in the database that aren’t self-referencing and aren’t disabled. The second CTE is called [level_info] and it is recursive. It starts by left joining the tables in the database to the foreign keys from the first CTE, picking out just those tables that have no dependencies. For the AdventureWorks2014 database, for example, these would be the 25 tables at level zero (0).
Then the recursion begins by joining the output from the previous iteration back to the key information. This time however, the focus is on the target of each foreign key. Whenever matches are found, the reference level is incremented by one to indicate the layer of recursion where the relationship was discovered. Finally, the results are harvested from the [level_info] CTE by grouping the table object identifiers, resolving the schema and table names, and picking off the maximum load level discovered for each entity.
The reason for grouping and selecting the maximum load level for any table becomes clear if you remove the GROUP BY clause and the MAX() operator from the code. Doing that reveals every foreign key relationship in the database. So for example, in the AdventureWorks2014 database, the [Sales].[SalesOrderDetails] table appears in 22 different relationships ranging from three levels high in the hierarchy to eight levels high, output as [LoadLevel] 7 in Figure 1. By grouping and selecting the maximum level for any table, I’m certain to avoid loading tables too early in my dynamic ETL processes.
In summary, you can use the [LoadLevels] function to identify dependencies enforced by foreign key constraints between tables in your database. This information is very useful when developing a process to copy this data to another database while preserving the referential integrity of the source data.
If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.
Reference: Pinal Dave (https://blog.sqlauthority.com)
7 Comments. Leave new
For my database, this returns error message
Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
do you have FK going 100 level deep?
Hi Pinal,
Nope. As correctly guessed by Kevin, we have circular foreign key references.
-Thanks
Darshan, the chances are good that you have two or more tables that have FKs to one another, perhaps through an intermediate table which the script cannot detect. Notice that self-referencing tables (another typically suspicious design pattern) are detected. However, the script I provided does not solve for other cases that are infinitely recursive.
Hi Kevin,
Yes we have that type of circular reference keys.
What could be possible solutions in this case? can you refer some articles ?
Kevin, this script is great! We are often trying to move a “slice” of our production data down to Staging or Dev and it is always painful. I am hoping to use this in reverse to clear out existing data and then in order to fill the table appropriately. Very, very well done.
This saved MY JOB!!! I AM NOT FIRED!!!