[Note from Pinal]: This is a 76th episode of Notes from the Field series. Hierarchy is one of the most important concepts in SQL Server but there are not clear tutorial for it. I have often observed that this simple concept is often ignored or poorly handled due to lack of understanding.
In this episode of the Notes from the Field series database expert Kevin Hazzard explains Table Hierarchy in Microsoft SQL Server Database. Read the experience of Kevin in his own words.
When you need to manage a set of tables in Microsoft SQL Server, it’s good to know the required order of operations. The order could be hard-coded into the process but such approaches tend to fail when the database schema evolves. Instead, I prefer to use the catalog view named [sys].[foreign_keys] to discover the relationships between tables dynamically. Long ago, I wrote a function called [LoadLevels] that I’ve used in hundreds of processes to make them reusable and more resilient. The code for that function is shown in Listing 1:
-- ========================================================================== -- Description: Get the load levels by tracing foreign keys in the database. -- License: Creative Commons (Free / Public Domain) -- Rights: This work (Linchpin People LLC Database Load Levels Function, -- by W. Kevin Hazzard), identified by Linchpin People LLC, is -- free of known copyright restrictions. -- Warranties: This code comes with no implicit or explicit warranties. -- Linchpin People LLC and W. Kevin Hazzard are not responsible -- for the use of this work or its derivatives. -- ========================================================================== CREATE FUNCTION [dbo].[LoadLevels]() RETURNS @results TABLE ( [SchemaName] SYSNAME, [TableName] SYSNAME, [LoadLevel] INT ) AS BEGIN WITH [key_info] AS ( SELECT [parent_object_id] AS [from_table_id], [referenced_object_id] AS [to_table_id] FROM [sys].[foreign_keys] WHERE [parent_object_id] <> [referenced_object_id] AND [is_disabled] = 0 ), [level_info] AS ( SELECT -- anchor part [st].[object_id] AS [to_table_id], 0 AS [LoadLevel] FROM [sys].[tables] AS [st] LEFT OUTER JOIN [key_info] AS [ki] ON [st].[object_id] = [ki].[from_table_id] WHERE [ki].[from_table_id] IS NULL UNION ALL SELECT -- recursive part [ki].[from_table_id], [li].[LoadLevel] + 1 FROM [key_info] AS [ki] INNER JOIN [level_info] AS [li] ON [ki].[to_table_id] = [li].[to_table_id] ) INSERT @results SELECT OBJECT_SCHEMA_NAME([to_table_id]) AS [SchemaName], OBJECT_NAME([to_table_id]) AS [TableName], MAX([LoadLevel]) AS [LoadLevel] FROM [level_info] GROUP BY [to_table_id]; RETURN END
The [LoadLevels] function walks through the table relationships in the database to discover how they’re connected to one another. As the function moves from one relationship to the next, it records the levels where they exist in the hierarchy. A partial output of the function as executed against Microsoft’s AdventureWorks2014 sample database is shown in Figure 1.
Ordering to show the highest load levels first, notice that the most dependent table in the AdventureWorks2014 database is [Sales].[SalesOrderDetails]. Since the load levels are zero-based in the function output, that table is eight levels high in the hierarchy. In other words, if I were developing an Extract, Transform & Load (ETL) system for [Sales].[SalesOrderDetails], there are at least seven other tables that need to be loaded before it. For all 71 tables in the AdventureWorks2014 database, the function reveals some interesting facts about the load order:
- Level 0 – 25 tables, these can be loaded first
- Level 1 – 8 tables, these can be loaded after level 0
- Level 2 – 8 tables, these can be loaded after level 1
- Level 3 – 19 tables, …
- Level 4 – 7 tables
- Level 5 – 1 table
- Level 6 – 1 table
- Level 7 – 2 tables, these must be loaded last
The [LoadLevels] function uses two Common Table Expressions (CTE) to do its work. The first one is called [key_info]. It is non-recursive and gathers just the foreign keys in the database that aren’t self-referencing and aren’t disabled. The second CTE is called [level_info] and it is recursive. It starts by left joining the tables in the database to the foreign keys from the first CTE, picking out just those tables that have no dependencies. For the AdventureWorks2014 database, for example, these would be the 25 tables at level zero (0).
Then the recursion begins by joining the output from the previous iteration back to the key information. This time however, the focus is on the target of each foreign key. Whenever matches are found, the reference level is incremented by one to indicate the layer of recursion where the relationship was discovered. Finally, the results are harvested from the [level_info] CTE by grouping the table object identifiers, resolving the schema and table names, and picking off the maximum load level discovered for each entity.
The reason for grouping and selecting the maximum load level for any table becomes clear if you remove the GROUP BY clause and the MAX() operator from the code. Doing that reveals every foreign key relationship in the database. So for example, in the AdventureWorks2014 database, the [Sales].[SalesOrderDetails] table appears in 22 different relationships ranging from three levels high in the hierarchy to eight levels high, output as [LoadLevel] 7 in Figure 1. By grouping and selecting the maximum level for any table, I’m certain to avoid loading tables too early in my dynamic ETL processes.
In summary, you can use the [LoadLevels] function to identify dependencies enforced by foreign key constraints between tables in your database. This information is very useful when developing a process to copy this data to another database while preserving the referential integrity of the source data.
If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.
Reference: Pinal Dave (https://blog.sqlauthority.com)