SQL SERVER – Explanation SQL SERVER Hash Join

June 14, 2007

SQL, SQL Performance, SQL Server, SQL Tips and Tricks

Hash Join works with large data set. I have seen this join used many times in data warehouses applications as well as data mining algorithms. While its characteristics are similar to merge join it does not required ordered result set to join. Hash join requiresequijoin predicate to join tables. Equijoin predicate is comparing values between one table to other table using “equals to” (“=”) operator. Hash join gives best performance when two more join tables are joined and at-least one of them have no index or is not sorted. It is also expected that smaller of the either of table can be read in memory completely (though not necessary).

Hash join is two phase process. 1) Build Phase 2) Probe Phase.

1) Build Phase : It reads rows of smaller table into the memory. It hashes the keys of the rows which will be used for join.

2) Probe Phase : It reads rows of other table and hashes the keys of the rows which will be used for join. While doing this it checks for the matching rows on hashed keys in the table build in Build phase. Smaller table in memory and larger table in disk is basic rule. If smaller table does not fit in memory it spits to hard drive. DBSPACETEMP configuration parameter is used to stored hashed table in probe phase.

In summary, instead of joining on the columns separate hash table is created and it is used to join tables for improved performance.

For further detail and in depth understanding refer Craig Freeman Blog.

There are three main kind of hash joins.
1) In-memory hash join
In build phase table fit completely in memory.

2) Grace hash join
In build phase table does not fit completely in memory and spans to disk.

3) Recursive hash join
In build phase table is very large and have to use many levels of merge joins.

Hash Bailout word is used for either grace hash join or recursive has join. Hash Bailout reduces the performance for queries to run. Trace inprofiler shows Hash Warning events when hash bailout reduces the performance. Update statistic on the columns used in join will improve the performance of the queries.

Reference : Pinal Dave (https://blog.sqlauthority.com) , BOL

SQL Joins

SQL SERVER – Fix : Error 8629 The query processor could not produce a query plan from the optimizer because a query cannot update a text, ntext, or image column and a clustering key at the same time.

SQL SERVER – Easy Sequence of SELECT FROM JOIN WHERE GROUP BY HAVING ORDER BY

4 Comments. Leave new

Mahua
October 24, 2008 11:54 pm
Thanks for your article, it really helped me
Reply
David
September 10, 2011 3:16 am
Hello Dave,
Hi all,
I ran a query and it took almost 21 seconds. But I used merge join
and end up 6 seconds.
Select *
table a
inner hash join table b on a.id = b.id
By using this merge or hash join cause any overhead work load? I am very new to SQL Server and I am using 2000, 2005 server.
Thank you very much
Dave
Reply
Arathy
July 19, 2013 6:31 pm
Hello All,
Please help me with below
How to determine which hash join is used in execution plan.
In execution plan, I could see nothing specifying type of hash join used in optimisation
Reply