SQL SERVER – Explanation SQL SERVER Hash Join

Hash Join works with large data set. I have seen this join used many times in data warehouses applications as well as data mining algorithms. While its characteristics are similar to merge join it does not required ordered result set to join. Hash join requiresequijoin predicate to join tables. Equijoin predicate is comparing values between one table to other table using “equals to” (“=”) operator. Hash join gives best performance when two more join tables are joined and at-least one of them have no index or is not sorted. It is also expected that smaller of the either of table can be read in memory completely (though not necessary).

Hash join is two phase process. 1) Build Phase 2) Probe Phase.

1) Build Phase :
It reads rows of smaller table into the memory. It hashes the keys of the rows which will be used for join.

2) Probe Phase : It reads rows of other table and hashes the keys of the rows which will be used for join. While doing this it checks for the matching rows on hashed keys in the table build in Build phase. Smaller table in memory and larger table in disk is basic rule. If smaller table does not fit in memory it spits to hard drive. DBSPACETEMP configuration parameter is used to stored hashed table in probe phase.

In summary, instead of joining on the columns separate hash table is created and it is used to join tables for improved performance.

For further detail and in depth understanding refer Craig Freeman Blog.

There are three main kind of hash joins.
1) In-memory hash join
In build phase table fit completely in memory.

2) Grace hash join
In build phase table does not fit completely in memory and spans to disk.

3) Recursive hash join
In build phase table is very large and have to use many levels of merge joins.

Hash Bailout word is used for either grace hash join or recursive has join. Hash Bailout reduces the performance for queries to run. Trace inprofiler shows Hash Warning events when hash bailout reduces the performance. Update statistic on the columns used in join will improve the performance of the queries.

Reference : Pinal Dave (https://blog.sqlauthority.com) , BOL

Previous Post
SQL SERVER – Fix : Error 8629 The query processor could not produce a query plan from the optimizer because a query cannot update a text, ntext, or image column and a clustering key at the same time.
Next Post
SQL SERVER – Easy Sequence of SELECT FROM JOIN WHERE GROUP BY HAVING ORDER BY

Related Posts

4 Comments. Leave new

  • Thanks for your article, it really helped me

    Reply
  • Hello Dave,

    Hi all,

    I ran a query and it took almost 21 seconds. But I used merge join
    and end up 6 seconds.

    Select *
    table a
    inner hash join table b on a.id = b.id

    By using this merge or hash join cause any overhead work load? I am very new to SQL Server and I am using 2000, 2005 server.

    Thank you very much
    Dave

    Reply
  • Hello All,

    Please help me with below

    How to determine which hash join is used in execution plan.
    In execution plan, I could see nothing specifying type of hash join used in optimisation

    Reply

Leave a Reply

Menu