SQL SERVER – Explanation SQL SERVER Hash Join

Hash Join works with large data set. I have seen this join used many times in data warehouses applications as well as data mining algorithms. While its characteristics are similar to merge join it does not required ordered result set to join. Hash join requiresequijoin predicate to join tables. Equijoin predicate is comparing values between one table to other table using “equals to” (“=”) operator. Hash join gives best performance when two more join tables are joined and at-least one of them have no index or is not sorted. It is also expected that smaller of the either of table can be read in memory completely (though not necessary).

Hash join is two phase process. 1) Build Phase 2) Probe Phase.

1) Build Phase :
It reads rows of smaller table into the memory. It hashes the keys of the rows which will be used for join.

2) Probe Phase : It reads rows of other table and hashes the keys of the rows which will be used for join. While doing this it checks for the matching rows on hashed keys in the table build in Build phase. Smaller table in memory and larger table in disk is basic rule. If smaller table does not fit in memory it spits to hard drive. DBSPACETEMP configuration parameter is used to stored hashed table in probe phase.

In summary, instead of joining on the columns separate hash table is created and it is used to join tables for improved performance.

For further detail and in depth understanding refer Craig Freeman Blog.

There are three main kind of hash joins.
1) In-memory hash join
In build phase table fit completely in memory.

2) Grace hash join
In build phase table does not fit completely in memory and spans to disk.

3) Recursive hash join
In build phase table is very large and have to use many levels of merge joins.

Hash Bailout word is used for either grace hash join or recursive has join. Hash Bailout reduces the performance for queries to run. Trace inprofiler shows Hash Warning events when hash bailout reduces the performance. Update statistic on the columns used in join will improve the performance of the queries.

Reference : Pinal Dave (http://blog.SQLAuthority.com) , BOL

4 thoughts on “SQL SERVER – Explanation SQL SERVER Hash Join

  1. Hello Dave,

    Hi all,

    I ran a query and it took almost 21 seconds. But I used merge join
    and end up 6 seconds.

    Select *
    table a
    inner hash join table b on a.id = b.id

    By using this merge or hash join cause any overhead work load? I am very new to SQL Server and I am using 2000, 2005 server.

    Thank you very much
    Dave

    Like

  2. Pingback: SQL SERVER – Weekly Series – Memory Lane – #033 | Journey to SQL Authority with Pinal Dave

  3. Hello All,

    Please help me with below

    How to determine which hash join is used in execution plan.
    In execution plan, I could see nothing specifying type of hash join used in optimisation

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s