Hash Join works with large data set. I have seen this join used many times in data warehouses applications as well as data mining algorithms. While its characteristics are similar to merge join it does not required ordered result set to join. Hash join requiresequijoin predicate to join tables. Equijoin predicate is comparing values between one table to other table using “equals to” (“=”) operator. Hash join gives best performance when two more join tables are joined and at-least one of them have no index or is not sorted. It is also expected that smaller of the either of table can be read in memory completely (though not necessary).
Hash join is two phase process. 1) Build Phase 2) Probe Phase.
1) Build Phase : It reads rows of smaller table into the memory. It hashes the keys of the rows which will be used for join.
2) Probe Phase : It reads rows of other table and hashes the keys of the rows which will be used for join. While doing this it checks for the matching rows on hashed keys in the table build in Build phase. Smaller table in memory and larger table in disk is basic rule. If smaller table does not fit in memory it spits to hard drive. DBSPACETEMP configuration parameter is used to stored hashed table in probe phase.
In summary, instead of joining on the columns separate hash table is created and it is used to join tables for improved performance.
For further detail and in depth understanding refer Craig Freeman Blog.
There are three main kind of hash joins.
1) In-memory hash join
In build phase table fit completely in memory.
2) Grace hash join
In build phase table does not fit completely in memory and spans to disk.
3) Recursive hash join
In build phase table is very large and have to use many levels of merge joins.
Hash Bailout word is used for either grace hash join or recursive has join. Hash Bailout reduces the performance for queries to run. Trace inprofiler shows Hash Warning events when hash bailout reduces the performance. Update statistic on the columns used in join will improve the performance of the queries.