SQL SERVER – The Self Join – Inner Join and Outer Join

Self Join has always been an note-worthy case. It is interesting to ask questions on self join in a room full of developers. I often ask – if there are three kind of joins, i.e.- Inner Join, Outer Join and Cross Join; what type of join is Self Join? The usual answer is that it is an Inner Join. In fact, it can be classified under any type of join. I have previously written about this in my interview questions and answers series. I have also mentioned this subject when I explained the joins in detail over SQL SERVER – Introduction to JOINs – Basic of JOINs.

When I mention that Self Join can be the outer join, I often get a request for an example for the same. I have created example using AdventureWorks Database of Self Join earlier, but that was meant for inner join as well. Let us create a new example today, where we will see how Self Join can be implemented as an Inner Join as well as Outer Join.

Let us first create the same table for an employee. One of the columns in the same table contains the ID of manger, who is also an employee for the same company. This way, all the employees and their managers are present in the same table. If we want to find the manager of a particular employee, we need use self join.

USE TempDb
GO
-- Create a Table
CREATE TABLE Employee(
EmployeeID INT PRIMARY KEY,
Name NVARCHAR(50),
ManagerID INT
)
GO
-- Insert Sample Data
INSERT INTO Employee
SELECT 1, 'Mike', 3
UNION ALL
SELECT 2, 'David', 3
UNION ALL
SELECT 3, 'Roger', NULL
UNION ALL
SELECT 4, 'Marry',2
UNION ALL
SELECT 5, 'Joseph',2
UNION ALL
SELECT 7, 'Ben',2
GO
-- Check the data
SELECT *
FROM Employee
GO

selfjoini1 SQL SERVER   The Self Join   Inner Join and Outer Join

We will now use inner join to find the employees and their managers’ details.

-- Inner Join
SELECT e1.Name EmployeeName, e2.name AS ManagerName
FROM Employee e1
INNER JOIN Employee e2
ON e1.ManagerID = e2.EmployeeID
GO

selfjoini2 SQL SERVER   The Self Join   Inner Join and Outer Join

From the result set, we can see that all the employees who have a manager are visible. However we are unable to find out the top manager of the company as he is not visible in our resultset. The reason for the same is that due to inner join, his name is filtered out. Inner join does not bring any result which does not have manager id. Let us convert Inner Join to Outer Join and then see the resultset.

-- Outer Join
SELECT e1.Name EmployeeName, ISNULL(e2.name, 'Top Manager') AS ManagerName
FROM Employee e1
LEFT JOIN Employee e2
ON e1.ManagerID = e2.EmployeeID
GO

selfjoini3 SQL SERVER   The Self Join   Inner Join and Outer Join

Once we convert Inner Join to Outer Join, we can see the Top Manager as well. Here we have seen how Self Join can behave as an inner join as well as an outer join.

As I said earlier, many of you know these details, but there are many who are still confused about this concept. I hope that this concept is clear from this post.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Outer Join Not Allowed in Indexed Views

I recently received an email that contains a question from one of my readers. I have already replied the answer to his email, but I would still like to bring it to your attention and ask if you think I could have done any better with the example I gave.

The question was raised when the email sender read the white paper, Improving Performance with SQL Server 2008 Indexed Views. If you scroll all the way down through the said white paper, there are several questions and answers.

Q: Why can’t I use OUTER JOIN in an Indexed view?

A: Rows can logically disappear from an Indexed view based on OUTER JOIN when you insert data into a base table. This makes the OUTER JOIN view to be increasingly updated, which is relatively difficult to implement. In addition, the performance of the implementation would be slower than for views based on standard (INNER) JOIN.

The reader was confused with my answer and wanted me to explain it further. Here is the example which I have quickly put together to demonstrate the behavior described in the above statement.
USE tempdb
GO
-- Create Two Tables
CREATE TABLE BaseTable (ID1 INT, Col1 VARCHAR(100))
CREATE TABLE JoinedTable (ID2 INT, Col2 VARCHAR(100))
GO
-- Insert Values in Tables
INSERT INTO BaseTable (ID1,Col1)
SELECT 1,'First'
UNION ALL
SELECT 2,'Second'
GO
INSERT INTO JoinedTable (ID2,Col2)
SELECT 1,'First'
UNION ALL
SELECT 2,'Second'
UNION ALL
SELECT 3,'Third'
UNION ALL
SELECT 4,'Fourth'
GO
-- Use Outer Join
SELECT jt.*
FROM BaseTable bt
RIGHT OUTER JOIN JoinedTable jt ON bt.ID1 = jt.ID2
WHERE bt.ID1 IS NULL
GO

The script above will give us the following output:

outerjoinmiss1 SQL SERVER   Outer Join Not Allowed in Indexed Views

-- Now Insert Rows in Base Table
INSERT INTO BaseTable (ID1,Col1)
SELECT 3,'Third'
GO
-- You will notice that one row less retrieved from Join
SELECT jt.*
FROM BaseTable bt
RIGHT OUTER JOIN JoinedTable jt ON bt.ID1 = jt.ID2
WHERE bt.ID1 IS NULL
GO
-- Clean up
DROP TABLE BaseTable
DROP TABLE JoinedTable
GO

outerjoinmiss2 SQL SERVER   Outer Join Not Allowed in Indexed Views

After running this script, you will notice that as the base table gains one row, the result loses one row. Going back to the white paper mentioned earlier, I believe this is expensive to manage for the same reason why it is not allowed in Indexed View.

Let me know if you have a better example to demonstrate this behavior in the Outer Join.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Default Statistics on Column – Automatic Statistics on Column

During the SQL Server Training, I frequently noticed confusion in people in terms of Statistics. Many people have no idea on how Statistics works. There are so many misconceptions with respect to Statistics. I recently had an interesting conversation with one attendee who believed that Statistics only exists on Column if there is an Index on the Column, or if we explicitly create Statistics on it.

The truth is, Statistics can be in a table even though there is no Index in it. If you have the auto- create and/or auto-update Statistics feature turned on for SQL Server database, Statistics will be automatically created on the Column based on a few conditions. Please read my previously posted article, SQL SERVER – When are Statistics Updated – What triggers Statistics to Update, for the specific conditions when Statistics is updated.

Let us see one example where we could observe how Statistics is created automatically.

/*
In this example we will see effect of unused index on updating database
We will create unused indexes on table and see the performance degradation for insert
*/
USE AdventureWorks
GO
ALTER DATABASE AdventureWorks
SET AUTO_CREATE_STATISTICS ON;
GO
-- Create Table
CREATE TABLE StatsTable (ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100),
City VARCHAR(100))
GO
-- Insert One Hundred Thousand Records
INSERT INTO StatsTable (ID,FirstName,LastName,City)
SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
'Bob',
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith'
ELSE 'Brown' END,
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles'
ELSE 'Houston' END
FROM
sys.all_objects a
CROSS JOIN sys.all_objects b
GO
/* Now Check the statistics on the Table
As the table is just created there should not be any statistics on it
and will display "This object does not have any statistics or indexes."
*/
sp_helpstats 'StatsTable', 'ALL'
GO

From the example above, it is very clear that if the auto-update Statistics setting is enabled in the database, it will create the necessary Statistics based on the columns where certain conditions applied.

statdef1 SQL SERVER   Default Statistics on Column   Automatic Statistics on Column

-- Run following few queries on the table
SELECT *
FROM StatsTable
WHERE ID = 110
GO
SELECT *
FROM StatsTable
WHERE City = 'Houston'
GO
/* Now Check the statistics on the Table again
You will see two different statistics created on respective columns
used in WHERE clause.
*/
sp_helpstats 'StatsTable', 'ALL'
GO

statdef2 SQL SERVER   Default Statistics on Column   Automatic Statistics on Column

/* Now let us try with multiple Column in WHERE clause */
SELECT *
FROM StatsTable
WHERE ID = 110 AND City = 'Houston' AND FirstName = 'Bob'
GO
/* Now Check the statistics on the Table again
You will see it will create statistics for the column
used in WHERE clause; if it was not created earlier.
*/
sp_helpstats 'StatsTable', 'ALL'
GO
-- Clean up Database
DROP TABLE StatsTable
GO

statdef3 SQL SERVER   Default Statistics on Column   Automatic Statistics on Column

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Merge Operations – Insert, Update, Delete in Single Execution

This blog post is written in response to T-SQL Tuesday hosted by Jorge Segarra (aka SQLChicken).

TSQL2sDay SQL SERVER   Merge Operations   Insert, Update, Delete in Single Execution

I have been very active using these Merge operations in my development. However, I have found out from my consultancy work and friends that these amazing operations are not utilized by them most of the time. Here is my attempt to bring the necessity of using the Merge Operation to surface one more time.

MERGE is a new feature that provides an efficient way to do multiple DML operations. In earlier versions of SQL Server, we had to write separate statements to INSERT, UPDATE, or DELETE data based on certain conditions; however, at present, by using the MERGE statement, we can include the logic of such data changes in one statement that even checks when the data is matched and then just update it, and similarly, when the data is unmatched, it is inserted.

One of the most important advantages of MERGE statement is that the entire data are read and processed only once. In earlier versions, three different statements had to be written to process three different activities (INSERT, UPDATE or DELETE); however, by using MERGE statement, all the update activities can be done in one pass of database table.

I have written about these Merge Operations earlier in my blog post over here SQL SERVER – 2008 – Introduction to Merge Statement – One Statement for INSERT, UPDATE, DELETE. I was asked by one of the readers that how do we know that this operator was doing everything in single pass and was not calling this Merge Operator multiple times.

Let us run the same example which I have used earlier; I am listing the same here again for convenience.

--Let’s create Student Details and StudentTotalMarks and inserted some records.
USE tempdb
GO
CREATE TABLE StudentDetails
(
StudentID INTEGER PRIMARY KEY,
StudentName VARCHAR(15)
)
GO
INSERT INTO StudentDetails
VALUES(1,'SMITH')
INSERT INTO StudentDetails
VALUES(2,'ALLEN')
INSERT INTO StudentDetails
VALUES(3,'JONES')
INSERT INTO StudentDetails
VALUES(4,'MARTIN')
INSERT INTO StudentDetails
VALUES(5,'JAMES')
GO
CREATE TABLE StudentTotalMarks
(
StudentID INTEGER REFERENCES StudentDetails,
StudentMarks INTEGER
)
GO
INSERT INTO StudentTotalMarks
VALUES(1,230)
INSERT INTO StudentTotalMarks
VALUES(2,255)
INSERT INTO StudentTotalMarks
VALUES(3,200)
GO
-- Select from Table
SELECT *
FROM StudentDetails
GO
SELECT *
FROM StudentTotalMarks
GO
-- Merge Statement
MERGE StudentTotalMarks AS stm
USING
(SELECT StudentID,StudentName FROM StudentDetails) AS sd
ON stm.StudentID = sd.StudentID
WHEN MATCHED AND stm.StudentMarks > 250 THEN DELETE
WHEN
MATCHED THEN UPDATE SET stm.StudentMarks = stm.StudentMarks + 25
WHEN NOT MATCHED THEN
INSERT
(StudentID,StudentMarks)
VALUES(sd.StudentID,25);
GO
-- Select from Table
SELECT *
FROM StudentDetails
GO
SELECT *
FROM StudentTotalMarks
GO
-- Clean up
DROP TABLE StudentDetails
GO
DROP TABLE StudentTotalMarks
GO

The Merge Join performs very well and the following result is obtained.

MergeStatement SQL SERVER   Merge Operations   Insert, Update, Delete in Single Execution

Let us check the execution plan for the merge operator. You can click on following image to enlarge it.

merge1 SQL SERVER   Merge Operations   Insert, Update, Delete in Single Execution

Let us evaluate the execution plan for the Table Merge Operator only.

merge2 SQL SERVER   Merge Operations   Insert, Update, Delete in Single Execution

We can clearly see that the Number of Executions property suggests value 1. Which is quite clear that in a single PASS, the Merge Operation completes the operations of Insert, Update and Delete.

I strongly suggest you all to use this operation, if possible, in your development. I have seen this operation implemented in many data warehousing applications.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Subquery or Join – Various Options – SQL Server Engine Knows the Best – Part 2

This blog post is part 2 of the earlier written article SQL SERVER – Subquery or Join – Various Options – SQL Server Engine knows the Best by Paulo R. Pereira.

Paulo has left excellent comment to earlier article once again proving the point that SQL Server Engine is smart enough to figure out the best plan itself and uses the same for the query. Let us go over his comment as he has posted.

“I think IN or EXISTS is the best choice, because there is a little difference between ‘Merge Join’ of query with JOIN (Inner Join) and the others options (Left Semi Join), and JOIN can give more results than IN or EXISTS if the relationship is 1:0..N and not 1:0..1.

And if I try use NOT IN and NOT EXISTS the query plan is different from LEFT JOIN too (Left Anti Semi Join vs. Left Outer Join + Filter)”

USE AdventureWorks
GO
-- use of SOME
SELECT *
FROM HumanResources.Employee E
WHERE E.EmployeeID = SOME (
SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
UNION ALL
SELECT EA.EmployeeID
FROM HumanResources.EmployeeDepartmentHistory EA
)
-- use of IN
SELECT *
FROM HumanResources.Employee E
WHERE E.EmployeeID IN (
SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
UNION ALL
SELECT EA.EmployeeID
FROM HumanResources.EmployeeDepartmentHistory EA
)
-- use of EXISTS
SELECT *
FROM HumanResources.Employee E
WHERE EXISTS (
SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
UNION ALL
SELECT EA.EmployeeID
FROM HumanResources.EmployeeDepartmentHistory EA
)

When looked into execution plan of the queries listed above indeed we do get different plans for queries and SQL Server Engines creates the best (least cost) plan for each query. Click on image to see larger images.

inexists2 SQL SERVER   Subquery or Join   Various Options   SQL Server Engine Knows the Best   Part 2

Thanks Paulo for your wonderful contribution.

Reference : Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Subquery or Join – Various Options – SQL Server Engine knows the Best

This is followup post of my earlier article SQL SERVER – Convert IN to EXISTS – Performance Talk, after reading all the comments I have received I felt that I could write more on the same subject to clear few things out.

First let us run following four queries, all of them are giving exactly same resultset.

USE AdventureWorks
GO
-- use of =
SELECT *
FROM HumanResources.Employee E
WHERE E.EmployeeID = ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO
-- use of in
SELECT *
FROM HumanResources.Employee E
WHERE E.EmployeeID IN ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO
-- use of exists
SELECT *
FROM HumanResources.Employee E
WHERE EXISTS ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO
-- Use of Join
SELECT *
FROM HumanResources.Employee E
INNER JOIN HumanResources.EmployeeAddress EA ON E.EmployeeID = EA.EmployeeID
GO

Let us compare the execution plan of the queries listed above. Click on image to see larger image.

inexists1 SQL SERVER   Subquery or Join   Various Options   SQL Server Engine knows the Best

It is quite clear from the execution plan that in case of IN, EXISTS and JOIN SQL Server Engines is smart enough to figure out what is the best optimal plan of Merge Join for the same query and execute the same. However, in the case of use of Equal (=) Operator, SQL Server is forced to use Nested Loop and test each result of the inner query and compare to outer query, leading to cut the performance. Please note that here I no mean suggesting that Nested Loop is bad or Merge Join is better. This can very well vary on your machine and amount of resources available on your computer.

When I see Equal (=) operator used in query like above, I usually recommend to see if user can use IN or EXISTS or JOIN. As I said, this can very much vary on different system. What is your take in above query? I believe SQL Server Engines is usually pretty smart to figure out what is ideal execution plan and use it.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – 2008 Star Join Query Optimization

Business Intelligence (BI) plays a significant role in businesses nowadays. Moreover, the databases that deal with the queries related to BI are presently facing an increase in workload. At present, when queries are sent to very large databases, millions of rows are returned. Also the users have to go through extended query response times when joining multiple tables are involved with such queries. ‘Star Join Query Optimization’ is a new feature of SQL Server 2008 Enterprise Edition. This mechanism uses bitmap filtering for improving the performance of some types of queries by the effective retrieval of rows from fact tables.

Improved Query Response Times
In general, data warehouses employ dimensionally modeled star or snowflake schemas. These schemas have one or more than one fact tables that contain transactional data and many dimension tables, which holds information such as product data, customer information, and times and dates – all these define the fact table data. Usually, foreign keys are employed for maintaining relationships between the rows in fact tables and also between the rows in the dimension tables. Databases that contain star schemas are recognized by SQL Server 2008 Enterprise. It uses the new Star Join Query logic for processing queries against such star schemas more efficiently. Typically, on an average, data warehouse queries run faster to approximately 20 percent.

Automatically Implemented
Star Join Query Optimization is automatically implemented by the SQL Server. It does not require a special database or application configuration. The query processor will usually optimize queries with medium selectivity (this refers to the queries that retrieve approximately 10% to 75% of rows from a fact table). Such queries are usually handled using hash joins to join the dimension and fact tables by employing the foreign keys to identify the matching rows. A hash table is built for each dimension table referenced in the query in the case of hash joins; the optimization process uses these hash tables for deriving bitmap filters. The key values from each dimension table are identified by bitmap filters; these key values qualify for inclusion in the query. When the fact table is scanned, the bitmap filters are applied to it. These bitmap filters eliminate those rows of the fact table which are not qualified for inclusion in the result set. The most selective bitmap filter is applied first as it is found to eliminate the highest number of rows. Since the eliminated rows do not need further processing, the subsequent filters need not be applied to them – this way the process becomes more efficient.

Query Selectivity
The performance is enhanced in the case of medium selectivity queries while using bitmap filtering because the rows are filtered before any joins are implemented. Hence, there is a decrease in the number of rows that are processed by each join. Bitmap filtering is not applied when queries are highly selective (i.e., those queries that return less than 10% of the rows in a fact table). In such case, a nested loop join is found to be generally more efficient. Similarly, when the queries are not very selective at all (queries which return more than 75% of the rows in a fact table), bitmap filtering is not applied as there are very few rows to be filtered, and hence, there is no requirement of enhancement in performance in this case.

Integer Data Types
Star join optimization is found to give the highest efficiency when the data type of the columns used in the joins is integer. This feature enables the bitmap filter to be applied as part of the initial table or index scan rather than being used at a later stage in the query plan. Most of the queries are benefited from star join optimization since foreign key relationships are commonly implemented using integer-based alternate key columns.

Reference : Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Understanding Table Hints with Examples

Introduction

Today we have a very interesting subject to look at. I tried to look for help online but have not found any other documentation besides what we have from the Book Online.

Let us try to understand what are the different kinds of hints available in SQL Server and how they are helpful.

What is a Hint?

Hints are options and strong suggestions specified for enforcement by the SQL Server query processor on DML statements. The hints override any execution plan the query optimizer might select for a query.

Before we continue to explore this subject, we need to consider one very important fact and say some words of caution. SQL Server Query optimizer is a very smart tool and it makes a best selection of execution plan. Suggesting hints to the Query Optimizer should be attempted when absolutely necessary and by experienced developers who know exactly what they are doing (or in development as a way to experiment and learn).

There are three different kinds of hints. Let us understand the basics of each of them separately.

Join Hint

This hint is used when more than one table is used in a query. Two or more tables can be joined using different kinds of joins. This hint forces the type of join algorithm that is used. Joins can be used in SELECT, UPDATE and DELETE statements.

Query Hint

This hint is used when certain kind of logic has to be applied to a whole query. Any hint used in the query is applied to the complete query, as opposed to part of it. There is no way to specify that only a certain part of a query should be used with the hint. After any query, the OPTION clause is specified to apply the logic to this query. A query always has any of the following statements: SELECT, UPDATE, DELETE, INSERT or MERGE (SQL 2K8); and this hint can be applied to all of them.

Table Hint

This hint is used when certain kind of locking mechanism of tables has to be controlled. SQL Server query optimizer always puts the appropriate kind of lock on tables, when any of the Transact SQL operations SELECT, UPDATE, DELETE, INSERT or MERGE are used. There are certain cases when the developer knows when and where to override the default behavior of the locking algorithm and these hints are useful in those scenarios.

Let us run the following simple query with different kinds of query hints and observe the actual execution plan. The analysis of execution plan is not part of this article and will be covered in future.

USE AdventureWorks
GO
/* No Query Hint */
SELECT *
FROM Production.Product AS p
INNER JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID
WHERE Weight = 20.77
GO
/* Merge Join Query Hint */
SELECT *
FROM Production.Product AS p
INNER MERGE JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID
WHERE Weight = 20.77
GO
/* Hash Join Query Hint */
SELECT *
FROM Production.Product AS p
INNER HASH JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID
WHERE Weight = 20.77
GO
/* Loop Join Query Hint */
SELECT *
FROM Production.Product AS p
INNER LOOP JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID
WHERE Weight = 20.77
GO
/* Remote Join Query Hint */
SELECT *
FROM Production.Product AS p
INNER REMOTE JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID
WHERE Weight = 20.77
GO

The above query will produce the following query execution plan.

sql join SQL SERVER   Understanding Table Hints with Examples

In this article we will focus mainly on Join Hints. We will discuss other kinds of hints and their usage in a different article. There are a total of four kinds of join hints available.

Loop Join

This join is also commonly known as nested iteration. This kind of join is composed by an outer loop and an inner loop. When the query runs for each row of the outer loop, the inner loop is executed completely. This join is effective only when the outer loop query is small and the inner loop query has all the proper optimizations applied. This join method is very useful with small transactions.

Merge Join

This join has the unique requirement for tables involved in the operation to be sorted. This join keeps both of the tables sorted in parallel and compares each table row by row simultaneously with each other. It compares one row of the first table with one row of the second table. If they are equal, that row qualifies; otherwise, this join determines which row of each table has the lower value. Once the lowest value of the table is figured out, it moves on to next row of that table and compares that to the original row. This operation keeps going on until all rows from each table are completely examined. This operation can be very expensive when tables are not sorted and it’s required to sort them before they are joined. If tables have non clustered indexes over them and joins are using the same conditions, there are pretty good chances that this join performs better than other kinds of joins.

Hash Join

This is the most complex of all the other joins. There are two major components of this kind of join – build query and probe query. First, a smaller table is assigned as build query and a hash table for the same is created. This hash table is compared with the probe table. This comparison of input table and probe table is done one row at a time. One row of the probe table is hashed and compared against the other row, and qualifying rows are checked.

There are three different kinds of hash joins:  in-memory hash join, grace hash join, and recursive hash join. Let us look in more detail at these three kinds of joins.

In-Memory Hash Join

This kind of hash join is used when tables are very small and a complete table can be hashed and loaded in memory.

Grace Hash Join

This kind of hash join is used when tables are comparatively large and don’t fit in memory. Tables will be partitioned at input levels and processed in steps.

Recursive Hash Join

This kind of join is used for complex tables and for tables which are very large and require multilevel of table partitions.

SQL Server query optimizer is a smart tool and it knows when to use the right kind of join. When it comes to hash join, the query optimizer starts conservatively with in-memory hash join. If join is larger than in-memory it moves up to Grace hash join or Recursive hash join. Sometimes the optimizer makes a mistake in identifying a smaller table and it reverses the role of the build and probe table: this is called role reversal.

Remote Join

This is least-used tabled join ever. There is no example for it given in Book On Line. Every Join has two tables associated with it: Left Table and Right Table. The join usually happens on the Left Table. When Remote close is used, joins are performed on the site of Right Join. This join can only be performed on INNER JOIN as in case of OUTER join there may be NULL values on the right table which makes concept of joining on right table logically incorrect.

Summary

We have closely observed different kinds of Join Hints in this article. We will cover the remaining concepts in another article.

Reference: Pinal Dave (http://blog.SQLAuthority.com), DNS

SQL SERVER – Interesting Observation – Query Hint – FORCE ORDER

SQL Server never stops to amaze me. As regular readers of this blog already know that besides conducting corporate training, I work on large-scale projects on query optimizations and server tuning projects. In one of the recent projects, I have noticed that a Junior Database Developer used the query hint Force Order; when I asked for details, I found out that the basic concept was not properly understood by him.

Today, let us try to understand its working and the effect of this hint. Further, we will see the extent of difference in performance created by this one query hint. I also have one interesting question for all of you as well; I will give the answer in one of my later posts.

USE AdventureWorks
GO
SELECT COUNT(*) CountEmployee -- 290 Rows
FROM HumanResources.Employee
GO
SELECT COUNT(*) CountEmployeeAddress -- 290 Rows
FROM HumanResources.EmployeeAddress
GO
SELECT COUNT(*) CountEmployeeDepartmentHistory -- 296 Rows
FROM HumanResources.EmployeeDepartmentHistory
GO
SELECT *
FROM HumanResources.Employee e
INNER JOIN HumanResources.EmployeeAddress ea ON ea.EmployeeID = e.EmployeeID
INNER JOIN HumanResources.EmployeeDepartmentHistory edh ON edh.EmployeeID = e.EmployeeID
GO
SELECT *
FROM HumanResources.Employee e
INNER JOIN HumanResources.EmployeeAddress ea ON ea.EmployeeID = e.EmployeeID
INNER JOIN HumanResources.EmployeeDepartmentHistory edh ON edh.EmployeeID = e.EmployeeID
OPTION (FORCE ORDER)
GO

In above example, I have used three tables and their row count is listed as well. Employee and EmployeeAddress – both have same number rows, and EmployeeDepartmentHistory table has around 6 rows more than the other two tables. Now let us run the query without using OPTION (FORCE ORDER) and run it along with the query hint and check the execution plan. You will find a difference in the query cost.

We all accept that the table with least number of rows should be listed as the base table, and the same is done here. We have two such tables with the least number of rows, which are listed as base tables. Now before we further explain this, let us see the execution plan for the same.

forceorder SQL SERVER   Interesting Observation   Query Hint   FORCE ORDER

It is very clear from the above execution plan that when I order is forced the query cost goes high. This proves that the SQL Server has already made a good decision with regard to the optimized query plan. When plan is forced in the case of the joins more than 2 table the performance matters. Let us see the execution order of the table in both the cases.

forceorder1 SQL SERVER   Interesting Observation   Query Hint   FORCE ORDER

Now, from the example, it is very clear when you force order the query, it evaluates the table Employee first and when it is not forced, it evaluates EmployeeAddress first. Even though both the tables have same number of rows, the query optimizer processes them differently and uses different types of join logic. When the order is not forced, it uses hash join; further, in case of forced order, it uses nested loop – this creates a significant difference in the query cost.

The conclusion of this whole exercise is very simple.

  • SQL Server Query Execution Engine is pretty smart to decide the best execution plan with least query cost for any query.
  • Order of the tables in any query can make a significant impact on the query.

Now the question for you: We have seen that using query hint of OPTION (FORCE ORDER) reduces the performance; give an example wherein we can use this hint to improve the performance?

Please leave your comment here. I will publish the answer to this question with due credit and with my own example in a later post.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Outer Join in Indexed View – Question to Readers

Today I have question for you. Just a day ago I was reading whitepaper Improving Performance with SQL Server 2008 Indexed Views. Following is question and answer I read in the white paper.

Q. Why can’t I use OUTER JOIN in an indexed view?

A. Rows can logically disappear from an indexed view based on OUTER JOIN when you insert data into a base table. This makes incrementally updating OUTER JOIN views relatively complex to implement, and the performance of the implementation would be slower than for views based on standard (INNER) JOIN.

Here I would like to ask you one question, do you have example for kind of OUTER JOIN where you insert data into base table, it will make a row disappear from query?

Please post your answer as comment.

Reference: Pinal Dave (http://blog.SQLAuthority.com)