SQL Server never stops to amaze me. As regular readers of this blog already know that besides conducting corporate training, I work on large-scale projects on query optimizations and server tuning projects. In one of the recent projects, I have noticed that a Junior Database Developer used the query hint Force Order; when I asked for details, I found out that the basic concept was not properly understood by him.
Today, let us try to understand its working and the effect of this hint. Further, we will see the extent of difference in performance created by this one query hint. I also have one interesting question for all of you as well; I will give the answer in one of my later posts.
USE AdventureWorks GO SELECT COUNT(*) CountEmployee -- 290 Rows FROM HumanResources.Employee GO SELECT COUNT(*) CountEmployeeAddress -- 290 Rows FROM HumanResources.EmployeeAddress GO SELECT COUNT(*) CountEmployeeDepartmentHistory -- 296 Rows FROM HumanResources.EmployeeDepartmentHistory GO SELECT * FROM HumanResources.Employee e INNER JOIN HumanResources.EmployeeAddress ea ON ea.EmployeeID = e.EmployeeID INNER JOIN HumanResources.EmployeeDepartmentHistory edh ON edh.EmployeeID = e.EmployeeID GO SELECT * FROM HumanResources.Employee e INNER JOIN HumanResources.EmployeeAddress ea ON ea.EmployeeID = e.EmployeeID INNER JOIN HumanResources.EmployeeDepartmentHistory edh ON edh.EmployeeID = e.EmployeeID OPTION (FORCE ORDER) GO
In above example, I have used three tables and their row count is listed as well. Employee and EmployeeAddress – both have same number rows, and EmployeeDepartmentHistory table has around 6 rows more than the other two tables. Now let us run the query without using OPTION (FORCE ORDER) and run it along with the query hint and check the execution plan. You will find a difference in the query cost.
We all accept that the table with least number of rows should be listed as the base table, and the same is done here. We have two such tables with the least number of rows, which are listed as base tables. Now before we further explain this, let us see the execution plan for the same.
It is very clear from the above execution plan that when I order is forced the query cost goes high. This proves that the SQL Server has already made a good decision with regard to the optimized query plan. When plan is forced in the case of the joins more than 2 table the performance matters. Let us see the execution order of the table in both the cases.
Now, from the example, it is very clear when you force order the query, it evaluates the table Employee first and when it is not forced, it evaluates EmployeeAddress first. Even though both the tables have same number of rows, the query optimizer processes them differently and uses different types of join logic. When the order is not forced, it uses hash join; further, in case of forced order, it uses nested loop – this creates a significant difference in the query cost.
The conclusion of this whole exercise is very simple.
- SQL Server Query Execution Engine is pretty smart to decide the best execution plan with least query cost for any query.
- Order of the tables in any query can make a significant impact on the query.
Now the question for you: We have seen that using query hint of OPTION (FORCE ORDER) reduces the performance; give an example wherein we can use this hint to improve the performance?
Please leave your comment here. I will publish the answer to this question with due credit and with my own example in a later post.
Reference: Pinal Dave (https://blog.sqlauthority.com)
27 Comments. Leave new
HI !! Pinal
I want to replicate Database from My One Server location to another.
For Example If One Database is in Ahemeabad and Other In Surat with large set of data the min. size of database is 2-3 GB.
Here the Problem is that as per requirement I have to put One Synchronization button on my Web Page (in Website).
Because user has not continuous Internet connectivity he will be ale to synchronize data from that button.
So I think I need to do Web synchronization.
If i have to use Sql Server 2008 or Window Application or any Other code Or Microsoft Sync. Framework than plz suggest me.
I have to obtain this without any tool.
In short I need to perform replication from Server to Server.
Please suggest me the rigth way to perform this functionality with some code guideline.
Thanks,
Nilesh Rathod
Hi Pinal,
What is the difference between the hash join and the nested loop join in these two queries?
I was also wondering how the estimated query costs are calculated.
Thanks.
Hi Pinal
I can’t think of some specific example, but I guess if I have a query with a combination of different joins (inner-outer). I can use Option (Force Order), in order to have a optimized query.
Comments ?
What is your Answer!
Hi Pinal,
I hope you have time to answer our questions. I can’t solve these problems on my own.
Thanks.
Hey Nilesh
Your Database size is 2-3 GB, so backing up this DB should take barely any time. You have following options
1. Full Backup on Source –> Copy the Backup on Destination file share –> Restore the backup
2. Look into Log-Shipping Option, its not bad
Replication is your 3rd option but I would rule that out for now b’cos you mentioned internet-connectivity can be an issue
OPTION (FORCE ORDER) – forced to use ‘NESTED LOOP’ option when joing table EMPLOYEE and Table EMPLOYEE HISTORY and again on EMPLOYEE DEPT HISTORY tables. Though it forced to use indexes and NESTED LOOP ( similar like FOR LOOP) but no predicate(s) on table A or B or C , it would take more time than ‘ JUST HASH MATCH’ and for loop with the data.
This option would perform well if the query has predicates , say filter on EPLOYEE ID ( or list of IDS ).
NESTED LOOP works like FOR LOOP record by record with table a ( base table or driving table) and table B.
Say table a has 1000 records and table b has 1000 records , it works similar to the one mentioned below.
for a in 1..1000
for b in 1..1000
…
end loop
end loop
Hi Pinal,
You ask whether we can give an example where this hint would improve performance. I cannot give a specific example, but I would point out the danger of assuming that a lower optimizer cost is proof that one execution plan is better than another. You really have to look at the actual IO, CPU, and Duration to make this assessment.
I see many cases when the optimizer’s choice of hash join is obviously badly suboptimal, even when proper indexing and statistics are present. If we see hash joins in our test environment, we will routinely “play with” hints and measure the actual performance of the query using different join types.
Sometimes when the optimizer prefers hash joins to nested loops, when I run the query both ways I see IO, CPU and Duration that is much higher for the hash join – sometimes more than one order of magnitude higher, in fact. In these cases, I am quick to remedy the situation by using the INNER LOOP JOIN operator, which effectively does the same thing as FORCE ORDER but which I prefer because it is more explicitly named, and can be applied to only a single join.
Cheers,
Chris
Is there a way to force the table order in the generated SQL?
Hi Pinal,
Came across a strange situation, where we had to use ‘FORCE ORDER’ in a badly written SQL statement for one of our client sites. The query would run without problems on SQL Server 2008 Enterprise with 35 to 40 seconds, but the same query on SQL Server 2008 Standard version would take hours. Once we used ‘FORCE ORDER’ query hint on the SQL Server 2008 Standard version the query would run within a minute.
The question which comes to mind is why ‘SQL Server 2008 Standard’ version behaving different when compared to ‘SQL Server 2008 Enterprise’ version? Any insight to the difference between Standard and Enterprise version behavior will be greatly appreciated.
(We had to get this going to setup the application at our client site. We have to revisit the query and rewrite it to work more efficiently.)
Thanks,
Vinod
Hi Pinal,
when we can expect the answer to your question and some examples to this issue?
Regards,
Andy
Hi Pinal,
Thanks for the information on (Force Order) Hint…but then which is the scerario where we should use this Hint to optimize the performance of the query?
Thanks,
Munish Bansal
Hello Munish,
There is no defined scenario to use FORCE ORDER hint. Because in most cases the SQL Server by default uses the best possible order for optimal plan. But This hint is only for cases where you try it and found considerable performance difference.
Regards,
Pinal Dave
Funny thing. I tried this with SQL Server 2008. Both queries produced exactly same execution plans. And if I force HASH JOIN to the query without FORCE ORDER hint I get quite opposite readings:
The former one without FORCE ORDER but with forced HASH JOIN query cost goes to 74%.
The latter one with FORCE ORDER query cost is only 26%.
I think the problem, when HASH JOIN is forced, is that optimizer must do index scan for both tables (Employee and EmployeeAddress) and build hash table out of one of them. That takes time. Then it have to go through the hash table and calculate hash from the other table and so on.
In the latter case optimizer scans only the Employee table and does index seek to the EmployyAddress table in a single run. Allthough index seek is done “SELECT COUNT(*) FROM Employee” times it is still faster than filling and comparing hash table.
But these are just my thoughts and I’m glad to be corrected :)
It’s been my experience with OPTION(FORCE ORDER) that it only increases preformance if the order your tables appear in SQL is from smallest rowset to largest, such as:
Company > Division > Employee > Entry
and it makes things worse if they appear in any other order, such as
Entry > Employee > Division > Company
Hi Pinal,
I need synchronize the tables from client to server and server to client using SQL server Sync framework and also sync from server to client sync based on some conditions.
regards,
Hema
Pinal,
It’s one thing to say the Force Order hint doesn’t improve performance over the Optimizer, but have you come across a situation where you absolutely needed the Force Order hint?
Amazingly, I have!
But reproducing it is not easy..
I’ll do my best to describe.
2005 Standard 64
CREATE TABLE A (
[A_id] int pk identity(1,1),
[B_id] int,
[Ref] varchar(200)
)
CREATE TABLE B (
[B_id] int pk identity(1,1),
[RefType] int
)
CREATE TABLE C (
[C_id] int pk identity(1,1),
[D_id] int,
…
)
CREATE TABLE D (
[D_id] int pk identity(1,1),
…
)
Table A
The idea here is that [Ref] can hold different keys for different tables. Table B’s [RefType] defines which table/key to join on.
Table C is that table. Lastly, Table D simply joins with C but really has nothing more to do with the situation other than returning a column.
— Data —
Table A (big)
1,1,’123′
2,2,’ABC’
Table B (small)
1,1
2,2
Table C (very big)
123, 85, …
Table D (small)
85, …
Finally, the fun:
SELECT *
FROM A
JOIN B ON B.B_ID = A.B_ID
JOIN C ON C.C_ID = CAST(A.Ref as int)
JOIN D ON D.D_ID = C.D_ID
WHERE B.RefType = 1
A is scanned first; B should be next, but sometimes the optimizer chooses C instead.
Since there is an alpha key ‘abc,’ the query fails because it cannot cast ‘abc’ as int.
The cure..
SELECT *
FROM A
JOIN B ON B.B_ID = A.B_ID
JOIN C ON C.C_ID = CAST(A.Ref as int)
JOIN D ON D.D_ID = C.D_ID
WHERE B.RefType = 1
OPTION (FORCE ORDER)
Have fun with this one!
If you can find some way to correct the optimizer’s behavior without the Hint, that would be great.
Oh and one last note..
Obviously this situation represents a much larger situation and C governs about 20 different joins. Ref is full of all sorts of stuff. And out of all of them, only 1 presented the above error and we didn’t see it until we ran it in 2005 (ran fine in 2000).
So don’t give up if you can’t reproduce it right away.
Thanks,
Benjamin
“Please leave your comment here. I will publish the answer to this question with due credit and with my own example in a later post.”
How much later? Or is it already posted, but you forgot to link to it from here?
I can only see that OPTION (FORCE ORDER) is clearly better when otherwise the Query Optimizer would “timeout” (it’s not time, it’s number of tries it does) due to too many possible alternatives to try. When query plan generation stops due to timeout, the plan isn’t considered “GoodEnough” (which it is when the Optimizer usually quits). FORCE ORDER greatly reduces the number of possibilities, and the timeout may be avoided.
Thought I’d just throw my situation into the mix. Using a recursive query with 3 subqueries, one of which was across databases in the same instance. Yeah, please don’t ask why and what was the deal with the data model but if I didnt use FORCE ORDER I could never run it properly.
select * from
(select distinct top(@n) * from table1)as t1
left join table2 on…
left join table3 on …
…
where @n is read from some table but is known to be small (<10) and table2 and table3 are big tables
Hi
We have a similar situation to Lena. A top 1 query that performs very badly without the hint. With FORCE ORDER the query runs 10 times faster. Given that the query runs a lot to poll a work table it improves the overall server performance considerably