First of all answer this question : Which method of T-SQL is better for performance LEFT JOIN or NOT IN when writing query? Answer is : It depends! It all depends on what kind of data is and what kind query it is etc. In that case just for fun guess one option LEFT JOIN or NOT IN. If you need to refer the query which demonstrates the mentioned clauses, review following two queries.
USE AdventureWorks;
GO
SELECT ProductID
FROM Production.Product
WHERE ProductID
NOT IN (
SELECT ProductID
FROM Production.WorkOrder);
GO
SELECT p.ProductID
FROM Production.Product p
LEFT JOIN Production.WorkOrder w ON p.ProductID = w.ProductID
WHERE w.ProductID IS NULL;
GO
Now let us examine the actual execution plan of both the queries. Click on image to see larger image.
You can clearly observe that first query with NOT IN takes 20% resources of execution plan and LEFT JOIN takes 80% resources of execution plan. It is better to use NOT IN clause over LEFT JOIN in this particular example. Please note that this is not generic conclusion and applies to this example only. Your results may vary on many factors. Let me know your comments if you have guessed this correct or not.
Reference : Pinal Dave (http://www.SQLAuthority.com)






[...] recommend to read my previous article SQL SERVER - Better Performance - LEFT JOIN or NOT IN?, which describes how to convert subqueries to Joins and Joins to [...]
Hi!
It is bit hard to guess which one is actually most efficient.
If you turn on the STATISTICS IO and check messages, you’ll notice that the NOT IN -query accesses significantly more data pages with more scans:
NOT IN:
Table ‘WorkOrder’. Scan count 504, logical reads 1097, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Product’. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
LEFT JOIN:
Table ‘WorkOrder’. Scan count 1, logical reads 101, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Product’. Scan count 1, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Compare especially the Scan Count and Logical Reads -values.
Also, I tried to get the same results with using EXISTS and Query Optimizer produced exactly the same Execution Plan as for the NOT IN -query:
SELECT ProductID
FROM Production.Product p
WHERE NOT EXISTS
(SELECT ProductID
FROM Production.WorkOrder w
WHERE p.ProductID = w.ProductID);
As EXISTS -query usually outperforms JOINs, the worrying STATISTICS IO -results probably tells about badly optimized indexes for this kind of query.
Like you said: it depends on the database structure and amount of data. I also recommend not to stare only at the Execution Plans as they don’t always tell the whole truth.
Hmm. I didn’t really get anything out of that. Percentages are useful but what were the overall timings. 20% of 10s against 80% of one second or the other way around?
What are the effects at different data sizes?
More information is definitely needed before any conclusion can be made.
20% and 80% of the complete batch.
But what about this:
SELECT p.ProductID
FROM Production.Product p
LEFT JOIN Production.WorkOrder w ON p.ProductID = w.ProductID
AND w.ProductID IS NULL;
Then where should we use left join???
Looking at the example it would seem the results would vary based on the number of rows in the WordOrder table and what indexes are set up.
If ProductID is indexed wouldn’t this be a seek operation while the IN statement is reading the entire table?
Always use SET STATISTICS IO ON. The execution plan shows an “estimate” of cpu usage and those numbers are completely worthless since IO rules all.
Although the LEFT JOIN may be more processor intensive it is significantly less IO expensive which is the far more important measure.
The only time to use a subquery and play that little game of “beat the optimizer” is when the tables have normalization or index issues that you have no control over.
Very very good.
Your explanations are great! Simply Great!
Dude,
You know it well how query optimiser works. Do you work for MS cause noone can explain things like you.
NOT EXISTS/LEFT JOIN > SUBQUERY/NOT IN
That’s how I see it “most of the time”
Usually it is better to avoid correlated subselects. Looking only at only resources used is a pretty worthless method of evaluating optimization methods. The goal of optimization is to reduce run-time while still giving the correct answer.
I ran this select,
USE ttst
GO
SET SHOWPLAN_ALL ON
GO
select * from customer
where c.record_type = ‘T’
and c.customer_class_code = ‘LOCAL’
and c.customer_status_code = ‘ACTIVE’
SET SHOWPLAN_ALL OFF
GO
it gave me a error like below,
Server: Msg 1067, Level 15, State 1, Line 5
The SET SHOWPLAN statements must be the only statements in the batch.
(1 row(s) affected)
What is the correct way to use the set showplan_all on?
Jchen
Thanks! Really it is very useful for my query changes
Simple…KISS!
Oracle does not have Exists. Is there an equivalent?
Bob
Thanks for such a nice tip… its really help me in MySQL…
Thanks once again
regards,
Naseer Ahmad
–List names of authors who have contributed in
–any book title
SELECT DISTINCT authors.au_id,au_fname
FROM authors,titleauthor
WHERE authors.au_id!=titleauthor.au_id
SELECT au_id, au_fname FROM authors WHERE authors.au_id NOT IN
(SELECT au_id FROM titleauthor)
why the above query did not work as same as the below query??
the above on shows more records then the following.
why we did not do the same work with JOIN as with NOT IN??