First of all answer this question : Which method of T-SQL is better for performance LEFT JOIN or NOT IN when writing a query? The answer is: It depends! It all depends on what kind of data is and what kind query it is etc. In that case just for fun guess one option LEFT JOIN or NOT IN. If you need to refer the query which demonstrates the mentioned clauses, review following two queries for Join Better Performance.
USE AdventureWorks; GO SELECT ProductID FROM Production.Product WHERE ProductID NOT IN ( SELECT ProductID FROM Production.WorkOrder); GO SELECT p.ProductID FROM Production.Product p LEFT JOIN Production.WorkOrder w ON p.ProductID = w.ProductID WHERE w.ProductID IS NULL; GO
Now let us examine the actual execution plan of both the queries. Click on image to see larger image.
You can clearly observe that first query with a NOT IN takes 20% resources of the execution plan and LEFT JOIN takes 80% resources of execution plan. It is better to use NOT IN clause over LEFT JOIN in this particular example. Please note that this is not generic conclusion and applies to this example only. Your results may vary on many factors. Let me know your comments if you have guessed this correct or not.
Reference : Pinal Dave (https://blog.sqlauthority.com)
70 Comments. Leave new
Hi!
It is bit hard to guess which one is actually most efficient.
If you turn on the STATISTICS IO and check messages, you’ll notice that the NOT IN -query accesses significantly more data pages with more scans:
NOT IN:
Table ‘WorkOrder’. Scan count 504, logical reads 1097, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Product’. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
LEFT JOIN:
Table ‘WorkOrder’. Scan count 1, logical reads 101, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Product’. Scan count 1, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Compare especially the Scan Count and Logical Reads -values.
Also, I tried to get the same results with using EXISTS and Query Optimizer produced exactly the same Execution Plan as for the NOT IN -query:
SELECT ProductID
FROM Production.Product p
WHERE NOT EXISTS
(SELECT ProductID
FROM Production.WorkOrder w
WHERE p.ProductID = w.ProductID);
As EXISTS -query usually outperforms JOINs, the worrying STATISTICS IO -results probably tells about badly optimized indexes for this kind of query.
Like you said: it depends on the database structure and amount of data. I also recommend not to stare only at the Execution Plans as they don’t always tell the whole truth.
Hmm. I didn’t really get anything out of that. Percentages are useful but what were the overall timings. 20% of 10s against 80% of one second or the other way around?
What are the effects at different data sizes?
More information is definitely needed before any conclusion can be made.
20% and 80% of the complete batch.
But what about this:
SELECT p.ProductID
FROM Production.Product p
LEFT JOIN Production.WorkOrder w ON p.ProductID = w.ProductID
AND w.ProductID IS NULL;
Then where should we use left join???
Looking at the example it would seem the results would vary based on the number of rows in the WordOrder table and what indexes are set up.
If ProductID is indexed wouldn’t this be a seek operation while the IN statement is reading the entire table?
Always use SET STATISTICS IO ON. The execution plan shows an “estimate” of cpu usage and those numbers are completely worthless since IO rules all.
Although the LEFT JOIN may be more processor intensive it is significantly less IO expensive which is the far more important measure.
The only time to use a subquery and play that little game of “beat the optimizer” is when the tables have normalization or index issues that you have no control over.
Very very good.
Your explanations are great! Simply Great!
Dude,
You know it well how query optimiser works. Do you work for MS cause noone can explain things like you.
NOT EXISTS/LEFT JOIN > SUBQUERY/NOT IN
That’s how I see it “most of the time”
Usually it is better to avoid correlated subselects. Looking only at only resources used is a pretty worthless method of evaluating optimization methods. The goal of optimization is to reduce run-time while still giving the correct answer.
I ran this select,
USE ttst
GO
SET SHOWPLAN_ALL ON
GO
select * from customer
where c.record_type = ‘T’
and c.customer_class_code = ‘LOCAL’
and c.customer_status_code = ‘ACTIVE’
SET SHOWPLAN_ALL OFF
GO
it gave me a error like below,
Server: Msg 1067, Level 15, State 1, Line 5
The SET SHOWPLAN statements must be the only statements in the batch.
(1 row(s) affected)
What is the correct way to use the set showplan_all on?
Jchen
Thanks! Really it is very useful for my query changes
Simple…KISS!
Oracle does not have Exists. Is there an equivalent?
Bob
USE IN or NOT IN
Thanks, good to know!
Lol… But Oracle have EXISTS.
Thanks for such a nice tip… its really help me in MySQL…
Thanks once again
regards,
Naseer Ahmad
–List names of authors who have contributed in
–any book title
SELECT DISTINCT authors.au_id,au_fname
FROM authors,titleauthor
WHERE authors.au_id!=titleauthor.au_id
SELECT au_id, au_fname FROM authors WHERE authors.au_id NOT IN
(SELECT au_id FROM titleauthor)
why the above query did not work as same as the below query??
the above on shows more records then the following.
why we did not do the same work with JOIN as with NOT IN??
Brilliant posts !
hi Pinal,
I want to retrieve only limited data from database like limit is keyword in mysql which retrieve limited data according to our arguments. Like I want to load only 10 records from the database at first display in page. but I don’t want to using top. Because its create problem of paging. So please give me alternate solution for that’d have use row index but in that row index I have to write inner query and my data base I s to heavy.
Sanjay, you can use ROWCOUNT.
SET ROWCOUNT 10
SELECT * FROM table
SET ROWCOUNT 0
Beware of SET ROWCOUNT
It may be removed from future relases of SQL Server
So make use of TOP instead
Very Helpful!
The difference is obvious in the query plans, the Join query is not using the index. You could alter the indices or the query to take advantage of the indices. Also, when you look at the statistics output you can see that the Join query is doing much better than the Not In query based on reads and scans, which in a high volume environment will be a much better indication of performance anyway. My opinion, the percentages shown on query plans are useless, use query plans only to figure out what indices you’re not taking advantage of (i.e. look for table and index scans).