SQL SERVER – Join Better Performance – LEFT JOIN or NOT IN?

First of all answer this question : Which method of T-SQL is better for performance LEFT JOIN or NOT IN when writing a query? The answer is: It depends! It all depends on what kind of data is and what kind query it is etc. In that case just for fun guess one option LEFT JOIN or NOT IN. If you need to refer the query which demonstrates the mentioned clauses, review following two queries for Join Better Performance.

USE AdventureWorks;
GO
SELECT ProductID
FROM Production.Product
WHERE ProductID
NOT IN (
SELECT ProductID
FROM Production.WorkOrder);
GO
SELECT p.ProductID
FROM Production.Product p
LEFT JOIN Production.WorkOrder w ON p.ProductID = w.ProductID
WHERE w.ProductID IS NULL;
GO

Now let us examine the actual execution plan of both the queries. Click on image to see larger image.
SQL SERVER - Join Better Performance - LEFT JOIN or NOT IN? executionplansmall

You can clearly observe that first query with a NOT IN takes 20% resources of the execution plan and LEFT JOIN takes 80% resources of execution plan. It is better to use NOT IN clause over LEFT JOIN in this particular example. Please note that this is not generic conclusion and applies to this example only. Your results may vary on many factors. Let me know your comments if you have guessed this correct or not.

Reference : Pinal Dave (https://blog.sqlauthority.com)

SQL Joins, SQL Scripts, SQL Server
Previous Post
SQL SERVER – Surface Area Configuration Tools Reduce Exposure To Security Risks
Next Post
SQL SERVER – Converting Subqueries to Joins

Related Posts

70 Comments. Leave new

  • Hi!
    It is bit hard to guess which one is actually most efficient.

    If you turn on the STATISTICS IO and check messages, you’ll notice that the NOT IN -query accesses significantly more data pages with more scans:

    NOT IN:
    Table ‘WorkOrder’. Scan count 504, logical reads 1097, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table ‘Product’. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    LEFT JOIN:
    Table ‘WorkOrder’. Scan count 1, logical reads 101, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table ‘Product’. Scan count 1, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Compare especially the Scan Count and Logical Reads -values.

    Also, I tried to get the same results with using EXISTS and Query Optimizer produced exactly the same Execution Plan as for the NOT IN -query:
    SELECT ProductID
    FROM Production.Product p
    WHERE NOT EXISTS
    (SELECT ProductID
    FROM Production.WorkOrder w
    WHERE p.ProductID = w.ProductID);

    As EXISTS -query usually outperforms JOINs, the worrying STATISTICS IO -results probably tells about badly optimized indexes for this kind of query.

    Like you said: it depends on the database structure and amount of data. I also recommend not to stare only at the Execution Plans as they don’t always tell the whole truth.

    Reply
  • Hmm. I didn’t really get anything out of that. Percentages are useful but what were the overall timings. 20% of 10s against 80% of one second or the other way around?

    What are the effects at different data sizes?

    More information is definitely needed before any conclusion can be made.

    Reply
  • 20% and 80% of the complete batch.

    But what about this:

    SELECT p.ProductID
    FROM Production.Product p
    LEFT JOIN Production.WorkOrder w ON p.ProductID = w.ProductID
    AND w.ProductID IS NULL;

    Reply
  • Then where should we use left join???

    Reply
  • Looking at the example it would seem the results would vary based on the number of rows in the WordOrder table and what indexes are set up.
    If ProductID is indexed wouldn’t this be a seek operation while the IN statement is reading the entire table?

    Reply
  • Always use SET STATISTICS IO ON. The execution plan shows an “estimate” of cpu usage and those numbers are completely worthless since IO rules all.

    Although the LEFT JOIN may be more processor intensive it is significantly less IO expensive which is the far more important measure.

    The only time to use a subquery and play that little game of “beat the optimizer” is when the tables have normalization or index issues that you have no control over.

    Reply
  • Very very good.
    Your explanations are great! Simply Great!

    Reply
  • Dude,

    You know it well how query optimiser works. Do you work for MS cause noone can explain things like you.

    Reply
  • NOT EXISTS/LEFT JOIN > SUBQUERY/NOT IN

    That’s how I see it “most of the time”

    Reply
  • Usually it is better to avoid correlated subselects. Looking only at only resources used is a pretty worthless method of evaluating optimization methods. The goal of optimization is to reduce run-time while still giving the correct answer.

    Reply
  • I ran this select,
    USE ttst
    GO
    SET SHOWPLAN_ALL ON
    GO
    select * from customer
    where c.record_type = ‘T’
    and c.customer_class_code = ‘LOCAL’
    and c.customer_status_code = ‘ACTIVE’
    SET SHOWPLAN_ALL OFF
    GO
    it gave me a error like below,
    Server: Msg 1067, Level 15, State 1, Line 5
    The SET SHOWPLAN statements must be the only statements in the batch.

    (1 row(s) affected)
    What is the correct way to use the set showplan_all on?

    Jchen

    Reply
  • Thanks! Really it is very useful for my query changes

    Reply
  • Simple…KISS!

    Oracle does not have Exists. Is there an equivalent?

    Bob

    Reply
  • Thanks for such a nice tip… its really help me in MySQL…

    Thanks once again

    regards,
    Naseer Ahmad

    Reply
  • –List names of authors who have contributed in
    –any book title
    SELECT DISTINCT authors.au_id,au_fname
    FROM authors,titleauthor
    WHERE authors.au_id!=titleauthor.au_id

    SELECT au_id, au_fname FROM authors WHERE authors.au_id NOT IN
    (SELECT au_id FROM titleauthor)

    why the above query did not work as same as the below query??
    the above on shows more records then the following.
    why we did not do the same work with JOIN as with NOT IN??

    Reply
  • Brilliant posts !

    Reply
  • hi Pinal,

    I want to retrieve only limited data from database like limit is keyword in mysql which retrieve limited data according to our arguments. Like I want to load only 10 records from the database at first display in page. but I don’t want to using top. Because its create problem of paging. So please give me alternate solution for that’d have use row index but in that row index I have to write inner query and my data base I s to heavy.

    Reply
  • Tiago Salgado
    May 25, 2009 2:13 pm

    Sanjay, you can use ROWCOUNT.

    SET ROWCOUNT 10
    SELECT * FROM table
    SET ROWCOUNT 0

    Reply
    • Beware of SET ROWCOUNT
      It may be removed from future relases of SQL Server
      So make use of TOP instead

      Reply
  • brvcoordinator
    June 10, 2009 9:34 pm

    Very Helpful!

    Reply
  • The difference is obvious in the query plans, the Join query is not using the index. You could alter the indices or the query to take advantage of the indices. Also, when you look at the statistics output you can see that the Join query is doing much better than the Not In query based on reads and scans, which in a high volume environment will be a much better indication of performance anyway. My opinion, the percentages shown on query plans are useless, use query plans only to figure out what indices you’re not taking advantage of (i.e. look for table and index scans).

    Reply

Leave a Reply