SQL SERVER – Convert IN to EXISTS – Performance Talk

In recent training one of the attendee asked if I can show a simple method to convert IN clause to EXISTS clause so it impacts performance. Here is the simple example.

USE AdventureWorks
GO
-- use of =
SELECT *
FROM HumanResources.Employee E
WHERE E.EmployeeID = ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO
-- use of exists
SELECT *
FROM HumanResources.Employee E
WHERE EXISTS ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO

It is NOT necessary that every time when IN is replaced by EXISTS it gives better performance. However, in our case listed above it does for sure give better performance.

Click on below image to see the execution plan.

SQL SERVER - Convert IN to EXISTS - Performance Talk inexists

SQL Server Performance Tuning and health check is a very challenging subject that requires expertise in Database Administration and Database Development. Here are few pointers how one can keep their SQL Server Performance Optimization. I am often asked what can one do keep SQL Server Health Optimal and SQL Server keep on running very smooth. What is the tool that you use to check for configuration settings on your SQL Server box? In the past, I have talked about SQL Server Management Studio Standard reports and a number of reports that can help.

Reference: Pinal Dave (https://blog.sqlauthority.com)

,
Previous Post
SQL SERVER – Disabled Index and Update Statistics
Next Post
SQL SERVER – Subquery or Join – Various Options – SQL Server Engine knows the Best

Related Posts

17 Comments. Leave new

  • Paulo R. Pereira
    June 5, 2010 9:16 am

    Hi Pinal, great article!!!

    So, I made queries with the same results using IN and EXISTS, and the SQL Server show me the same execution plan.

    Do you know if these queries can make differents plans if I use indexes or another resource?

    Thanks!

    Queries:

    USE AdventureWorks
    GO

    — use of in
    SELECT *
    FROM HumanResources.Employee E
    WHERE E.EmployeeID IN ( SELECT EA.EmployeeID
    FROM HumanResources.EmployeeAddress EA)
    GO

    — use of exists
    SELECT *
    FROM HumanResources.Employee E
    WHERE EXISTS ( SELECT *
    FROM HumanResources.EmployeeAddress EA
    WHERE EA.EmployeeID = E.EmployeeID)
    GO

    Reply
    • Marko Parkkola
      June 5, 2010 10:41 am

      I’ve discovered that execution plans can differ greatly depending from, at least, following:

      1. How much data there is in the table. With huge amount of data Sql Server will revert to full index scan instead of index seek.
      2. Statistics. Execution plan can differ when you run it the first time and when the server has accumulated enough statistics to make more sensible decision.
      3. Hardware. I’ve seen quite different execution plans depending how many CPUs there are. In my laptop execution plan is wildly different from what I had in our 4 node cluster with 8 CPUs and 32 cores.

      Unfortunately I can’t access our big servers right now to test these hypothesis.

      Reply
      • Regarding point 3 above, the query optimizer does NOT adjust the execution plan based on the available hardware.

        If parallelism in enabled (using Degree of Parallism setting or MAXDOP) and the query cost is greater than cost threshold for parallelism, you will get a parallel plan.

        The actual DOP is decided only at the runtime.

    • Hi Paulo,

      In additional to what Marco has suggested that many times SQL Server Engine is smart enough to determine optimal execution plan for both of the different query and uses the same.

      Kind Regards,
      Pinal

      Reply
  • Actually, once I even saw a case when a query was performing poorly when the SQL Server had 16 Gb of memory assigned, but when the memory was limited to 4Gb, the query got a different execution plan, and was performing 10 times faster.

    Reply
    • Marko Parkkola
      June 5, 2010 4:31 pm

      Weird. Server was running 64 bit server?

      Was it running many such queries at the same time? If there’s multiple queries consuming gigaloads of memory then limiting the available memory per query could alleviate the problem. But I’ve never seen such a situation so I can’t tell for sure.

      Another thing. There’ve been known situations where there are some conflicts between Sql Server memory management and Windows memory management causing memory pressures which causes the whole Sql Server to be swapped to the disk. This naturally brings a lot of performance issues. I think this related to some faulty drivers and problem can be circumvented by lowering the amount of memory Sql Server is allowed to use.

      Reply
  • The first subquery has “=”, not “IN”. (Paulo has the correct version with the “IN”.) If it was written with “IN”, then it would be semantically equivalent to the 2nd query and I would expect a pre-optimization phase to normalize them both into the same internal form. That can’t be done in this case (“=”), as the execution engine has to check for cardinality violations. A cardinality violation could occur because an employee could have multiple addresses. I would expect the “=” subquery to be flattened into a join if there was also an “=” condition on AddressID, as that would guarantee that there could not be a cardinality violation.

    Reply
  • Also, the first version of query (the original one with “=”) will simply fail in case of 1:N relationships (I don’t know the structure, but assume there 1:N relationships), so if sub-query returns more than one record it’ll fail.

    Reply
  • Hi,

    Nice Article….

    Reply
  • Hello there.

    Both Jerry and Igor are right, so I will not digress. The only thing to mention here is that the queries will generate identical plans if they are logically equivalent.

    What really separates them is when you use NOT IN and NOT EXISTS, and the subquery contains NULLs. There is an excellent explanation in Inside SQL Server 2008: T-SQL Querying.

    I hope it helps.

    best regards,
    calin

    Reply
  • Hi Pinal

    Is there any difference between ‘COUNT(*)’ and ‘COUNT(2)’.

    I have seen execution plan and execution time in Sql Server 2008 but i have not found any difference..

    Thanks & Regards
    Piyush Sachan

    Reply
  • prashant srivastava
    December 25, 2013 12:47 am

    there is diff between NOT IN and NOT EXISTS, when subquery contains NULLs.NOT IN query will always return an empty result set while NOT EXISTS returns correct result set.

    Reply
  • HI Pinal
    Thanks for the great suggestion but would you please take a look for below script.
    we have two tables like
    create table emp(id int,name varchar(20))
    insert into emp values(1,’ashish’)
    insert into emp values(2,’sachin’)
    insert into emp values(3,’akash’)
    insert into emp values(4,’anup’)
    insert into emp values(5,’avi’)

    create table dept(deptid int,deptname varchar(20),empid int)
    insert into dept values(11,’civil’,1)
    insert into dept values(12,’mech’,2)
    insert into dept values(13,’it’,3)
    insert into dept values(15,’comp’,5)

    –by IN
    select id,name from emp as A
    where id in (select empid from dept as B where A.id=B.empid)

    id name
    1 ashish
    2 sachin
    3 akash
    4 anup

    –by Exists
    select id,name from emp as A
    where exists(select empid from dept as B where A.id=B.empid)

    id name
    1 ashish
    2 sachin
    3 akash
    4 anup

    The result is same for both case but when I am trying for below script ,I am getting some unwanted result.so how we can say we can replace IN by Exists.
    select id,name from emp as A
    where id in (select empid from dept)
    id name
    1 ashish
    2 sachin
    3 akash
    4 anup

    select id,name from emp as A
    where exists(select empid from dept)
    id name
    1 ashish
    2 sachin
    3 akash
    4 anup
    5 avi

    Please take a look and suggest me if I am wrong.

    Reply
  • HI Pinal ,

    As I tried and read ,I think ,the exists is giving us the Boolean result(true/false) rather than actual values. So that we are getting all records from emp table(5 records for 5 different empid) rather than getting only four records, that are available in dept table.

    Please take a look and suggest me if I am wrong.

    Reply
  • Hi Pinal, I am also facing same issue as ashish facing. If exists is alternate of In and only perfomance difference between them then why query select id,name from emp as A
    where exists(select empid from dept) return all records from emp?

    Reply
  • I have a couple of questions:

    1#

    Which of the following is better.

    select * from users where type in (‘A’,’B’)

    OR

    select * from users where type=’A’ OR type = ‘B’

    2#

    Which query will give you a better performance. WHY?
    I am more interested in WHY part.

    SELECT * FROM Users
    WHERE UserType NOT IN
    (SELECT Name from UserTypes WHERE Name=’X’)

    OR

    SELECT * FROM Users U
    LEFT JOIN UserType UT
    ON U.UserType = UT.Name
    WHERE UT.Name IS NULL

    Reply

Leave a Reply

Menu