SQL SERVER – Convert IN to EXISTS – Performance Talk

In recent training one of the attendee asked if I can show simple method to convert IN clause to EXISTS clause. Here is the simple example.

USE AdventureWorks
GO
-- use of =
SELECT *
FROM HumanResources.Employee E
WHERE E.EmployeeID = ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO
-- use of exists
SELECT *
FROM HumanResources.Employee E
WHERE EXISTS ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO

It is NOT necessary that every time when IN is replaced by EXISTS it gives better performance. However, in our case listed above it does for sure give better performance.

Click on below image to see the execution plan.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

About these ads

17 thoughts on “SQL SERVER – Convert IN to EXISTS – Performance Talk

  1. Hi Pinal, great article!!!

    So, I made queries with the same results using IN and EXISTS, and the SQL Server show me the same execution plan.

    Do you know if these queries can make differents plans if I use indexes or another resource?

    Thanks!

    Queries:

    USE AdventureWorks
    GO

    – use of in
    SELECT *
    FROM HumanResources.Employee E
    WHERE E.EmployeeID IN ( SELECT EA.EmployeeID
    FROM HumanResources.EmployeeAddress EA)
    GO

    – use of exists
    SELECT *
    FROM HumanResources.Employee E
    WHERE EXISTS ( SELECT *
    FROM HumanResources.EmployeeAddress EA
    WHERE EA.EmployeeID = E.EmployeeID)
    GO

    • I’ve discovered that execution plans can differ greatly depending from, at least, following:

      1. How much data there is in the table. With huge amount of data Sql Server will revert to full index scan instead of index seek.
      2. Statistics. Execution plan can differ when you run it the first time and when the server has accumulated enough statistics to make more sensible decision.
      3. Hardware. I’ve seen quite different execution plans depending how many CPUs there are. In my laptop execution plan is wildly different from what I had in our 4 node cluster with 8 CPUs and 32 cores.

      Unfortunately I can’t access our big servers right now to test these hypothesis.

      • Regarding point 3 above, the query optimizer does NOT adjust the execution plan based on the available hardware.

        If parallelism in enabled (using Degree of Parallism setting or MAXDOP) and the query cost is greater than cost threshold for parallelism, you will get a parallel plan.

        The actual DOP is decided only at the runtime.

    • Hi Paulo,

      In additional to what Marco has suggested that many times SQL Server Engine is smart enough to determine optimal execution plan for both of the different query and uses the same.

      Kind Regards,
      Pinal

  2. Actually, once I even saw a case when a query was performing poorly when the SQL Server had 16 Gb of memory assigned, but when the memory was limited to 4Gb, the query got a different execution plan, and was performing 10 times faster.

    • Weird. Server was running 64 bit server?

      Was it running many such queries at the same time? If there’s multiple queries consuming gigaloads of memory then limiting the available memory per query could alleviate the problem. But I’ve never seen such a situation so I can’t tell for sure.

      Another thing. There’ve been known situations where there are some conflicts between Sql Server memory management and Windows memory management causing memory pressures which causes the whole Sql Server to be swapped to the disk. This naturally brings a lot of performance issues. I think this related to some faulty drivers and problem can be circumvented by lowering the amount of memory Sql Server is allowed to use.

  3. The first subquery has “=”, not “IN”. (Paulo has the correct version with the “IN”.) If it was written with “IN”, then it would be semantically equivalent to the 2nd query and I would expect a pre-optimization phase to normalize them both into the same internal form. That can’t be done in this case (“=”), as the execution engine has to check for cardinality violations. A cardinality violation could occur because an employee could have multiple addresses. I would expect the “=” subquery to be flattened into a join if there was also an “=” condition on AddressID, as that would guarantee that there could not be a cardinality violation.

  4. Also, the first version of query (the original one with “=”) will simply fail in case of 1:N relationships (I don’t know the structure, but assume there 1:N relationships), so if sub-query returns more than one record it’ll fail.

  5. Hello there.

    Both Jerry and Igor are right, so I will not digress. The only thing to mention here is that the queries will generate identical plans if they are logically equivalent.

    What really separates them is when you use NOT IN and NOT EXISTS, and the subquery contains NULLs. There is an excellent explanation in Inside SQL Server 2008: T-SQL Querying.

    I hope it helps.

    best regards,
    calin

  6. Pingback: SQLAuthority News – A Monthly Round Up of SQLAuthority Blog Posts Journey to SQL Authority with Pinal Dave

  7. Hi Pinal

    Is there any difference between ‘COUNT(*)’ and ‘COUNT(2)’.

    I have seen execution plan and execution time in Sql Server 2008 but i have not found any difference..

    Thanks & Regards
    Piyush Sachan

  8. Pingback: SQL SERVER – Weekly Series – Memory Lane – #032 | Journey to SQL Authority with Pinal Dave

  9. there is diff between NOT IN and NOT EXISTS, when subquery contains NULLs.NOT IN query will always return an empty result set while NOT EXISTS returns correct result set.

  10. HI Pinal
    Thanks for the great suggestion but would you please take a look for below script.
    we have two tables like
    create table emp(id int,name varchar(20))
    insert into emp values(1,’ashish’)
    insert into emp values(2,’sachin’)
    insert into emp values(3,’akash’)
    insert into emp values(4,’anup’)
    insert into emp values(5,’avi’)

    create table dept(deptid int,deptname varchar(20),empid int)
    insert into dept values(11,’civil’,1)
    insert into dept values(12,’mech’,2)
    insert into dept values(13,’it’,3)
    insert into dept values(15,’comp’,5)

    –by IN
    select id,name from emp as A
    where id in (select empid from dept as B where A.id=B.empid)

    id name
    1 ashish
    2 sachin
    3 akash
    4 anup

    –by Exists
    select id,name from emp as A
    where exists(select empid from dept as B where A.id=B.empid)

    id name
    1 ashish
    2 sachin
    3 akash
    4 anup

    The result is same for both case but when I am trying for below script ,I am getting some unwanted result.so how we can say we can replace IN by Exists.
    select id,name from emp as A
    where id in (select empid from dept)
    id name
    1 ashish
    2 sachin
    3 akash
    4 anup

    select id,name from emp as A
    where exists(select empid from dept)
    id name
    1 ashish
    2 sachin
    3 akash
    4 anup
    5 avi

    Please take a look and suggest me if I am wrong.

  11. HI Pinal ,

    As I tried and read ,I think ,the exists is giving us the Boolean result(true/false) rather than actual values. So that we are getting all records from emp table(5 records for 5 different empid) rather than getting only four records, that are available in dept table.

    Please take a look and suggest me if I am wrong.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s