SQL SERVER – Convert IN to EXISTS – Performance Talk

June 5, 2010

In recent training one of the attendee asked if I can show a simple method to convert IN clause to EXISTS clause so it impacts performance. Here is the simple example.

USE AdventureWorks
GO
-- use of =
SELECT *
FROM HumanResources.Employee E
WHERE E.EmployeeID = ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO
-- use of exists
SELECT *
FROM HumanResources.Employee E
WHERE EXISTS ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO

It is NOT necessary that every time when IN is replaced by EXISTS it gives better performance. However, in our case listed above it does for sure give better performance.

Click on below image to see the execution plan.

SQL SERVER - Convert IN to EXISTS - Performance Talk inexists

SQL Server Performance Tuning and health check is a very challenging subject that requires expertise in Database Administration and Database Development. Here are few pointers how one can keep their SQL Server Performance Optimization. I am often asked what can one do keep SQL Server Health Optimal and SQL Server keep on running very smooth. What is the tool that you use to check for configuration settings on your SQL Server box? In the past, I have talked about SQL Server Management Studio Standard reports and a number of reports that can help.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Scripts, SQL Server

SQL SERVER – Disabled Index and Update Statistics

SQL SERVER – Subquery or Join – Various Options – SQL Server Engine knows the Best

17 Comments. Leave new

Paulo R. Pereira
June 5, 2010 9:16 am
Hi Pinal, great article!!!
So, I made queries with the same results using IN and EXISTS, and the SQL Server show me the same execution plan.
Do you know if these queries can make differents plans if I use indexes or another resource?
Thanks!
—
Queries:
USE AdventureWorks
GO
— use of in
SELECT *
FROM HumanResources.Employee E
WHERE E.EmployeeID IN ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA)
GO
— use of exists
SELECT *
FROM HumanResources.Employee E
WHERE EXISTS ( SELECT *
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO
Reply
- Marko Parkkola
  June 5, 2010 10:41 am
  I’ve discovered that execution plans can differ greatly depending from, at least, following:
  1. How much data there is in the table. With huge amount of data Sql Server will revert to full index scan instead of index seek.
  2. Statistics. Execution plan can differ when you run it the first time and when the server has accumulated enough statistics to make more sensible decision.
  3. Hardware. I’ve seen quite different execution plans depending how many CPUs there are. In my laptop execution plan is wildly different from what I had in our 4 node cluster with 8 CPUs and 32 cores.
  Unfortunately I can’t access our big servers right now to test these hypothesis.
  Reply
  - Thomas
    June 5, 2010 12:44 pm
    Regarding point 3 above, the query optimizer does NOT adjust the execution plan based on the available hardware.
    If parallelism in enabled (using Degree of Parallism setting or MAXDOP) and the query cost is greater than cost threshold for parallelism, you will get a parallel plan.
    The actual DOP is decided only at the runtime.
Feodor
June 5, 2010 1:18 pm
Actually, once I even saw a case when a query was performing poorly when the SQL Server had 16 Gb of memory assigned, but when the memory was limited to 4Gb, the query got a different execution plan, and was performing 10 times faster.
Reply
- Marko Parkkola
  June 5, 2010 4:31 pm
  Weird. Server was running 64 bit server?
  Was it running many such queries at the same time? If there’s multiple queries consuming gigaloads of memory then limiting the available memory per query could alleviate the problem. But I’ve never seen such a situation so I can’t tell for sure.
  Another thing. There’ve been known situations where there are some conflicts between Sql Server memory management and Windows memory management causing memory pressures which causes the whole Sql Server to be swapped to the disk. This naturally brings a lot of performance issues. I think this related to some faulty drivers and problem can be circumvented by lowering the amount of memory Sql Server is allowed to use.
  Reply
Jerry B
June 7, 2010 7:55 am
The first subquery has “=”, not “IN”. (Paulo has the correct version with the “IN”.) If it was written with “IN”, then it would be semantically equivalent to the 2nd query and I would expect a pre-optimization phase to normalize them both into the same internal form. That can’t be done in this case (“=”), as the execution engine has to check for cardinality violations. A cardinality violation could occur because an employee could have multiple addresses. I would expect the “=” subquery to be flattened into a join if there was also an “=” condition on AddressID, as that would guarantee that there could not be a cardinality violation.
Reply
IgorA
June 11, 2010 5:35 am
Also, the first version of query (the original one with “=”) will simply fail in case of 1:N relationships (I don’t know the structure, but assume there 1:N relationships), so if sub-query returns more than one record it’ll fail.
Reply
Jay
June 11, 2010 12:49 pm
Hi,
Nice Article….
Reply
calin
June 14, 2010 11:48 am
Hello there.
Both Jerry and Igor are right, so I will not digress. The only thing to mention here is that the queries will generate identical plans if they are logically equivalent.
What really separates them is when you use NOT IN and NOT EXISTS, and the subquery contains NULLs. There is an excellent explanation in Inside SQL Server 2008: T-SQL Querying.
I hope it helps.
best regards,
calin
Reply
Piyush Sachan
February 23, 2011 2:07 pm
Hi Pinal
Is there any difference between ‘COUNT(*)’ and ‘COUNT(2)’.
I have seen execution plan and execution time in Sql Server 2008 but i have not found any difference..
Thanks & Regards
Piyush Sachan
Reply
- madhivanan
  February 24, 2011 3:00 pm
  They both are same. However if you use column_name, the count excludes NULLs
  Reply
prashant srivastava
December 25, 2013 12:47 am
there is diff between NOT IN and NOT EXISTS, when subquery contains NULLs.NOT IN query will always return an empty result set while NOT EXISTS returns correct result set.
Reply
ashish nasre
December 27, 2013 8:12 pm
HI Pinal
Thanks for the great suggestion but would you please take a look for below script.
we have two tables like
create table emp(id int,name varchar(20))
insert into emp values(1,’ashish’)
insert into emp values(2,’sachin’)
insert into emp values(3,’akash’)
insert into emp values(4,’anup’)
insert into emp values(5,’avi’)
create table dept(deptid int,deptname varchar(20),empid int)
insert into dept values(11,’civil’,1)
insert into dept values(12,’mech’,2)
insert into dept values(13,’it’,3)
insert into dept values(15,’comp’,5)
–by IN
select id,name from emp as A
where id in (select empid from dept as B where A.id=B.empid)
id name
1 ashish
2 sachin
3 akash
4 anup
–by Exists
select id,name from emp as A
where exists(select empid from dept as B where A.id=B.empid)
id name
1 ashish
2 sachin
3 akash
4 anup
The result is same for both case but when I am trying for below script ,I am getting some unwanted result.so how we can say we can replace IN by Exists.
select id,name from emp as A
where id in (select empid from dept)
id name
1 ashish
2 sachin
3 akash
4 anup
select id,name from emp as A
where exists(select empid from dept)
id name
1 ashish
2 sachin
3 akash
4 anup
5 avi
Please take a look and suggest me if I am wrong.
Reply
ashish nasre
December 27, 2013 8:44 pm
HI Pinal ,
As I tried and read ,I think ,the exists is giving us the Boolean result(true/false) rather than actual values. So that we are getting all records from emp table(5 records for 5 different empid) rather than getting only four records, that are available in dept table.
Please take a look and suggest me if I am wrong.
Reply
pavan dubey
September 22, 2014 1:38 pm
Hi Pinal, I am also facing same issue as ashish facing. If exists is alternate of In and only perfomance difference between them then why query select id,name from emp as A
where exists(select empid from dept) return all records from emp?
Reply
Datta Kale
April 7, 2015 11:40 am
I have a couple of questions:
1#
Which of the following is better.
select * from users where type in (‘A’,’B’)
OR
select * from users where type=’A’ OR type = ‘B’
2#
Which query will give you a better performance. WHY?
I am more interested in WHY part.
SELECT * FROM Users
WHERE UserType NOT IN
(SELECT Name from UserTypes WHERE Name=’X’)
OR
SELECT * FROM Users U
LEFT JOIN UserType UT
ON U.UserType = UT.Name
WHERE UT.Name IS NULL
Reply