In recent training one of the attendee asked if I can show simple method to convert IN clause to EXISTS clause. Here is the simple example.
USE AdventureWorks
GO
-- use of =
SELECT *
FROM HumanResources.Employee E
WHERE E.EmployeeID = ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO
-- use of exists
SELECT *
FROM HumanResources.Employee E
WHERE EXISTS ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO
It is NOT necessary that every time when IN is replaced by EXISTS it gives better performance. However, in our case listed above it does for sure give better performance.
Click on below image to see the execution plan.

Reference: Pinal Dave (http://blog.SQLAuthority.com)












Hi Pinal, great article!!!
So, I made queries with the same results using IN and EXISTS, and the SQL Server show me the same execution plan.
Do you know if these queries can make differents plans if I use indexes or another resource?
Thanks!
—
Queries:
USE AdventureWorks
GO
– use of in
SELECT *
FROM HumanResources.Employee E
WHERE E.EmployeeID IN ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA)
GO
– use of exists
SELECT *
FROM HumanResources.Employee E
WHERE EXISTS ( SELECT *
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO
I’ve discovered that execution plans can differ greatly depending from, at least, following:
1. How much data there is in the table. With huge amount of data Sql Server will revert to full index scan instead of index seek.
2. Statistics. Execution plan can differ when you run it the first time and when the server has accumulated enough statistics to make more sensible decision.
3. Hardware. I’ve seen quite different execution plans depending how many CPUs there are. In my laptop execution plan is wildly different from what I had in our 4 node cluster with 8 CPUs and 32 cores.
Unfortunately I can’t access our big servers right now to test these hypothesis.
Regarding point 3 above, the query optimizer does NOT adjust the execution plan based on the available hardware.
If parallelism in enabled (using Degree of Parallism setting or MAXDOP) and the query cost is greater than cost threshold for parallelism, you will get a parallel plan.
The actual DOP is decided only at the runtime.
Hi Paulo,
In additional to what Marco has suggested that many times SQL Server Engine is smart enough to determine optimal execution plan for both of the different query and uses the same.
Kind Regards,
Pinal
Actually, once I even saw a case when a query was performing poorly when the SQL Server had 16 Gb of memory assigned, but when the memory was limited to 4Gb, the query got a different execution plan, and was performing 10 times faster.
Weird. Server was running 64 bit server?
Was it running many such queries at the same time? If there’s multiple queries consuming gigaloads of memory then limiting the available memory per query could alleviate the problem. But I’ve never seen such a situation so I can’t tell for sure.
Another thing. There’ve been known situations where there are some conflicts between Sql Server memory management and Windows memory management causing memory pressures which causes the whole Sql Server to be swapped to the disk. This naturally brings a lot of performance issues. I think this related to some faulty drivers and problem can be circumvented by lowering the amount of memory Sql Server is allowed to use.
The first subquery has “=”, not “IN”. (Paulo has the correct version with the “IN”.) If it was written with “IN”, then it would be semantically equivalent to the 2nd query and I would expect a pre-optimization phase to normalize them both into the same internal form. That can’t be done in this case (“=”), as the execution engine has to check for cardinality violations. A cardinality violation could occur because an employee could have multiple addresses. I would expect the “=” subquery to be flattened into a join if there was also an “=” condition on AddressID, as that would guarantee that there could not be a cardinality violation.
Also, the first version of query (the original one with “=”) will simply fail in case of 1:N relationships (I don’t know the structure, but assume there 1:N relationships), so if sub-query returns more than one record it’ll fail.
Hi,
Nice Article….
Hello there.
Both Jerry and Igor are right, so I will not digress. The only thing to mention here is that the queries will generate identical plans if they are logically equivalent.
What really separates them is when you use NOT IN and NOT EXISTS, and the subquery contains NULLs. There is an excellent explanation in Inside SQL Server 2008: T-SQL Querying.
I hope it helps.
best regards,
calin
[...] SQL SERVER – Convert IN to EXISTS – Performance Talk [...]
Hi Pinal
Is there any difference between ‘COUNT(*)’ and ‘COUNT(2)’.
I have seen execution plan and execution time in Sql Server 2008 but i have not found any difference..
Thanks & Regards
Piyush Sachan
They both are same. However if you use column_name, the count excludes NULLs