I recently stirred up an interesting discussion on Twitter regarding the performance difference between using “OR” and “IN” in SQL queries. The question attracted a variety of responses, both correct and incorrect. In this blog post, I’ll outline an example to clarify the correct answer. Let us discover more about OR vs IN.
— Pinal Dave (@pinaldave) September 10, 2023
Let’s consider the following SQL queries:
USE AdventureWorks2019; SELECT * FROM Sales.SalesOrderHeader WHERE CustomerID IN (16529, 25249, 27668); SELECT * FROM Sales.SalesOrderHeader WHERE CustomerID = 16529 OR CustomerID = 25249 OR CustomerID = 27668;
The question was: Which of these queries would perform better and why?
The surprising answer is that both queries will perform identically. To understand why, let’s delve into the statistics IO for both queries:
For both queries:
Table 'SalesOrderHeader'. Scan count 3, logical reads 24, physical reads 0. Table 'SalesOrderHeader'. Scan count 3, logical reads 24, physical reads 0.
The statistics IO for both queries is identical, indicating that they are processed similarly by the SQL Server.
Furthermore, when we examine the execution plans for both queries, we find they are identical.
The first query uses the IN keyword, while the second uses the OR keyword. Logically, both queries ask for the same data: records from the SalesOrderHeader table where the CustomerID is either 16529, 25249, or 27668.
When the SQL Server processes these queries, it doesn’t just translate the SQL into machine code and execute it. Instead, the query optimizer assesses how to retrieve the data most efficiently. It considers the database’s current state, including the available indexes, the table’s size, the system’s available resources, and more. In this case, the optimizer recognizes that both queries are asking for the same data and that the data can be retrieved similarly.
The optimizer’s decision is reflected in the statistics IO and the execution plan for both queries. They show that the same amount of data is being read (logical reads 24) and that the same number of index or table scans are being performed (scan count 3). Furthermore, the execution plans are identical, confirming that the queries are processed similarly.
So, the SQL Server engine’s internals, particularly the query optimizer, are why these two different queries have the same performance. It demonstrates the optimizer’s ability to interpret the intent of a query and choose the most efficient execution plan, regardless of the specific SQL used.
This is not a hard and fast rule, however. Depending on the complexity of your queries and your data structure, the SQL Server engine may not always be able to optimize different queries to the same extent. It’s always a good idea to test your queries’ performance and inspect their execution plans.
Contrary to any theories or suggestions you might have heard, there is no performance difference between using “OR” and “IN” in this context and example.
You can always reach out to me on Twitter.
Reference: Pinal Dave (https://blog.sqlauthority.com)