In today’s blog post let us discuss Performance Comparison – BETWEEN, IN, and Operators. I recently discussed this during the Comprehensive Database Performance Health Check.
Let me ask you the question first, look at the queries here and tell me which one of the queries will run the faster.
SET STATISTICS IO ON GO -- Query with Between SELECT * FROM [WideWorldImporters].[Sales].[InvoiceLines] WHERE InvoiceID >= 20 AND InvoiceID <= 40; GO -- Query with Operators SELECT * FROM [WideWorldImporters].[Sales].[InvoiceLines] WHERE InvoiceID BETWEEN 20 AND 40; GO -- Query with In SELECT * FROM [WideWorldImporters].[Sales].[InvoiceLines] WHERE InvoiceID IN (20,21,22,23,24,25,26,27,28,29,30, 31,32,33,34,35,36,37,38,39,40) GO
Now here is the question for you when you look at the queries here, which one of the queries here will give optimal performance.
Let me post a screenshot of the execution plan.
It is very clear from the execution plan that in our case the query which is using IN keywords is the fastest. Let us see the output of the statistics IO.
BETWEEN: Table ‘InvoiceLines’. Scan count 1, logical reads 162
Operators <= and >= : Table ‘InvoiceLines’. Scan count 1, logical reads 152
IN: Table ‘InvoiceLines’. Scan count 1, logical reads 180
It is very clear from the output of the statistics IO even though the second query looks expensive with the operator it is not that IO intensive.
The final summary is that in any case using IN which gives a clear idea to the SQL Server Query engine what actually we are looking for in our resultset results in the BETTER ESTIMATION from the STATISTICS, leading optimal performance.
Now I have tried with many different queries and every single time I have found that BETWEEN is either slower than IN or gives equally bad/good performance as IN. I have yet to come across the scenario where BETWEEN is better than IN. This leads me to believe that if there is an opportunity one should use the IN operator over BETWEEN.
It is very much possible that this query gives different results for you on your system and also the result is totally dependent on the statistics and data distribution you have in your table. I also believe there are queries out where BETWEEN may be faster than any other option.
Based on everything that we have discussed, I suggest that you test out your queries with your data and do the final conclusion. I will be interested in reading your comments. Should I convert this blog post to SQL in Sixty Seconds video?
Update: The original script used in the video had an error in the WHERE condition which has been fixed. However, this still does not change the opinion that one should check the STATISTICS IO before making the decision which query is optimal for them.
Reference: Pinal Dave (https://blog.sqlauthority.com)
When having a range of 2000, this case would result in “IN” being the slowest option as seen in my below results.
Scan count 1, logical reads 2052
Scan count 1, logical reads 2052
Scan count 504, logical reads 3686
… and that is totally possible which I discussed in the SQL in the Sixty Seconds video.
I think the performance you’re seeing derives more from the column being indexed or not. It would be interesting to see the results of these operators on a column with a clustered index, a non-clustered index and no index at all. I suspect the results would show a different winner in each case.
InvoiceID >= 20 AND StockItemID <= 40;
WHERE InvoiceID BETWEEN 20 AND 40;
This has never been the same query!!!!!!
Fixed that part and posted new images. (You are correct)
The more interesting thing now is that the query plans for between and = are identical.
I generated a similar test with the operator and the between with a bigger pull and you are right they have the same execution plan yet the operator is faster