Using SQL Server functions in the WHERE clause of your queries can significantly impact performance. SQL Server must calculate the function for every row in the table, which can be time-consuming, especially for large tables. Instead, using the raw data in your WHERE clause whenever possible is more efficient.
Why Are Functions in the WHERE Clause a Problem?
SQL functions in a WHERE clause can prevent SQL Server from effectively using indexes on the columns used in the function. This happens because the function needs to be executed for each row in the table, effectively turning an indexed search into a full table scan.
Moreover, using functions in the WHERE clause can lead to computation overhead because the function needs to be calculated for each row in the table. This can be especially impactful for large tables with millions of rows.
Why Do People Convert String to Upper Case and Compare?
Case sensitivity can be a tricky issue in SQL. By default, string comparisons in SQL Server are case sensitive because the server uses binary collation. This can lead to unexpected results for those not familiar with this behavior. For example, ‘Smith’ and ‘smith’ would not be seen as equal in a binary collation.
One common workaround is to use the UPPER (or LOWER) function to convert all strings to the same case before comparison. This ensures that ‘Smith’ and ‘smith’ are seen as equal. However, as we’ve discussed, using these functions in the WHERE clause can impact performance.
A more efficient solution is to use the COLLATE clause to specify a case-insensitive collation for the comparison. This allows SQL Server to do a case-insensitive comparison without converting all the strings, which can be much more efficient.
Real-World Testing and the Importance of Indexes
Consider the Person
table in the Person
schema of the AdventureWorks2019
database. The Person
the table has a column LastName
that contains text strings, and we assume an index exists in this column.
We executed the following queries:
SET STATISTICS IO ON -- 211 records SELECT * FROM AdventureWorks2019.Person.Person WHERE UPPER(LastName) LIKE 'DIAZ%'; -- 211 records SELECT * FROM AdventureWorks2019.Person.Person WHERE LastName COLLATE SQL_Latin1_General_CP1_CI_AS LIKE 'DIAZ%'; -- 1 records SELECT * FROM AdventureWorks2019.Person.Person WHERE UPPER(LastName) LIKE 'JARVIS%'; -- 1 records SELECT * FROM AdventureWorks2019.Person.Person WHERE LastName COLLATE SQL_Latin1_General_CP1_CI_AS LIKE 'JARVIS%';
The first and third queries use the UPPER function in the WHERE clause. The second and fourth queries use the COLLATE clause to specify a case-insensitive collation, allowing for a case-insensitive comparison of the column without performing any functions.
Here is the execution plan comparison when we searched for DIAZ, which returned 211 rows.
Here is the execution plan comparison when we searched for JARVIS, which returned 1 row.
The statistics for these queries were as follows:
- For ‘DIAZ‘:
- With UPPER: 762 logical reads
- With COLLATE: 659 logical reads
- For ‘JARVIS‘:
- With UPPER: 110 logical reads
- With COLLATE: 5 logical reads
The COLLATE clause performs significantly fewer reads than the UPPER function, demonstrating that it’s more efficient to avoid using functions in the WHERE clause whenever possible. However, it’s important to note that this performance benefit is primarily due to the existence of an index on the LastName
column. If no index existed on the column, the performance difference between the two methods may not be significant.
Conclusion
This deep dive illustrates why using functions in the WHERE clause can significantly impact the performance of your SQL Server queries and why converting strings to upper case for comparison purposes can be problematic. By understanding the underlying reasons and knowing how to avoid using functions in the WHERE clause, you can optimize your queries and improve the overall efficiency of your database operations. Always consider the data type, collation, and indexing strategy when writing your SQL queries to ensure optimal performance.
You can always reach out to me on Twitter.
Reference: Pinal Dave (https://blog.sqlauthority.com)
4 Comments. Leave new
What are the options for a datetime column need to be compared to a date
SaleDate between @StartDate and @EndDate
I run into an issue where the sale date has date and time but @EndDate is just Date. so the query excludes the last day unless it happens exactly at midnight. So I run cast(SaleDate as Date) but it slows the query
Not to diminish from your point about functions in the where clause, but the default server-level collation for SQL Server is based on the locale of the operating system. See https://learn.microsoft.com/en-us/sql/relational-databases/collations/set-or-change-the-server-collation?view=sql-server-ver16. Being fully transparent, I thought the default collation was SQL_Latin1_General_CP1_CI_AS, which is case insensitive. Glad I looked it up!
Robert, I really appreciate your comment.
Thanks for adding this information. This will further give more clarification to the users.
WHERE SaleDate >= @StartDate
AND SalesDate < DATEADD (day, 1, @EndDate)