SQL SERVER – Avoid Functions in the WHERE Clause for Performance

September 13, 2023

SQL SERVER - Avoid Functions in the WHERE Clause for Performance funrace-800x457 Using SQL Server functions in the WHERE clause of your queries can significantly impact performance. SQL Server must calculate the function for every row in the table, which can be time-consuming, especially for large tables. Instead, using the raw data in your WHERE clause whenever possible is more efficient.

Why Are Functions in the WHERE Clause a Problem?

SQL functions in a WHERE clause can prevent SQL Server from effectively using indexes on the columns used in the function. This happens because the function needs to be executed for each row in the table, effectively turning an indexed search into a full table scan.

Moreover, using functions in the WHERE clause can lead to computation overhead because the function needs to be calculated for each row in the table. This can be especially impactful for large tables with millions of rows.

Why Do People Convert String to Upper Case and Compare?

Case sensitivity can be a tricky issue in SQL. By default, string comparisons in SQL Server are case sensitive because the server uses binary collation. This can lead to unexpected results for those not familiar with this behavior. For example, ‘Smith’ and ‘smith’ would not be seen as equal in a binary collation.

One common workaround is to use the UPPER (or LOWER) function to convert all strings to the same case before comparison. This ensures that ‘Smith’ and ‘smith’ are seen as equal. However, as we’ve discussed, using these functions in the WHERE clause can impact performance.

A more efficient solution is to use the COLLATE clause to specify a case-insensitive collation for the comparison. This allows SQL Server to do a case-insensitive comparison without converting all the strings, which can be much more efficient.

Real-World Testing and the Importance of Indexes

Consider the Person table in the Person schema of the AdventureWorks2019 database. The Person the table has a column LastName that contains text strings, and we assume an index exists in this column.

We executed the following queries:

SET STATISTICS IO ON
-- 211 records
SELECT *
FROM AdventureWorks2019.Person.Person
WHERE UPPER(LastName) LIKE 'DIAZ%';
-- 211 records
SELECT *
FROM AdventureWorks2019.Person.Person
WHERE LastName COLLATE SQL_Latin1_General_CP1_CI_AS LIKE 'DIAZ%';
-- 1 records
SELECT *
FROM AdventureWorks2019.Person.Person
WHERE UPPER(LastName) LIKE 'JARVIS%';
-- 1 records
SELECT *
FROM AdventureWorks2019.Person.Person
WHERE LastName COLLATE SQL_Latin1_General_CP1_CI_AS LIKE 'JARVIS%';

The first and third queries use the UPPER function in the WHERE clause. The second and fourth queries use the COLLATE clause to specify a case-insensitive collation, allowing for a case-insensitive comparison of the column without performing any functions.

Here is the execution plan comparison when we searched for DIAZ, which returned 211 rows.

SQL SERVER - Avoid Functions in the WHERE Clause for Performance whfun1

Here is the execution plan comparison when we searched for JARVIS, which returned 1 row.

SQL SERVER - Avoid Functions in the WHERE Clause for Performance whfun2

The statistics for these queries were as follows:

For ‘DIAZ‘:
- With UPPER: 762 logical reads
- With COLLATE: 659 logical reads
For ‘JARVIS‘:
- With UPPER: 110 logical reads
- With COLLATE: 5 logical reads

The COLLATE clause performs significantly fewer reads than the UPPER function, demonstrating that it’s more efficient to avoid using functions in the WHERE clause whenever possible. However, it’s important to note that this performance benefit is primarily due to the existence of an index on the LastName column. If no index existed on the column, the performance difference between the two methods may not be significant.

Conclusion

This deep dive illustrates why using functions in the WHERE clause can significantly impact the performance of your SQL Server queries and why converting strings to upper case for comparison purposes can be problematic. By understanding the underlying reasons and knowing how to avoid using functions in the WHERE clause, you can optimize your queries and improve the overall efficiency of your database operations. Always consider the data type, collation, and indexing strategy when writing your SQL queries to ensure optimal performance.

You can always reach out to me on Twitter.

Reference: Pinal Dave (https://blog.sqlauthority.com)

Query Rewrite, SQL Function

SQL SERVER – Performance: OR vs IN – A Summary and Further Reading

SQL SERVER – Understanding Incremental Statistics

4 Comments. Leave new

Carter Cordingley
September 13, 2023 7:27 am
What are the options for a datetime column need to be compared to a date
SaleDate between @StartDate and @EndDate
I run into an issue where the sale date has date and time but @EndDate is just Date. so the query excludes the last day unless it happens exactly at midnight. So I run cast(SaleDate as Date) but it slows the query
Reply
Robert Eder
September 13, 2023 7:02 pm
Not to diminish from your point about functions in the where clause, but the default server-level collation for SQL Server is based on the locale of the operating system. See https://learn.microsoft.com/en-us/sql/relational-databases/collations/set-or-change-the-server-collation?view=sql-server-ver16. Being fully transparent, I thought the default collation was SQL_Latin1_General_CP1_CI_AS, which is case insensitive. Glad I looked it up!
Reply
- Pinal Dave
  September 13, 2023 8:20 pm
  Robert, I really appreciate your comment.
  Thanks for adding this information. This will further give more clarification to the users.
  Reply
Evert
September 14, 2023 2:14 am
WHERE SaleDate >= @StartDate
AND SalesDate < DATEADD (day, 1, @EndDate)
Reply