When examining the performance of SQL queries, the structure of the WHERE clause significantly influences the execution speed. This blog post aims to shed light on the performance impact between questions that use functions within the WHERE clause and those that utilize direct column references. We’ll also analyze the positive implications of incorporating indexes.
Setting the Stage with Sample Data and Queries
To demonstrate the variations in performance, we’ll rely on a hypothetical SalesOrderHeader table, which houses columns such as SalesOrderID, OrderDate, CustomerID, and TotalDue. This queries are written for AdventureWorks sample database of SQL Server.
Consider the following queries, both aiming to retrieve data from 2011 but structured differently:
-- Query using the YEAR() function SELECT SalesOrderID, OrderDate, CustomerID, TotalDue FROM SalesOrderHeader WHERE YEAR(OrderDate) = 2011;
-- Query using direct column reference SELECT SalesOrderID, OrderDate, CustomerID, TotalDue FROM SalesOrderHeader WHERE OrderDate BETWEEN '2011-01-01' AND '2011-12-31';
In the first query, we apply the YEAR() function to the OrderDate column within the WHERE clause. Alternatively, the second query directly refers to the OrderDate column, defining a date range in the WHERE clause.
Analyzing Performance without Indexes
When we execute these queries without an index on the OrderDate column, both perform a full table scan to fetch the desired data:
Table 'SalesOrderHeader'. Scan count 1, logical reads 689.
Without an index, both queries read the same amount of data, resulting in similar performance. The SQL Server’s optimizer cannot perform an efficient seek operation to directly fetch the matching rows.
The Power of Indexing – WHERE clause
Let’s now create an index on the OrderDate column:
CREATE NONCLUSTERED INDEX IX_OrderDate ON Sales.SalesOrderHeader (OrderDate) INCLUDE ([CustomerID],[TotalDue]);
Upon re-running our two queries, the performance dynamics changed notably.
With the YEAR() function, the first query still resorts to a full table scan:
Table 'SalesOrderHeader'. Scan count 1, logical reads 688.
However, the query employing the direct column reference strategically uses the index for an efficient seek operation, leading to a significant reduction in I/O:
Table 'SalesOrderHeader'. Scan count 1, logical reads 119.
Unmasking the Root Cause
The considerable difference in performance is attributed to using a function on the column preventing the optimizer from effectively leveraging the index.
An index maintains data sorted based on the raw column value. But, applying a function like YEAR() or UPPER() transforms the values, which aren’t stored in the index, thereby hindering the optimizer’s ability to seek directly the matching entries.
Conversely, query filtering based on the exact column reference can leverage the sorted index values to quickly locate matching rows, resulting in significantly fewer logical reads.
Given this analysis, let’s summarize some best practices:
- For improved performance, try to avoid using functions in WHERE clauses.
- Instead of wrapping columns in functions, refer to them directly.
- If filtering by a transformed column, create an index on the raw column.
- Always compare performance with and without indexes when using functions in WHERE clauses.
Remember, leveraging indexes correctly is vital for efficient SQL queries. By understanding how functions interact with indexes, we can optimize queries to reduce reads and improve response times.
The structure of a query, mainly whether it contains functions or direct column references, drastically impacts its ability to utilize indexes for improved performance. Filtering by a raw column generally allows for index seeks, while using a function necessitates full scans. By keeping tasks out of WHERE clauses and adding supporting indexes, we can significantly enhance the performance of SQL queries.
You can always reach out to me on Twitter.
Reference: Pinal Dave (https://blog.sqlauthority.com)