“Don’t use functions in the WHERE clause, they reduce performance.”
I hear this quite often. This is true but this subject is hard to understand in a single statement. Let us see what it means and how to use the function in the WHERE clause.
We will be using sample database AdventureWorks in this example. Additionally, turn on STATISTICS IO ON settings so we can see various statistics as well.
USE AdventureWorks2012
GO
SET STATISTICS IO ON
GO
Let us first execute following query and check the execution plan and statistics.
-- SCAN - Select values from SalesOrderDetail
SELECT [SalesOrderID], [SalesOrderDetailID], [ModifiedDate]
FROM [Sales].[SalesOrderDetail]
WHERE DATEDIFF(YEAR,ModifiedDate,GETDATE()) <Â 0
GO
Table 'SalesOrderDetail'. Scan count 1, logical reads 1246
You can see that the above query is scanning the whole table as well even though it is not returning any result it is reading 1246 pages from database.
In this case we are retrieving the data based on the ModifiedDate so we will create an index on the ModifiedDate Column.
-- Create Index on ModifiedDate
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ModifiedDate]
ON [Sales].[SalesOrderDetail] ([ModifiedDate])
GO
Now we have created an index on the ModifiedDate column we will once again execute the same query which we had run earlier.
-- SCAN - Select values from SalesOrderDetail
SELECT [SalesOrderID], [SalesOrderDetailID], [ModifiedDate]
FROM [Sales].[SalesOrderDetail]
WHERE DATEDIFF(YEAR,ModifiedDate,GETDATE()) <Â 0
GO
Table 'SalesOrderDetail'. Scan count 1, logical reads 338
You can see that the above query is still scanning the whole table as well even though it is not returning any result it is read over 338 pages from database.
The reason for the same is because in the query we are using the function DATEDIFF over the column ModifiedDate. As the outcome of this function is evaluated at the run time, the SQL Server Query engine has to scan the whole table to get necessary data.
To avoid this situation, let us try to avoid using the function on the column of the table and re-write the query. To re-write the query let us first understand what the query is retrieved. It is retrieving all the rows where the year difference between ModifiedDate and Current Date is less than 0. In other words what it means is that we need to retrieve the records which have a future date. We can simply re-write above query in the following way.
-- SEEK - Select values from SalesOrderDetail
SELECT [SalesOrderID], [SalesOrderDetailID], [ModifiedDate]
FROM [Sales].[SalesOrderDetail]
WHERE ModifiedDate >Â GETDATE()
GO
Table 'SalesOrderDetail'. Scan count 1, logical reads 3
Now let us execute the query and see the execution plan. We can see that there are only 3 logical read now and execution plan is also displaying Seek. This is because now we are not using any function over the column used in the WHERE clause.
To clean up you can run following script to drop the newly created index.
-- Cleanup
DROP INDEX [IX_SalesOrderDetail_ModifiedDate] ON [Sales].[SalesOrderDetail]
GO
In our case both the queries are returning the same result, a simple re-write can make a lot of difference.
Reference:Â Pinal Dave (https://blog.sqlauthority.com)
18 Comments. Leave new
thanks for the Excellent, simple and systematic article with a lot of clarity.
Sir,
I am usually into situations where I have to retrieve rows from a table which are inserted on that day. For that I use DATEDIFF function with equal to 0 between date column and GETDATE(). I cannot use equal to with GETDATE() because I want on that day and not at that moment. I am not able to figure out a way to both improve performance by not using DATEDIFF while achieving the result that I need.
Thanks,
Sai Pavan
@Pavan,
Why do you have to do DATEDIFF = 0 ??
You can do, where Datecolumn > yesterday (greater than yesterday, which means today). Let me know if you need more help.
~ IM.
@Imran,
I agree with what you have suggested. I was looking at a generic solution for this problem. It could be today’s date or a particular date which which we want to check. The solution that you have provided works perfectly fine but the assumption is that there are no records of future date other than today.
Thanks for your reply.
Sai Pavan
@Pavan,
If you are expecting future records then the solution would be,
1. Date > yesterday and Date < tomorrow
2. Date between yesterday + 1 sec and tomorrow -1sec (Warning: between includes limits also)
3. Date = @Today
Leaving getdate() in the where clause causes a call to the function for every row. For better performance, declare a local variable above the select statement and set it to getdate(), then use that variable in the where clause.
declare @theDate datetime
set @theDate = getdate()
Hi,
How do you implement the same in Views?
Can we use variables in Views?
@KSP
I don’t know what is your requirement, but to make a generic statement, You cannot have variables in view, but if you have to use variables then got for “table value functions”.
~IM.
Hello,
I don’t know how many of you know that this start to happend with SQL 2005 when they change something in the “frunction evaluation mechanism”, Before that you can use function in the where clause without any performance problem.
Great post.
Would it be right to say them NON-SARgable predicates?
HI,
Just to add further information This Index Seek behavior is having because we have selected the columns on which the indexes are already created .
Suppose if you select any other column in addition to the above 3 columns then the query would be INDEX SCAN .In that we should go for Included NON Clustered INDEX . In that case it would give you index seek instead of index scan.
any simple Definition For these so that i can make anyone easily understand these ?
Your original query does not return all future rows, but all rows from next year and beyond. If that is required you should first compute the start of next year from GETDATE() and then select rows where Date >= @StartOfNextYear. However calculating @StartOfNextYear from GETDATE() is not a trivial task.
Thanks Sir…
i use mostly with where clause but not understand why result set is too slow, but now i know what effects my query.
— Ashish
Can you please tell me other alternate solutions instead of using functions in where clause to improve performance
Hi Dave
Currently I am analysing an application that has various performance issues.
One statement has the following where clause:
WHERE (
(
(A2.idA3A6 = 51002) –param 1
AND (A2.classnamekeyA4 = ‘wt.part.WTPart’) — param 2
AND (A2.idA3A4 = A0.idA2A2)
AND ((A2.idA3A5 = 0) OR (A2.idA3A5 IS NULL)) –param 3
)
AND (schemaname.WtPK_prefix(A2.value) = ‘M2M11412005’) –param 4
)
AND ((A0.markForDeleteA2 = 0))
AND ((A0.latestiterationInfo = 1)) — param 5
You can see the schemaname.WtPK_prefix scalar valued function which has the following code:
CREATE FUNCTION WtPK_prefix(@i_target NVARCHAR(4000)) RETURNS NVARCHAR(200) WITH SCHEMABINDING
BEGIN
RETURN SUBSTRING(@i_target, 1, 200)
END
It basically just shortens the input down to a length of 200 and then performs the search.
My first reaction was to remove the function. –> Bad idea.
The query execution plan states that it will perform a KEY LOOKUP for A2.value if the function is used, but will perform an Index Scan when the function is omitted.
If I execute the statement with the function, I have the following io / client statistics:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 58 ms, elapsed time = 58 ms.
(1 row(s) affected)
Table ‘WTPartMaster’. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘WTPart’. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 2, lob physical reads 0, lob read-ahead reads 0.
Table ‘StringValue’. Scan count 2, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 11 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Omitting the function increases the load on the SQL Server as can be seen with the following statistics:
[…AND (A2.value = ‘M2M11412005’)…]
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 60 ms, elapsed time = 60 ms.
(1 row(s) affected)
Table ‘WTPartMaster’. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘WTPart’. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 2, lob physical reads 0, lob read-ahead reads 0.
Table ‘StringValue’. Scan count 1, logical reads 614101, physical reads 5, read-ahead reads 45, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 3245 ms, elapsed time = 3267 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Logical reads and physical reads have dramatically increased.
Could it be that because the script with function has already been executed 250’000 times, that the cached query plan is faster than the normal SELECT without the function in the statement?
I do realize that the actual execution plan will differ from what I am analysing, but does that explain the huge difference?
Regards, John
Nice article , but here is what am trying to do. I want all the records <=1 month. The date filed is as datetime and when am doing datediff(month,reportdate,getdtae())<1, it is not puling the correct data because of the timestamp and i used
datediff(month,convert(date,reportdate),getdate())<=1
How do you acheive this