SQL SERVER – Avoid Using Function in WHERE Clause – Scan to Seek

“Don’t use functions in the WHERE clause, they reduce performance.”

I hear this quite often. This is true but this subject is hard to understand in a single statement. Let us see what it means and how to use the function in the WHERE clause.

We will be using sample database AdventureWorks in this example. Additionally, turn on STATISTICS IO ON settings so we can see various statistics as well.

USE AdventureWorks2012
GO
SET STATISTICS IO ON
GO

Let us first execute following query and check the execution plan and statistics.

-- SCAN - Select values from SalesOrderDetail
SELECT [SalesOrderID], [SalesOrderDetailID], [ModifiedDate] FROM [Sales].[SalesOrderDetail] WHERE DATEDIFF(YEAR,ModifiedDate,GETDATE()) < 0
GO

SQL SERVER - Avoid Using Function in WHERE Clause - Scan to Seek whereclause1

Table 'SalesOrderDetail'. Scan count 1, logical reads 1246

You can see that the above query is scanning the whole table as well even though it is not returning any result it is reading 1246 pages from database.

In this case we are retrieving the data based on the ModifiedDate so we will create an index on the ModifiedDate Column.

-- Create Index on ModifiedDate
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ModifiedDate] ON [Sales].[SalesOrderDetail] ([ModifiedDate])
GO

Now we have created an index on the ModifiedDate column we will once again execute the same query which we had run earlier.

-- SCAN - Select values from SalesOrderDetail
SELECT [SalesOrderID], [SalesOrderDetailID], [ModifiedDate] FROM [Sales].[SalesOrderDetail] WHERE DATEDIFF(YEAR,ModifiedDate,GETDATE()) < 0
GO

SQL SERVER - Avoid Using Function in WHERE Clause - Scan to Seek whereclause1

Table 'SalesOrderDetail'. Scan count 1, logical reads 338

You can see that the above query is still scanning the whole table as well even though it is not returning any result it is read over 338 pages from database.

The reason for the same is because in the query we are using the function DATEDIFF over the column ModifiedDate. As the outcome of this function is evaluated at the run time, the SQL Server Query engine has to scan the whole table to get necessary data.

To avoid this situation, let us try to avoid using the function on the column of the table and re-write the query. To re-write the query let us first understand what the query is retrieved. It is retrieving all the rows where the year difference between ModifiedDate and Current Date is less than 0. In other words what it means is that we need to retrieve the records which have a future date. We can simply re-write above query in the following way.

-- SEEK - Select values from SalesOrderDetail
SELECT [SalesOrderID], [SalesOrderDetailID], [ModifiedDate] FROM [Sales].[SalesOrderDetail] WHERE ModifiedDate > GETDATE()
GO

SQL SERVER - Avoid Using Function in WHERE Clause - Scan to Seek whereclause2

Table 'SalesOrderDetail'. Scan count 1, logical reads 3

Now let us execute the query and see the execution plan. We can see that there are only 3 logical read now and execution plan is also displaying Seek. This is because now we are not using any function over the column used in the WHERE clause.

To clean up you can run following script to drop the newly created index.

-- Cleanup
DROP INDEX [IX_SalesOrderDetail_ModifiedDate] ON [Sales].[SalesOrderDetail] GO

In our case both the queries are returning the same result, a simple re-write can make a lot of difference.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Function, SQL Index
Previous Post
SQL SERVER – How to Add Column at Specific Location in Table
Next Post
SQL SERVER – Shortcut to SELECT Single Row from Table – SQL in Sixty Seconds #046 – Video

Related Posts

18 Comments. Leave new

  • thanks for the Excellent, simple and systematic article with a lot of clarity.

    Reply
  • Sir,

    I am usually into situations where I have to retrieve rows from a table which are inserted on that day. For that I use DATEDIFF function with equal to 0 between date column and GETDATE(). I cannot use equal to with GETDATE() because I want on that day and not at that moment. I am not able to figure out a way to both improve performance by not using DATEDIFF while achieving the result that I need.

    Thanks,
    Sai Pavan

    Reply
    • Imran Mohammed
      March 13, 2013 6:27 am

      @Pavan,

      Why do you have to do DATEDIFF = 0 ??

      You can do, where Datecolumn > yesterday (greater than yesterday, which means today). Let me know if you need more help.

      ~ IM.

      Reply
      • @Imran,

        I agree with what you have suggested. I was looking at a generic solution for this problem. It could be today’s date or a particular date which which we want to check. The solution that you have provided works perfectly fine but the assumption is that there are no records of future date other than today.

        Thanks for your reply.

        Sai Pavan

      • Imran Mohammed
        March 14, 2013 2:40 am

        @Pavan,

        If you are expecting future records then the solution would be,
        1. Date > yesterday and Date < tomorrow
        2. Date between yesterday + 1 sec and tomorrow -1sec (Warning: between includes limits also)
        3. Date = @Today

  • Leaving getdate() in the where clause causes a call to the function for every row. For better performance, declare a local variable above the select statement and set it to getdate(), then use that variable in the where clause.

    declare @theDate datetime
    set @theDate = getdate()

    Reply
    • Hi,

      How do you implement the same in Views?

      Can we use variables in Views?

      Reply
      • Imran Mohammed
        March 13, 2013 6:31 am

        @KSP

        I don’t know what is your requirement, but to make a generic statement, You cannot have variables in view, but if you have to use variables then got for “table value functions”.

        ~IM.

  • Gatej Alexandru
    March 12, 2013 3:38 pm

    Hello,

    I don’t know how many of you know that this start to happend with SQL 2005 when they change something in the “frunction evaluation mechanism”, Before that you can use function in the where clause without any performance problem.

    Reply
  • Flávio Oliveira
    March 13, 2013 6:24 am

    Great post.

    Reply
  • Would it be right to say them NON-SARgable predicates?

    Reply
  • HI,
    Just to add further information This Index Seek behavior is having because we have selected the columns on which the indexes are already created .
    Suppose if you select any other column in addition to the above 3 columns then the query would be INDEX SCAN .In that we should go for Included NON Clustered INDEX . In that case it would give you index seek instead of index scan.

    Reply
  • any simple Definition For these so that i can make anyone easily understand these ?

    Reply
  • Dony van Vliet
    March 19, 2013 11:18 pm

    Your original query does not return all future rows, but all rows from next year and beyond. If that is required you should first compute the start of next year from GETDATE() and then select rows where Date >= @StartOfNextYear. However calculating @StartOfNextYear from GETDATE() is not a trivial task.

    Reply
  • Ashish Srivastava
    May 7, 2013 11:35 am

    Thanks Sir…
    i use mostly with where clause but not understand why result set is too slow, but now i know what effects my query.

    — Ashish

    Reply
  • Can you please tell me other alternate solutions instead of using functions in where clause to improve performance

    Reply
  • Hi Dave

    Currently I am analysing an application that has various performance issues.

    One statement has the following where clause:

    WHERE (
    (
    (A2.idA3A6 = 51002) –param 1
    AND (A2.classnamekeyA4 = ‘wt.part.WTPart’) — param 2
    AND (A2.idA3A4 = A0.idA2A2)
    AND ((A2.idA3A5 = 0) OR (A2.idA3A5 IS NULL)) –param 3
    )
    AND (schemaname.WtPK_prefix(A2.value) = ‘M2M11412005’) –param 4
    )
    AND ((A0.markForDeleteA2 = 0))
    AND ((A0.latestiterationInfo = 1)) — param 5

    You can see the schemaname.WtPK_prefix scalar valued function which has the following code:

    CREATE FUNCTION WtPK_prefix(@i_target NVARCHAR(4000)) RETURNS NVARCHAR(200) WITH SCHEMABINDING
    BEGIN
    RETURN SUBSTRING(@i_target, 1, 200)
    END

    It basically just shortens the input down to a length of 200 and then performs the search.

    My first reaction was to remove the function. –> Bad idea.

    The query execution plan states that it will perform a KEY LOOKUP for A2.value if the function is used, but will perform an Index Scan when the function is omitted.

    If I execute the statement with the function, I have the following io / client statistics:

    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    SQL Server parse and compile time:
    CPU time = 58 ms, elapsed time = 58 ms.

    (1 row(s) affected)
    Table ‘WTPartMaster’. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table ‘WTPart’. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 2, lob physical reads 0, lob read-ahead reads 0.
    Table ‘StringValue’. Scan count 2, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:
    CPU time = 15 ms, elapsed time = 11 ms.
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.

    Omitting the function increases the load on the SQL Server as can be seen with the following statistics:
    […AND (A2.value = ‘M2M11412005’)…]

    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    SQL Server parse and compile time:
    CPU time = 60 ms, elapsed time = 60 ms.

    (1 row(s) affected)
    Table ‘WTPartMaster’. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table ‘WTPart’. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 2, lob physical reads 0, lob read-ahead reads 0.
    Table ‘StringValue’. Scan count 1, logical reads 614101, physical reads 5, read-ahead reads 45, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:
    CPU time = 3245 ms, elapsed time = 3267 ms.
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.

    Logical reads and physical reads have dramatically increased.

    Could it be that because the script with function has already been executed 250’000 times, that the cached query plan is faster than the normal SELECT without the function in the statement?

    I do realize that the actual execution plan will differ from what I am analysing, but does that explain the huge difference?

    Regards, John

    Reply
  • Nice article , but here is what am trying to do. I want all the records <=1 month. The date filed is as datetime and when am doing datediff(month,reportdate,getdtae())<1, it is not puling the correct data because of the timestamp and i used
    datediff(month,convert(date,reportdate),getdate())<=1
    How do you acheive this

    Reply

Leave a Reply