Today, I would like to share one very quick tip about how to remove bookmark lookup or RID lookup. Let us first understand Bookmark lookup or RID lookup. Please note that from SQL Server 2005 SP1 onwards, Bookmark look up is known as Key look up.
When a small number of rows are requested by a query, the SQL Server optimizer will try to use a non-clustered index on the column or columns contained in the WHERE clause to retrieve the data requested by the query. If the query requests data from columns not present in the non-clustered index, SQL Server must go back to the data pages to get the data in those columns. Even if the table has a clustered index or not, the query will still have to return to the table or clustered index to retrieve the data.
In the above scenario, if table has clustered index, it is called bookmark lookup (or key lookup); if the table does not have clustered index, but a non-clustered index, it is called RID lookup. This operation is very expensive. To optimize any query containing bookmark lookup or RID lookup, it should be removed from the execution plan to improve performance. There are two different ways to remove bookmark/RID lookup.
Before we understand these two methods, we will create sample table without clustered index and simulate RID lookup. RID Lookup is a bookmark lookup on a heap that uses a supplied row identifier (RID).
USE tempdb
GO
-- Create Table OneIndex with few columns
CREATEÂ TABLE OneIndex (ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100),
City VARCHAR(100))
GO
-- Insert One Hundred Thousand Records
INSERTÂ INTO OneIndex (ID,FirstName,LastName,City)
SELECTÂ TOP 100000Â ROW_NUMBER() OVER (ORDERÂ BY a.name) RowID,
'Bob',
CASE WHEN ROW_NUMBER() OVER (ORDERÂ BY a.name)%2 = 1 THEN 'Smith'
ELSE 'Brown' END,
CASE
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%1000 = 1 THEN 'Las Vegas'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles'
ELSE 'Houston' END
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO
Now let us run following select statement and check the execution plan.
SELECT ID, FirstName
FROM OneIndex
WHERE City = 'Las Vegas'
GO
As there is no index on table, scan is performed over the table. We will create a clustered index on the table and check the execution plan once again.
-- Create Clustered Index
CREATEÂ CLUSTEREDÂ INDEX [IX_OneIndex_ID] ON [dbo].[OneIndex]
(
[ID] ASC
) ON [PRIMARY]
GO
Now, run following select on the table once again.
SELECT ID, FirstName
FROM OneIndex
WHERE City = 'Las Vegas'
GO
It is clear from execution plan that as a clustered index is created on the table, table scan is now converted to clustered index scan. In either case, base table is completely scanned and there is no seek on the table.
Now, let us see the WHERE clause of our table. From our basic observation, if we create an index on the column that contains the clause, a performance improvement may be obtained. Let us create non-clustered index on the table and then check the execution plan.
-- Create Index on Column City As that is used in where condition
CREATEÂ NONCLUSTEREDÂ INDEX [IX_OneIndex_City] ON [dbo].[OneIndex]
(
[City] ASC
) ON [PRIMARY]
GO
After creating the non-clustered index, let us run our select statement again and check the execution plan.
SELECT ID, FirstName
FROM OneIndex
WHERE City = 'Las Vegas'
GO
As we have an index on the WHERE clause, the SQL Server query execution engine uses the non-clustered index to retrieve data from the table. However, the columns used in the SELECT clause are still not part of the index, and to display those columns, the engine will have to go to the base table again and retrieve those columns. This particular behavior is known as bookmark lookup or key lookup.
There are two different methods to resolve this issue. I have demonstrated both the methods together; however, it is recommended that you use any one of these methods for removing key lookup. I prefer Method 2.
Method 1: Creating non-clustered cover index.
In this method, we will create non-clustered index containing the columns, which are used in the SELECT statement, along with the column which is used in the WHERE clause.
CREATEÂ NONCLUSTEREDÂ INDEX [IX_OneIndex_Cover] ON [dbo].[OneIndex]
(
City, FirstName, ID
) ON [PRIMARY]
GO
Once the above non-clustered index, which covers all the columns in query, is created, let us run the following SELECT statement and check our execution plan.
SELECT ID, FirstName
FROM OneIndex
WHERE City = 'Las Vegas'
GO
From the execution plan, we can confirm that key lookup is removed the only index seek is happening. As there is no key lookup, the SQL Server query engine does not have to go to retrieve the data from data pages and it obtains all the necessary data from index itself.
Method 2: Creating an included column non-clustered index.
Here, we will create non-clustered index that also includes the columns, which are used in the SELECT statement, along with the column used in the WHERE clause. In this method, we will use new syntax introduced in SQL Server 2005. An index with included nonkey columns can significantly improve query performance when all columns in the query are included in the index.
CREATEÂ NONCLUSTEREDÂ INDEX [IX_OneIndex_Include] ON [dbo].[OneIndex]
(
City
) INCLUDE (FirstName,ID) ON [PRIMARY]
GO
From the execution plan, it is very clear that this method  also removes the key lookup as well.
In summary, Key lookup, Bookmark lookup or RID lookup reduces the performance of query, and we can improve the performance of query by using included column index or cover index.
I will cover few additional concepts related to the optimal method in another article.
Reference : Pinal Dave (https://blog.sqlauthority.com)
Related Post:
SELECT ID, FirstName
FROM OneIndex
WHERE City = 'Las Vegas'
GO
32 Comments. Leave new
No need to add ID to your nonclustered index… that’s redundant. Since you have a clustered index on ID, then ID will automatically be included in any nonclustered index you create (so that it could do the bookmark lookup).
So your nonclustered index examples would be to either create an index on (City, FirstName) or an index on (City) INCLUDE (FirstName).
–Brad
I have a table customer in which cstid of BIGINT data type with identity. As table have identity column I do not want any type of index on it. Being cstid as primary key of table, is it possible without any index?
Dear Pinal,
good post, very clear sample, but I would like to see a remark, that Key-Lookups could be in some situations, special with very wide tables, and the SELECT includes a lot or all of the columns, the much much better way. Think about a table containing 11.000.000 customers with 200 columns an the avg size of a row of 30kb. The SELECT need to return the whole row-data. Seach is done within the lastname, no leading placeholder to be able to use an Index. So an additional small index only containing the lastname and the key and then doing an Key Lookup is much much faster instead of using the index, even clustered index where every row nees severals data blocks. In case of IO we see differences in the factor of 500 times more IO.
Regards,
Michael
Hi pinal,
We have a requirement to Merge six 12millions data tables into single Table.All the tables are having duplicate Emails.So We put the Primary key for email column using EnterPrise Manager and We r trying to merge by writing the following insert query, But it’s taking more than 20hrs…:-(
Insert into Total( Email,FirstName,LastName,Address,City,State,Zipcode,Gender,DOB ,
Phone,WebAddress1,Webaddress2,IPAddress,DateTime,Interest )
Select Email,FirstName,LastName,Address,City,State,Zipcode,Gender,DOB ,
Phone,WebAddress1,Webaddress2,IPAddress,DateTime,Interest
From [Set4]
Where
(not exists(select email from Total
where Total.email=[Set4].email))
How can we do this most efficiently??? Please Help Us……
Thanks in advance…
Hi Pinal,
Nice explanation of the covering index concept. What is your take on the comment posted by Michael about the customer table situation, where one needs to return a lot of columns.
Thank you
Pinal,
Why not just remove, from the query, the column that’s causing the key lookup. This is really much simpler I’m sure you’ll agree. You’ve written about best practices for database design in the past, and taught me that fewer columns per table are always better. So if you’re getting lots of key lookups, it follows that we should split up our tables and make them smaller. This is the query optimizer’s way of telling us that we have messed up our design!
Thank you for continuing to educate us on how best to use our database.
But what if you don’t need those columns. Then you can remove them. This example proves that you shouldn’t add columns to a query that you don’t need. That’s never a best practice (at least in my experience). You revealed in another post that we should keep indexes as narrow as possible, and I have been following this rule religiously! Since starting down this path I have managed to get almost every index in our database down to a single column. So far this has worked very nicely as far as I can tell, but I’m not really sure how to collect query performance numbers. Would you mind helping me so that I can send you a report and you can show readers what great progress can be made if we follow your methods.
Hi Pinal,
I’m curious to know why you have included the ID column in the covering index, as well as in the nc index with the includes. I thought all columns in the Clustered index were already included in the non clustered index.
Is there a reason why you did that?
Thanks,
And these new notes were not just plain bank notes. ,
Awesome. Managed to knock down a query from 5-6 minutes down to 5 seconds or so.
Excellent Post!!! Thank you for the information. This helps me improve my SP from 5 seconds to 0.xx second. Thanks alot!
Hi Pinal,
Nice post, very impressive and understandable…Thank You.
Dave,
I’m confused why the best recommendation isn’t to create a clustered index. In our situation we have a 5 Billion row table without a clustered index. The RID lookups are killing us. The need for a lot of extra non-clustered indexes is also killing performance because this is on a replication server.
Finally the writes to a clustered index are dramatically less than to a non-clustered index with a HEAP.
See: https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2005/administrator/cc917672(v=technet.10)#EFAA
How can anyone justify a large table without a clustered index?
Nick
HI Pinal,
Thanks for this post.Its very easy to understand.Good post.
Hi,
If you are saying, if table has clustered index, it is called bookmark lookup (or key lookup); if the table does not have clustered index, but a non-clustered index, it is called RID lookup then you again wrote that as we have an index on the WHERE clause, the SQL Server query execution engine uses the non-clustered index to retrieve data from the table. However, the columns used in the SELECT clause (id, firstname) are still not part of the index that means (neither cluster index nor non-cluster index has applied on it)and to display those columns, the engine will have to go to the base table again and retrieve those columns. so how we can say this particular behavior is known as bookmark lookup or key lookup
Hi Pinal,
Nice explanation, too easy to understand.
i have one question over here, let me write your code what you have used.
SELECT ID, FirstName
FROM OneIndex
WHERE City = ‘Las Vegas’
CREATE NONCLUSTERED INDEX [IX_OneIndex_Include] ON [dbo].[OneIndex]
(
City
) INCLUDE (FirstName,ID) ON [PRIMARY]
GO
As i can see in above code, you are selecting ID then FirstName columns in Select list, but while including these columns for Non Clustered Index you have not followed the same sequence/order (Included Firstname then ID)
Is this really fruitful to improve query performance?
As per my understanding, we should follow the same sequence as we are selecting the fields, same sequence should be followed while including in Index, It would be easy for SQL Server to get the data in fast way.
Regards,
Girijesh
Hi,
great article, i find it very helpful…
I have a question… what happens when the output list of a key lookup is NULL? what is it looking for?
Hello Pinal,
Very clear and informative. Understanding tuning is tricky but this post made it seem simple.
My question is, when creating the composite index, why do you specify city first, rather than ID, which is more unique.
Thanks,
Steven
Thanks Pinal for your very clear and helpful article!
Thanks. This was quite informative. I tried to apply it on one of my tables. Its a partitioned table and the where clause has to go from one side of the partition to the other side to fetch all the data required.
My query becomes something like this
select a.*
into ##TEST_RET
from RETURNS_TABLE a
join ##TEST_ID b on a.ID = b.ID and a.TODATE between b.STARTDATE and b.ENDDATE;
This RETURNS_TABLE has partitions. I tried both covered index and include index options, but none of them seem to help. Do you have any suggestions for the partitioned tables?
Thanks again!
Thanks. This was informative.