This is the third article in the series which I am writing on Persisted Columns. I suggest you read following two article first before continuing on this article.
This is the list of the all the articles in the series of computed column.
SQL SERVER – Computed Column – PERSISTED and Storage
This article talks about how computed columns are created and why they take more storage space than before.
SQL SERVER – Computed Column – PERSISTED and Performance
This article talks about how PERSISTED columns give better performance than non-persisted columns.
SQL SERVER – Computed Column – PERSISTED and Performance – Part 2
This article talks about how non-persisted columns give better performance than PERSISTED columns.
SQL SERVER – Computed Column and Performance – Part 3
This article talks about how Index improves the performance of Computed Columns.
SQL SERVER – Computed Column – PERSISTED and Storage – Part 2
This article talks about how creating index on computed column does not grow the row length of table.
SQL SERVER – Computed Columns – Index and Performance
This article summarized all the articles related to computed columns.
In this article we will see the other side, how persisted column can be expensive as well. Please note, this is not always the case, but it is the other side of the story of the persisted columns.
In earlier article we noticed that when computation is complex, computed columns are expensive to be created at run time. However, in this case we have relatively simpler example and while comparing the cost of the computed column with persisted data is less than non-persisted data. Let us quickly examine the example.
USE tempdb GO -- Create Table IF EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'[dbo].[CompCol]') AND TYPE IN (N'U')) DROP TABLE [dbo].[CompCol] GO IF EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'[dbo].[CompCol_P]') AND TYPE IN (N'U')) DROP TABLE [dbo].[CompCol_P] GO CREATE TABLE CompCol (ID INT, FirstName VARCHAR(100), LastName CHAR(800)) GO CREATE TABLE CompCol_P (ID INT, FirstName VARCHAR(100), LastName CHAR(800)) GO -- Insert One Hundred Thousand Records INSERT INTO CompCol (ID,FirstName,LastName) SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO INSERT INTO CompCol_P (ID,FirstName,LastName) SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Add Computed Column ALTER TABLE dbo.CompCol ADD FullName AS (FirstName+' '+LastName) GO -- Add Computed Column PERSISTED ALTER TABLE dbo.CompCol_P ADD FullName_P AS (FirstName+' '+LastName) PERSISTED GO -- Select Comparision SELECT FullName FROM dbo.CompCol WHERE FullName = 'Bob Smith' GO SELECT FullName_P FROM dbo.CompCol_P WHERE FullName_P = 'Bob Smith' GO -- Clean up Database DROP TABLE CompCol DROP TABLE CompCol_P GO
Let us examine the resultset where we are comparing two SELECT statements.
It is clear from above example that here column which are persisted are more expensive. In earlier example here we have seen column which was persisted was less expensive. I suggest that you consider different parameters like size of the column, computation logic, number of the rows in consideration before marking the column as persisted.
Reference: Pinal Dave (https://blog.sqlauthority.com)
13 Comments. Leave new
Hello,
Interesting article. Never thought that a persisted column could get a behavior like this. I have tried the example and looked at it with statistics io on and statistics profile on.
When i look at the results, the first query is using 1112 logial reads, the second one, with the persisted columns is using 8062 logical reads.
with an index on the computed column ‘Firstname’ i get a result of both 626 logical reads and even the outcome of the profile is exactly the same..
With this in mind, isn’t it usefull to always use an index on a computed column?
With persisted computed columns you shift performance from selects to inserts/updates. If you add index to the column you shift it a bit more. If you are not having performance issues with inserts/updates then it is propably wise to index the column.
But this is weird. Let’s say we create a table with computed, non-persisted, column in it.
CREATE TABLE T (a INT PRIMARY KEY, b AS a * 2)
Then we create nonclustered covering index.
CREATE NONCLUSTERED INDEX IX_T ON T(a) INCLUDE(b)
Now how is this possible? If T.b is not persisted, how it can be used as non-key column in covering index?
I mean that to be used as non-key column it means that the value of the column is copied into the tempdb and is fetched from there (if index is used in the query). But how can it be copied into the tempdb when it has no persisted value?
Unfortunately I can’t find any info from the MSDN about this.
Thanks for your whole effort and tips in this blog. This is one of your least interesting series of articles though: “It can be black on white, investigate” :)
black or white
Interesting article on Computed columns. Did not expect that persisted columns can be more expensive.
Thanks. I do find it interesting. I have a few tables that use computed columns. I may have to revisit some of them to see which could benefit from being persisted.
All articles based on computed columns and performance are really very very interesting. If possible can you please write a post or comment on when to use it and when not to use. I mean to say what is the best case in which we should use computed columns. :-).
I have one example where Pinal helped me to tackle some performance issues.
Let’s say we have tables Foo, Bar1 and Bar1 as follows.
CREATE TABLE Foo(id1 INT PRIMARY KEY, id2 INT)
CREATE TABLE Bar1(id INT PRIMARY KEY FooId INT)
CREATE TABLE Bar2(id INT PRIMARY KEY FooId INT)
Now in order to match Foo and Bar tables we have to do outer joins as follows.
SELECT *
FROM Foo f
LEFT JOIN Bar1 b1 ON b1.FooId = ISNULL(f.id2, f.id1)
LEFT JOIN Bar2 b2 ON b2.FooId = ISNULL(f.id2, f.id1)
Now, please, don’t ask why we did things this way. I really don’t know and I think it’s horrible :)
But anyway. Pinal solved our performance issue with persisted computed column with index.
ALTER TABLE Foo ADD CompId AS ISNULL(id2, id1)
CREATE NONCLUSTERED INDEX IX_Foo_Comp ON Foo (CompId)
And now we can join tables much more efficiently.
SELECT *
FROM Foo f
LEFT JOIN Bar1 b1 ON b1.FooId = f.CompId
LEFT JOIN Bar2 b2 ON b2.FooId = f.CompId
The ONLY reason that it estimates 31%:69% between the two queries is because the second table (CompCol_P) has a larger rowsize. It has an extra VARCHAR(901) column in it (because of the PERSISTED column), so it takes up more space on disk, so there are more pages to process, so there’s a higher I/O cost.
If you added a new column to CompCol and populate it so that it is just like CompCol_P…
ALTER TABLE CompCol ADD Junk VARCHAR(901)
GO
UPDATE CompCol
SET Junk = FirstName+’ ‘+LastName
GO
And then compared the two queries again:
SELECT FullName
FROM dbo.CompCol
WHERE FullName = ‘Bob Smith’
GO
SELECT FullName_P
FROM dbo.CompCol_P
WHERE FullName_P = ‘Bob Smith’
GO
Now they are 50%:50%.
So the fact that a persistent column exists in the CompCol_P table is only INDIRECTLY responsible for making the second query more expensive… just because it makes the table take up more space.
That’s all.
–Brad
Exactly! I believe that’s correct. it’s total bytes of I/O
check out this experiment I did:
if object_id(‘ContactList’) is not null
drop table ContactList;
go
create table ContactList (
RecordId int identity primary key,
FirstName char(25) not null,
LastName char(25) not null,
filler char(50),
FullName as (LastName + ‘, ‘ + FirstName) –persisted
);
go
if object_id(‘ContactList_P’) is not null
drop table ContactList_P;
go
create table ContactList_P (
RecordId int identity primary key,
FirstName char(25) not null,
LastName char(25) not null,
FullName as (LastName + FirstName) persisted
);
go
Create nonclustered Index IX_CListFullName on ContactList (FullName);
go
Create nonclustered Index IX_CListFullName on ContactList_P (FullName);
go
set nocount on;
insert into ContactList
values (‘John’, ‘Smith’, space(35))
, (‘Jane’, ‘Doe’, space(35)), (‘Sally’, ‘Lane’, space(35)), (‘Kimberly’, ‘Smithstone’, space(35))
insert into ContactList_P
values (‘John’, ‘Smith’)
, (‘Jane’, ‘Doe’), (‘Sally’, ‘Lane’), (‘Kimberly’, ‘Smithstone’)
go 10000
set statistics io on; –set statistics io off;
select count(*) from ContactList; select count(*) from ContactList_P
exec sp_spaceused ContactList; exec sp_spaceused ContactList_P
select FirstName, LastName ,FullName
from ContactList
go
select FirstName, LastName ,FullName
from ContactList_P
go
It produces identical logical reads for persisted AND non-persisted when adding a filler column to compare apples with apples.
so I’d add that unless you have large result-sets, the results are trivial and a moot point in terms of I/O reads.
The key lesson learned from this article is to make sure you have an index on your computed columns