SQL SERVER – Soft Delete – IsDelete Column – Your Opinion

September 3, 2010

Just a day ago, I was reading the blog post of Michale J Swart. If you are a regular reader of this blog, I am sure you will be familiar with him. He is a very interesting blogger for sure. He recently wrote an article about Ten Things I hate to See in T-SQL; it was really fun, but the thing which caught my eyes was the subject of isDeleted Column. First of all, let me say that I totally agree with his view point. Let me re-produce what Michale exactly suggests.

“Deleted records aren’t deleted. Look, they’re right there!”

You sometimes see this. Records that are marked as deleted with a flag. It’s sometimes needed to support an undelete feature but it still bugs me.

It’s also confusing when looking at procedures called s_OBJECT_Delete(). Without looking at the definition, does this procedure delete the record, or just mark it as deleted?”

SQL SERVER - Soft Delete - IsDelete Column - Your Opinion recyclebin

I, in fact, encountered a similar experience very soon. I had gone for performance tuning consultation and I was reviewing a large table. I spotted one table called Orders. Naturally, the size of the table was in millions of the rows. I thought – it is fine, a table can have that many rows. Then I checked another table called customers and it had under thousand records. The question which came to my mind was how come thousand customers ordered millions of items. I asked local DBA coordinator the same question. He said oh, we just cleaned the customer table but the orders table is yet to clean – consider it as a smaller table.

I asked him how can I consider the orders table as a small table when it is too large. He said, well we have most of the column marked as isDeleted.

I just do not like this kind of design at all. I am firm believer of the architecture where only necessary data should be in single table and the useless data should be moved to an archived table. Instead of following the isDeleted column, I suggest the usage of two different tables: one with orders and another with deleted orders. In that case, you will have to maintain both the table, but in reality, it is very easy to maintain. When you write UPDATE statement to the isDeleted column, write INSERT INTO another table and DELETE it from original table. If the situation is of rollback, write another INSERT INTO and DELETE in reverse order. If you are worried about a failed transaction, wrap this code in TRANSACTION.

What are the advantages of the smaller table verses larger table in above described situations?

A smaller table is easy to maintain
Index Rebuild operations are much faster
Moving the archive data to another filegroup will reduce the load of primary filegroup (considering that all filegroups are on different system) – this will also speed up the backup as well.
Statistics will be frequently updated due to smaller size and this will be less resource intensive.
Size of the index will be smaller
Performance of the table will improve with a smaller table size.

I think we can count many other advantages. Again, this is totally my opinion and I would like to ask you question, do you have this kind of soft delete mechanism in your database? If yes, what is the usage of it and how do you handle this. This is very interesting question to me; so I may publish the result on blog post as well.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQLAuthority News – SQL Server Health Check Service – Speed UP SQL Server

SQLAuthority News – Social Media Confusion – Twitter, FaceBook, LinkedIn and Me

No results found.

48 Comments. Leave new

jonmcrawford
September 3, 2010 7:17 am
I’ve seen that sort of thing in our data warehouse, where they’re combining data from multiple clients, and therefore want to use the same structure to serve all. To “modify” the structure, they’ve added an isDeleted column which allows for a specific client to act as though some of the data is not present, without causing issues with the other clients.
However, I think this is misleading. If you know what you’re doing, then as you’ve pointed out, you just fix the design. If you don’t know what you’re doing (end user situation), then, well, you don’t know what you’re doing, and you’re not likely to be savvy enough to filter out the isDeleted column anyway. The end user will say the table has millions of rows.
In short, I vote no. (not that it matters)
Reply
Karen Lopez
September 3, 2010 7:24 am
The situations where I have seen this done for a good reason is in cases where there are mass deletes. Perhaps one needs to delete 25% of a very large table that also requires cascade deletes to even larger tables.
Marking the 25% as “to be deleted”, then processing those deletions in batches or during down times allows the workload to be moved to a time when it has the least negative impact on business processes.
Another use case would be where there is a business process that needs a Undelete function within a short timeframe.
But setting a deleted flag for one-off delete instead of a real delete when the business means delete is just not cricket.
Reply
Tyler Clendenin
September 3, 2010 8:05 am
I have to agree and disagree at the same time. While table size issues can be a constraint, flagging rows as deleted has far more advantages.
I use flags frequently because end users seem to rarely consider the ramifications before deleting a record.
When you use archive tables you then need your application to recognize that there are two different tables that store the same information. And it makes code and db maintenance more than twice as difficult.
This doesn’t even go into how do you deal with transition tables and dependent rows in other tables. Would you need to create a archive table for every table in your database? How would you write your code to take this into account? How would you maintain it?
Reply
Phani
September 3, 2010 9:32 am
Hi Pinal,
As you mentioned in specific scenario, IsDeleted doesnt make sense in applications like order processing where there will certainly be cases like data archival/data getting invalidated/obselete over a period of time.
But, In my personal exprience, this column will be definitely of use when you are defining metadata of an application. For instance, I have worked on an ERP application where in, users are asked to setup/configure their metadata for the first time when the application is deployed, and then that can be revisited over a period of time. In that, one of the customer has configured specific part of metadata and has removed an option in that by selecting DELETE option. Later after almost couple of months, he came back and asked that he wanted to revoke the option which he has deleted. As we have implemented the soft delete, and also have UI where he can make that change, we were able to do that on a button click and he was amazed to see that getting activated so fast.
Critical part of the above process is, if there are any referential integrity exists across the data model, this could have been next to impossible unless we go with a crud work around by having a series of DML operations.
So my point is IsDeleted kind of columns are very handy in tables which are used for metadata setup.
Regards,
Phani.
Reply
Marko Parkkola
September 3, 2010 10:21 am
In CRM application it’s quite normal that you have IsDeleted or IsDisabled column. Let’s take an example.
Let’s say for instance some of our employee decides to switch jobs. The employee has done numerous customer calls, deals etc. which are all stored in the CRM database. The employee has long list of contacts to customer like who to call and when and some personal info about the customer.
Now the employee leaves the job. What do we do regarding the CMR? We can’t just go and delete the employee. We’d have to delete all the related records also. If we delete all related records we loose those contact infos, sales infos, all the employee’s history would be gone. We can’t do that because we need that info for two reason, a) we need to be able to transfer that info to some other employee b) raporting.
So instead of deleting we have IsDisabled field in the Employee table which tells that “this row in this table is not valid anymore but all the records referencing this row are valid”.
Now we haven’t had situation where there would be too much data in the table to lower the performance. I think there would be, at least, two options in that case:
– Create Employee_History table where all the employee rows, which has been disabled are moved
– Use table partitioning and put all disables employee rows to a different partition than enabled ones (and use IsDisabled as partition index)
Reply
Daivagna
September 3, 2010 10:49 am
hey pinal,
i am regular reader of your blog,and may be this would be my first comment on your blog…i hope i will keep doing it…:)
BTW..thxs for the post as in my scenario i have done the same this,but in my case i need to have for types of status in my DB i.e IsActive,IsSuspended,IsLocked,IsPending but all for i am maintaining one flag only i.e 0,1,2,3 likewise i have enum at front end…so what you say..for these design also i should go for four more table and what if one more status comes??
kindly give your valuable feedback
Reply
David Ames
September 3, 2010 10:54 am
Hey Pinal ,
I agree with the concept of moving deleted orders to a DeletedOrders table to avoid the isDeleted column, however how do you deal with situation where the Orders table serves is a parent in foreign key relationships?
Eg, you have an Orders table, an a child OrderLog table, OrderWorkflowItems, OrderLedgerItems, etc?
If each of these tables has a foreign key relationship to the Orders table then you can’t move records from the Orders table to DeletedOrders table unless your child tables also contain “deletedOrderID” type columns and the OrderID on these tables becomes nullable OR you haver a DeletedOrderLog table, a DeletedOrderWorkflowItems table, DeletedOrderLedgerItems table, etc.
In my opinion, either of these options is messier than having an OrdersIsDeleted column.
I’m hoping you can enlighten me with a 3rd solution.
Dave
PS – instead of OrderIsDeleted as a bit column, I usually use an OrderDeletedDate column then have a second process that purges deleted order records a month after the deleted date. – This handles the case where the user hits the wrong button.
Dave
Reply
Sumit Thapar
September 3, 2010 11:20 am
I never use the isDeleted mechanism….it gradually increases the table length….i always prefer to use the two-table mechanism that u have specified sir….I agree with u as u said a smaller table is always easy to maintain and i believe inner joins work more efficiently in that case…
Reply
Anbalagan
September 3, 2010 11:34 am
i need some reference for sqlserver2005
Reply
John McLusky
September 3, 2010 11:48 am
It depends very much on the nature of the table and your data. I use ‘IsDeleted’ columns from time to time, typically only when the proportion of the soft-deleted rows is quite small in comparison to the table size. In those cases, the overhead of checking the Deleted column is lower than maintaining a whole other Archived/Deleted table.
In other scenarios, when the proportion of Archived rows is much higher, I will use a separate table for the reasons you mention.
Reply
Feodor
September 3, 2010 12:03 pm
The isDelete column is not as evil as it sounds. I have implemented several different solutions, depending on the system metrics and on many other factors.
Case 1: in a busy system, where the peak hours are between 12pm and 1 am, and the rest of the time is idle, I would not do anything else but mark records with a flag isDeleted during the peak times and then I will have a batch job running at night to move the records marked for deletion and physically delete them from the work table.
Think about it. The last thing you want in your busy system is to delete physical data during peak times (think about threading and parallelism, IO consumption and cache; what about fragmentation? ).
Case 2: 100% busy system at all times. Here it gets even trickier, because the above mentioned problems of bulk deletion are almost impossible at any time. Here is where I would engage SSDs and a smart mechanism to “outdate” data by duplicating it to an archive spindle drive and then removing it with conjunction with the application’s performance metrics.
Bottom line: we cannot simply disregard a technique just because it seems poor at first sight. If a technique exists and it has been in use for a while, then there must be something useful about it.
I do agree with Michael Swart, though: in his example, the biggest problem is that the procedure is named incorrectly. s_OBJECT_Delete() is a poor naming convention in regards to the next DBA which comes to look at the system. However, this does not mean yet that the implementation logic is wrong…
P.S. Great blog, Michael. You just won a new reader. Congrats! :)
Reply
Vit Kymla
September 3, 2010 12:50 pm
And what about foreign keys which refer to the record you want to “delete”? Then you are not able to move the record to another table so simple as you discuss in your post….
Reply
Wilfred van Dijk
September 3, 2010 2:03 pm
Interesting: PostGreSQL have the famous ‘VACUUM’ command to reclaim space by records marked as deleted https://www.postgresql.org/docs/7.4/sql-vacuum.html
Not exactly the sme as described in your blog, but funny to know that this RDBMS is also not physically deleting records.
Reply
Nakul Vachhrajani
September 3, 2010 4:44 pm
Oops…this is something that we just introduced! Ours was a legacy system and earlier, we only had the hard-delete concept. You delete it from the UI, it’s gone forever. However, we have now started giving soft-delete concept for new enhancements. The use of an archive table was debated I am aware, but it was later eliminated because items are not “Deleted”, we now call them “retired” – they can be reintroduced. However, this is an interesting line of thought and definitely something that one should spend a great deal of time on when working on the design.
Reply
robert cook
September 3, 2010 6:11 pm
Dave, at a previous employer we would add not both a deleted date field and modified date field to essentially every table. The tables rarely grew to large sizes so we did not encounter any serious performance problems. The reasoning for the fields given was recoverability. My translation is that the company had been burned before by bad code or careless end users. For highly critical data an “undo” capability may be worth the overhead which you detailed in your post. Also, partitioning could help with performance. The thing I most disliked about the practice was the requirement to add a where clause to every SQL statement to ensure the “deleted” row was always ignored. On more than one occasion a reporting discrepancy was caused because this filter was missing. Thank you for the great content and I look forward to your follow-up post.
Reply
Michael J Swart
September 3, 2010 6:49 pm
Thank you for the kind words.
You explained everything I dislike about this lingering data. You put into words the things I was feeling when I included that item in the list of things I hate.
Thanks again!
Michael
Reply
Paulo
September 3, 2010 7:21 pm
Hi Dave,
There is inherent problem with the backup table you propose. In the example you mentioned, it seamed a very simple situation and the history table would aply nicely. But in some more complex scenarios, you have one row that is referenced by a couple of table, and those table have more references in other table and so on. Some situations in the previous company that I worked at, had about 10 table directly or indirectly associated with one row that had to be deleted. Should I make a complex procedure that moves the row tree of data to a bunch of backup tables?
What I do in this situation is to just flag the rows that have to be removed, and index the flag column (to have good performance accessing the active rows) and create a view that only shows the active row. I use the view in all situations accept when I explicitly have to access the history rows or the flag column.
Reply
JianHou
September 3, 2010 7:47 pm
We do have this mechanism in my company’s primary database, that was why this post caught my attention. As a matter of fact, there is an IsDeleted column in almost every table! So yes, nothing ever gets deleted in the database when it is deleted from the application.
So far we have done nothing as the database size is still small (<100gb) but I suspect we will have to deal with this soon…
Reply
Sandeep
September 3, 2010 7:50 pm
gr8… u opened windows of my mind :) thanks…
Reply
R A Valencourt
September 3, 2010 8:05 pm
Is this not an ideal situation for implementing a partitioned table? I don’t recall the details of how Microsoft has implemented this concept, but…
…if membership in a partition be determined by column value… If a column called Status be defined, with certain values to be in the “active” partition, and other values to determine membership in the “archive” partition, then would the appropriate change of status value cause the record to be transferred automatically to the archive partition?
Reply