SQL SERVER – Soft Delete – IsDelete Column – Your Opinion

September 3, 2010

Just a day ago, I was reading the blog post of Michale J Swart. If you are a regular reader of this blog, I am sure you will be familiar with him. He is a very interesting blogger for sure. He recently wrote an article about Ten Things I hate to See in T-SQL; it was really fun, but the thing which caught my eyes was the subject of isDeleted Column. First of all, let me say that I totally agree with his view point. Let me re-produce what Michale exactly suggests.

“Deleted records aren’t deleted. Look, they’re right there!”

You sometimes see this. Records that are marked as deleted with a flag. It’s sometimes needed to support an undelete feature but it still bugs me.

It’s also confusing when looking at procedures called s_OBJECT_Delete(). Without looking at the definition, does this procedure delete the record, or just mark it as deleted?”

SQL SERVER - Soft Delete - IsDelete Column - Your Opinion recyclebin

I, in fact, encountered a similar experience very soon. I had gone for performance tuning consultation and I was reviewing a large table. I spotted one table called Orders. Naturally, the size of the table was in millions of the rows. I thought – it is fine, a table can have that many rows. Then I checked another table called customers and it had under thousand records. The question which came to my mind was how come thousand customers ordered millions of items. I asked local DBA coordinator the same question. He said oh, we just cleaned the customer table but the orders table is yet to clean – consider it as a smaller table.

I asked him how can I consider the orders table as a small table when it is too large. He said, well we have most of the column marked as isDeleted.

I just do not like this kind of design at all. I am firm believer of the architecture where only necessary data should be in single table and the useless data should be moved to an archived table. Instead of following the isDeleted column, I suggest the usage of two different tables: one with orders and another with deleted orders. In that case, you will have to maintain both the table, but in reality, it is very easy to maintain. When you write UPDATE statement to the isDeleted column, write INSERT INTO another table and DELETE it from original table. If the situation is of rollback, write another INSERT INTO and DELETE in reverse order. If you are worried about a failed transaction, wrap this code in TRANSACTION.

What are the advantages of the smaller table verses larger table in above described situations?

A smaller table is easy to maintain
Index Rebuild operations are much faster
Moving the archive data to another filegroup will reduce the load of primary filegroup (considering that all filegroups are on different system) – this will also speed up the backup as well.
Statistics will be frequently updated due to smaller size and this will be less resource intensive.
Size of the index will be smaller
Performance of the table will improve with a smaller table size.

I think we can count many other advantages. Again, this is totally my opinion and I would like to ask you question, do you have this kind of soft delete mechanism in your database? If yes, what is the usage of it and how do you handle this. This is very interesting question to me; so I may publish the result on blog post as well.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQLAuthority News – SQL Server Health Check Service – Speed UP SQL Server

SQLAuthority News – Social Media Confusion – Twitter, FaceBook, LinkedIn and Me

No results found.

48 Comments. Leave new

Husain
September 3, 2010 8:44 pm
I must confess that I have designed a database with an IsDeleted column. My question is: how would deleting a record from the Orders table and inserting it into an ArchivedOrders table work if another table has a foreign key reference on the OrderID column of the Orders table? Assume that OrderID is an autoincrement column.
Reply
Malathi
September 3, 2010 11:11 pm
I guess what is appropriate depends strongly on application requirements and what ‘delete’ means to a developer, user and a DBA can be 3 completely different things. For a user it means data does not appear when he looks at it, for a developer it is gone from his query results and for a DBA it is about where it is gone physically. There may also be reporting situations particularly in high compliance companies that need ‘deleted’ data, businesses that have lot of turnover such as call centers have lot of careless employees who do accidental deletes.In some situations it may be appropriate to keep it in same table with a flag, some situations to archive it to a different table or in some situations to dump it all completely. It might be appropriate for the team to get together and decide what is appropriate for the business and implement correctly.
As a DBA i definitely prefer a flag to restoring a huge databse for someone who did a careless delete!!
Reply
Jeff
September 6, 2010 7:13 am
Our table structures contain a DeleteDate field in MOST tables.. for our large CRM-like SAAS database, it is crucial that we do NOT physically delete records, as this will destroy our billing history for our customers.
Each DeleteDate field is a SmallDateTime, defaulted to 6/1/2079 (the max value on a smalldatetime is roughly at 6/6/2079 at 11:59AM) and we filter all records in our queries as having a DeleteDate > GetUtcDate() With a small, simple index on the deletedate… we can accurately and quickly filter these records out.
As someone else had suggested, once our database grows large enough, we have already planned to partition these records away from the bulk of the table data.
With more than several million records in some tables using this structure, we are still not seeing any significant performance hit.
Reply
Kenny Eliasson
September 6, 2010 2:31 pm
I more have a problem with the word “IsDeleted”. It’s not deleted, it’s “discontinued”, “quitted” or something with a more fitting name.
Reply
Ramdas
September 7, 2010 8:46 am
In certain systems the records cannot be physically deleted due to the business rules which govern them. We design tables to achieve like a Isdeleted kind of functionality. So i feel it depends on business needs where in in some situations soft delete kind of methods have to be employed.
Reply
Simon
September 7, 2010 1:33 pm
I think the reason for isDeleted column is simple. You have orders linked to a price list table, then after new year they create a new price list and delete the old one (since it is no longer used). They go home happy and come back the next day…
Suddenly all of last years orders have “disappeared” from the system, mass panic ensues everywhere.
The orders table query is no longer able to join orders with an Id in the price list table, so those orders are not returned by the query. Since a system will have many changes and additions over its lifetime it is hard to know what records you really can remove from the table (archive) and which ones have to be their because they are part of a historic relationship (i.e. you have to know the changing application design to make that call).
So you fix that kind of problem forever by telling the developers to never delete a record just mark it as deleted, and even novice developers can follow those instructions.
Basically “Delete” to the user really means “Hide” and isDelete solves this problem. It would be possible to automatically archive “mistakes” by identifying records with no relationship to any other table.
Reply
- jonmcrawford
  September 7, 2010 6:31 pm
  Simon, wouldn’t that situation make more sense to have an effective/termdate on the price? Once you have more than one cycle of a new price, you’d surely want to know which old price belonged to which timeframe? Otherwise you’ll end up with multiple prices with isDeleted=TRUE, and couldn’t tell if you raised or lowered the price from the original.
  Reply
  - Jeff
    September 7, 2010 8:37 pm
    Jon,
    I completely agree with you on the effective dating of the price list. I would normalize the price to a separate table such that there is NO price in the products table.. and the price is stored with a foreign key to the Product, a start and end date range.
    However, I do have this SAME scenario… and my design is to copy the price extended for a product into my OrderLineItem table at the time of checkout. In this way, if the Price changes in the product table, the orders are not effected.
Greg
September 8, 2010 6:34 am
For the idea of an extra “deleted” table to match each data table, you could use a trigger on each of the tables to automatically insert the deleted record into its corresponding “deleted” table, instead of having to worry about putting that logic into your stored procedures. This could also be implemented without the application even knowing about it.
This seems to work for cascaded deletes as well, as long as the foreign key “deleted” tables don’t try to maintain referential integrity with their primary key “deleted” table (as the cascade operation deletes rows in the foreign key tables first, meaning that the foreign key records would be inserted into their “deleted” tables before their associated primary key record was inserted into its “deleted” table).
“Un-deleting” would of course then be the problem, in which you would definitely need some application logic to perform that task.
Implementing the ‘deleted’ column and having to remember to include the ‘where deleted = 0’ in each query is still much easier and less time consuming to implement than an extra table for everything in your database (and having to maintain those tables if your schema changes). A few unit tests along with some simple test data could verify that your queries are correct as well (which you should most likely implement anyway). Not to mention, it is very easy to un-delete records, even if they have many foreign key relationships.
But if you’re having a performance problem and need data out of a large table, perhaps some triggers are the way to go instead of messing around with your application code, and having to possibly write many statements in many different procedures. You should probably write a few “un-delete” procedures for when the need for that arises, though.
Reply
- Marko Parkkola
  September 8, 2010 10:54 am
  “But if you’re having a performance problem and need data out of a large table, perhaps some triggers are the way to go”
  I’m not really sure about that. Triggers may cause more performance problems if table is under heavy updates/inserts/deletes.
  If you need to actually delete the rows and there’s performance problems I would go with what Feodor explained above. Mark rows as “deleted” and run the actual deletion at scheduled times.
  Reply
  - Greg
    September 8, 2010 10:10 pm
    Ah, yes, sorry if I wasn’t clear, I was speaking along the lines of automatically maintaining the table(s) for future read performance, while still maintaining the ability to un-delete.
    But the case that was discussed of using a ‘deleted’ column in a highly transactional table, and then performing the actual deletes later is definitely another valid one.
Doug Rathbone
September 10, 2010 5:20 am
While your post was interesting reading, this sounds a lot like jumping to the conclusion of “optimising too early”.
I am looking at it from a developers view only (not a DBA or SYSADMIN’s) and keeping things in a single table is a simpler thing to write and manage (from a code level) and leads to faster development time. If it becomes an issue long term performance wise, THEN move the data into a second table – not simply because it is a DBA’s prerogative to have everything “perfect”.
Too often as people in IT progress in their trade do they appear to build castles in the sky without real world reasoning to back it up. Developers fall into this trap as well with over-architecting. I believe that solutions should be aimed at solving the problems they are designed too.
Either way you have a new subscriber :-)
Reply
- jonmcrawford
  September 10, 2010 6:17 pm
  This is why DBA’s hate us developers. We shouldn’t be saying we should only optimize once it’s become a problem, we should be avoiding problems to begin with. Just because it’s easier on one end of the process doesn’t mean it’s the right thing to do.
  Reply
  - DaveCWright
    July 10, 2018 12:02 am
    The question, Jon, is *what is a problem*? It is not always obvious at the outset where or what the problems will ultimately be. There are essentially two approaches – the shotgun approach or the sniper approach. With the shotgun approach you *might* get everything, but most of them will never be a problem in the first place (so you wasted time and effort) – and there is still the possibility that after all of that effort, the only thing that ultimately *does* present as a problem went unforeseen and thus unaddressed. With the sniper approach, you only address known problems. The latter is far more efficient, but ultimately requires that you wait until something presents as a legitimate issue. The downside is that after the fact, you’re open to criticism that you should have fixed the problem in the first place, but, of course hindsight is always 20/20.
Chinwendu
December 6, 2010 5:33 pm
I think it all depends on the business process and the type of application you find yourself in. But from a users point of view, delete means i dont need it anymore or dont want to see it again, DBA always want to be able to recover records but at whose expense e.g if i delete one or two mails from my box why does yahoo want to keep it for me, but from a sales application if he/she deletes a sales record, it is important for the business owner to know where deleted records go to and why they where deleted.
But moving the deleted records to another table is a whole waste of time and energy for developers to code with that mentality.
in summary you business process determines alot and your foresight as a solution provider is very needful in that effect.
Reply
Pablo Gimenez
January 4, 2011 1:04 am
I just found this article and I think it’s a great discussion. It was really helpful to me.
I only want to make a small contribution to the thread: in the case you decided to use the IsDeleted, isActive or IsWhatever flag, to avoid having to add the isDeleted=0 in all your queries one solution that we found is to create views exposing only active fields.
So you may have a view vwActiveOrders (select * from orders where isDeleted = 0) which you can use in your queries, to join it to other tables, etc without having to remember to add isDeleted = 0.
Hope it helps,
Pablo
Reply
Santosh Bhanutej
March 15, 2012 11:25 am
Hello Guys,
We have implemented this soft delete mechanism in our data warehouse. We are getting few invalid records from source which should not be visible to the end users.
But we have one more component which makes referene of all the records in our datawarehouse. Hence we should not hard delete these invalid records. Hence we have introduced a flag which specifies the validity of the flag.
Thanks,
Santosh.
Reply
sudheesh
April 27, 2012 4:11 pm
Hi
am developing one application for vehicle tracking system. one service connected with thousands of vehicles . each vehicle sending data in 10 seconds delay. am storing values in one table called msglog and events stored in another table msgevnts. how can we improve the performance of the database ?. every month am getting more than millions of data in to that tables so please advice me
Reply
mxr
May 18, 2012 9:39 pm
Nice discussion!
Someone at my work suggested removing all the constraints on the table to allow quicker deletion. Any thoughts on this?
Reply
tiger33
January 27, 2013 4:01 pm
Yep i’m also finding using the flag of create benefit.
sometimes users make mistakes and dont mean to delete
also in the case of history you want the user to be able to mark a record as no longer in user but still show the previous history for what happens .
we also have procedures in place which run hard deletes (with tombstone tables to keep history) at set periods of time to help clean up the database.
this to me is the best approach – mostly because we do have constraints on the tables and deleting first hand would fail as we dont use cascades (cascading deletes is just lazy if you ask me) you want to be fully warned if a delete would cause relationships to be invalid so thats what we do – attempt a delete with a reference then the system warns you to delete with it first before continuing with the delete.
Its more work to build the system like this but worth it in the long run.
Reply
tiger33
January 27, 2013 4:02 pm
Yep i’m also finding using the flag of create benefit.
sometimes users make mistakes and dont mean to delete
also in the case of history you want the user to be able to mark a record as no longer in user but still show the previous history for what happens .
we also have procedures in place which run hard deletes (with tombstone tables to keep history) at set periods of time to help clean up the database.
this to me is the best approach – mostly because we do have constraints on the tables and deleting first hand would fail as we dont use cascades (cascading deletes is just lazy if you ask me) you want to be fully warned if a delete would cause relationships to be invalid so thats what we do – attempt a delete with a reference then the system warns you to delete with it first before continuing with the delete.
Its more work to build the system like this but worth it in the long run.
Reply
Ravi Sharma
July 25, 2013 10:23 am
Hi Pinal sir,
This is the very first comment I have on your blog, I have been reading your blog from pas few months and I found it knowledgeable and interesting.
Now about the soft delete We had Isdeleted column as flag for deleted record in past but We introduced terminatedate concept for soft delete in our database 2 year ago and It helps us a lot in keeping the audit with dates. We used “create date” and “terminate date” as column in tables and we also had “create userid” and “terminate userid” to identify who created and terminated the record. With every update we terminate the previous record and create new record with changes. We easily keep and move the data with the help of partitioning and with scheduled jobs.
Thanks,
Ravi
Reply
Garrett
August 20, 2013 5:19 am
Why not use table partitioning or a filtered index instead? You can’t always just delete data, if there are foreign key constraints things can get nasty pretty quick.
Reply
Karthik
August 3, 2015 11:47 pm
Hi Pinal,
How about creating filtered index for IsDeleted =0 records, instead of thinking about changing the design.
Reply
Rob Horrocks
December 2, 2015 3:22 pm
Has anyone ever thought about using partitioning on the IsDeleted column? As data is marked as deleted it will be physically moved from one partition to another (only a gradual data movement) but this leaves you with an option to only query data in the non deleted partition using $partition, or switching out deleted data for mass deletes without fragmenting the index or causing blocking.
The process used to mark data as deleted could be a simple update statement, or a frequently running process that archives data and marks as deleted once they have gone over a date threshold. This process could then be used to mark records as deleted (archived) from several related tables at the same time ensuring related data is also deleted (archive).
The initial data movement would cause a bit of fragmentation in the primary partition but apart from that I can’t see any other issues. I’d just be interested to see if anyone else has every thought of this approach and whether it caused any problems?
It’s not always simple to archive data just on date alone so I’m looking for ways to do this while maintaining good performance on an OLTP server.
Reply
Rama G
November 6, 2018 2:25 pm
I am using isDeleted Flag to achieve the soft delete. Before setting the flag IsDeleted = 1, i want to check whether this record is referenced by other tables. Is there any way to achieve this
Reply