I always recall my fun days after talking to my friends and blog readers. Recently killed/rollback discussion came back when my close DBA friend called me for help. Our discussion was worth blogging.
If you are a SQL DBA or developer, I am sure you must have seen something like below:
The first question which would come to mind would be “what is killed/rollback command?” The answer is very simple – sometimes, DBA/Developer use “KILL” command to end processes that appear in a hung state. Once that happen, if SQL knows how to rollback that work, it would start “real” rollback process and would undo the work done. There are situations where SQL doesn’t know and has no control of work done of the work done and it has no control to roll back the work. These are the cases where its stays in KILLED/ROLLBACK state forever.
How would you decide which action to take? It completely depends on the action done by the SPID which was killed. There are only a few things, which you can do:
- Have patience and wait for rollback to finish. Some large operations may take a long time to rollback. Such SPID should show a change in CPU, Memory and IO column in size. sys.sysprocesses DMV (query in the next step).
- If the SPID is not showing any change in CPU, Memory and IO column then it is doing nothing. Here is the query which can tell us more about the SPID and from where the connection might have originated.
SELECT spid ,kpid ,login_time ,last_batch ,status ,hostname ,nt_username ,loginame ,hostprocess ,cpu ,memusage ,physical_io FROM sys.sysprocesses WHERE cmd = 'KILLED/ROLLBACK'
If the process is originated from different machine, then using task manager and doing end process on that machine might help. This decision has to be taken very carefully as it might be a process of a 3rd party application.
After figuring out the SPID we can run the DBCC INPUTBUFFER (<SPID>) to know what is the batch running under this session.
- Restart SQL Service if it was killed and doing nothing.
Can you reproduce it? The simplest way is to use xp_cmdshell and open notepad.exe using below command. DON’T DO THIS ON PRODUCTION
Now, it you kill this SPID it would go to KILLED/ROLLBACK state.
In summary, it is important to understand the cause. Killed/Rollback SPIDs are waiting for an event that will never happen. It could be a real “rollback” situation where SPID is rolling back and trying to get the database into a consistent state. By restarting the SQL Server service, you are postponing that rollback (or roll forward) work to the recovery step that runs before the database becomes available.
Have you ever come across such situation?
Reference: Pinal Dave (https://blog.sqlauthority.com)
I am facing the same issue, I was trying to alter a procedure, which started taking a long time, so I stopped the execution. After that we are not able to do alter/create that procedure.
I can see that process in ‘killed/rollback’ state. How do we kill it.
Kill psid gives following error
‘SPID 52: transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 0 seconds.’
I tried restarting my machine, from where it was executed, restarted SQL server, restarted Oracle Server. but nothing seems to work.
Basically this procedure is updating Oracle tables using DB link. There is one table which is not giving any response to sql server over db link. We tried recreating Db link and also tried after dropping this table form Oracle.
We have spent days to fix it, Could you pls suggest something. Thanks
Before Killing the spid, what was the wait type?
I’ve been fighting this same problem on my SQL Server 2014 for about 4 months. I see the problem when the job is trying to use a link server to an Advantage Database and the query is using OpenQuery. I can kill the job but it will never clear and prevents other jobs from using that link server. The only solution is to reboot the server. If I’m lucky I will run a few days before another job stalls out.
How can I get the history of killed processes ?
I killed a job sometime ago and now I need to know the name of the job I killed !!
ERRRORLOG would have KILL command but not very detailed.
is there a trace flag can be used to force delete the table while it is in rollback?
No Jack. I don’t think there is such a trace flag.
Hi Pinal, I took a copy only database backup from Log shipping primary database. I restored that database backup in another server. Restoration process percentage shows 100% and completed. But that restoration command was running for long time. So. I just killed this process. Now it went to KILLED/ROLLBACK state. Restored database status shows online but it is not accessible. I am not able to do anything on this database. Could you please guide how to make this restored database accessible?
If restore is showing 100% and staying there then I would assume that this is due to huge T-log file. Please check ERRROLOG and see if recovery has started. if log is huge and there are many VLFs then you need to wait. KILL is not a solution.
Thanks Pinal. Yes. My T-log file size is 209GB with three VLFs. Third log file size alone 194GB. After the recovery process completed 100% then it’s taking the nearly 6 hours to complete the background process. Could you please suggest how to avoid this long running database refresh process?
My Source database has Log shipping, replication and CDC enabled. Size is around 1.3TB included the Log files size 209GB.
Is there any additional steps I need to do before taking the backup from source database?
209 GB is still a big size. Other than shrinking log file, nothing else comes to my mind. Is the disk OK on this server? What is the SQL @@version?
Yes. We added additional disk space. SQL Server 2008 R2 version. If we shrink the logfile on primary server, Is that creates any issue on Loshipping?
No issue with log shipping, it would shrink automatically on secondary also.
To get faster recovery times after a restore.
Try shrinking you logfiles 1 time to near zero value.
When completed, resize them to the required sizes but in batches of say 5 Gb if you need a 100Gb logfile.
I’m pretty sure, that if you run DBCC LOGINFO onto your database in the current configuration, it would show a massive ammount of records.
Those records represents the “VLF” Pinal was talking about.
A VLF is a “Virtual Log File” and they need to be read 1 by 1 in the recovery process.
The reason you have so many of them (having very long log recovery times) is most likely a small autogrowth setting on those logfiles.
When it’s exspected to get bigger logfiles, then its also wise to set autogrowth to a bigger number fixed size lige 1Gb for example.
Further: If the database is in Full Recovery mode, make sure you take sufficient logbackups, otherwise the log file will be evergrowing.
PS: I think i read you have 3 physical logfiles (your answer to the VLF question, but 3 is an impossible number for VLF)., make note that it does not serve anything to have multiple logfiles, as SQL Server works sequential in logfiles, The only reason it would make sense is in a temporary solution where you run out of space and need some extra logspace fast.
Hi Pinal, One of my database is not brought up properly. It’s in DB Startup stage (SPID:29) in very long time. When I analyzed about this and found it’s because of disk space issue. Disk space got full due to that database was not able complete it’s own system background process.
Now I don’t want to add more space instead I want to drop the database.
But while dropping the database I am getting the error message ‘User does not have permission to alter the database or database not exists or the database not in a state to allow access checks’
Could you please help me to drop the database on this stage?
Also, I cannot stop the SQL services because other databases are running on this instance.
I don’t think there is a option other than stopping SQL Service and getting rid of the handle on those database files. you are getting stopped due to recovery thread.
DO NOT RESTART SQL SERVER as that might put the entire database in recovery. That essentially means escalating from table not available for update to database not available for anything!
Do you have better suggestion?
my query is related to this blog only. I executed one query which insert record while creating table at same time ( like Select * into A from B) I did this earlier as well and it usually complete in 20-25 mins for maximum 4 Cr records. I did this activity this morning again but today it was running for more then 1.5 hour, since it was production I tried cancelling the query but again ran into loop and didn’t stopped for around hour so finally I KILL the session and since then session is in SUSPENDED state with KILL/ROLLBACK state, I checked cpu and physical io which is still increasing. Can you suggest anything here
Is values are increasing then its good. You should wait for rollback to finish.
after stopping the service that created the connection(spid), i could not delete my database(i needed to rebuild it.) I discovered I could take the database offline and the spid that would not be killed disappeared. this isn’t applicable to everyone but…
Every comment, which can help someone, is valuable. Thanks @Sean
68 ADMINISTRADORBD sa 501712688 user_transaction 2019-02-27 16:14:27.450 8 BBHQ
68 ADMINISTRADORBD sa 501712688 user_transaction 2019-02-27 16:14:27.450 2 tempdb
68 ADMINISTRADORBD sa 501712688 user_transaction 2019-02-27 16:14:27.450 7 HOVISYS
68 ADMINISTRADORBD sa 501712688 user_transaction 2019-02-27 16:14:27.450 4 msdb
Hello everybody. I have this active transactions related to a KILLED/ROLLBACK spid which is not showing any cpu, i/o change. This I believe was originated by an user who left an “begin tran” statement unnatended after a crash. We found out next day and killed the spid.
I have read about restarting the service, but I wonder if having those transactions related to different production database may lead us to a recovery state on multiple databases…
I have this situation, I have linked server procedure to update data in SQL from oracle. Some day back due to some reason I have to kill the running procedure, since then the procedure is stuck in KILLED/ROLLBACK state.
Now when I try to run the same procedure it create a new SPID and do nothing. Also I’m not able to modify the procedure because its say block by previous SPID and also not able to execute the procedure.
Do we have any other alternative apart from restarting the server ?
This just happened to me for first time on a dev server. As an act of desperation to avoid having to restart the service, I killed the process that the zombie process was blocking. Lo and behold I check and both were gone. So, depending on how others can handle the blocked process being killed too, this could be another workaround.
my bank is with simple registration locked for more than 2 days, I didn’t want to restart the service.
Is there a way to kill the process?