Have you even been in a situation where system processes are taking CPU and not moving ahead? Recently, one of my clients showed me a system where checkpoint on the user database being blocked indefinitely by ghost cleanup.
Since I have never seen that behavior earlier, I decided to dig little deeper. I asked them to share SQL Server ERRORLOG files. I found below information in those files.
Error: 845, Severity: 17, State: 1.
A time-out occurred while waiting for buffer latch — type 3, bp 0000000152CE2980, page 10:4240576, stat 0x5c00008, database id: 6, allocation unit Id: 72057595261550592, task 0x00000000092125C8 : 0, waittime 300, flags 0x3a, owning task 0x000000000424A868. Not continuing to wait.
Whenever I see some strange error like above, I first try to make sure that there is no corruption in the database, so I asked them to run DBCC CHECKDB on DBID 10, as mentioned in the above error message. It came back with the following lines at the end of the output
CHECKDB found 0 allocation errors and 40 consistency errors in database ‘iFLUXDB’.
repair_rebuild is the minimum repair level for the errors found by DBCC CHECKDB (iFLUXDB).
In my client’s case, it was caused due to corruption in the database. If we ran DBCC CHECKDB with a repair option provided. It was a sheer luck that repair_rebuild was suggested and we were safe to run it without causing data loss.
If you have same situation due to corruption, then evaluate the possibility of fixing corruption because you might get repair_allow_data_loss as minimum repair level. Restore from backup is always a preferred way as compare to repairing with data loss.
Note: If you have no experience with fixing corruption, I suggest you contact expert rather than attempting this yourself as there are chances to loss data.
Reference: Pinal Dave (http://blog.SQLAuthority.com)