In the past few days, I am being contacted by clients for AlwaysOn related issue. I have been writing a blog about them. In this blog, we would learn about how to fix wait for HADR_AR_CRITICAL_SECTION_ENTRY.
My client was using SQL Server in a virtual environment. Due to some instability with their network infrastructure, windows cluster lost quorum for few minutes and then it came back. As you might know that AlwaysOn availability group is tightly coupled with windows server failover cluster, so anything happening in the cluster could also impact AlwaysOn availability group. That is what precisely has happened here.
As usual, they sent me an email, I responded back with GoToMeeting details and we were talking to each other in a few minutes. When I joined the call with them:
- All of our AG modification queries (removing availability database, removing availability replica) were stuck waiting on HADR_AR_CRITICAL_SECTION_ENTRY.
- We were unable to make modifications to the AG as it was in an inconsistent state, pending updating the state of the replica.
- As per the Microsoft docs – Occurs when an Always On DDL statement or Windows Server Failover Clustering command is waiting for exclusive read/write access to the runtime state of the local replica of the associated availability group.
Based on my search on the internet, restart of SQL instance is the only way to come out of this.
We set the AG failover to manual and restarted both replicas; after doing so, our secondary replica became synchronized after a few minutes and we were able to successfully remove databases from the AG. We tested failover back and forth, and everything was working as expected.’
Have you seen this wait in your environment? It would be great if you can share the cause of that via comments and how did you come out of it.
Reference: Pinal Dave (https://blog.sqlauthority.com)