SQL SERVER – AlwaysOn – Queries Waiting for HADR_AR_CRITICAL_SECTION_ENTRY

SQL
3 Comments

In the past few days, I am being contacted by clients for AlwaysOn related issue. I have been writing a blog about them. In this blog, we would learn about how to fix wait for HADR_AR_CRITICAL_SECTION_ENTRY.

SQL SERVER - AlwaysOn - Queries Waiting for HADR_AR_CRITICAL_SECTION_ENTRY alwaysonerror

THE SITUATION

My client was using SQL Server in a virtual environment. Due to some instability with their network infrastructure, windows cluster lost quorum for few minutes and then it came back. As you might know that AlwaysOn availability group is tightly coupled with windows server failover cluster, so anything happening in the cluster could also impact AlwaysOn availability group. That is what precisely has happened here.

As usual, they sent me an email, I responded back with GoToMeeting details and we were talking to each other in a few minutes. When I joined the call with them:

  1. All of our AG modification queries (removing availability database, removing availability replica) were stuck waiting on HADR_AR_CRITICAL_SECTION_ENTRY.
  2. We were unable to make modifications to the AG as it was in an inconsistent state, pending updating the state of the replica.
  3. As per the Microsoft docs – Occurs when an Always On DDL statement or Windows Server Failover Clustering command is waiting for exclusive read/write access to the runtime state of the local replica of the associated availability group.

SOLUTION/WORKAROUND

Based on my search on the internet, restart of SQL instance is the only way to come out of this.

We set the AG failover to manual and restarted both replicas; after doing so, our secondary replica became synchronized after a few minutes and we were able to successfully remove databases from the AG. We tested failover back and forth, and everything was working as expected.’

Have you seen this wait in your environment? It would be great if you can share the cause of that via comments and how did you come out of it.

Reference: Pinal Dave (https://blog.sqlauthority.com)

, , ,
Previous Post
SQL SERVER – Script level upgrade for database ‘master’ failed because upgrade step msdb110_upgrade.sql encountered error 926, state 1, severity 25
Next Post
SQL SERVER – Initializing the FallBack Certificate Failed With Error Code: 1, State: 20, Error Number: 0

Related Posts

3 Comments. Leave new

  • Was there any data loss during the downtime? We’re also trying to setup the same in our environment.

    Reply
  • We ran into this issue just today. Something as yet unknown caused quorum to be lost. Eventually failed over. Except one of our DBs was in “not synchronizing” on both the primary and replica, and therefore not available for the application. We then saw the same wait type with any action on the AG itself. Our only solution was instance restarts followed by full reboot of the current primary. Did you end up finding a root cause? SQL 2016 SP2-GDR

    Reply
  • We had similar issue in our prod env. We have 3 node cluster where 2 primaries share same secondary. We had issue with one of the sets while other set kept working through out. Issue started with lease timeout expiration. First time cluster was able to recover on its own but second time around synchronization was suspended and secondary went into resolving state. All databases on secondary had a Red cross. Similar to what you experienced none of the AG modification queries worked and they kept waiting on “HADR_AR_CRITICAL_SECTION_ENTRY “.

    We are on SQL server 2017

    Reply

Leave a Reply

Menu