SQL SERVER – Cluster Resource ‘AGName’ of type ‘SQL Server Availability Group’ in Clustered Role ‘AGName’ Failed

I never leave my customers alone when they are having an issue with something which I helped them. Typically, I help customers in creating POC and deploying AlwaysOn Availability Groups. Just the other day while doing the Comprehensive Database Performance Health Check, I came across error related to cluster resources.

I must admit that configuring availability group is a piece of cake and smooth as butter but the challenge comes when something breaks in a cluster. A DBA should know about troubleshooting windows cluster so that he can recover from the disaster.

My client contacted me and informed that due to some issues SQL Server availability group is in “Resolving” state in SQL Server Management Studio (SSMS). When they tried to bring the resource online in Failover Cluster Manager, it didn’t work and showed below message in Event Logs.

Cluster resource ‘AGNAME’ of type ‘SQL Server Availability Group’ in clustered role ‘AGNAME’ failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

This is not a very useful message and doesn’t tell us what needs to be done. Here is how it looks like in SSMS.

SQL SERVER - Cluster Resource 'AGName' of type 'SQL Server Availability Group' in Clustered Role 'AGName' Failed ao-resolving-01

If we try to remove the database from the secondary replica, below is the error we got

The database ‘AG_DB’ failed to leave the availability group ‘AGNAME’ on the availability replica ‘DB02’. (Microsoft.SqlServer.Management.SDK.TaskForms)

The local availability replica of availability group ‘AGNAME’ cannot accept signal ‘UNJOIN_DB’ in its current replica role, ‘RESOLVING_NORMAL’, and state (configuration is in Windows Server Failover Clustering store, local availability replica has joined).  The availability replica signal is invalid given the current replica role.  When the signal is permitted based on the current role of the local availability replica, retry the operation. (Microsoft SQL Server, Error: 41121)

What is RESOLVING state in SQL Server AlwaysOn?

When there is an availability group, the replica would be either in primary state or secondary state – when its online in failover cluster manager. Resolving is an intermediate state when the transition is happening from primary to secondary or vice versa. If due to some reason the transition is not successful, it goes to “resolving” state. In this state, the database is not accessible.

What can we do?

First, we need to find the cause why it’s not coming online. There are multiple logs which need review.

  1. SQL Server ERRORLOG. SQL SERVER – Where is ERRORLOG? Various Ways to Find ERRORLOG Location
  2. Cluster Log: SQL SERVER – Steps to Generate Windows Cluster Log?
  3. Windows Event Viewer.

In most of the situations, cluster logs give the right message and cause. There are various blogs which could have various causes and I would continue sharing my knowledge if I find more, like below.

SQL SERVER – Always On AG – HADRAG: Did not Find the Instance to Connect in SqlInstToNodeMap Key

WORKAROUND/SOLUTION

Based on error messages and situations sometimes we need to perform force failover of the availability group. Perform a Forced Manual Failover of an Availability Group (SQL Server)

If the error message makes sense and you are able to solve an issue, please share via comments.

Reference: Pinal Dave (https://blog.sqlauthority.com)

AlwaysOn, SQL Error Messages, SQL High Availability, SQL Server, SSMS
Previous Post
SQL SERVER – You Are Not Logged on as the Database Owner or System Administrator
Next Post
SQL SERVER – The Older Way to Find Size of Index Using Deprecated sysindexes

Related Posts

Leave a Reply