SQL SERVER – The Lease between Availability Group ‘PRODAG’ and the Windows Server Failover Cluster has Expired

March 27, 2017

This is one of the common errors I have seen while working with customers who are using SQL Server AlwaysOn availability groups. Once this error comes, the resource in the cluster goes to failed state and in SQL Server Management Studio, we should be an availability group in resolving state. Resolving state essentially means that the role of availability group is neither primary nor secondary.

SQL SERVER - The Lease between Availability Group 'PRODAG' and the Windows Server Failover Cluster has Expired AlwaysOn-800x201

Here is the snippet from ERRORLOG when the lease expires. I have tried to explain the meaning of each line.

2017-02-27 19:31:07.34 Server Error: 19407, Severity: 16, State: 1.
2017-02-27 19:31:07.34 Server The lease between availability group ‘PRODAG’ and the Windows Server Failover Cluster has expired. A connectivity issue occurred between the instance of SQL Server and the Windows Server Failover Cluster. To determine whether the availability group is failing over correctly, check the corresponding availability group resource in the Windows Server Failover Cluster.

Above message means the lease between windows cluster and SQL Server.

2017-02-27 19:31:07.34 Server AlwaysOn: The local replica of availability group ‘PRODAG’ is going offline because either the lease expired or lease renewal failed. This is an informational message only. No user action is required.

Due to lease renewal failure, the AG resource in the cluster would go to failed state.

2017-02-27 19:31:07.34 Server The state of the local availability replica in availability group ‘PRODAG’ has changed from ‘PRIMARY_NORMAL’ to ‘RESOLVING_NORMAL’. The state changed because the lease between the local availability replica and Windows Server Failover Clustering (WSFC) has expired. For more information, see the SQL Server error log, Windows Server Failover Clustering (WSFC) management console, or WSFC log.

Since AG is failed, the AG state in SQL Server would change from PRIMARY to RESOLVING_NORMAL. At the same time, on secondary, we should see state from SECONDARY_NORMAL to RESOLVING_NORMAL

Now the next challenge would be to find WHY lease was expired.

SOLUTION/WORKAROUND

Based on my research on the internet, I found that in most of the cases, the lease gets expired due to shortage of resources on the machine. You can think of this as a “momentarily hang” of windows operations. Generally, we should look at the cause of slowness. The client with whom I worked, I could see tons of IO related messages like below.

2017-02-27 19:23:26.09 spid35s SQL Server has encountered 244 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [M:\MSSQL\SAM_PRODUCTION.MDF] in database id 21. The OS file handle is 0x0000000000001B44. The offset of the latest long I/O is: 0x00004fcfb76000

I have explained above issue in below blog SQL SERVER – WARNING – SQL Server Has Encountered N Occurrence(s) of I/O Requests Taking Longer Than 15 Seconds

We also saw below, which again points to IO slowness.

2017-02-27 19:31:05.09 spid18s average writes per second: 1225.28 writes/sec
average throughput: 86.19 MB/sec, I/O saturation: 149511, context switches 190629
2017-02-27 19:31:05.09 spid18s last target outstanding: 278, avgWriteLatency 38

Till you find the actual cause, you can increase LeaseTimeout value so that the AG is remaining healthy

But remember that we have not fixed the issue, but applied band-aid.

Here are few more things to do.

Limit Max Server Memory of SQL Server, if not capped.
Consult your storage team for storage performance issues, since we see many IO stalled messages.
Enable lock pages in memory. This will prevent work set trimming and prevent it from being paged out. Please refer to the below link https://msdn.microsoft.com/en-IN/library/ms190730.aspx

Have you found something more than above?

Reference: Pinal Dave (http://blog.SQLAuthority.com)

AlwaysOn, SQL Error Messages, SQL Server, SQL Server Cluster

SQL SERVER – Could Not Use View or Function ‘msdb.dbo.sysdac_instances’ Because of Binding Errors

SQL SERVER – Huge Size of Table catalog.execution_component_phases in SSISDB

4 Comments. Leave new

Anil
March 27, 2017 4:51 pm
There are different opinions on enabling lock pages in memory (LPIM), setting MIN Server Memory also makes sense.
Thank you Pinal !!
Reply
- Pinal Dave
  March 29, 2017 7:48 am
  Thanks Anil.
  Reply
James Hannon
January 6, 2021 5:47 am
This actually doesn’t have a specific relation to poor memory/storage performance.
The key is in the ‘lease renewal’ message indicated in the Log. This indicates the Client Access Point was unable to renew it’s lease on the DNS A record it depends on outside of the Failover Cluster.
As a workaround, manually create the DNS A record, assigning the Cluster Name Object Full control in DNS against it. For good measure, run through a non-Storage resource Cluster Validation afterwards.
Reply
ema
November 26, 2024 3:51 am
how much lease timeout to increase? also the health check would be increased accordingly right?
Reply

July 2025 Discount: Comprehensive Database Performance Health Check | Testimonials

SQL SERVER – The Lease between Availability Group ‘PRODAG’ and the Windows Server Failover Cluster has Expired

SOLUTION/WORKAROUND

Related Posts

SQL SERVER – Database Backup and Restore Job Management

SQL SERVER – FIX: Error: 1807 Could not obtain exclusive lock on database ‘model’. Retry the operation later.

How to Show Results of sp_spaceused in a Single Result? – Interview Question of the Week #135

4 Comments. Leave new

Leave a ReplyCancel reply