When I was on-site for the performance tuning workshop, few DBAs suddenly got call as there was an unexpected downtime of SQL Server. As per them, after some maintenance activities were scheduled, the SQL Server resource failed to come online on both nodes. When they try to bring it online, it remains in online pending state for some time before failing eventually. Here are the things I tried. Let us see an error related to clustered SQL Resource not coming online.
- We tried starting it from services.msc and SQL Server service could start successfully.
- When we checked SQL Server ERRORLOG, and found no errors when bringing online in a cluster.
- As a last resort, I generated cluster log and found the below SQL SERVER – Steps to Generate Windows Cluster Log?
- As per above we can see that the cluster service was not to connect to SQL Server
- The error was Login failed for user ‘PRODUCTION\SQLNODE1$’
- This means the account; ‘PRODUCTION\NODE1$’ did not have login permission on SQL Server. This is actually the local machine account (NODE1 was the machine name).
We started SQL Server from services console and gave the sysadmin server role of the LocalSystem (NT AUTHORITY\SYSTEM) account. Here was the command
ALTER SERVER ROLE [sysadmin] ADD MEMBER [NT AUTHORITY\SYSTEM]
After this, we stopped SQL Server from the services console and tried bringing it online from cluster administrator and it succeeded.
As a safely measure, we tested a failover and failback between the SQLNOD1 and SQLNODE2 and it worked perfectly.
Reference: Pinal Dave (http://blog.SQLAuthority.com)