It’s always a good experience to visit customer sites and talk to people. Sometimes I get to see things outside SQL world as well. There is a lot to learn and I believe that I can do that by sharing what I learned. In this blog post we will discuss Why Cluster Network is Unavailable in Failover Cluster Manager?
During my last visit to an India based company, I was talking to a windows admin during lunch and he was talking about a cluster issue. It was an interesting conversation where he told that sometimes a reboot is THE solution to solve a problem. He told me an incident where Cluster networks were shown as unavailable in failover cluster manager. After lunch, I went to his desk to get more details.
As we can see under box created around Nodes, this was only with one node.
When we look at cluster logs, we see below the messages.
========B02===========
00000648.00002464::2016/11/29-08:58:45.173 INFO [FTI][Initiator] This node (1) is initiator
00000648.00002464::2016/11/29-08:58:45.173 WARN [FTI][Initiator] Ignoring duplicate connection: usable route already exists
00000648.00002464::2016/11/29-08:58:45.173 INFO [CHANNEL 147.170.123.251:~3343~] graceful close, status (of previous failure, may not indicate problem) ERROR_SUCCESS(0)
00000648.00002464::2016/11/29-08:58:45.174 WARN cxl::ConnectWorker::operator (): GracefulClose(1226)’ because of ‘channel to remote endpoint 147.170.123.251:~3343~ is closed’
========B01============
00004090.00005db0::2016/11/29-08:58:45.157 INFO [FTI][Follower] This node (2) is not the initiator
00004090.00005db0::2016/11/29-08:58:45.157 DBG [FTI] Stream already exists to node 1: false
00004090.00005db0::2016/11/29-08:58:45.157 DBG [CHANNEL 147.170.123.252:~54783~] Close().
00004090.00005db0::2016/11/29-08:58:45.157 INFO [CHANNEL 147.170.123.252:~54783~] graceful close, status (of previous failure, may not indicate problem) ERROR_SUCCESS(0)
00004090.00005db0::2016/11/29-08:58:45.157 INFO [CORE] Node 2: Clearing cookie 63cfe37d-42be-4211-8cd8-6db6b3344b52
00004090.00005db0::2016/11/29-08:58:45.157 DBG [CHANNEL 147.170.123.252:~54783~] Not closing handle because it is invalid.
00004090.00005db0::2016/11/29-08:58:45.157 WARN mscs::ListenerWorker::operator (): GracefulClose(1226)’ because of ‘channel to remote endpoint 147.170.123.252:~54783~ is closed’
Based on cluster logs and highlighted message “Ignoring duplicate connection: usable route already exists”, we can say that this issue is caused due to stale information on network from rejecting node.
The only solution to fix the error was to reboot the active node.
I search on internet and found that this could be because of real network issue, some antivirus software as well. So, if above message is not shown in cluster log, then you can search further. Please share the solution if you find.
Reference: Pinal Dave (https://blog.sqlauthority.com)
5 Comments. Leave new
Might want to apply the windows service packs to, leaving a machine unpatched is a significant security risk!
Good suggestion.
Hello,
We got the same issue kind of issue and I worked with Microsoft for the same. What you have to do is, you need to uninstall Antivirus on both the nodes. Try to uninstall Antivirus.
That could be another reason.
Just had this happen with a WSFC File Server cluster. For me it was due to having two NICs whose subnets overlapped but one was more specific than the other. The more specific one has no gateway and is used for replication traffic in the same subnet only. It is marked as having no cluster use, but it seems Windows was using it anyway for communicating with the other node as it is the more specific route and takes precedence by default.
As this replication NIC has no gateway, Windows treats it as an unidentified network and therefore Windows Firewall was dropping the connections . This didn’t occur initially but only after a reboot which only added to the confusion.
Luckily this isn’t a production setup yet, the reboot was an induced bugcheck to test whether the clustering was stable.