A few days ago, I wrote a blog about log shipping job failure due to missing permission. In this blog, I would share my experience with another log shipping error “Connection Timeout Expired” for which my client contacted me via Comprehensive Database Performance Health Check. As per them restore job was failing with errors. Let us learn how to solve Log Shipping Job Error: Failed to Connect to Server.
Job Name: LSRestore_<details>
Step Name: Log shipping restore log job step.
<DateTime>*** Error: Could not retrieve restore settings.(Microsoft.SqlServer.Management.LogShipping) ***
<DateTime>*** Error: Failed to connect to server ServerName.(Microsoft.SqlServer.ConnectionInfo) ***
<DateTime>*** Error: Connection Timeout Expired. The timeout period elapsed while attempting to consume the pre-login handshake acknowledgement. This could be because the pre-login handshake failed or the server was unable to respond back in time. The duration spent while attempting to connect to this server was – [Pre-Login] initialization=17708; handshake=15341; (.Net SqlClient Data Provider) ***
<DateTime>*** Error: The wait operation timed out() ***
<DateTime>—– END OF TRANSACTION LOG RESTORE —–
There are few interesting things to note here.
- The total job duration is 23 seconds.
- Error is “Connection Timeout Expired”
- Total time in initialization and handshake = 17708 + 15341 = 33049 which is 33 seconds.
- The time taken was due to a pre-login handshake.
There could be multiple reasons for the login process taking the time. It might be due to a busy server or slow domain controller communication or slow network. In short, these errors in the log shipping job indicate that sqllogship.exe is trying to connect to SQL Server and eventually timing out because the connection took longer than the 30 seconds (default value) interval. I observed that my client had too many databases configured for Log shipping and all the jobs were running at the same time. So, I asked them to stagger the jobs so that all are not running at the same time.
Another possible workaround would change the login/connection timeout and set a higher value. When we edit these, we can find that the copy and restore jobs call sqllogship.exe. To avoid the connection timeout errors, we edited this and appended an additional parameter “-logintimeout 60” (without quotes) to the job.
This will cause the job to wait for 60 seconds before timing out connecting to the server. This is a documented option in sqllogship.exe.
After adding this parameter, the issue got resolved and the job ran successfully.
Reference: Pinal Dave (https://blog.sqlauthority.com)