One of my clients contacted me to give advice about an error. In this blog, we would discuss error message – A read of the file at offset succeeded after failing 1 time(s)
Here is the complete error message they were seeing while all of the scheduled jobs in the SQL Server Agent were failing due to the IO errors.
Message. A read of the file ‘K:\MSSQL14.MSSQLSERVER\MSSQL\Data\HR_NCIDX.ndf’ at offset 0x000002a84dd000 succeeded after failing 1 time(s) with error: 21(failed to retrieve text for this error. Reason: 15105). Additional messages in the SQL Server error log and system event log may provide more detail. This error condition threatens database integrity and must be corrected. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
Whenever I see such a message, I always ask to make sure that hardware is healthy. In the error message, we also see the error: 21. What does that mean?
Most of the cases you should be able to use NET HELPMSG command to convert the number to the text. I have written this trick in an earlier blog.
So, error 21 means “The device is not ready”.
Based on the error message, it is clear that in this situation there are definitely issue with hardware hosting K drive (in the error message we have the path of the file). This is also mentioned in Microsoft documentation – This message indicates that the read operation had to be reissued at least one time and indicates a major problem with the disk hardware.
I recommended my client to move ALL the file to new physical drive (on a new hardware LUN). The quick way was to do below.
- Add a new Drive (let’s call it as X)
- Create the same folder structure as on the K drive.
- Stop SQL Service.
- Move files from K drive to X drive.
- Swap Drive letters (K > Y, X > K, Y > X)
- Start the SQL Service. (you might get permission errors so set permissions accordingly)
I also asked them to engage hardware vendor and check the health of hardware. This is what they have replied.
It was indeed a storage issue. After we have moved all our data new LUNs, all the jobs are running fine now. Thanks for quick turnaround time and your immediate help!
Such emails are so lovely to read.
Have you seen such error in ERRORLOGs? If yes, please take action right now before its too late.
Reference: Pinal Dave (https://blog.sqlauthority.com)