It is very easy to say that you replace your hardware as that is not up to the mark. In reality, it is very difficult to implement. It is really hard to convince an infrastructure team to change any hardware because they are not performing at their best.
I had a nightmare related to this issue in a deal with an infrastructure team as I suggested that they replace their faulty hardware. This is because they were initially not accepting the fact that it is the fault of their hardware. But it is really easy to say “Trust me, I am correct”, while it is equally important that you put some logical reasoning along with this statement.
PAGEIOLATCH_XX is such a kind of those wait stats that we would directly like to blame on the underlying subsystem. Of course, most of the time, it is correct – the underlying subsystem is usually the problem.
From Book On-Line:
Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Destroy mode. Long waits may indicate problems with the disk subsystem.
Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Exclusive mode. Long waits may indicate problems with the disk subsystem.
Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Keep mode. Long waits may indicate problems with the disk subsystem.
Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Shared mode. Long waits may indicate problems with the disk subsystem.
Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Update mode. Long waits may indicate problems with the disk subsystem.
Simply put, this particular wait type occurs when any of the tasks is waiting for data from the disk to move to the buffer cache.
Just like any other wait type, this is again a very challenging and interesting subject to resolve. Here are a few things you can experiment on:
- Improve your IO subsystem speed (read the first paragraph of this article, if you have not read it, I repeat that it is easy to say a step like this than to actually implement or do it).
- This type of wait stats can also happen due to memory pressure or any other memory issues. Putting aside the issue of a faulty IO subsystem, this wait type warrants proper analysis of the memory counters. If due to any reasons, the memory is not optimal and unable to receive the IO data. This situation can create this kind of wait type.
- Proper placing of files is very important. We should check file system for the proper placement of files – LDF and MDF on separate drive, TempDB on separate drive, hot spot tables on separate filegroup (and on separate disk), etc.
- Check the File Statistics and see if there is higher IO Read and IO Write Stall SQL SERVER – Get File Statistics Using fn_virtualfilestats.
- It is very possible that there are no proper indexes on the system and there are lots of table scans and heap scans. Creating proper index can reduce the IO bandwidth considerably. If SQL Server can use appropriate cover index instead of clustered index, it can significantly reduce lots of CPU, Memory and IO (considering cover index has much lesser columns than cluster table and all other it depends conditions). You can refer to the two articles’ links below previously written by me that talk about how to optimize indexes.
- Updating statistics can help the Query Optimizer to render optimal plan, which can only be either directly or indirectly. I have seen that updating statistics with full scan (again, if your database is huge and you cannot do this – never mind!) can provide optimal information to SQL Server optimizer leading to efficient plan.
- Checking Memory Related Perfmon Counters
- SQLServer: Memory Manager\Memory Grants Pending (Consistent higher value than 0-2)
- SQLServer: Memory Manager\Memory Grants Outstanding (Consistent higher value, Benchmark)
- SQLServer: Buffer Manager\Buffer Hit Cache Ratio (Higher is better, greater than 90% for usually smooth running system)
- SQLServer: Buffer Manager\Page Life Expectancy (Consistent lower value than 300 seconds)
- Memory: Available Mbytes (Information only)
- Memory: Page Faults/sec (Benchmark only)
- Memory: Pages/sec (Benchmark only)
- Checking Disk Related Perfmon Counters
- Average Disk sec/Read (Consistent higher value than 4-8 millisecond is not good)
- Average Disk sec/Write (Consistent higher value than 4-8 millisecond is not good)
- Average Disk Read/Write Queue Length (Consistent higher value than benchmark is not good)
Read all the post in the Wait Types and Queue series.
Note: The information presented here is from my experience and there is no way that I claim it to be accurate. I suggest reading Book OnLine for further clarification. All of the discussions of Wait Stats in this blog is generic and varies from system to system. It is recommended that you test this on a development server before implementing it to a production server.
Reference: Pinal Dave (http://blog.SQLAuthority.com)