SQL SERVER – Script: Current IO Related Waits on SQL Server

Playing around with DMVs is so much fun. Many a times we need little inspiration and a problem in hand to solve. Out of many things I monitor, I have personally seen folks having trouble in finding what is going wrong when the server turns slow. In a recent conversation with a friend who was troubleshooting a SQL Server instance running inside a VM on Azure, he was not sure what suddenly went wrong and was curious to know what is going on LIVE with his SQL Server.

This was an interesting question and I asked what does he want to monitor. From his experience he was quick to bounce and let me know it was around IO. I went back into my handy little scripts folder to find what I had. I share the below script which was used for this problem.

/* sys.dm_io_pending_io_requests : It is important to recognize that these are current active waits that are I/O related. */
SELECT f.database_id, f.[file_id], DB_NAME(f.database_id) AS database_name, f.name AS logical_file_name, f.type_desc,
CAST (CASE
-- Handle UNC paths (e.g. '\\fileserver\DBs\readonly_dw.ndf')
WHEN LEFT (LTRIM (f.physical_name), 2) = '\\'
THEN LEFT (LTRIM (f.physical_name),CHARINDEX('\',LTRIM(f.physical_name),CHARINDEX('\',LTRIM(f.physical_name), 3) + 1) - 1)
-- Handle local paths (e.g. 'C:\Program Files\...\master.mdf')
WHEN CHARINDEX('\', LTRIM(f.physical_name), 3) > 0
THEN UPPER(LEFT(LTRIM(f.physical_name), CHARINDEX ('\', LTRIM(f.physical_name), 3) - 1))
ELSE f.physical_name
END AS NVARCHAR(255)) AS logical_disk,
fs.io_stall/1000/60 AS io_stall_min,
fs.io_stall_read_ms/1000/60 AS io_stall_read_min,
fs.io_stall_write_ms/1000/60 AS io_stall_write_min,
(
fs.io_stall_read_ms / (1.0 + fs.num_of_reads)) AS avg_read_latency_ms,
(
fs.io_stall_write_ms / (1.0 + fs.num_of_writes)) AS avg_write_latency_ms,
((
fs.io_stall_read_ms/1000/60)*100)/(CASE WHEN fs.io_stall/1000/60 = 0 THEN 1 ELSE fs.io_stall/1000/60 END) AS io_stall_read_pct,
((
fs.io_stall_write_ms/1000/60)*100)/(CASE WHEN fs.io_stall/1000/60 = 0 THEN 1 ELSE fs.io_stall/1000/60 END) AS io_stall_write_pct,
ABS((fs.sample_ms/1000)/60/60) AS 'sample_HH',
((
fs.io_stall/1000/60)*100)/(ABS((fs.sample_ms/1000)/60))AS 'io_stall_pct_of_overall_sample',
PIO.io_pending_ms_ticks,
PIO.scheduler_address
FROM sys.dm_io_pending_io_requests AS PIO
INNER JOIN sys.dm_io_virtual_file_stats (NULL,NULL) AS fs
ON fs.file_handle = PIO.io_handle
INNER JOIN sys.master_files AS f
ON fs.database_id = f.database_id AND fs.[file_id] = f.[file_id]

 

Also a small extension to this script includes the addition of os_schedulers to know if there is any skew in usage. So the same query is modified to include the same. I know it is just an extension, but thought to add it here in a single location for completeness.

/* The query below can be used to determine if pending IO is causing work to be queued on the scheduler. */
SELECT f.database_id, f.[file_id], DB_NAME(f.database_id) AS database_name, f.name AS logical_file_name, f.type_desc,
CAST (CASE
-- Handle UNC paths (e.g. '\\fileserver\DBs\readonly_dw.ndf')
WHEN LEFT (LTRIM (f.physical_name), 2) = '\\'
THEN LEFT (LTRIM (f.physical_name),CHARINDEX('\',LTRIM(f.physical_name),CHARINDEX('\',LTRIM(f.physical_name), 3) + 1) - 1)
-- Handle local paths (e.g. 'C:\Program Files\...\master.mdf')
WHEN CHARINDEX('\', LTRIM(f.physical_name), 3) > 0
THEN UPPER(LEFT(LTRIM(f.physical_name), CHARINDEX ('\', LTRIM(f.physical_name), 3) - 1))
ELSE f.physical_name
END AS NVARCHAR(255)) AS logical_disk,
fs.io_stall/1000/60 AS io_stall_min,
fs.io_stall_read_ms/1000/60 AS io_stall_read_min,
fs.io_stall_write_ms/1000/60 AS io_stall_write_min,
(
fs.io_stall_read_ms / (1.0 + fs.num_of_reads)) AS avg_read_latency_ms,
(
fs.io_stall_write_ms / (1.0 + fs.num_of_writes)) AS avg_write_latency_ms,
((
fs.io_stall_read_ms/1000/60)*100)/(CASE WHEN fs.io_stall/1000/60 = 0 THEN 1 ELSE fs.io_stall/1000/60 END) AS io_stall_read_pct,
((
fs.io_stall_write_ms/1000/60)*100)/(CASE WHEN fs.io_stall/1000/60 = 0 THEN 1 ELSE fs.io_stall/1000/60 END) AS io_stall_write_pct,
ABS((fs.sample_ms/1000)/60/60) AS 'sample_HH',
((
fs.io_stall/1000/60)*100)/(ABS((fs.sample_ms/1000)/60))AS 'io_stall_pct_of_overall_sample',
PIO.io_pending_ms_ticks,
PIO.scheduler_address,
os.scheduler_id,
os.pending_disk_io_count,
os.work_queue_count
FROM sys.dm_io_pending_io_requests AS PIO
INNER JOIN sys.dm_io_virtual_file_stats (NULL,NULL) AS fs
ON fs.file_handle = PIO.io_handle
INNER JOIN sys.master_files AS f
ON fs.database_id = f.database_id AND fs.[file_id] = f.[file_id] INNER JOIN sys.dm_os_schedulers AS os
ON PIO.scheduler_address = os.scheduler_address

Though this is a simple script that I have used. Do let me know, which component of SQL Server is of importance for you? Has IO been a point of problem for you anytime? Have you used scripts similar to this in your environments? Do let me know.

Reference: Pinal Dave (https://blog.sqlauthority.com)

Previous Post
SQL SERVER – Backup Failure – Error: 3203, Severity: 16, State: 1
Next Post
Interview Question of the Week #019 – How to Reset Identity of Table

Related Posts

4 Comments. Leave new

  • Nick winstanley
    May 10, 2015 11:43 am

    Thanks for this great scripts. DMV contain so much info it’s just a matter of getting it out in a usable format. I’ve used them to show log file contention in tempdb and prove improper settings for it.

    Reply
  • First – thank you.
    I am still running this on different servers. Expected to see write latency on log drive to be much less than data drives but they are pretty close. That surprised me as log files are sequential.

    Also here is one table for saving the information for later comparison:
    create table IOHist (
    RunDate datetime NOT NULL DEFAULT GETDATE(),
    database_id int,
    file_id int, database_name char(12),
    logical_file_name char(20),
    type_desc char(5),
    logical_disk char(4),io_stall_min int,
    io_stall_read_min int,
    io_stall_write_min int,
    avg_read_latency_ms real,
    avg_write_latency_ms real,
    io_stall_read_pct int,
    io_stall_write_pct int,
    sample_HH int,
    io_stall_pct_of_overall_sample int,
    io_pending_ms_ticks int,
    scheduler_address int)
    Also add ‘ (select GETDATE()) ” to the select. For me GetDate had to be a subselect to get multiple rows, which I believe is right though there may be a better way of doing it.

    Reply

Leave a Reply

Menu