In this post, let’s delve a bit more in depth regarding wait stats.
The very first question: when do the wait stats occur?
Here is the simple answer. When SQL Server is executing any task, and if for any reason it has to wait for resources to execute the task, this wait is recorded by SQL Server with the reason for the delay. Later on we can analyze these wait stats to understand the reason the task was delayed and maybe we can eliminate the wait for SQL Server. It is not always possible to remove the wait type 100%, but there are few suggestions that can help.
Before we continue learning about wait types and wait stats, we need to understand three important milestones of the query life-cycle.
Running – a query which is being executed on a CPU is called a running query. This query is responsible for CPU time.
Runnable – a query which is ready to execute and waiting for its turn to run is called a runnable query. This query is responsible for Signal Wait time. (In other words, the query is ready to run but CPU is servicing another query).
Suspended – a query which is waiting due to any reason (to know the reason, we are learning wait stats) to be converted to runnable is suspended query. This query is responsible for wait time. (In other words, this is the time we are trying to reduce).
In simple words, query execution time is a summation of the query Executing CPU Time (Running) + Query Wait Time (Suspended) + Query Signal Wait Time (Runnable). Again, it may be possible a query goes to all these stats multiple times.
Let us try to understand the whole thing with a simple analogy of a taxi and a passenger.
Two friends, Tom and Danny, go to the mall together. When they leave the mall, they decide to take a taxi. Tom and Danny both stand in the line waiting for their turn to get into the taxi. This is the Signal Wait Time as they are ready to get into the taxi but the taxis are currently serving other customer and they have to wait for their turn. In other word they are in a runnable state.
Now when it is their turn to get into the taxi, the taxi driver informs them he does not take credit cards and only cash is accepted. Neither Tom nor Danny have enough cash, they both cannot get into the vehicle. Tom waits outside in the queue and Danny goes to ATM to fetch the cash. During this time the taxi cannot wait, they have to let other passengers get into the taxi. As Tom and Danny both are outside in the queue, this is the Query Wait Time and they are in the suspended state. They cannot do anything till they get the cash.
Once Danny gets the cash, they are both standing in the line again, creating one more Signal Wait Time. This time when their turn comes they can pay the taxi driver in cash and reach their destination. The time taken for the taxi to get from the mall to the destination is running time (CPU time) and the taxi is running.
I hope this analogy is bit clear with the wait stats. You can check the Signalwait stats using following query of Glenn Berry.
-- Signal Waits for instance
SELECT CAST(100.0 * SUM(signal_wait_time_ms) / SUM (wait_time_ms) AS NUMERIC(20,2))
AS [%signal (cpu) waits],
CAST(100.0 * SUM(wait_time_ms - signal_wait_time_ms) / SUM (wait_time_ms) AS NUMERIC(20,2))
AS [%resource waits] FROM sys.dm_os_wait_stats OPTION (RECOMPILE);
Higher the Signal wait stats are not good for the system. Very high value indicates CPU pressure. In my experience, when systems are running smooth and without any glitch the Signal wait stat is lower than 20%. Again, this number can be debated (and it is from my experience and is not documented anywhere). In other words, lower is better and higher is not good for the system.
In future articles we will discuss in detail the various wait types and wait stats and their resolution.
Read all the post in the Wait Types and Queue series.
Reference: Pinal Dave (https://blog.sqlauthority.com)
Great article, Pinal!
Think about it this way: a database system is much more complex than the SQL Server by itself. The reason being is that there are other components involved in “the big picture”.
Wat I am trying to say is that the internal waits, which are the topic of this article, are just a small part of the game, since the database system depends on the client system, networking, error handling, and so on. In other words, if you have a client application which sends adhoc queries of 80kb each 100 times per minute (trust me, I have seen this), then you will depend on the network, NIC configuration, internal SQL parsing, processing, plan generation, query execution, and if the dataset returned is large, then you will also depend on the NIC and the network.
Unfortunately, SQL Server is not too aware of other wait stats, aside from its own internal ones. (well, there are some network async IO stats, but it does not say who know how much about the cause of the problem)
The bottom line: when we talk about waits, keep in mind the “big picture”.
Excellent analogy! Though if it were me, I might have waited for a cab that did accept credit cards :) Makes sense that lower single wait stats would be better for the system.
Excelent analogy. Thanks for expaining this complex topic.
Mr. Pinal Dave wait stats articles very useful for me, it really great.
I executed on of my production server it looks like somewhat good in CPU waits stats, and resource waits is very high.
Please tell me how to reduce resource wait time on server.
%signal (cpu) waits – 22.25
%resource waits – 77.75
Im an sql amateur and was just wondering where do i find or how do i create this sys.dm_os_wait_stats table, cause if i run this, it tells me procedure does not exist.
Thank u in advance
I just started using your query recently to monitor CPU pressure. After seeing what looked like spikes in signal wait time, I added some more detail to your query to show the magnitude of the percentages shown in the query. After seeing this data, I realized that what appeared to be spikes in signal wait time could be ignored.
Note, I have been collecting wait stats data hourly on all SQL Servers and then reporting the CPU pressure once per day for successive samples over the last 24 hours.
SUM(s2.signal_wait_time_ms – s1.signal_wait_time_ms) as signal_wait_time_ms_diff
, SUM((s2.wait_time_ms – s1.wait_time_ms) – (s2.signal_wait_time_ms – s1.signal_wait_time_ms)) as resource_wait_time_ms_diff
Nice write Pinal, thanks.
Now my wait_type is Async_Network_Io and it is in suspended state. Any idea what shuld be the quick resolution? even after clearing the proc cache? Thanks
In a perfect world, we wouldn’t need to wait for anything, but the world isn’t perfect and as long as we can plan for waits, then we can hopefully ensure consistent throughput.
Dave, an excellent write-up and thank you
Excellent, thanks :)
Excellent, thanks :)
You are saying that “lower is better and higher is not good for the system.”
what we do if we get higher value which indicate CPU pressure, How to take control over it ?
Analogy was excellent and amazing. With this, can understand the complex topic very easily