Hello Geeks and welcome to the Day 17 of the long series of One DMV a day. In this series of blogs I will be exploring and explaining the most useful DMVs in SQL Server. As we go ahead in this series we will also talk about the usage, linking between DMVs and some scenarios where these DMVs will be helpful while you are using SQL Server. For the complete list in the series please click here.
Today I am going to cover one of the most famous DMV of all, sys.dm_os_wait_stats. This DMV has solved so many cases and gives a lot of information which just needs to be read and used correctly to your benefit.
In SQL Server if a task is not completed and not running it should be waiting. First question to answer before you even start troubleshooting is – What is the resource it is waiting on? In an ideal world you will have once procedure/query or application which is slow. In such cases you will use another DMV which I am going to cover tomorrow.
But, all my DBA friends who work on chaotic Production servers, we do not live in an ideal world. đ Every issue we get is server level and all the applications are affected. Most of our Production servers are designed to host multiple applications or multiple functionalities in sites.
So to start with we need to identify if the problem is at the server level. Sys.dm_os_wait_stats is the right place to start our troubleshooting. This helps you identify what is the highest waits on my SQL Server. The straight forward output of this DMV doesnât make much sense as the stats in this DMV are accumulated over time. They get flushed when a SQL instance is restarted. So to get the reliable and relevant details from sys.dm_os_wait_stats you need to get the delta of waits for a significant time frame.
There are ways to flush the data out of sys.dm_os_wait_stats without restarting SQL Server by running below script. But I warn you that this is a very bad practice. You can get the delta waits difference without flushing the data out of this DMV.
DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR)
The below query pulls out the delta waits for one minute from sys.dm_os_wait_stats. You can confirm based on the output if the problem is with signal waits or any wait type. Dealing with each wait type is a huge topic in itself. I would recommend to follow MS whitepaper on waits. I will be filtering on the wait types from the CSS repository.
CREATE TABLE #CSS_Waits_Repository (wait_type NVARCHAR(100)); INSERT INTO #CSS_Waits_Repository VALUES ('ASYNC_IO_COMPLETION'); INSERT INTO #CSS_Waits_Repository VALUES ('CHECKPOINT_QUEUE'); INSERT INTO #CSS_Waits_Repository VALUES ('CHKPT'); INSERT INTO #CSS_Waits_Repository VALUES ('CXPACKET'); INSERT INTO #CSS_Waits_Repository VALUES ('DISKIO_SUSPEND'); INSERT INTO #CSS_Waits_Repository VALUES ('FT_IFTS_SCHEDULER_IDLE_WAIT'); INSERT INTO #CSS_Waits_Repository VALUES ('IO_COMPLETION'); INSERT INTO #CSS_Waits_Repository VALUES ('KSOURCE_WAKEUP'); INSERT INTO #CSS_Waits_Repository VALUES ('LAZYWRITER_SLEEP'); INSERT INTO #CSS_Waits_Repository VALUES ('LOGBUFFER'); INSERT INTO #CSS_Waits_Repository VALUES ('LOGMGR_QUEUE'); INSERT INTO #CSS_Waits_Repository VALUES ('MISCELLANEOUS'); INSERT INTO #CSS_Waits_Repository VALUES ('PREEMPTIVE_XXX'); INSERT INTO #CSS_Waits_Repository VALUES ('REQUEST_FOR_DEADLOCK_SEARCH'); INSERT INTO #CSS_Waits_Repository VALUES ('RESOURCE_QUERY_SEMAPHORE_COMPILE'); INSERT INTO #CSS_Waits_Repository VALUES ('RESOURCE_SEMAPHORE'); INSERT INTO #CSS_Waits_Repository VALUES ('SOS_SCHEDULER_YIELD'); INSERT INTO #CSS_Waits_Repository VALUES ('SQLTRACE_BUFFER_FLUSHÂ '); INSERT INTO #CSS_Waits_Repository VALUES ('THREADPOOL'); INSERT INTO #CSS_Waits_Repository VALUES ('WRITELOG'); INSERT INTO #CSS_Waits_Repository VALUES ('XE_DISPATCHER_WAIT'); INSERT INTO #CSS_Waits_Repository VALUES ('XE_TIMER_EVENT'); DECLARE @Waits TABLE ( WaitID INT IDENTITY(1, 1) not null PRIMARY KEY, wait_type NVARCHAR(60), wait_time_s DECIMAL(12, 2), resources_wait_s DECIMAL(12, 2), signal_wait_s decimal(12, 2)); WITH Waits AS (SELECT wait_type, wait_time_ms / 1000. AS wait_time_s, 100. * wait_time_ms / SUM(wait_time_ms) OVER() AS pct, (wait_time_ms - signal_wait_time_ms)/1000. AS resources_wait_s, signal_wait_time_ms/1000. AS signal_wait_s, ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn FROM sys.dm_os_wait_stats WHERE wait_type IN(SELECT wait_type FROM #CSS_Waits_Repository)) -- filter out additional irrelevant waits INSERT INTO @Waits (wait_type, wait_time_s, resources_wait_s, signal_wait_s) SELECT W1.wait_type, CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s, CAST(W1.resources_wait_s AS DECIMAL(12, 2)) AS resources_wait_s, CAST(W1.signal_wait_s AS DECIMAL(12, 2)) AS signal_wait_s FROM Waits AS W1 INNER JOIN Waits AS W2 ON W2.rn <= W1.rn GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.resources_wait_s, W1.signal_wait_s, W1.pct HAVING SUM(W2.pct) - W1.pct < 95; -- percentage threshold WAITFOR DELAY '00:01:00'; WITH Waits AS (SELECT wait_type, wait_time_ms / 1000. AS wait_time_s, 100. * wait_time_ms / SUM(wait_time_ms) OVER() AS pct, (wait_time_ms - signal_wait_time_ms)/1000. AS resources_wait_s, signal_wait_time_ms/1000. AS signal_wait_s, ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn FROM sys.dm_os_wait_stats WHERE wait_type NOT IN(SELECT wait_type FROM #CSS_Waits_Repository)) -- filter out additional irrelevant waits INSERT INTO @Waits (wait_type, wait_time_s, resources_wait_s, signal_wait_s) SELECT W1.wait_type, CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s, CAST(W1.resources_wait_s AS DECIMAL(12, 2)) AS resources_wait_s, CAST(W1.signal_wait_s AS DECIMAL(12, 2)) AS signal_wait_s FROM Waits AS W1 INNER JOIN Waits AS W2 ON W2.rn <= W1.rn GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.resources_wait_s, W1.signal_wait_s, W1.pct HAVING SUM(W2.pct) - W1.pct < 95; -- percentage threshold SELECT wait_type, MAX(wait_time_s) - MIN(wait_time_s) WaitDelta_total, MAX(resources_wait_s) - MIN(resources_wait_s) WaitDelta_resource, MAX(signal_wait_s) - MIN(signal_wait_s) WaitDelta_signal FROM @Waits GROUP BY wait_Type ORDER BY WaitDelta_total Desc
Based on the above output we can take necessary action based on the wait types. As I have already mentioned in the beginning of the post, sys.dm_os_wait_stats can only guide you get the pain area. I have showed you how to use this DMV to get the details. Troubleshooting based on each wait type will be a long series. I hope someone would be starting that soon on SQL Server Geeks. đ If not I will do it some time within an years time from now. đ
Tomorrow I will talk about another OS related DMV. So, stay tuned. Till then
Happy Learning,
Manu
Like us on FaceBook | Join the fastest growing SQL Server group on FaceBook |
Follow me on Twitter | Follow me on FaceBook