SQL SERVER – Delayed Durability, or The Story of How to Speed Up Autotests From 11 to 2.5 Minutes

This is one of the most interesting stories written by my friend Syrovatchenko Sergey. He is an expert on SQL Server and works at Devart. Just like me he also shares his passion for Wait Stats and new features of the SQL Server. In this blog post he talks about one of the most interesting feature about Delayed Durability. I strongly encourage that you find sometime during your day to read this blog post and discover more about this topic.


I’ve recently started helping with a new project TMetric, which is developed as a free web-service for tracking working hours. The technology stack was originally selected to be Microsoft with SQL Server 2014 as data repository. One of the first tasks assigned to me was to study the opportunity to accelerate auto-tests.

Before I got into gear, the project had existed for a long time and had gathered a fair number of tests (at that time I reckoned 1300 of auto-tests). On a build machine with SSD, tests ran for 4-5 minutes, and on a HDD – not more than 11-12 minutes. The whole team could be equipped with SSD, but the essence of the problem would not be solved. Especially that they were soon planning to expand the functionality and the number of tests would become even greater.

All tests were grouped, and before running each of the groups, the old data were purged from the database. Previously, purging was performed by recreating the database, but this approach proved to be very slow in practice. It would be much faster just to clean all the tables from data and reset the IDENTITY value to zero, so that future inserts would form correct test data. So, my starting point was the script with such an approach:

EXEC sys.sp_msforeachtable 'ALTER TABLE ? NOCHECK CONSTRAINT ALL' 
DELETE FROM [dbo].[Project] 
DBCC CHECKIDENT('[dbo].[Project]', RESEED, 0) 
DBCC CHECKIDENT('[dbo].[Project]', RESEED) 
DELETE FROM [dbo].[RecentWorkTask] 
... 
EXEC sys.sp_msforeachtable 'ALTER TABLE ? WITH CHECK CHECK CONSTRAINT ALL' 

As such, an idea came straight into my mind to use dynamic SQL to generate a query. For example, if a table has foreign keys, then use the DELETE operation as before. Otherwise, you can delete data with minimal logging using the TRUNCATE command.

As a result, the query for data deletion will look as follows:

DECLARE @SQL NVARCHAR(MAX)
      , @FK_TurnOff NVARCHAR(MAX)
      , @FK_TurnOn NVARCHAR(MAX)

SELECT @SQL = (
    SELECT CHAR(13) + CHAR(10) +
        IIF(p.[rows] > 0,
            IIF(t2.referenced_object_id IS NULL, N'TRUNCATE TABLE ', N'DELETE FROM ') + obj_name,
            ''
        ) + CHAR(13) + CHAR(10) +
        IIF(IdentityProperty(t.[object_id], 'LastValue') > 0,
            N'DBCC CHECKIDENT('''+ obj_name + N''', RESEED, 0) WITH NO_INFOMSGS',
            ''
        )
    FROM (
        SELECT obj_name = QUOTENAME(s.name) + '.' + QUOTENAME(o.name), o.[object_id]
        FROM sys.objects o
        JOIN sys.schemas s ON o.[schema_id] = s.[schema_id]
        WHERE o.is_ms_shipped = 0
            AND o.[type] = 'U'
            AND o.name != N'__MigrationHistory'
    ) t
    JOIN sys.partitions p ON t.[object_id] = p.[object_id] AND p.index_id IN (0, 1)
    LEFT JOIN (
        SELECT DISTINCT f.referenced_object_id
        FROM sys.foreign_keys f
    ) t2 ON t2.referenced_object_id = t.[object_id]
    FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)')

SELECT @FK_TurnOff = CAST(x.query('off/text()') AS NVARCHAR(MAX))
     , @FK_TurnOn = CAST(x.query('on/text()') AS NVARCHAR(MAX))
FROM (
    SELECT [off] = CHAR(10) + 'ALTER TABLE ' + obj + ' NOCHECK CONSTRAINT ' + fk
         , [on] = CHAR(10) + 'ALTER TABLE ' + obj + ' CHECK CONSTRAINT ' + fk
    FROM (
        SELECT fk = QUOTENAME(f.name)
             , obj = QUOTENAME(SCHEMA_NAME(f.[schema_id])) + '.' + QUOTENAME(OBJECT_NAME(f.parent_object_id))
        FROM sys.foreign_keys f
        WHERE f.delete_referential_action = 0
            AND EXISTS(
                    SELECT *
                    FROM sys.partitions p
                    WHERE p.[object_id] = f.parent_object_id
                        AND p.[rows] > 0
                        AND p.index_id IN (0, 1)
                )
    ) t
    FOR XML PATH(''), TYPE
) t(x)

IF @SQL LIKE '%[a-z]%' BEGIN

    SET @SQL = ISNULL(@FK_TurnOff, '') + @SQL + ISNULL(@FK_TurnOn, '')

    PRINT @SQL
    --EXEC sys.sp_executesql @SQL

END

Initially, the auto-tests ran for 11 minutes on my machine:

But after I rewrote the query, all tests began to run 40 seconds faster:

Of course, I could be happy about it and set resolved status for the task, but the basic problem remained:

The disk was heavily loaded when tests were executed. I decided to see what waits are on the server. To do this, I first cleared sys.dm_os_wait_stats:

DBCC SQLPERF("sys.dm_os_wait_stats", CLEAR)

I ran autotests once again and then executed the query:

SELECT TOP(20)
      wait_type
    , wait_time = wait_time_ms / 1000.
    , wait_resource = (wait_time_ms - signal_wait_time_ms) / 1000.
    , wait_signal = signal_wait_time_ms / 1000.
    , waiting_tasks_count
    , percentage = 100.0 * wait_time_ms / SUM(wait_time_ms) OVER ()
    , avg_wait = wait_time_ms / 1000. / waiting_tasks_count
    , avg_wait_resource = (wait_time_ms - signal_wait_time_ms) / 1000. / [waiting_tasks_count]
    , avg_wait_signal = signal_wait_time_ms / 1000.0 / waiting_tasks_count
FROM sys.dm_os_wait_stats
WHERE [waiting_tasks_count] > 0
    AND max_wait_time_ms > 0
    AND [wait_type] NOT IN (
        N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR',
        N'BROKER_TASK_STOP', N'BROKER_TO_FLUSH',
        N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE',
        N'CHKPT', N'CLR_AUTO_EVENT',
        N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE',
        N'DBMIRROR_DBM_EVENT', N'DBMIRROR_EVENTS_QUEUE',
        N'DBMIRROR_WORKER_QUEUE', N'DBMIRRORING_CMD',
        N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE',
        N'EXECSYNC', N'FSAGENT',
        N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX',
        N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
        N'HADR_LOGCAPTURE_WAIT', N'HADR_NOTIFICATION_DEQUEUE',
        N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE',
        N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP',
        N'LOGMGR_QUEUE', N'ONDEMAND_TASK_QUEUE',
        N'PWAIT_ALL_COMPONENTS_INITIALIZED',
        N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',
        N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
        N'REQUEST_FOR_DEADLOCK_SEARCH', N'RESOURCE_QUEUE',
        N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH',
        N'SLEEP_DBSTARTUP', N'SLEEP_DCOMSTARTUP',
        N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY',
        N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP',
        N'SLEEP_SYSTEMTASK', N'SLEEP_TASK',
        N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT',
        N'SP_SERVER_DIAGNOSTICS_SLEEP', N'SQLTRACE_BUFFER_FLUSH',
        N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
        N'SQLTRACE_WAIT_ENTRIES', N'WAIT_FOR_RESULTS',
        N'WAITFOR', N'WAITFOR_TASKSHUTDOWN',
        N'WAIT_XTP_HOST_WAIT', N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG',
        N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN',
        N'XE_DISPATCHER_WAIT', N'XE_TIMER_EVENT'
    )
ORDER BY [wait_time_ms] DESC

The biggest delay occurs with WRITELOG.

wait_type wait_time waiting_tasks_count percentage
———————– ———— ——————– ———–
WRITELOG 546.798 60261 96.07
PAGEIOLATCH_EX 13.151 96 2.31
PAGELATCH_EX 5.768 46097 1.01
PAGEIOLATCH_UP 1.243 86 0.21
IO_COMPLETION 1.158 89 0.20
MEMORY_ALLOCATION_EXT 0.480 683353 0.08
LCK_M_SCH_S 0.200 34 0.03
ASYNC_NETWORK_IO 0.115 688 0.02
LCK_M_S 0.082 10 0.01
PAGEIOLATCH_SH 0.052 1 0.00
PAGELATCH_UP 0.037 6 0.00
SOS_SCHEDULER_YIELD 0.030 3598 0.00

“This wait type is usually seen in the heavy transactional database. When data is modified, it is written both on the log cache and buffer cache. This wait type occurs when data in the log cache is flushing to the disk. During this time, the session has to wait due to WRITELOG.” (Reference: SQLAuthority – WRITELOG)

And what do I need to find out now? Yes, each running autotest records something in the database. One of the solutions to the problem with WRITELOG waits could be inserting large chunks of data, rather than row by row. But SQL Server 2014 has a new Delayed Durability option on the database level, i.e. the ability to omit recording of data to disk upon committing transactions.

How is data modified in SQL Server? Suppose we are inserting a new row. SQL Server calls the Storage Engine component that, in turn, accesses Buffer Manager (that works with buffers in memory and the disk) and informs that I want to change the data.

After that, Buffer Manager accesses Buffer Pool (the cache in memory for all of our data, which stores information by page – 8 Kb per page), and then modifies the necessary pages in memory. If there are no such pages, it will load them from the disk. At the moment when a page is changed in memory, SQL Server cannot yet tell that the query was executed. Otherwise, one of the ACID principles, namely Durability, would be violated, when the end of modification ensures that the data will be written to disk.

After the page is modified in memory, Storage Engine accesses Log Manager, which writes data to the log. But it does not happen at once, but through Log Buffer having a size of 60Kb, which is used to optimize performance when working with the log. Data reset from the buffer to the log file occurs in the situation when:

  1. The buffer is filled and data is stored in the log.
  2. A user executed sys.sp_flush_log.
  3. A transaction was committed, and the entire Log Buffer was recorded to the log.

When the data are stored in the log, the data modification is confirmed, and SQL Server informs the client about it.

According to this logic, the data does not get into the data file. SQL Server uses an asynchronous mechanism for recording data to the files. There are two such mechanisms:

  1. Lazy Writer, that runs on a time basis and checks whether there is sufficient memory for SQL Server. If there is not, the pages are forced out of the memory and are recorded to the data file. And those that have been modified are flushed and thrown out of memory.
  2. Checkpoint, that scans dirty pages once a minute, flushes them and leaves in memory.

Suppose that a lot of small transactions are running in the system, for example, that modify data by row. After each modification, the data is transmitted from Log Buffer to the transaction log. Remember that all modifications get synchronously into the log and other transactions should wait for their turn.

Let me illustrate:

USE [master]
GO
SET NOCOUNT ON

IF DB_ID('TT') IS NOT NULL BEGIN
    ALTER DATABASE TT SET SINGLE_USER WITH ROLLBACK IMMEDIATE
    DROP DATABASE TT
END
GO

CREATE DATABASE TT
GO
ALTER DATABASE TT
    MODIFY FILE (NAME = N'TT', SIZE = 25MB, FILEGROWTH = 5MB)
GO
ALTER DATABASE TT
    MODIFY FILE (NAME = N'TT_log', SIZE = 25MB, FILEGROWTH = 5MB)
GO

USE TT
GO

CREATE TABLE dbo.tbl (
      a INT IDENTITY PRIMARY KEY
    , b INT
    , c CHAR(2000)
)
GO

IF OBJECT_ID('tempdb.dbo.#temp') IS NOT NULL
    DROP TABLE #temp
GO

SELECT t.[file_id], t.num_of_writes, t.num_of_bytes_written
INTO #temp
FROM sys.dm_io_virtual_file_stats(DB_ID(), NULL) t

DECLARE @WaitTime BIGINT
      , @WaitTasks BIGINT
      , @StartTime DATETIME = GETDATE()
      , @LogRecord BIGINT = (
              SELECT COUNT_BIG(*)
              FROM sys.fn_dblog(NULL, NULL)
          )

SELECT @WaitTime = wait_time_ms
     , @WaitTasks = waiting_tasks_count
FROM sys.dm_os_wait_stats
WHERE [wait_type] = N'WRITELOG'

DECLARE @i INT = 1

WHILE @i < 5000 BEGIN

    INSERT INTO dbo.tbl (b, c)
    VALUES (@i, 'text')

    SELECT @i += 1

END

SELECT elapsed_seconds = DATEDIFF(MILLISECOND, @StartTime, GETDATE()) * 1. / 1000
     , wait_time = (wait_time_ms - @WaitTime) / 1000.
     , waiting_tasks_count = waiting_tasks_count - @WaitTasks
     , log_record = (
          SELECT COUNT_BIG(*) - @LogRecord
          FROM sys.fn_dblog(NULL, NULL)
       )
FROM sys.dm_os_wait_stats
WHERE [wait_type] = N'WRITELOG'

SELECT [file] = FILE_NAME(o.[file_id])
     , num_of_writes = t.num_of_writes - o.num_of_writes
     , num_of_mb_written = (t.num_of_bytes_written - o.num_of_bytes_written) * 1. / 1024 / 1024
FROM #temp o
CROSS APPLY sys.dm_io_virtual_file_stats(DB_ID(), NULL) t
WHERE o.[file_id] = t.[file_id]

Inserting 5 thousand rows took about 42.5 seconds, and the delay upon inserting into the log was 42 seconds.

elapsed_seconds wait_time waiting_tasks_count log_record
—————- ———- ——————– ———–
42.54 42.13 5003 18748

SQL Server physically accessed the log 5000 times and has recorded a total of 20Mb.

file num_of_writes num_of_mb_written
——- ————– ——————
TT 79 8.72
TT_log 5008 19.65

Delayed Durability is the right choice for these situations. When activated, an entry is made to the log only when Log Buffer is full. You can enable Delayed Durability for the entire database:

ALTER DATABASE TT SET DELAYED_DURABILITY = FORCED 
GO 

or for individual transactions:

ALTER DATABASE TT SET DELAYED_DURABILITY = ALLOWED 
GO 
BEGIN TRANSACTION t 
... 
COMMIT TRANSACTION t WITH (DELAYED_DURABILITY = ON) 

Let’s enabled for the database and execute the script once again.

The waits disappeared and the query ran for 170ms on my machine:

elapsed_seconds wait_time waiting_tasks_count log_record
—————- ———- ——————– ———–
0.17 0.00 0 31958

Due to the fact that the records were made to the log less intensely:

file num_of_writes num_of_mb_written
——- ————– ——————
TT 46 9.15
TT_log 275 12.92

Of course, there is a fly in the ointment. Before the data physically gets into the log file, the client is informed that the changes are recorded. In case of failure, we can lose data equal to the buffer size and damage the database.

In my case, the safety of the test data is not required. The DELAYED_DURABILITY was set to FORCED for the test database on which the TMetric autotests run, and the next time all tests ran for 2.5 minutes.

All the delays associated with logging have a minimal impact on performance:

wait_type wait_time waiting_tasks_count percentage
——————– ———– ——————– ———–
PAGEIOLATCH_EX 16.031 61 43.27
WRITELOG 15.454 787 41.72
PAGEIOLATCH_UP 2.210 36 5.96
PAGEIOLATCH_SH 1.472 2 3.97
LCK_M_SCH_M 0.756 9 2.04
ASYNC_NETWORK_IO 0.464 735 1.25
PAGELATCH_UP 0.314 8 0.84
SOS_SCHEDULER_YIELD 0.154 2759 0.41
PAGELATCH_EX 0.154 44785 0.41
LCK_M_SCH_S 0.021 7 0.05
PAGELATCH_SH 0.011 378 0.02

Let’s summarize the results on Delayed Durability:

  1. Available in all editions starting from SQL Server 2014.
  2. It can be used if you have a bottleneck when writing to the transaction log (lazy commit in large blocks may be more effective than many small ones).
  3. Concurrent transactions will less likely compete for IO operations upon logging.
  4. When activated, the COMMIT operation does not wait for entries in the transaction log and we can get a significant performance boost in OLTP systems.
  5. You can go ahead and enable Delayed Durability, if you are ready to play Russian roulette and upon “fortuitous” combination of circumstances lose approximately 60Kb of data in case of failure.

Reference: Pinal Dave (https://blog.sqlauthority.com)

Exit mobile version