SQL SERVER – Replacing a Cursor with a Common Table Expression

SQL SERVER - Replacing a Cursor with a Common Table Expression CTECursor-800x586 During my recent consulting engagement Comprehensive Database Performance Health Check, I had the opportunity to work with a client experiencing slow query performance due to a cursor-based approach in their T-SQL code. In this blog post, I will describe the problem the client was facing, walk through the T-SQL code they were using, and explain how I was able to improve the query performance by replacing the cursor with a common table expression (CTE).

Problem:

The client had a large database table containing multiple rows of text data, and they needed to generate a report of the top three most frequently occurring words in the text. The original T-SQL code they were using used a cursor to loop through each row of the table, split the text into individual words, and count the occurrences of each word. However, they quickly discovered that the cursor was causing significant performance issues, with the query taking several minutes to complete, even on a well-tuned server. They reached out to me for assistance, and I recommended replacing the cursor with a more efficient solution using a CTE.

Cursor – T-SQL Code:

Here is the original T-SQL code the client was using:

DECLARE @Text NVARCHAR(MAX)
DECLARE @Word NVARCHAR(MAX)
DECLARE @Count INT
DECLARE @Words TABLE (Word NVARCHAR(MAX), Count INT)
DECLARE Cursor1 CURSOR FOR
  SELECT TextColumn
  FROM MyTable
OPEN Cursor1
FETCH NEXT FROM Cursor1 INTO @Text
WHILE @@FETCH_STATUS = 0
BEGIN
  SET @Text = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
  (REPLACE(REPLACE(REPLACE(
    LOWER(@Text), '.', ''), ',', ''), ';', ''), ':', ''), '?', ''), 
	'(', ''), ')', ''), '[', ''), ']', ''), '!', '')
  DECLARE Cursor2 CURSOR FOR
    SELECT value
    FROM STRING_SPLIT(@Text, ' ')
  OPEN Cursor2
  FETCH NEXT FROM Cursor2 INTO @Word
  WHILE @@FETCH_STATUS = 0
  BEGIN
    SET @Count = (SELECT COUNT(*) FROM @Words WHERE Word = @Word)
    IF @Count = 0
    BEGIN
      INSERT INTO @Words (Word, Count) VALUES (@Word, 1)
    END
    ELSE
    BEGIN
      UPDATE @Words SET Count = Count + 1 WHERE Word = @Word
    END
    FETCH NEXT FROM Cursor2 INTO @Word
  END
  CLOSE Cursor2
  DEALLOCATE Cursor2
  FETCH NEXT FROM Cursor1 INTO @Text
END
CLOSE Cursor1
DEALLOCATE Cursor1
SELECT TOP 3 Word, Count 
FROM @Words 
ORDER BY Count DESC

This code uses two nested cursors to loop through each row of the MyTable table, split the text into words, and count the occurrences of each word. As mentioned earlier, this approach had poor performance due to the overhead of opening and closing the cursors for each row.

Common Table Expression Solution:

To improve the performance of this query, I recommended using a CTE to perform the calculation in a single set-based operation instead of using nested cursors. Here is the updated T-SQL code using a CTE:

WITH CTE AS (
  SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    LOWER(TextColumn), '.', ''), ',', ''), ';', ''), ':', ''), '?', ''), 
    '(', ''), ')', ''), '[', ''), ']', ''), '!', '') AS CleanedText
  FROM MyTable
)
SELECT TOP 3 Word, COUNT(*) AS Count
FROM (
  SELECT value AS Word
  FROM CTE
    CROSS APPLY STRING_SPLIT(CleanedText, ' ')
) AS Words
WHERE Word != ''
GROUP BY Word
ORDER BY COUNT(*) DESC

This code uses a CTE and the STRING_SPLIT function to split the text into individual words, clean up the text by removing punctuation and converting it to lowercase, and count the occurrences of each word. The resulting data is then selected and outputted.

SQL SERVER - Replacing a Cursor with a Common Table Expression wordcount

Conclusion:

By replacing the nested cursors with a CTE, I significantly improved the query performance for my client. The updated query ran in a fraction of the time it took for the original cursor-based approach, and the client was able to generate their reports much more quickly and efficiently.

This experience highlights the importance of choosing the right approach when writing T-SQL code and the benefits of using set-based operations instead of cursors when possible. By leveraging the power of the SQL Server engine and using a CTE, we were able to achieve the desired results with much better performance.

Here is the sample data for MyTable that was used in the original query:

CREATE TABLE MyTable (
  TextColumn NVARCHAR(MAX)
);

INSERT INTO MyTable (TextColumn)
VALUES 
  ('The quick brown fox jumps over the lazy dog.'),
  ('She sells seashells by the seashore.'),
  ('How much wood would a woodchuck chuck if a woodchuck could chuck wood?'),
  ('To be or not to be, that is the question.'),
  ('Now is the winter of our discontent made glorious summer by this sun of York.'),
  ('All the world''s a stage, and all the men and women merely players.');

You can always reach out to me via YouTube Channel.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

CTE, SQL Cursor, SQL String
Previous Post
SQL SERVER – Thread Pool Waits: Monitoring and Optimization
Next Post
SQL SERVER – Impact of Changing Database Compatibility Level on Cache

Related Posts

Leave a Reply