During my recent consulting engagement Comprehensive Database Performance Health Check, I had the opportunity to work with a client experiencing slow query performance due to a cursor-based approach in their T-SQL code. In this blog post, I will describe the problem the client was facing, walk through the T-SQL code they were using, and explain how I was able to improve the query performance by replacing the cursor with a common table expression (CTE).
The client had a large database table containing multiple rows of text data, and they needed to generate a report of the top three most frequently occurring words in the text. The original T-SQL code they were using used a cursor to loop through each row of the table, split the text into individual words, and count the occurrences of each word. However, they quickly discovered that the cursor was causing significant performance issues, with the query taking several minutes to complete, even on a well-tuned server. They reached out to me for assistance, and I recommended replacing the cursor with a more efficient solution using a CTE.
Cursor – T-SQL Code:
Here is the original T-SQL code the client was using:
DECLARE @Text NVARCHAR(MAX) DECLARE @Word NVARCHAR(MAX) DECLARE @Count INT DECLARE @Words TABLE (Word NVARCHAR(MAX), Count INT) DECLARE Cursor1 CURSOR FOR SELECT TextColumn FROM MyTable OPEN Cursor1 FETCH NEXT FROM Cursor1 INTO @Text WHILE @@FETCH_STATUS = 0 BEGIN SET @Text = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE (REPLACE(REPLACE(REPLACE( LOWER(@Text), '.', ''), ',', ''), ';', ''), ':', ''), '?', ''), '(', ''), ')', ''), '[', ''), ']', ''), '!', '') DECLARE Cursor2 CURSOR FOR SELECT value FROM STRING_SPLIT(@Text, ' ') OPEN Cursor2 FETCH NEXT FROM Cursor2 INTO @Word WHILE @@FETCH_STATUS = 0 BEGIN SET @Count = (SELECT COUNT(*) FROM @Words WHERE Word = @Word) IF @Count = 0 BEGIN INSERT INTO @Words (Word, Count) VALUES (@Word, 1) END ELSE BEGIN UPDATE @Words SET Count = Count + 1 WHERE Word = @Word END FETCH NEXT FROM Cursor2 INTO @Word END CLOSE Cursor2 DEALLOCATE Cursor2 FETCH NEXT FROM Cursor1 INTO @Text END CLOSE Cursor1 DEALLOCATE Cursor1 SELECT TOP 3 Word, Count FROM @Words ORDER BY Count DESC
This code uses two nested cursors to loop through each row of the MyTable table, split the text into words, and count the occurrences of each word. As mentioned earlier, this approach had poor performance due to the overhead of opening and closing the cursors for each row.
Common Table Expression Solution:
To improve the performance of this query, I recommended using a CTE to perform the calculation in a single set-based operation instead of using nested cursors. Here is the updated T-SQL code using a CTE:
WITH CTE AS ( SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( LOWER(TextColumn), '.', ''), ',', ''), ';', ''), ':', ''), '?', ''), '(', ''), ')', ''), '[', ''), ']', ''), '!', '') AS CleanedText FROM MyTable ) SELECT TOP 3 Word, COUNT(*) AS Count FROM ( SELECT value AS Word FROM CTE CROSS APPLY STRING_SPLIT(CleanedText, ' ') ) AS Words WHERE Word != '' GROUP BY Word ORDER BY COUNT(*) DESC
This code uses a CTE and the STRING_SPLIT function to split the text into individual words, clean up the text by removing punctuation and converting it to lowercase, and count the occurrences of each word. The resulting data is then selected and outputted.
By replacing the nested cursors with a CTE, I significantly improved the query performance for my client. The updated query ran in a fraction of the time it took for the original cursor-based approach, and the client was able to generate their reports much more quickly and efficiently.
This experience highlights the importance of choosing the right approach when writing T-SQL code and the benefits of using set-based operations instead of cursors when possible. By leveraging the power of the SQL Server engine and using a CTE, we were able to achieve the desired results with much better performance.
Here is the sample data for MyTable that was used in the original query:
CREATE TABLE MyTable ( TextColumn NVARCHAR(MAX) ); INSERT INTO MyTable (TextColumn) VALUES ('The quick brown fox jumps over the lazy dog.'), ('She sells seashells by the seashore.'), ('How much wood would a woodchuck chuck if a woodchuck could chuck wood?'), ('To be or not to be, that is the question.'), ('Now is the winter of our discontent made glorious summer by this sun of York.'), ('All the world''s a stage, and all the men and women merely players.');
You can always reach out to me via YouTube Channel.
Reference: Pinal Dave (http://blog.SQLAuthority.com)