SQL SERVER 2016 – STRING_SPLIT Function Performance Comparison – Notes from the Field #118

March 17, 2016

SQL SERVER 2016 - STRING_SPLIT Function Performance Comparison - Notes from the Field #118 Kathi [Note from Pinal]: SQL Server 2016 is around the corner. We already have SQL Server 2016 RC0 released and many of us have started to test out various aspects of the same. For many years users have created temporary tables as well as used cross apply functions to split a large string and make it a table. I think every single time when I was assigned this task, I disliked it as the methods I used never made me happy. Well, the situation is now changed now with STRING_SPLIT function in SQL Server 2016.

In this episode of the Notes from the Field series database expert Kathi Kellenberger explains about easy and amazing functions of SQL Server. Kathi is an amazing instructor, she was the SQL author I have read in my early career. Today she brings a very new topic for database experts. Read the experience of Kathi in her own words.

One of the most commonly employed T-SQL user defined functions (UDF) takes a delimited string and returns a table. This functionality is especially useful for multi-valued parameters sent from Reporting Services. Generally, a function like this loops through the string looking for the delimiter and inserts each single item into the table. It’s simple, gets the job done, and the code can be easily found with a quick Internet search. Other methods use a recursive common table expression (CTE) or XML to do the work.

For years, many of us wondered why a function this useful didn’t already exists in T-SQL. My theory (I always have a theory) is that a string splitting function is a perfect example to illustrate how to develop a CLR integrated UDF. It’s not “Hello, world!” but it isn’t much more difficult.

SQL Server 2016 RC0 (release candidate 0) became available in early March. Along with other many enhancements, the built-in function STRING_SPLIT was announced!

Story of STRING_SPLIT

Here is an example of how the new function can be used:

DECLARE @String NVARCHAR(1000) = 'abc,def,ghi,jkl,mno,pqr,stu,vw,xyz';
SELECT * FROM STRING_SPLIT(@String,',');

SQL SERVER 2016 - STRING_SPLIT Function Performance Comparison - Notes from the Field #118 118-1

Over the past several releases, many new functions have been CLR functions under the hood. I suspect this is the case with STRING_SPLIT as well. You can use STRING_SPLIT anywhere you can use a table-valued function, such as with CROSS APPLY. I can see this being used to import data into SQL Server from a non-normalized source. Here is an example using CROSS APPLY:

CREATE TABLE #Grades(StudentID INT NOT NULL PRIMARY KEY, TestScores NVARCHAR(100));
INSERT INTO #Grades(StudentID,TestScores)
VALUES(1,'75,80,100,90'),
(2,'80,82,90,95'),
(3,'90,89,91,100');

SELECT StudentID, Value AS Score
FROM #Grades
CROSS APPLY STRING_SPLIT(TestScores,',') AS Scores;

SELECT StudentID, AVG(CAST(Value AS INT)) AS AverageScore
FROM #Grades
CROSS APPLY STRING_SPLIT(TestScores,',') AS Scores
GROUP BY StudentID;

SQL SERVER 2016 - STRING_SPLIT Function Performance Comparison - Notes from the Field #118 118-2

Most of the time, a function like this is used to split just a small number of items, and performance is not a concern. I did wonder if there was much of a difference between the new function and the typical function used in the past. I set up a test to find out.

First, I created this old-school function:

CREATE FUNCTION dbo.OldSplitString(@String NVARCHAR(MAX), @Delim NCHAR(1))
RETURNS @Values TABLE ([Value] NVARCHAR(MAX)) AS
BEGIN
DECLARE @Pos INT;
SET @Pos = CHARINDEX(@Delim,@String);
WHILE @Pos &amp;gt; 0 BEGIN
INSERT INTO @Values([Value])
SELECT LTRIM(RTRIM(LEFT(@String,@Pos -1)));
SET @String = SUBSTRING(@String,@Pos + 1,LEN(@String));
SET @Pos = CHARINDEX(@Delim,@String);
END
IF LEN(LTRIM(RTRIM(@String))) &amp;gt; 0
INSERT INTO @Values([Value])
SELECT @String;
RETURN;
END

Then, I ran this test:

--Create a large comma-delimited string
DECLARE @String NVARCHAR(4000) ;
SELECT @String = COALESCE(@String + ',','') +  LEFT(name,1)
FROM sys.objects;

PRINT @String

--Create a variable used to see which takes the most time
DECLARE @time datetime2 = GETDATE()
SELECT * FROM dbo.OldSplitString(@String,',');
SELECT DATEDIFF(MILLISECOND,@time,GETDATE()) AS OldMethod;
SET @time = GETDATE()
SELECT * FROM STRING_SPLIT(@String,',');
SELECT DATEDIFF(MILLISECOND,@time,GETDATE()) AS NewFunction;

SQL SERVER 2016 - STRING_SPLIT Function Performance Comparison - Notes from the Field #118 118-3

After running the test several times, the new STRING_SPLIT function consistently ran in under 5 milliseconds, while the old method took about 150 milliseconds each time. I decided to change things up a bit and use NVARCHAR(MAX) for @String instead of NVARCHAR(4000). In this case, the new STRING_SPLIT function took about 250 milliseconds each time while the old method stayed about the same as before. The length of the @String variable was only 1609 characters, so the length didn’t change between the two examples.

I continued to try several different scenarios with different size strings and different width items. The new function seemed to perform the best on smaller items and when the variable was not a MAX data type. There was a tipping point at 15 character items. Here are some of the scenarios I tested:

Variable Type	String Size (characters)	Item Size (characters)	Number of Items	Typical Duration for STRING_SPLIT (ms)
NVARCHAR(4000)	1609	1	805	0
NVARCHAR(4000)	4000	5	667	0
NVARCHAR(4000)	4000	14	284	20
NVARCHAR(4000)	4000	15	271	190
NVARCHAR(4000)	4000	20	227	140
NVARCHAR(MAX)	1609	1	805	150
NVARCHAR(MAX)	4829	5	805	150
NVARCHAR(MAX)	15,524	20	1613	180

Conclusion

We have been waiting for the STRING_SPLIT function for years. For most scenarios, this function will perform great, but be sure to test your particular workload.

Kathi Kellenberger is the author of “Beginning T-SQL” and “Expert T-SQL Window Functions in SQL Server” both from Apress. She is also the author of two Pluralsight courses, “T-SQL Window Functions” and “Configuring Kerberos for SSRS.”

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (https://blog.sqlauthority.com)

Notes from the Field, SQL DateTime, SQL Function, SQL Scripts, SQL Server

SQL SERVER – Mirroring Error 1412 – The remote copy of database has not been rolled forward to a point in time that is encompassed in the local copy of the database log. (Microsoft SQL Server, Error:)

SQL SERVER – Introduction to SQL Azure Database Throttling

6 Comments. Leave new

Grzegorz Lyp
March 17, 2016 10:51 am
Here is the CLR comparison and the xml split method is twice slower than CLR
Reply
Arny Pire
March 17, 2016 6:28 pm
What about Halloween protection with this func?
Reply
Vikram Reddy
March 18, 2016 10:05 am
Thanks for your article. There is small typo mistake in conclusion, ‘SPLIT_STRING’ was mentioned instead of ‘STRING_SPLIT’.
Reply
- Pinal Dave
  March 18, 2016 10:25 am
  Thanks, I fixed it!
  Reply
Andy Dayton
March 22, 2016 10:19 pm
It seems to have a drawback that it doesn’t delimit separator characters in any way (meaning that the chosen delimiter cannot itself ever appear in text fields) . I’ve encountered cases where programmers have forgotten about that principle. See the RFC 4180 specification for CSV files for examples.
Reply
Kraaitje
April 24, 2016 6:03 pm
Cool new function. thanks for mentioning!
Reply