In this episode of the Notes from the Field series database expert Kathi Kellenberger explains about easy and amazing functions of SQL Server. Kathi is an amazing instructor, she was the SQL author I have read in my early career. Today she brings a very new topic for database experts. Read the experience of Kathi in her own words.
One of the most commonly employed T-SQL user defined functions (UDF) takes a delimited string and returns a table. This functionality is especially useful for multi-valued parameters sent from Reporting Services. Generally, a function like this loops through the string looking for the delimiter and inserts each single item into the table. It’s simple, gets the job done, and the code can be easily found with a quick Internet search. Other methods use a recursive common table expression (CTE) or XML to do the work.
For years, many of us wondered why a function this useful didn’t already exists in T-SQL. My theory (I always have a theory) is that a string splitting function is a perfect example to illustrate how to develop a CLR integrated UDF. It’s not “Hello, world!” but it isn’t much more difficult.
SQL Server 2016 RC0 (release candidate 0) became available in early March. Along with other many enhancements, the built-in function STRING_SPLIT was announced!
Story of STRING_SPLIT
Here is an example of how the new function can be used:
DECLARE @String NVARCHAR(1000) = 'abc,def,ghi,jkl,mno,pqr,stu,vw,xyz'; SELECT * FROM STRING_SPLIT(@String,',');
Over the past several releases, many new functions have been CLR functions under the hood. I suspect this is the case with STRING_SPLIT as well. You can use STRING_SPLIT anywhere you can use a table-valued function, such as with CROSS APPLY. I can see this being used to import data into SQL Server from a non-normalized source. Here is an example using CROSS APPLY:
CREATE TABLE #Grades(StudentID INT NOT NULL PRIMARY KEY, TestScores NVARCHAR(100)); INSERT INTO #Grades(StudentID,TestScores) VALUES(1,'75,80,100,90'), (2,'80,82,90,95'), (3,'90,89,91,100'); SELECT StudentID, Value AS Score FROM #Grades CROSS APPLY STRING_SPLIT(TestScores,',') AS Scores; SELECT StudentID, AVG(CAST(Value AS INT)) AS AverageScore FROM #Grades CROSS APPLY STRING_SPLIT(TestScores,',') AS Scores GROUP BY StudentID;
Most of the time, a function like this is used to split just a small number of items, and performance is not a concern. I did wonder if there was much of a difference between the new function and the typical function used in the past. I set up a test to find out.
First, I created this old-school function:
CREATE FUNCTION dbo.OldSplitString(@String NVARCHAR(MAX), @Delim NCHAR(1)) RETURNS @Values TABLE ([Value] NVARCHAR(MAX)) AS BEGIN DECLARE @Pos INT; SET @Pos = CHARINDEX(@Delim,@String); WHILE @Pos &gt; 0 BEGIN INSERT INTO @Values([Value]) SELECT LTRIM(RTRIM(LEFT(@String,@Pos -1))); SET @String = SUBSTRING(@String,@Pos + 1,LEN(@String)); SET @Pos = CHARINDEX(@Delim,@String); END IF LEN(LTRIM(RTRIM(@String))) &gt; 0 INSERT INTO @Values([Value]) SELECT @String; RETURN; END
Then, I ran this test:
--Create a large comma-delimited string DECLARE @String NVARCHAR(4000) ; SELECT @String = COALESCE(@String + ',','') + LEFT(name,1) FROM sys.objects; PRINT @String --Create a variable used to see which takes the most time DECLARE @time datetime2 = GETDATE() SELECT * FROM dbo.OldSplitString(@String,','); SELECT DATEDIFF(MILLISECOND,@time,GETDATE()) AS OldMethod; SET @time = GETDATE() SELECT * FROM STRING_SPLIT(@String,','); SELECT DATEDIFF(MILLISECOND,@time,GETDATE()) AS NewFunction;
After running the test several times, the new STRING_SPLIT function consistently ran in under 5 milliseconds, while the old method took about 150 milliseconds each time. I decided to change things up a bit and use NVARCHAR(MAX) for @String instead of NVARCHAR(4000). In this case, the new STRING_SPLIT function took about 250 milliseconds each time while the old method stayed about the same as before. The length of the @String variable was only 1609 characters, so the length didn’t change between the two examples.
I continued to try several different scenarios with different size strings and different width items. The new function seemed to perform the best on smaller items and when the variable was not a MAX data type. There was a tipping point at 15 character items. Here are some of the scenarios I tested:
|Variable Type||String Size (characters)||Item Size
|Number of Items||Typical Duration for STRING_SPLIT (ms)|
We have been waiting for the STRING_SPLIT function for years. For most scenarios, this function will perform great, but be sure to test your particular workload.
Kathi Kellenberger is the author of “Beginning T-SQL” and “Expert T-SQL Window Functions in SQL Server” both from Apress. She is also the author of two Pluralsight courses, “T-SQL Window Functions” and “Configuring Kerberos for SSRS.”
If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.
Reference: Pinal Dave (https://blog.sqlauthority.com)