Some puzzles come as legacy and a great way to do sometimes interviews. In the past, I have written some innovative solutions when it comes to splitting strings inside SQL Server. One of them is at: SQL SERVER – Split Comma Separated List Without Using a Function. Though this is a good solution, many have some back to say this is not a clean or a native way to work with strings. Let us learn about Splitting Strings puzzle.
Let me complicate this statement a bit and set the stage for you. Assume you have a column that has a delimited value inside it. Our requirement is to split the values and find out the count of values. So I am making the split string conversation into an aggregation too. Nothing fancy about this though.
Splitting Strings Puzzle
Let us set up the environment for testing.
USE tempdb GO -- Create a table CREATE TABLE SplitWithCount ( cols VARCHAR(100) ) GO -- Insert Some Value INSERT INTO SplitWithCount (cols) VALUES ('Pinal;Dave;Kumar'),('Pinal;Dave'),('Pinal') GO -- Retrieve data SELECT * FROM SplitWithCount GO
What we want to achieve is to get an output as shown below:
The puzzle here is to see who can bring a simple code to achieve this above requirement.
Welcome 2016:
Interestingly, we have had a note around using STRING_SPLIT function inside SQL Server and it has been documented here: SQL SERVER 2016 – STRING_SPLIT Function Performance Comparison – Notes from the Field #118. If you are on SQL Server 2016 then we can easily do this using the following statement.
SELECT ss.value AS SplitMyString, COUNT(ss.value) [Counts] FROM SplitWithCount CROSS APPLY STRING_SPLIT(cols,';') ss GROUP BY ss.value GO
Though this is a very elegant way to do inside SQL Server 2016, I don’t think we can even beat the simplicity of this solution. Having said that if you are on an earlier version, would love to see what solutions you can come up with. Let me know. If you go through my earlier blog this would look like:
SELECT SplitMyString, COUNT(*) Counts FROM ( SELECT CAST(CONCAT('<Cols>' , REPLACE(cols, ';' , '</Cols><Cols>' ) , '</Cols>' ) AS XML) Xmlcol FROM SplitWithCount ) s CROSS apply ( SELECT ProjectData.D.value('.', 'VARCHAR(10)') as SplitMyString FROM s.xmlcol.nodes('Cols') as ProjectData(D) )a GROUP BY a.SplitMyString
Leave a comment if you are aware of alternate solutions. I will be happy to publish that on blog with due credit.
Reference: Pinal Dave (https://blog.sqlauthority.com)
47 Comments. Leave new
Not equal to sign get disappeared in above statement
; With Data
As
(
SELECT Cols
, Cast(1 As Int) [Level]
, Cast(1 As Int) [Start]
, Cast(Case When CharIndex(‘;’,Cols) > 0 Then CharIndex(‘;’,Cols)-1 Else Len(Cols) End As Int) [End]
, Cast(Case When CharIndex(‘;’,D.Cols) > 0 Then Stuff(D.Cols, 1, CharIndex(‘;’,D.Cols), ”) End As Varchar(100)) [NextStr]
FROM #SplitWithCount D
Union All
Select D.Cols
, Cast(D.Level + 1 As Int) [Level]
, Cast(D.Start+ D.[End]+1 As Int) [Start]
, Cast(Case When CharIndex(‘;’,D.NextStr) > 0 Then CharIndex(‘;’,D.NextStr)-1 Else Len(D.NextStr) End As Int) [End]
, Cast(Case When CharIndex(‘;’,D.NextStr) > 0 Then Stuff(D.NextStr, 1, CharIndex(‘;’,D.NextStr), ”) End As Varchar(100)) [NextStr]
From Data D
Where D.NextStr != ”
)
Select SubString(D.Cols, D.Start, D.[End]) [SplitMyString], Count(1) [Counts]
From Data D
Group By SubString(D.Cols, D.Start, D.[End])
DECLARE @xml xml, @delimiter varchar(1)=’;’
SET @xml = (SELECT ” + Replace(STUFF((SELECT @delimiter + cols FROM SplitWithCount FOR XML PATH(”)), 1, 1, ”),@delimiter,”)+”)
SELECT value, count(*) FROM
(
SELECT C.value(‘.’, ‘varchar(10)’) AS value
FROM @xml.nodes(‘X’) AS X(C)
)t
GROUP BY value
Alternate Solution
SELECT * FROM [CURRENT DATA]–Base Table
BEGIN TRY
DROP TABLE #TABLE
END TRY
BEGIN CATCH
END CATCH
CREATE TABLE #TABLE (SplitMyString VARCHAR(MAX))
DECLARE @T TABLE(ID INT IDENTITY,NAME VARCHAR(MAX))
DECLARE @@Table TABLE(ID INT IDENTITY,NAME VARCHAR(MAX))
DECLARE @I INT =1
INSERT INTO @T
SELECT ‘INSERT INTO #TABLE Values(”’+REPLACE(COLS,’;’,”’),(”’)+”’)’ FROM [CURRENT DATA]
DECLARE @COUNT INT =(SELECT MAX(ID) FROM @T)
WHILE @I<=@count
BEGIN
DECLARE @R VARCHAR(MAX) = (SELECT Name FROM @T WHERE ID = @I)
EXEC(@R)
SET @I=@I+1
END
SELECT SplitMyString,COUNT(SplitMyString) AS Counts FROM #TABLE GROUP BY SplitMyString ORDER BY SplitMyString
some of the reply stick with the output ..they got failed if anyone add more name or number of ‘;’ in there data. I think it should be quite more flexible . we can achieve it by just keeping the name under XML tag like Pinal then removing them . Below is the query
SELECT f.YourName, COUNT(f.YourName) AS YourCount FROM
(
SELECT t.c.value(‘.’, ‘nvarchar(100)’) as YourName FROM
(
SELECT CAST(” + REPLACE(cols, ‘;’,”) + ” as XML) AS ColXml FROM SplitWithCount
) k
CROSS APPLY ColXml.nodes(‘/R’) as t(c)
) f
GROUP BY f.YourName
;WITH [Split] AS
(
SELECT CONVERT(VARCHAR(100),LEFT(Cols,CHARINDEX(‘;’,Cols + ‘;’) – 1))[Name], STUFF(Cols,1,CHARINDEX(‘;’,Cols + ‘;’),”)[Remained]
FROM SplitWithCount
UNION ALL
SELECT CONVERT(VARCHAR(100),LEFT([Remained],CHARINDEX(‘;’,[Remained] + ‘;’) – 1)), STUFF([Remained],1,CHARINDEX(‘;’,[Remained] + ‘;’),”)
FROM [Split]
WHERE LEN([Remained]) > 0
)
SELECT [Name]SplitMyString,COUNT(1)Counts FROM [Split] GROUP BY [Name];
Hi,
My script is not optimize,it is just little different.
using plain recursive CTE.
DECLARE @SplitWithCount TABLE
(
cols VARCHAR(100)
)
— Insert Some Value
INSERT INTO @SplitWithCount (cols)
VALUES (‘Pinal;Dave;Kumar’),(‘Pinal;Dave’),(‘Pinal’)
— Retrieve data
;With CTE as
(
SELECT cols,row_number()over(order by (select null))rn
FROM @SplitWithCount
)
,CTE1 as
(
select case when charindex(‘;’,cols)>0 then substring(cols,0,charindex(‘;’,cols)) else cols end words
,case when charindex(‘;’,cols)>0 then substring(cols,charindex(‘;’,cols)+1,len(cols)) else null end cols
,rn from cte
union all
select case when charindex(‘;’,cols)>0 then substring(cols,0,charindex(‘;’,cols))else cols end
,case when charindex(‘;’,cols)>0 then substring(cols,charindex(‘;’,cols)+1,len(cols)) else null end
,rn from cte1
where len(cols)>0
)
select words,count(words)wordscount from cte1
group by words
Thank you for some amazing comments. Keep them coming. This is real learning for everyone who participated.
We want more puzzles. :)
Select Data, Count(*)
From
(
SELECT
XMLDoc.Node.value(‘.’, ‘VARCHAR(100)’) AS Data
FROM
(
SELECT
CAST (” + REPLACE(Cols, ‘;’, ”) + ” AS XML) AS Data
FROM SplitWithCount
) AS A CROSS APPLY Data.nodes (‘/D’) AS XMLDoc(Node)
) temp Group by Data
Result
——————-
Dave 2
Kumar 1
Pinal 3
DECLARE @VAR VARCHAR(10)=’A.B.C.D’
DECLARE @TBL TABLE ( SPLITMYSTRING VARCHAR(100),B VARCHAR(100))
DECLARE @ITT2 INT=LEN(REPLACE(@VAR,’.’,”))-1,@ITT1 INT=0
WHILE @ITT1<=@ITT2
BEGIN
INSERT INTO @TBL
VALUES (PARSENAME(@VAR,LEN(REPLACE(@VAR,'.',''))-@ITT1),LEN(REPLACE(@VAR,'.',''))-@ITT1-1)
SET @ITT1=@ITT1+1
END
SELECT * FROM @TBL
select
PARSENAME(replace(cols, ‘;’, ‘.’), 1) as SplitMyString,
3 – (len(cols) – len(replace(cols, ‘;’, ”))) as Counts
from PinalDaveKumar
order by 1;
I just changed a little bit for data inserting, I think some of you guys will encounter an exception when using xml method.
USE tempdb
GO
— Create a table
CREATE TABLE SplitWithCount
(
cols VARCHAR(100)
)
GO
— Insert Some Value
INSERT INTO SplitWithCount (cols)
VALUES (‘Pinal&;Dave;Kumar’),(‘Pinal&;Dave’),(‘Pinal&’)
GO
— Retrieve data
SELECT *
FROM SplitWithCount
GO
;WITH data
AS (
SELECT c = CONVERT(xml,
N”
)
FROM SplitWithCount
)
,CT
AS(
SELECT col = T.C.value(‘(./text())[1]’,’VARCHAR(10)’)
FROM data as a
CROSS APPLY C.nodes(‘./cols’) AS T(C)
)
select
SplitMyString = col
,Counts = count(1)
from CT
group by col
I have issues posting for 3rd time please ignore if duplicated:
this should work for any combination of the three values for any number of rows:
this could work for sql server 2000 by just removing the CTE and putting it in a subquery.
I think Harshad’s solution is the cleanest one and the most elegant (given a SQL Server 2016 instance is available)
select final.SplitMyString, count(SplitMyString) Counts from
(
SELECT Split.a.value(‘.’, ‘VARCHAR(100)’) AS SplitMyString
FROM
(
SELECT cols
,CAST (” + REPLACE([Cols], ‘;’, ”) + ” AS XML) AS String
FROM #SplitWithCount
) AS A
CROSS APPLY String.nodes (‘/M’) AS Split(a)
)final
group by final.SplitMyString
select Items,COUNT(Items) from SplitWithCount CROSS APPLY dbo.SplitNew(cols,’;’)
group by Items
make the function dbo.SplitNew
;WITH CTEDataSplit(DataValue, cols) AS (
SELECT LEFT(cols, CHARINDEX(‘;’,cols+’;’)-1)
,CAST(STUFF(cols, 1, CHARINDEX(‘;’,cols+’;’), ”) AS VARCHAR(100))
FROM #SplitWithCount
UNION ALL
SELECT LEFT(cols, CHARINDEX(‘;’,cols+’;’)-1)
,CAST(STUFF(cols, 1, CHARINDEX(‘;’,cols+’;’), ”) AS VARCHAR(100))
FROM CTEDataSplit
WHERE cols > ”
)
SELECT DataValue, COUNT(DataValue) AS Counts
FROM CTEDataSplit
GROUP BY DataValue
USE Test
GO
— Create a table
CREATE TABLE SplitWithCount
(
cols VARCHAR(100)
)
GO
— Insert Some Value
INSERT INTO SplitWithCount (cols)
VALUES (‘Pinal;Dave;Kumar’),(‘Pinal;Dave’),(‘Pinal’)
GO
— Retrieve data
SELECT *
FROM SplitWithCount
GO
——————————————————
CREATE FUNCTION [dbo].[fnParseStringTSQL]
(
@string NVARCHAR(MAX),
@separator NCHAR(1)
)
RETURNS @parsedString TABLE (string NVARCHAR(MAX))
AS
BEGIN
DECLARE @position int
SET @position = 1
SET @string = @string + @separator
WHILE charindex(@separator,@string,@position) 0
BEGIN
INSERT into @parsedString
SELECT substring(@string, @position, charindex(@separator,@string,@position) – @position)
SET @position = charindex(@separator,@string,@position) + 1
END
RETURN
END
select * from dbo.fnParseStringTSQL(‘Querying SQL Server’,’ ‘)
SELECT * FROM SplitWithCount
SELECT * FROM SplitWithCount T CROSS APPLY dbo.fnParseStringTSQL(T.cols,’;’)
SELECT [string],count(*) as Counts FROM SplitWithCount T CROSS APPLY dbo.fnParseStringTSQL(T.cols,’;’) group by [string]
No string splitting….
;with cte as (select case when cols like ‘%;Kumar’ then ‘Kumar’ end as cols,
(select count(cols) from SplitWithCount where cols like ‘%Kumar%’) as CountName
from SplitWithCount
group by cols
union
select case when cols like ‘%;Dave’ then ‘Dave’ end as cols,
(select count(cols) from SplitWithCount where cols like ‘%Dave%’) as CountName
from SplitWithCount
group by cols
union
select cols,
(select count(cols) from SplitWithCount where cols like ‘%Pinal%’) as CountName
from SplitWithCount
group by cols)
select cols, countname from cte where cols is not null and cols not like ‘%;%’