SQL SERVER – 2005 – UDF – User Defined Function to Strip HTML – Parse HTML – No Regular Expression

June 16, 2007

One of the developers at my company asked is it possible to parse HTML and retrieve only TEXT from it without using regular expression. He wanted to remove everything between < and > and keep only Text. I found the question very interesting and quickly wrote UDF which does not use regular expression. Let us see how to parse HTML without regular expression.

Following UDF takes input as HTML and returns TEXT only. If there is any single quotes in HTML they should be replaced with two single quotes (not double quote) before it is passed as input to function.

CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
SET @Start = CHARINDEX('<',@HTMLText) SET @End = 
CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText)) 
SET @Length = (@End - @Start) + 1 WHILE @Start > 0
AND @End > 0
AND @Length > 0
BEGIN
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
SET @Start = CHARINDEX('<',@HTMLText) SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
END
RETURN LTRIM(RTRIM(@HTMLText))
END
GO

Test above function like this :

SELECT dbo.udf_StripHTML('<b>UDF at SQLAuthority.com </b>

<a href="http://www.SQLAuthority.com">SQLAuthority.com</a>')

Result Set:

UDF at SQLAuthority.com SQLAuthority.com

If you want to see this example in action click on Image. It will open large image.

SQL SERVER - 2005 - UDF - User Defined Function to Strip HTML - Parse HTML - No Regular Expression HTMLUDF_S

Let me know what think of blog post by leaving your note in the comment sections.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Function, SQL Scripts, SQL Server, SQL String

SQL SERVER – sp_HelpText for sp_HelpText – Puzzle

SQL SERVER – De-fragmentation of Database at Operating System to Improve Performance

June 2025 Discount: Comprehensive Database Performance Health Check | Testimonials

SQL SERVER – 2005 – UDF – User Defined Function to Strip HTML – Parse HTML – No Regular Expression

Related Posts

SQL SERVER – Identifying InMemory OLTP Hash Collisions

Performance Comparison EXCEPT vs NOT IN – Interview Question of the Week #095

Fastest Way to Retrieve Rowcount for a Table – SQL in Sixty Seconds #096

Leave a ReplyCancel reply