One of the developer at my company asked is it possible to parse HTML and retrieve only TEXT from it without using regular expression. He wanted to remove everything between < and > and keep only Text. I found question very interesting and quickly wrote UDF which does not use regular expression.
Following UDF takes input as HTML and returns TEXT only. If there is any single quotes in HTML they should be replaced with two single quotes (not double quote) before it is passed as input to function.
CREATE FUNCTION [dbo].[udf_StripHTML]
(@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
WHILE @Start > 0
AND @End > 0
AND @Length > 0
BEGIN
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
END
RETURN LTRIM(RTRIM(@HTMLText))
END
GO
Test above function like this :
SELECT dbo.udf_StripHTML('<b>UDF at SQLAuthority.com </b><br><br><a href="http://www.SQLAuthority.com">SQLAuthority.com</a>')
Result Set:
UDF at SQLAuthority.com SQLAuthority.com
Reference : Pinal Dave (http://www.SQLAuthority.com)
If you want to see this example in action click on Image. It will open large image.







Hi ,
Really fantastic and jore hey, but I tried this in sql server query analyzer…
I got the errors like this :
Server: Msg 170, Level 15, State 1, Procedure udf_StripHTML, Line 2
Line 2: Incorrect syntax near ‘MAX’.
Server: Msg 137, Level 15, State 1, Procedure udf_StripHTML, Line 11
Must declare the variable ‘@HTMLText’.
Server: Msg 137, Level 15, State 1, Procedure udf_StripHTML, Line 12
Must declare the variable ‘@HTMLText’.
Server: Msg 137, Level 15, State 1, Procedure udf_StripHTML, Line 17
Must declare the variable ‘@HTMLText’.
Server: Msg 137, Level 15, State 1, Procedure udf_StripHTML, Line 18
Must declare the variable ‘@HTMLText’.
Server: Msg 137, Level 15, State 1, Procedure udf_StripHTML, Line 19
Must declare the variable ‘@HTMLText’.
Server: Msg 137, Level 15, State 1, Procedure udf_StripHTML, Line 22
Must declare the variable ‘@HTMLText’.
Send the remedy by query
With regards…
Wilson Gunanthi . J
Hi Wilson Gunanithi . J,
I just tested it again, it does not thrown an error. It works as intended and explained.
I have posted additional screen shots in original post, which displays UDF in action.
Regards,
Pinal Dave (http://www.SQLAuthority.com)
This code is interesting. How would one go about using this as a procedure instead of a function ?
Wilson Gunanithi . J
you are getting an error coz of copy paste
you need to replace the ‘ in your text when you copy paste.
oops you are using sql server 2000 and he is using 2005
so you cannot use varchar(max) which is only for 2005.
Replace varchar(max) with varchar(8000) and you will get rid of errors.
sometimes the ‘ marks are not copied properly thats what I thought initially. please who post these articles should point out for which version this is compatible with.
Hello All,
Lazy Coder has improved code :
Stripping HTML from text in SQL Server 2005
Use either of the version according to your need.
Regards,
Pinal Dave (SQLAuthority.com)
Does this code allow for HTML characters like the non-breaking space?   and/or other HTML encoded characters ?
Thanks
Tim
Hi Tim,
It does not allow those chars yet. It removes everything between “greater than” and “lesser than” sign.
Regards,
Pinal Dave (SQLAuthority.com)
I tried the modifyed code from Lazy coder’s site and the result shows the CHAR(13) + CHAR(10) in the data instead of a BR.. any way to change that?
Hi,
I wanted to know, if I’m inserting this record from this
control, how thw spaces are getting also inserted.
Regards,
SAROJ
Hi,
I wish to return a table.
how do i go about it ??
Hi,
How can I use this procedure to insert multiple html files in my existing table column.
Regards
Bobby
Fantastic!!!I wish I could be your student..