1

I am trying to convert the HTML names like & " etc to their equivalent CHAR values using the SQL below. I was testing this in SQL Server 2012.

Test 1 (This works fine):

GO
DECLARE @inputString VARCHAR(MAX)= '&testString&'
DECLARE @codePos INT, @codeEncoded VARCHAR(7), @startIndex INT, @resultString varchar(max)
SET @resultString = LTRIM(RTRIM(@inputString))
SELECT @startIndex = PATINDEX('%&%', @resultString)
WHILE @startIndex > 0 
BEGIN
    SELECT @resultString = REPLACE(@resultString, '&', '&'), @startIndex=PATINDEX('%&%', @resultString)
END

PRINT @resultString
Go

Output:

&testString&

Test 2 (this isn't worked): Since the above worked, I have tried to extend this to deal with more characters as following:

DECLARE @htmlNames TABLE (ID INT IDENTITY(1,1), asciiDecimal INT, htmlName varchar(50))
INSERT INTO @htmlNames
VALUES (34,'"'),(38,'&'),(60,'<'),(62,'>'),(160,' '),(161,'¡'),(162,'¢')
-- I would load the full list of HTML names into this TABLE varaible, but removed for testing purposes
DECLARE @inputString VARCHAR(MAX)= '&testString&'
DECLARE @count INT = 0
DECLARE @id INT = 1
DECLARE @charCode INT, @htmlName VARCHAR(30)
DECLARE @codePos INT, @codeEncoded VARCHAR(7), @startIndex INT
        , @resultString varchar(max)
SELECT @count=COUNT(*) FROM @htmlNames

WHILE @id <=@count
BEGIN
    SELECT @charCode = asciiDecimal, @htmlname = htmlName
    FROM @htmlNames
    WHERE ID = @id

        SET @resultString = LTRIM(RTRIM(@inputString))
        SELECT @startIndex = PATINDEX('%' + @htmlName + '%', @resultString)
        While @startIndex > 0 
        BEGIN
            --PRINT @resultString + '|'  + @htmlName + '|' + NCHAR(@charCode)
            SELECT @resultString = REPLACE(@resultString, @htmlName, NCHAR(@charCode))
            SET @startIndex=PATINDEX('%' + @htmlName + '%', @resultString)
        END
        SET @id=@id + 1
END

PRINT @resultString

GO

Output:

&amp;testString&amp;

I cannot figure out where I'm going wrong? Any help would be much appreciated.

I am not interested to load the string values into application layer and then apply HTMLDecode and save back to the database.

EDIT: This line SET @resultString = LTRIM(RTRIM(@inputString)) was inside the WHILE so I was overwriting the result with @inputString. Thank you, YanireRomero.

I like @RichardDeeming's solution too, but it didn't suit my needs in this case.

4 Answers 4

16

Here's a simpler solution that doesn't need a loop:

DECLARE @htmlNames TABLE 
(
    ID INT IDENTITY(1,1), 
    asciiDecimal INT, 
    htmlName varchar(50)
);

INSERT INTO @htmlNames 
VALUES 
    (34,'&quot;'),
    (38,'&amp;'),
    (60,'&lt;'),
    (62,'&gt;'),
    (160,'&nbsp;'),
    (161,'&iexcl;'),
    (162,'&cent;')
;

DECLARE @inputString varchar(max)= '&amp;test&amp;quot;&lt;String&gt;&quot;&amp;';
DECLARE @resultString varchar(max) = @inputString;

-- Simple HTML-decode:
SELECT
    @resultString = Replace(@resultString COLLATE Latin1_General_CS_AS, htmlName, NCHAR(asciiDecimal))
FROM
    @htmlNames
;

SELECT @resultString;
-- Output: &test&quot;<String>"&


-- Multiple HTML-decode:
SET @resultString = @inputString;

DECLARE @temp varchar(max) = '';
WHILE @resultString != @temp
BEGIN
    SET @temp = @resultString;

    SELECT
        @resultString = Replace(@resultString COLLATE Latin1_General_CS_AS, htmlName, NCHAR(asciiDecimal))
    FROM
        @htmlNames
    ;
END;

SELECT @resultString;
-- Output: &test"<String>"&

EDIT: Changed to NCHAR, as suggested by @tomasofen, and added a case-sensitive collation to the REPLACE function, as suggested by @TechyGypo.

Sign up to request clarification or add additional context in comments.

5 Comments

@RichardDeeming, thank you very much for the suggestion, It is much better than mine. I had a loop to handle the input in the following format. &amp;amp;testString&amp;amp;. Your script will output this as &amp;testString&amp;. Do you have any suggestions to deal with the aforementioned case?
@Sathish: So you're not really HTML-decoding the string? Because if you put that string in an HTML document, it will display &amp;testString&amp;. If you want to ignore that, and keep going until there are no entities left, then you will need to execute the SELECT statement within a loop.
@RichardDeeming, I'm loading values from a legacy database and their TEXT column values were in the aforementioned format, so I had to deal with them. Your script is really nice, simple and works perfect. Thank you again.
Be aware that case sensitivity can come into play here too if the list of codes increases (as @tomasofen supplied below). For example &Eacute; and &eacute; may not be correctly replaced with the REPLACE statement if your DB is not case sensitive. Use something like this: REPLACE(@resultString COLLATE Latin1_General_CS_AS, htmlName COLLATE Latin1_General_CS_AS, CHAR(asciiDecimal) COLLATE Latin1_General_CS_AS)
@TechyGypo: Thanks for that. I've updated the answer to include your suggestion. :)
5

For the sake of performance, this isn't something you should do write as T-SQL statements, or as a SQL scalar value function. The .NET libraries provide excellent, fast, and, above all, reliable HTML decoding. In my opinion, you should implement this as a SQL CLR, like this:

using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
using System.Net;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction(
        IsDeterministic = true,
        IsPrecise = true,
        DataAccess = DataAccessKind.None,
        SystemDataAccess = SystemDataAccessKind.None)]
    [return: SqlFacet(MaxSize = 4000)]
    public static SqlString cfnHtmlDecode([SqlFacet(MaxSize = 4000)] SqlString input)
    {
        if (input.IsNull)
            return null;

        return System.Net.WebUtility.HtmlDecode(input.Value);
    }
}

Then in your T-SQL, call it like this:

SELECT clr_schema.cfnHtmlDecode(column_name) FROM table_schema.table_name

1 Comment

This is much better solution to undo encoded data which is already in database. Upvoted.
2

Hey it was an assign error:

DECLARE @htmlNames TABLE (ID INT IDENTITY(1,1), asciiDecimal INT, htmlName varchar(50))
INSERT INTO @htmlNames
VALUES (34,'&quot;'),(38,'&amp;'),(60,'&lt;'),(62,'&gt;'),(160,'&nbsp;'),(161,'&iexcl;'),(162,'&cent;')
-- I would load the full list of HTML names into this TABLE varaible, but removed for testing purposes
DECLARE @inputString VARCHAR(MAX)= '&amp;testString&amp;'
DECLARE @count INT = 0
DECLARE @id INT = 1
DECLARE @charCode INT, @htmlName VARCHAR(30)
DECLARE @codePos INT, @codeEncoded VARCHAR(7), @startIndex INT
    , @resultString varchar(max)
SELECT @count=COUNT(*) FROM @htmlNames

SET @resultString = LTRIM(RTRIM(@inputString))

WHILE @id <=@count
BEGIN

    SELECT @charCode = asciiDecimal, @htmlname = htmlName
    FROM @htmlNames
    WHERE ID = @id

        SELECT @startIndex = PATINDEX('%' + @htmlName + '%', @resultString)

        While @startIndex > 0 
        BEGIN
            --PRINT @resultString + '|'  + @htmlName + '|' + NCHAR(@charCode)
            SET @resultString = REPLACE(@resultString, @htmlName, NCHAR(@charCode))
            SET @startIndex=PATINDEX('%' + @htmlName + '%', @resultString)
        END
        SET @id=@id + 1
END

PRINT @resultString

GO

this line SET @resultString = LTRIM(RTRIM(@inputString)) was inside the while so you were overwriting you result.

Hope it helps.

3 Comments

thank you for looking into this. Yes, that was the error in my code.
@CarlosCalla, I would have already done that if I had enough reputation. I cannot mark more than one as an answer, but thank you for your suggestion.
@Sathish no problem, I forgot you need 15 reputation to vote up, you are right.
2

Some additional help for "Richard Deeming" response, to safe some typing for future visitors trying to upgrade the function with more codes:

INSERT INTO @htmlNames 
    VALUES 
        (34,'&quot;'),
        (38,'&amp;'),
        (60,'&lt;'),
        (62,'&gt;'),

(160, '&nbsp;'),
(161, '&iexcl;'),
(162, '&cent;'),
(163, '&pound;'),
(164, '&curren;'),
(165, '&yen;'),
(166, '&brvbar;'),
(167, '&sect;'),
(168, '&uml;'),
(169, '&copy;'),
(170, '&ordf;'),
(171, '&laquo;'),
(172, '&not;'),
(173, '&shy;'),
(174, '&reg;'),
(175, '&macr;'),

(176, '&deg;'),
(177, '&plusmn;'),
(178, '&sup2;'),
(179, '&sup3;'),
(180, '&acute;'),
(181, '&micro;'),
(182, '&para;'),
(183, '&middot;'),
(184, '&cedil;'),
(185, '&sup1;'),
(186, '&ordm;'),
(187, '&raquo;'),
(188, '&frac14;'),
(189, '&frac12;'),
(190, '&frac34;'),
(191, '&iquest;'),

(192, '&Agrave;'),
(193, '&Aacute;'),
(194, '&Acirc;'),
(195, '&Atilde;'),
(196, '&Auml;'),
(197, '&Aring;'),
(198, '&AElig;'),
(199, '&Ccedil;'),
(200, '&Egrave;'),
(201, '&Eacute;'),
(202, '&Ecirc;'),
(203, '&Euml;'),
(204, '&Igrave;'),
(205, '&Iacute;'),
(206, '&Icirc;'),
(207, '&Iuml;'),

(208, '&ETH;'),
(209, '&Ntilde;'),
(210, '&Ograve;'),
(211, '&Oacute;'),
(212, '&Ocirc;'),
(213, '&Otilde;'),
(214, '&Ouml;'),
(215, '&times;'),
(216, '&Oslash;'),
(217, '&Ugrave;'),
(218, '&Uacute;'),
(219, '&Ucirc;'),
(220, '&Uuml;'),
(221, '&Yacute;'),
(222, '&THORN;'),
(223, '&szlig;'),

(224, '&agrave;'),
(225, '&aacute;'),
(226, '&acirc;'),
(227, '&atilde;'),
(228, '&auml;'),
(229, '&aring;'),
(230, '&aelig;'),
(231, '&ccedil;'),
(232, '&egrave;'),
(233, '&eacute;'),
(234, '&ecirc;'),
(235, '&euml;'),
(236, '&igrave;'),
(237, '&iacute;'),
(238, '&icirc;'),
(239, '&iuml;'),

(240, '&eth;'),
(241, '&ntilde;'),
(242, '&ograve;'),
(243, '&oacute;'),
(244, '&ocirc;'),
(245, '&otilde;'),
(246, '&ouml;'),
(247, '&divide;'),
(248, '&oslash;'),
(249, '&ugrave;'),
(250, '&uacute;'),
(251, '&ucirc;'),
(252, '&uuml;'),
(253, '&yacute;'),
(254, '&thorn;'),
(255, '&yuml;'),
(8364, '&euro;');

EDITED:

If you want the euro symbol working (and in general ASCII codes over 255), you will need to use NCHAR instead CHAR in Richard Deeming code.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.