2

I have some HTML content that I'd like to parse and encode before displaying it in my web pages.

The trick is that I want to encode only text content, not the obvious HTML tags in the HTML content. How can I achieve that?

Example:

Provided

"Some text & links : <strong>bla blà blö</strong> and <a href="http://www.google.com">go there</a> for only 15 € < 20 €"

I'd like to output

"Some text &amp; links : <strong>bla bl&agrave; bl&ouml;</strong> and <a href="http://www.google.com">go there</a> for only 15 &euro; &lt; 20 &euro;"
or
"Some text &#38; links : <strong>bla bl&#224; bl&#246;</strong> and <a href="http://www.google.com">go there</a> for only 15 &#8364; &#60; 20 &#8364;"
5
  • 2
    Can you provide an example of what exactly it is you are trying to accomplish? The whole purpose of htmlencoding is to encode the HTML tags... Commented Sep 14, 2011 at 15:02
  • 4
    Try to use a HTML parser like HTML Agility Pack to do the actual parsing. Commented Sep 14, 2011 at 15:02
  • I just updated with an example. Commented Sep 14, 2011 at 15:11
  • if you broke that string apart into, say, HTMLString and ContentString, you could encode the ContentString, and then concatenate it back together with HTMLString. This may not be easy though unless you're already dynamically building up that string in the first place. :) Commented Sep 14, 2011 at 15:26
  • I guess I'm not the first doing this. Don't know a library or something to help me do this? Commented Sep 14, 2011 at 15:40

2 Answers 2

1

Use Html Agility Pack:

var html = 
  "Some text & links : <strong>bla blà blö</strong> and <a href=\"http://www.google.com\">go there</a> for only 15 € < 20 €";

// This
HtmlAgilityPack.HtmlEntity.Entitize(html);

// Outputs
Some text & links : <strong>bla bl&agrave; bl&ouml;</strong> and <a href="http://www.google.com">go there</a> for only 15 &euro; < 20 &euro;

Just tested it and it works great on your example.

If you want to see how it's done, it's public.

Sign up to request clarification or add additional context in comments.

Comments

0

I know this is an old topic, but I think this snippet might do a good job. I also know you're not supposed to use RegEx for HTML tags (as it does not address <script> and <style> at all), but this method might be what you need instead of getting the whole HTMLAgilityPack.... I used SqlString because this method is used by my SQL Server database. Can easily be switched to string. Also easy to change to StringBuilder to make it more optimal.

private static SqlString fnHTMLDecodeEncode(SqlString html, bool encode)
{
  if (html.IsNull)
    return SqlString.Null;

  const RegexOptions REGOPT = RegexOptions.Singleline | RegexOptions.Compiled;

  string s = html.Value;
  var m = Regex.Matches(s, @"(<[!A-Za-z\/][^>]*>", RegexOptions.Singleline |   RegexOptions.Compiled);
  int proStart, proLen;
  if (m.Count == 0)
  {
    proStart = 0;
    proLen = s.Length;
  }
  else
  {
    proStart = m[m.Count - 1].Index + m[m.Count - 1].Length;
    proLen = s.Length - proStart;
  }

  for (int i = m.Count; i >= 0; i--)
  {
    if (i < m.Count)
    {
        proStart = (i == 0 ? 0 : m[i - 1].Index + m[i - 1].Length);
        proLen = m[i].Index - proStart;
    }

    if (proLen > 2)
    {
        var orig = s.Substring(proStart, proLen);
        var enc = (encode ? System.Net.WebUtility.HtmlEncode(orig) : System.Net.WebUtility.HtmlDecode(orig));
        if (orig.Length != enc.Length)
        {
            s = s.Remove(proStart, proLen).Insert(proStart, enc);
        }

        proLen = -1;
    }

  }

  return new SqlString(s);
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.