4

I would like to parse out any HTML data that is returned wrapped in CDATA.

As an example <![CDATA[<table><tr><td>Approved</td></tr></table>]]>

Thanks!

2
  • 1
    Can you be more specific? You've got an XML document, containing a CDATA section, and you want to get a string containing the contents of that CDATA section? Commented May 1, 2009 at 17:18
  • I am getting this returned in a DataTable as one of the columns in the result set as a string exactly as per the example I wrote above, so I just want to do a regex to get the contents and return to browser just the html string via an AJAX call. Commented May 1, 2009 at 17:21

6 Answers 6

8

The expression to handle your example would be

\<\!\[CDATA\[(?<text>[^\]]*)\]\]\>

Where the group "text" will contain your HTML.

The C# code you need is:

using System.Text.RegularExpressions;
RegexOptions   options = RegexOptions.None;
Regex          regex = new Regex(@"\<\!\[CDATA\[(?<text>[^\]]*)\]\]\>", options);
string         input = @"<![CDATA[<table><tr><td>Approved</td></tr></table>]]>";

// Check for match
bool   isMatch = regex.IsMatch(input);
if( isMatch )
  Match   match = regex.Match(input);
  string   HTMLtext = match.Groups["text"].Value;
end if

The "input" variable is in there just to use the sample input you provided

Sign up to request clarification or add additional context in comments.

1 Comment

it's probably more suitable to use .* instead of [^]]* for the text group otherwise any HTML with the "]" in it will prevent the match.
4

I know this might seem incredibly simple, but have you tried string.Replace()?

string x = "<![CDATA[<table><tr><td>Approved</td></tr></table>]]>";
string y = x.Replace("<![CDATA[", string.Empty).Replace("]]>", string.Empty);

There are probably more efficient ways to handle this, but it might be that you want something that easy...

Comments

2

Not much detail, but a very simple regex should match it if there isn't complexity that you didn't describe:

/<!\[CDATA\[(.*?)\]\]>/

2 Comments

Though I don't think escaping "<" is really necessary.
Escaping < and > is not necessary in c# regex
1

The regex to find CDATA sections would be:

(?:<!\[CDATA\[)(.*?)(?:\]\]>)

Comments

0
Regex r = new Regex("(?<=<!\[CDATA\[).*?(?=\]\])");

1 Comment

Fixed! Sorry, didn't know that was valid in there :)
0

Why do you want to use Regex for such a simple task? Try this one:

str = str.Trim().Substring(9);
str = str.Substring(0, str.Length-3);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.