0

I have a string that contains dynamic HTML content.

I want to be able to find and replace all occurrances of specific HTML tags and replace them, but not the content within them.

The specific HTML tags would be for a table - i.e. TABLE, TR, and TD. The tags may contain attributes, or they may not. How would one go about doing this in C#?

Thanks in advance for any help!

5
  • 1
    This is a task for an HTML parser, not a regular expression. Commented Jan 28, 2010 at 21:09
  • 2
    Using regex's on HTML and XML has been asked before. There's a very good response here on StackOverflow involving Cthulhu. ;) stackoverflow.com/questions/1732348/… Commented Jan 28, 2010 at 21:09
  • 3
    No go ahead, use regex. Life lesson. Commented Jan 28, 2010 at 21:18
  • @Peter Gibbons: You're cruel! Commented Jan 28, 2010 at 21:19
  • Eh, I tried it. And I've failed. Wasted many hours of my life. Commented Jan 28, 2010 at 21:20

3 Answers 3

4

This function might be sufficient:

public static string ReplaceTag(string input, string soughtTag, string replacementTag)
{
    return Regex.Replace(input, "(</?)" + soughtTag + @"((?:\s+.*?)?>)", "$1" + replacementTag + "$2");
}
Sign up to request clarification or add additional context in comments.

1 Comment

I was trying to do something similar, but my own regex when searching for an italics tag (<i>) was also matching image tags (<img>). This solution worked perfectly to correct my error, though I modified it to return the entire tag as a single capture group: (</?tagName(?:\s+.*?)?>) [regex101.com/r/nM5cJ8/3]
4

Don't use Regexs. Use the Html Agility Pack.

See this question for why not.

Comments

1
  e = "(< *?/*)div( +?|>)";
  repl = "\\1boo\\2"; 

Frankly I am befuddled by this mantra being imposed on everyone to never use regex for html.

4 Comments

I Read it. The OP at least is only diatribe, assertion, humor and hyperbole. Understanding going in that html is in a different language class may clue you in to the causes for why your query in a particular case may be getting unwieldy. But that doesn't mean every sort of operation you might need to perform on HTML would be effected by the language class of HTML. Admittedly the solution I give above is not complete, as it will perform the transformation on even comments and on quoted content of attributes. But at least for excluding comments a simple addition would suffice.
Excluding quoted sections not a problem either.
I inadvertently just read the quoted part of that codinghorror - I'll read the rest.
OK, this is my diatribe I guess. Natural language is in the highest language class of all - much higher than even regular expressions or html. Does that mean regex should never be used to alter text written by a human? Maybe you should only use a competely accurate natural language parser. In that case be prepared to wait maybe another decade at least until such a thing exists.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.