Use RegEx to Find and Replace Specific HTML Tags

Question

I have a string that contains dynamic HTML content.

I want to be able to find and replace all occurrances of specific HTML tags and replace them, but not the content within them.

The specific HTML tags would be for a table - i.e. TABLE, TR, and TD. The tags may contain attributes, or they may not. How would one go about doing this in C#?

Thanks in advance for any help!

This is a task for an HTML parser, not a regular expression. — Anon.
– Anon., Commented Jan 28, 2010 at 21:09
Using regex's on HTML and XML has been asked before. There's a very good response here on StackOverflow involving Cthulhu. ;) stackoverflow.com/questions/1732348/… — FrustratedWithFormsDesigner
– FrustratedWithFormsDesigner, Commented Jan 28, 2010 at 21:09
Eh, I tried it. And I've failed. Wasted many hours of my life. — George Johnston
– George Johnston, Commented Jan 28, 2010 at 21:20

Nick Higgs · Accepted Answer · 2010-01-29 01:28:43Z

4

This function might be sufficient:

public static string ReplaceTag(string input, string soughtTag, string replacementTag)
{
    return Regex.Replace(input, "(</?)" + soughtTag + @"((?:\s+.*?)?>)", "$1" + replacementTag + "$2");
}

edited Jan 29, 2010 at 1:28

answered Jan 29, 2010 at 1:21

Nick Higgs

1,7021 gold badge18 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Nicholas Over a year ago

I was trying to do something similar, but my own regex when searching for an italics tag (<i>) was also matching image tags (<img>). This solution worked perfectly to correct my error, though I modified it to return the entire tag as a single capture group: (</?tagName(?:\s+.*?)?>) [regex101.com/r/nM5cJ8/3]

Community · Accepted Answer · 2017-05-23 11:53:28Z

4

Don't use Regexs. Use the Html Agility Pack.

See this question for why not.

edited May 23, 2017 at 11:53

CommunityBot

11 silver badge

answered Jan 28, 2010 at 21:10

John Gietzen

49.8k32 gold badges151 silver badges191 bronze badges

Comments

Mark · Accepted Answer · 2010-01-29 01:08:09Z

1

  e = "(< *?/*)div( +?|>)";
  repl = "\\1boo\\2";

Frankly I am befuddled by this mantra being imposed on everyone to never use regex for html.

answered Jan 29, 2010 at 1:08

Mark

797 bronze badges

4 Comments

Mark Over a year ago

I Read it. The OP at least is only diatribe, assertion, humor and hyperbole. Understanding going in that html is in a different language class may clue you in to the causes for why your query in a particular case may be getting unwieldy. But that doesn't mean every sort of operation you might need to perform on HTML would be effected by the language class of HTML. Admittedly the solution I give above is not complete, as it will perform the transformation on even comments and on quoted content of attributes. But at least for excluding comments a simple addition would suffice.

Mark Over a year ago

Excluding quoted sections not a problem either.

Mark Over a year ago

I inadvertently just read the quoted part of that codinghorror - I'll read the rest.

Mark Over a year ago

OK, this is my diatribe I guess. Natural language is in the highest language class of all - much higher than even regular expressions or html. Does that mean regex should never be used to alter text written by a human? Maybe you should only use a competely accurate natural language parser. In that case be prepared to wait maybe another decade at least until such a thing exists.)

Collectives™ on Stack Overflow

Use RegEx to Find and Replace Specific HTML Tags

3 Answers 3

1 Comment

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related