Replacing specific HTML tags using Regex

Question

Alright, an easy one for you guys. We are using ActiveReport's RichTextBox to display some random bits of HTML code.

The HTML tags supported by ActiveReport can be found here : http://www.datadynamics.com/Help/ARNET3/ar3conSupportedHtmlTagsInRichText.html

An example of what I want to do is replace any match of <div style="text-align:*</div> by <p style=\"text-align:*</p> in order to use a supported tag for text-alignment.

I have found the following regex expression to find the correct match in my html input:

<div style=\"text-align:(.*?)</div>

However, I can't find a way to keep the previous text contained in the tags after my replacement. Any clue? Is it me or Regex are generally a PITA? :)

    private static readonly IDictionary<string, string> _replaceMap =
        new Dictionary<string, string>
            {
                {"<div style=\"text-align:(.*?)</div>", "<p style=\"text-align:(.*?)</p>"}
            };

    public static string FormatHtml(string html)
    {
        foreach(var pair in _replaceMap)
        {
            html = Regex.Replace(html, pair.Key, pair.Value);
        }

        return html;
    }

Thanks!

RegEx & HTML don't usually play well together stackoverflow.com/questions/1732348/… — Nick Gotch
– Nick Gotch, Commented May 28, 2010 at 18:21
@Nick Gotch Thank you. I'm glad someone else is fighting the good fight. — Hank Gay
– Hank Gay, Commented May 28, 2010 at 18:23

Mark Byers · Accepted Answer · 2010-05-28 18:29:05Z

4

Use $1:

{"<div style=\"text-align:(.*?)</div>", "<p style=\"text-align:$1</p>"}

Note that you could simplify this to:

{"<div (style=\"text-align:(?:.*?))</div>", "<p $1</p>"}

Also it is generally a better idea to use an HTML parser like HtmlAgilityPack than trying to parse HTML using regular expressions. Here's how you could do it:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
foreach (var e in doc.DocumentNode.Descendants("div"))
    e.Name = "p";
doc.Save(Console.Out);

Result:

<p style="text-align:center">foo</p><p style="text-align:center">bar</p>

edited May 28, 2010 at 18:29

answered May 28, 2010 at 18:22

Mark Byers

844k202 gold badges1.6k silver badges1.5k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Rune Grimstad · Accepted Answer · 2010-05-28 18:22:28Z

3

Instead of using regex'es you should use a tool that is more suited to parse and modify html. I would recommend the Html Agility Pack for this - it was written to do just what you need.

answered May 28, 2010 at 18:22

Rune Grimstad

36.5k10 gold badges66 silver badges77 bronze badges

5 Comments

Matthew Perron Over a year ago

Thanks for the suggestion, but I'm only looking for a quick easy way to solve this without any external libraries. I'll make sure to have a look at the Html Agility pack though, could be useful on some other projects!

Mark Byers Over a year ago

matthewpw: I think you're missing his point. HtmlAgilityPack is a quick and easy way to solve your task - regex is not designed for parsing HTML and that's why you're finding it difficult.

Rune Grimstad Over a year ago

Check out this answer to a similar question. It's a StackOverflow classic: stackoverflow.com/questions/1732348/…

Matthew Perron Over a year ago

I've taken a look at that SO classic, good read! Also, I'll check out Agility Pack if I ever need to mess with HTML again. But really, the simple regex replacement is what I was looking for: ActiveReport implements a very lame HTML renderer; the subset of supported tags is minimalistic and the HTML I want to 'sanitize' is really nothing complex. But I got your point though, and Rune's awnser is definatly +1 material!

Rune Grimstad Over a year ago

Haha! Ok. Good luck on that! :-)

Collectives™ on Stack Overflow

Replacing specific HTML tags using Regex

2 Answers 2

Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related