2

Alright, an easy one for you guys. We are using ActiveReport's RichTextBox to display some random bits of HTML code.

The HTML tags supported by ActiveReport can be found here : http://www.datadynamics.com/Help/ARNET3/ar3conSupportedHtmlTagsInRichText.html

An example of what I want to do is replace any match of <div style="text-align:*</div> by <p style=\"text-align:*</p> in order to use a supported tag for text-alignment.

I have found the following regex expression to find the correct match in my html input:

<div style=\"text-align:(.*?)</div>

However, I can't find a way to keep the previous text contained in the tags after my replacement. Any clue? Is it me or Regex are generally a PITA? :)

    private static readonly IDictionary<string, string> _replaceMap =
        new Dictionary<string, string>
            {
                {"<div style=\"text-align:(.*?)</div>", "<p style=\"text-align:(.*?)</p>"}
            };

    public static string FormatHtml(string html)
    {
        foreach(var pair in _replaceMap)
        {
            html = Regex.Replace(html, pair.Key, pair.Value);
        }

        return html;
    }

Thanks!

2
  • 5
    RegEx & HTML don't usually play well together stackoverflow.com/questions/1732348/… Commented May 28, 2010 at 18:21
  • 1
    @Nick Gotch Thank you. I'm glad someone else is fighting the good fight. Commented May 28, 2010 at 18:23

2 Answers 2

4

Use $1:

{"<div style=\"text-align:(.*?)</div>", "<p style=\"text-align:$1</p>"}

Note that you could simplify this to:

{"<div (style=\"text-align:(?:.*?))</div>", "<p $1</p>"}

Also it is generally a better idea to use an HTML parser like HtmlAgilityPack than trying to parse HTML using regular expressions. Here's how you could do it:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
foreach (var e in doc.DocumentNode.Descendants("div"))
    e.Name = "p";
doc.Save(Console.Out);

Result:

<p style="text-align:center">foo</p><p style="text-align:center">bar</p>
Sign up to request clarification or add additional context in comments.

Comments

3

Instead of using regex'es you should use a tool that is more suited to parse and modify html. I would recommend the Html Agility Pack for this - it was written to do just what you need.

5 Comments

Thanks for the suggestion, but I'm only looking for a quick easy way to solve this without any external libraries. I'll make sure to have a look at the Html Agility pack though, could be useful on some other projects!
matthewpw: I think you're missing his point. HtmlAgilityPack is a quick and easy way to solve your task - regex is not designed for parsing HTML and that's why you're finding it difficult.
Check out this answer to a similar question. It's a StackOverflow classic: stackoverflow.com/questions/1732348/…
I've taken a look at that SO classic, good read! Also, I'll check out Agility Pack if I ever need to mess with HTML again. But really, the simple regex replacement is what I was looking for: ActiveReport implements a very lame HTML renderer; the subset of supported tags is minimalistic and the HTML I want to 'sanitize' is really nothing complex. But I got your point though, and Rune's awnser is definatly +1 material!
Haha! Ok. Good luck on that! :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.