replace string regex

Question

I'm trying to give users the ability to "mark" certain sections of content in a CMS with some additional 'tags' if you will, that will then get translated, for example, bold, when the content is rendered on the page.

Something like {strong:Lorum ipsum dolar} where the text will then be wrapped with <strong>Lorum ipsum dolar</strong>.

I've tried to figure the regex out for this, but I'm no good. I grabbed some html replacement scripts from sites, by they are not very helpful, at least, I don't know what to change :$.

Any help would be appreciated.

note

I'm doing this in C#.

This seems like it's going to present a whole world of problems with invalid tags. — Evan Davis
– Evan Davis, Commented May 10, 2012 at 13:31
$str = q({strong:Lorum ipsum dolar}); $str =~ m/\{(\w+):(.+?)\}/; $str = "<$1>$2</$1>"; awful solution, but works (perl) — gaussblurinc
– gaussblurinc, Commented May 10, 2012 at 13:42
@loldop: Unless and until they start nesting those things, as in {strong:Lor{i:e}m ipsum dol{i:o}r}. With Perl's extensions to regex, that would even be possible to do – in “pure” regex, you can only do it up to some predetermined nesting depth. — Christopher Creutzig
– Christopher Creutzig, Commented May 10, 2012 at 13:48
Rather than reinvent the wheel, why not just let them write HTML and forbid certain tags or use BBCode? There are lots of parsing options already available and some WYSIWYG editors for both HTML and BBCode. — JamieSee
– JamieSee, Commented May 10, 2012 at 15:18

Samy Arous · Accepted Answer · 2012-05-11 22:26:34Z

1

This looks a lot like jSon to XML conversion.

{"strong":"Lorum ipsum dolar"}

would become

<strong>Lorum ipsum dolar</strong>

and

{"strong":{italic:"Lorum ipsum dolar"}}

would become

<strong>
<italic>Lorum ipsum dolar</italic>
</strong>

I'm not saying this is the answer, but you might wanna look over that. The basic idea, would be to parse your tags into a hierarchical struct then parse it back to HTML or whatever output language you use.

answered May 11, 2012 at 22:26

Samy Arous

6,81215 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

FlyingStreudel · Accepted Answer · 2012-05-12 01:08:01Z

So this will get you the tags and parts you are looking for, however, the way I turn those results into the final string is pretty ugly. Its really just the regex at the top that matters. Enjoy!

string test = "{strong:lorem ip{i:su{b:m}m}m dolar} {strong:so strong}";
Regex tagParse = new Regex(
    @"\{(?<outerTag>\w*)
        (?>
            (?<DEPTH>\{(?<innerTags>\w*))
            |
            (?<-DEPTH>\})
            |
            :?(?<innerContent>[^\{\}]*)
        )*
        (?(DEPTH)(?!))

        ", RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline);

MatchCollection matches = tagParse.Matches(test);
foreach (Match m in matches)
{
    StringBuilder sb = new StringBuilder();
    List<string> tags = new List<string>();
    tags.Add(m.Groups["outerTag"].Value);
    foreach (Capture c in m.Groups["innerTags"].Captures)
        tags.Add(c.Value);
    List<string> content = new List<string>();
    foreach (Capture c in m.Groups["innerContent"].Captures)
        content.Add(c.Value);
    if (tags.Count > 1)
    {
        for (int i = 0; i < content.Count; i++)
        {
            if (i >= tags.Count)
                sb.Append("</" + tags[tags.Count - (i - tags.Count + 1)] + ">");
            else
                sb.Append("<" + tags[i] + ">");
            sb.Append(content[i]);
        }
        sb.Append("</" + tags[1] + ">");
    }
    else
    {
        sb.Append("<" + tags[0] + ">");
        sb.Append(content[0]);
    }
    sb.Append(m.Groups["outerContent"].Value);
    sb.Append("</" + m.Groups["outerTag"].Value + ">");
    Console.WriteLine(sb.ToString());
}

detale · Accepted Answer · 2012-05-11 23:56:01Z

0

Edit: To work with nested tags, multiple matches per input string. Restrictions: text inside a tag pair cannot contain "{" or "}".

private string FormatInput(string input)
{
    const string patternNonGreedy = @"\{(?<tag>.+?):(\s*)(?<content>.*?)(\s*)}";
    const string patternGreedy = @"\{(?<tag>.+?):(\s*)(?<content>.*)(\s*)}";

    Match mtc = Regex.Match(input, patternGreedy);
    if (!mtc.Success)
        return input;

    string content = mtc.Groups["content"].Value;
    int braces = 0;
    foreach (char c in content)
    {
        if (c == '{')
            braces++;
        else if (c == '}')
        {
            if (braces > 0)
                braces--;
        }
    }

    if (braces == 0)
        return input.Substring(0, mtc.Index)
            + string.Format("<{0}>{1}</{0}>", mtc.Groups["tag"].Value, FormatInput(content))
            + input.Substring(mtc.Index + mtc.Length);

    mtc = Regex.Match(input, patternNonGreedy);
    Debug.Assert(mtc.Success);

    content = mtc.Groups["content"].Value;
    return input.Substring(0, mtc.Index)
        + string.Format("<{0}>{1}</{0}>", mtc.Groups["tag"].Value, content)
        + FormatInput(input.Substring(mtc.Index + mtc.Length));
}

Test examples:

string output1 = FormatInput("{strong:Lorum ipsum dolar}");
// output1: <strong>Lorum ipsum dolar</strong>

string output2 = FormatInput("{strong:{italic:Lorum ipsum dolar}}");
// output2: <strong><italic>Lorum ipsum dolar</italic></strong>

string output3 = FormatInput("{strong:Lor{i:e}m ipsum dol{i:o}r}");
// output3: <strong>Lor<i>e</i>m ipsum dol<i>o</i>r</strong>

edited May 11, 2012 at 23:56

answered May 11, 2012 at 22:15

detale

13k6 gold badges44 silver badges43 bronze badges

2 Comments

FlyingStreudel Over a year ago

This has the same shortcoming that Christopher Creutzig pointed out early. It does not handle nested tags.

detale Over a year ago

@FlyingStreudel Thanks for pointing out that. I've updated it and use a recursive method and non-greedy regex match to solve that.

Collectives™ on Stack Overflow

replace string regex

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related