5

I have a string like this:

string s = "<p>Hello world, hello world</p>";
string[] terms = new string[] {"hello", "world"};

I want to do a replacement on this string such that each word (case-insensitive) will be matched, and replaced with a numbered index span tag like so:

<p>
    <span id="m_1">Hello</span> 
    <span id="m_2">world</span>, 
    <span id="m_3">hello</span> 
    <span id="m_4">world</span>!
</p>

I tried doing it like this.

int match = 1;
Regex.Replace(s,
    String.Join("|", String.Join("|", terms.OrderByDescending(s => s.Length)
        .Select(Regex.Escape))),
    String.Format("<span id=\"m_{0}\">$&</span>", match++),
    RegexOptions.IgnoreCase);

The output is something like this:

<p>
    <span id="m_1">Hello</span> 
    <span id="m_1">world</span>, 
    <span id="m_1">hello</span> 
    <span id="m_1">world</span>!
</p>

Where all the ids are the same (m_1) because the regex doesn't evaluate match++ for each match, but one for the whole Regex. How do I get around this?

3
  • May be easier to parse the html and iterate the span nodes, take a look: stackoverflow.com/questions/6063203/parsing-html-with-c-net Commented Apr 17, 2017 at 16:53
  • Does it have to be Regex? Looks like a loop with compare would be a simpler and more readable approach. Commented Apr 17, 2017 at 16:54
  • @ferflores I am parsing it, but the input has no span nodes. That is the desired output and the actual output. The input is that string up there. Commented Apr 17, 2017 at 16:57

1 Answer 1

5

All you need to do is to convert the replacement argument from a string pattern to a match evaluator (m => String.Format("<span id=\"m_{0}\">{1}</span>", match++, m.Value)):

string s1 = "<p>Hello world, hello world</p>";
string[] terms = new string[] {"hello", "world"};
var match = 1;
s1 = Regex.Replace(s1,
        String.Join("|", String.Join("|", terms.OrderByDescending(s => s.Length)
            .Select(Regex.Escape))),
    m => String.Format("<span id=\"m_{0}\">{1}</span>", match++, m.Value),
    RegexOptions.IgnoreCase);
Console.Write(s1);
// => <p><span id="m_1">Hello</span> <span id="m_2">world</span>, <span id="m_3">hello</span> <span id="m_4">world</span></p>

See the C# demo

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.