Replace text not between certain tag with regex

Question

Say I have the following text:

"I want a pink banana for my dog"

And I have a list of word(s) with their definition. For example:

"pink banana": "This is a weird banane" "banana": "This is a fruit"

I would like to replace the matching words in my sentence with something like:

<span tooltip="whatever">word</span>

That I can do, but the issue is that in my example, the first words will be replaced correctly:

"I want a <span tooltip="whatever">pink banana</span> for my dog"

But the second words will create an unwanted behavior:

"I want a <span tooltip="whatever">pink <span tooltip="whatever">banana</span></span> for my dog"

Which produce two tooltips on the word banana, which I don't want. Basically, I'd like to modify the regex used to replace the words ("\b(WORD)\b") with a regex that only replaces the word if it's not inside a "<span tooltip="(.*)"></span>".

Is this possible?

EDIT

Here's the code I use to loop through the items and replace the word:

foreach (var glossaryItem in items)
{
    textNode.InnerHtml = Regex.Replace(textNode.InnerHtml, $@"\b({glossaryItem.Name})\b", $"<span tooltip=\"{glossaryItem.Definition}\">$1</span>", RegexOptions.IgnoreCase);
}

What's your current regular expression and how are you checking them in the list? Are you just looping the list and using the keys (the words) as a part of the regex check? — Chris
– Chris, Commented Oct 24, 2018 at 14:06
You need to build a regex containing the alternatives of keywords. Then you may use that regex to replace with the appropriate replacement if you can map the keys to definitions. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Oct 24, 2018 at 17:31

Chris · Accepted Answer · 2018-10-24 18:19:14Z

2

What you could try is replacing your regex with a negative lookahead subexpression (?!...) (or something similar to suit your needs).

For example:

foreach (var glossaryItem in items)
{
    textNode.InnerHtml = Regex.Replace(textNode.InnerHtml, $@"\b(?<!"">)({glossaryItem.Name})(?!<\/span>)\b", $"<span tooltip=\"{glossaryItem.Definition}\">$1</span>", RegexOptions.IgnoreCase);
}

This will basically allow you to match the string, only if "> is currently not at the beginning of the match and </span> is currently not at the end of the match.

edited Oct 24, 2018 at 18:19

answered Oct 24, 2018 at 14:15

Chris

2,31413 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ssougnez Over a year ago

Would it work if the list of word contained "pink" ? Because, once "pink banana" is replaced, there is no pink</span>, so it will get replaced.

Chris Over a year ago

The concept would be the same, you can just add another negative lookbehind before the glossaryItem.Name capture group to check for previous as well. Essentially, your problem lies in checking for anything not inside spans. And you can base your regex around that.

Chris Over a year ago

@ssougnez I just sat back down at my computer and I updated the regex for your particular case. Assuming that your tags will always end with "> and considering the fact that I can't use non-fixed width negative lookbehind (therefore, I can't check the entire span tag), this should capture anything that's not in between the span tags you've replaced a substring with.

Collectives™ on Stack Overflow

Replace text not between certain tag with regex

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related