1

Say I have the following text:

"I want a pink banana for my dog"

And I have a list of word(s) with their definition. For example:

"pink banana": "This is a weird banane" "banana": "This is a fruit"

I would like to replace the matching words in my sentence with something like:

<span tooltip="whatever">word</span>

That I can do, but the issue is that in my example, the first words will be replaced correctly:

"I want a <span tooltip="whatever">pink banana</span> for my dog"

But the second words will create an unwanted behavior:

"I want a <span tooltip="whatever">pink <span tooltip="whatever">banana</span></span> for my dog"

Which produce two tooltips on the word banana, which I don't want. Basically, I'd like to modify the regex used to replace the words ("\b(WORD)\b") with a regex that only replaces the word if it's not inside a "<span tooltip="(.*)"></span>".

Is this possible?

EDIT

Here's the code I use to loop through the items and replace the word:

foreach (var glossaryItem in items)
{
    textNode.InnerHtml = Regex.Replace(textNode.InnerHtml, $@"\b({glossaryItem.Name})\b", $"<span tooltip=\"{glossaryItem.Definition}\">$1</span>", RegexOptions.IgnoreCase);
}
3
  • What's your current regular expression and how are you checking them in the list? Are you just looping the list and using the keys (the words) as a part of the regex check? Commented Oct 24, 2018 at 14:06
  • Yes, I edited my OP with this code. Commented Oct 24, 2018 at 14:10
  • You need to build a regex containing the alternatives of keywords. Then you may use that regex to replace with the appropriate replacement if you can map the keys to definitions. Commented Oct 24, 2018 at 17:31

1 Answer 1

2

What you could try is replacing your regex with a negative lookahead subexpression (?!...) (or something similar to suit your needs).

For example:

foreach (var glossaryItem in items)
{
    textNode.InnerHtml = Regex.Replace(textNode.InnerHtml, $@"\b(?<!"">)({glossaryItem.Name})(?!<\/span>)\b", $"<span tooltip=\"{glossaryItem.Definition}\">$1</span>", RegexOptions.IgnoreCase);
}

This will basically allow you to match the string, only if "> is currently not at the beginning of the match and </span> is currently not at the end of the match.

Sign up to request clarification or add additional context in comments.

3 Comments

Would it work if the list of word contained "pink" ? Because, once "pink banana" is replaced, there is no pink</span>, so it will get replaced.
The concept would be the same, you can just add another negative lookbehind before the glossaryItem.Name capture group to check for previous as well. Essentially, your problem lies in checking for anything not inside spans. And you can base your regex around that.
@ssougnez I just sat back down at my computer and I updated the regex for your particular case. Assuming that your tags will always end with "> and considering the fact that I can't use non-fixed width negative lookbehind (therefore, I can't check the entire span tag), this should capture anything that's not in between the span tags you've replaced a substring with.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.