2

I wrote a method that highlights keywords in an HTML string. It returns the updated string and a list of the matched keywords. I would like to match the word if it appears as a whole word or with dashes. But in case it appears with dashes, the word including the dashes is highlighted and returned.

For example, if the word is locks and the HTML contains He -locks- the door then the dashes around the word are also highlighted:

He <span style=\"background-color:yellow\">-locks-</span> the door.

Instead of:

He -<span style=\"background-color:yellow\">locks</span>- the door.

In addition, the returned list contains -locks- instead of locks.

What can I do to get my expected result?

Here is my code:

private static List<string> FindKeywords(IEnumerable<string> words, bool bHighlight, ref string text)
{
    HashSet<String> matchingKeywords = new HashSet<string>(new CaseInsensitiveComparer());

    string allWords = "\\b(-)?(" + words.Aggregate((list, word) => list + "|" + word) + ")(-)?\\b";
    Regex regex = new Regex(allWords, RegexOptions.Compiled | RegexOptions.IgnoreCase);

    foreach (Match match in regex.Matches(text))
    {
        matchingKeywords.Add(match.Value);
    }

    if (bHighlight)
    {
        text = regex.Replace(text, string.Format("<span style=\"background-color:yellow\">{0}</span>", "$0"));
    }

    return matchingKeywords.ToList();
}

1 Answer 1

2

You need to use captured .Groups[2].Value instead of Match.Value because your regex has 3 capturing groups, and the second one contains the keyword that you highlight:

foreach (Match match in regex.Matches(text))
{
    matchingKeywords.Add(match.Groups[2].Value);
}

if (bHighlight)
{
    text = regex.Replace(text, string.Format("$1<span style=\"background-color:yellow\">{0}</span>$3", "$2"));
}

match.Groups[2].Value is used in the foreach and then $2 is the backreference to the keyword captured in the regex.Replace replacement string. $1 and $3 are the optional hyphens around the highlighted word (captured with (-)?).

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, it works! I was surprised to see that $2 captures the word also when it appears without hyphens.
Capturing groups are working like that, if you set it in a regex, it returns a group, whether empty or not.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.