I wrote a method that highlights keywords in an HTML string. It returns the updated string and a list of the matched keywords. I would like to match the word if it appears as a whole word or with dashes. But in case it appears with dashes, the word including the dashes is highlighted and returned.
For example, if the word is locks and the HTML contains He -locks- the door then the dashes around the word are also highlighted:
He <span style=\"background-color:yellow\">-locks-</span> the door.
Instead of:
He -<span style=\"background-color:yellow\">locks</span>- the door.
In addition, the returned list contains -locks- instead of locks.
What can I do to get my expected result?
Here is my code:
private static List<string> FindKeywords(IEnumerable<string> words, bool bHighlight, ref string text)
{
HashSet<String> matchingKeywords = new HashSet<string>(new CaseInsensitiveComparer());
string allWords = "\\b(-)?(" + words.Aggregate((list, word) => list + "|" + word) + ")(-)?\\b";
Regex regex = new Regex(allWords, RegexOptions.Compiled | RegexOptions.IgnoreCase);
foreach (Match match in regex.Matches(text))
{
matchingKeywords.Add(match.Value);
}
if (bHighlight)
{
text = regex.Replace(text, string.Format("<span style=\"background-color:yellow\">{0}</span>", "$0"));
}
return matchingKeywords.ToList();
}