Scenario
I am trying to develop a Javascript text highlight function. It receives in input a text to search inside, an array of tokens to be searched, a class to wrap the found matches:
var fmk = fmk || {};
fmk.highlight = function (target, tokens, cls) {
var token, re;
if (tokens.length > 0) {
token = tokens.pop();
re = new RegExp(token, "gi");
return this.highlight(
target.replace(re, function (matched) {
return "<span class=\"" + cls + "\">" + matched + "</span>";
}), tokens, cls);
}
else { return target; }
};
It is based on a recursive replace that wraps a <span> tag around the found matches.
Issues
if there are two tokens, and the latter is a substring of the former then only the latter token will be highligthed. In the jsFiddle example try these tokens: 'ab b'.
if the tokens contains a substring of the wrapper sequence (i.e.
<span class="[className]"></span>) and another matching token, then the highlight fails and returns a dirty result. In the jsFiddle example try these tokens: 'red ab'.
Note that single character tokens are admitted in the actual application.
Questions
How to avoid these errors? I figured out these approaches:
To pre-process the tokens, removing the tokens that are substrings of others. Disadvantages: it requires O(n^2) searches in the pre-processing phase in case of n tokens; good matches are cut off.
To pre-process the matches BEFORE applying the wrapper, in order to cut off only the substrings matches. Disadvantages: again, further computation required. Anyway, I don't know where to start from implement this inside the replace callback function.