0

I am writing a jquery plugin that will do a browser-style find-on-page search. I need to improve the search, but don't want to get into parsing the html quite yet.

At the moment my approach is to take an entire DOM element and all nested elements and simply run a regex find/replace for a given term. In the replace I will simply wrap a span around the matched term and use that span as my anchor to do highlighting, scrolling, etc. It is vital that no characters inside any html tags are matched.

This is as close as I have gotten:

(?<=^|>)([^><].*?)(?=<|$)

It does a very good job of capturing all characters that are not in an html tag, but I'm having trouble figuring out how to insert my search term.

Input: Any html element (this could be quite large, eg <body>)    
Search Term: 1 or more characters    
Replace Txt: <span class='highlight'>$1</span>

UPDATE

The following regex does what I want when I'm testing with http://gskinner.com/RegExr/...

Regex: (?<=^|>)(.*?)(SEARCH_STRING)(?=.*?<|$)
Replacement: $1<span class='highlight'>$2</span>

However I am having some trouble using it in my javascript. With the following code chrome is giving me the error "Invalid regular expression: /(?<=^|>)(.?)(Mary)(?=.?<|$)/: Invalid group".

var origText = $('#'+opt.targetElements).data('origText');
var regx = new RegExp("(?<=^|>)(.*?)(" + $this.val() + ")(?=.*?<|$)", 'gi');
$('#'+opt.targetElements).each(function() {
   var text = origText.replace(regx, '$1<span class="' + opt.resultClass + '">$2</span>');
   $(this).html(text);
});

It's breaking on the group (?<=^|>) - is this something clumsy or a difference in the Regex engines?

UPDATE

The reason this regex is breaking on that group is because Javascript does not support regex lookbehinds. For reference & possible solutions: http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript.

5
  • 2
    sigh Please refrain from parsing HTML with RegEx as it will drive you insane. Use an HTML parser instead. Commented May 2, 2012 at 15:19
  • I've got a plan to move to html parsing, but I need a quick proof-of-concept before I'll get the green light on that. Commented May 2, 2012 at 15:37
  • You should have that as your proof of concept, not RegExp. This is a solved problem, please don't overkill yourself with RegExp. Commented May 2, 2012 at 15:38
  • @Truth: Thank you for your concern. Please desist. I agree with your statements and embrace your apparent agenda. My question is, how can I insert a search term into this regex string? Commented May 2, 2012 at 16:08
  • 1
    Have a look at mark.js as it might be the thing you're searching for. Commented May 21, 2016 at 14:20

1 Answer 1

0

Just use jQuerys built-in text() method. It will return all the characters in a selected DOM element.

For the DOM approach (docs for the Node interface): Run over all child nodes of an element. If the child is an element node, run recursively. If it's a text node, search in the text (node.data) and if you want to highlight/change something, shorten the text of the node until the found position, and insert a highligth-span with the matched text and another text node for the rest of the text.

Example code (adjusted, origin is here):

(function iterate_node(node) {
    if (node.nodeType === 3) { // Node.TEXT_NODE
        var text = node.data,
            pos = text.search(/any regular expression/g), //indexOf also applicable
            length = 5; // or whatever you found
        if (pos > -1) {
            node.data = text.substr(0, pos); // split into a part before...
            var rest = document.createTextNode(text.substr(pos+length)); // a part after
            var highlight = document.createElement("span"); // and a part between
            highlight.className = "highlight";
            highlight.appendChild(document.createTextNode(text.substr(pos, length)));
            node.parentNode.insertBefore(rest, node.nextSibling); // insert after
            node.parentNode.insertBefore(highlight, node.nextSibling);
            iterate_node(rest); // maybe there are more matches
        }
    } else if (node.nodeType === 1) { // Node.ELEMENT_NODE
        for (var i = 0; i < node.childNodes.length; i++) {
            iterate_node(node.childNodes[i]); // run recursive on DOM
        }
    }
})(content); // any dom node

There's also highlight.js, which might be exactly what you want.

Sign up to request clarification or add additional context in comments.

5 Comments

I see how .text() can be used to obtain and replace an element's text, but I don't see how it is possible to use this to search/replace a subset of that element's text. Example: I only want to highlight the word 'and' in a long <p> element. Ideas?
Then you might need to use native DOM methods and alter text nodes.
Cool. I'm good for now, but when I get the OK on this project I think I'll try this approach first. Using the jquery :contains method (api.jquery.com/contains-selector/) I should be able to find my search terms in the DOM. Once I have the elements, it should be fairly simple to manipulate the .text() as necessary. Thanks Bergi.
whoop - spoke too soon. $(#target *:contains('text')) does a good job of finding elements, but it returns the containing element. That element contains a mix of content, my search term, and other html. Using .text() strips out tags (unacceptable) and .html() leaves me with the original problem of searching mixed content & markup for the search term. :contains() narrows the playing field, but the search/replace problem remains. @Bergi, did you have a particular native DOM approach in mind?
Yes, I already have coded various text-node-iterators :) Too long for a comment, extended my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.