0

I've converted html to a string, I'm able to use replace in that string to wrap the text with a link, and I can put that html back into the ID it came from.

My problem is that my replace method is going inside existing links on the page. This could create nested links, which is a problem. Does anyone out there know how to prevent the replace method from matching text that is within a link already?

I have right now:

keyword = "matching phrase";
keywordLink = "<a href='http://myurl.com'/>" + keyword + "</a>";
sasser = sasser.replace(keyword, keywordLink);
sasDom.innerHTML = sasser;

I'm looking for, in pseudo code:

... (keyword [if the next " < " sign is not followed by "/a>", regardless of how far away it is], keywordLink);
0

3 Answers 3

2

You can't do this kind of thing with regex at all. Work on the document objects which are already nicely parsed into a structure for you.

Here's a keyword linker adapted from this question.

// Find text in descendents of an element, in reverse document order
// pattern must be a regexp with global flag
//
function findTextExceptInLinks(element, pattern, callback) {
    for (var childi= element.childNodes.length; childi-->0;) {
        var child= element.childNodes[childi];
        if (child.nodeType===1) {
            if (child.tagName.toLowerCase()!=='a')
                findTextExceptInLinks(child, pattern, callback);
        } else if (child.nodeType===3) {
            var matches= [];
            var match;
            while (match= pattern.exec(child.data))
                matches.push(match);
            for (var i= matches.length; i-->0;)
                callback.call(window, child, matches[i]);
        }
    }
}

findTextExceptInLinks(document.body, /\bmatching phrase\b/g, function(node, match) {
    node.splitText(match.index+match[0].length);
    var a= document.createElement('a');
    a.href= 'http://www.example.com/myurl';
    a.appendChild(node.splitText(match.index));
    node.parentNode.insertBefore(a, node.nextSibling);
});

eta re comments: Here's a version of the same thing using plain text matching rather than regex:

function findPlainTextExceptInLinks(element, substring, callback) {
    for (var childi= element.childNodes.length; childi-->0;) {
        var child= element.childNodes[childi];
        if (child.nodeType===1) {
            if (child.tagName.toLowerCase()!=='a')
                findPlainTextExceptInLinks(child, substring, callback);
        } else if (child.nodeType===3) {
            var index= child.data.length;
            while (true) {
                index= child.data.lastIndexOf(substring, index);
                if (index===-1)
                    break;
                callback.call(window, child, index)
            }
        }
    }
}

var substring= 'matching phrase';
findPlainTextExceptInLinks(document.body, substring, function(node, index) {
    node.splitText(index+substring.length);
    var a= document.createElement('a');
    a.href= 'http://www.example.com/myurl';
    a.appendChild(node.splitText(index));
    node.parentNode.insertBefore(a, node.nextSibling);
});
Sign up to request clarification or add additional context in comments.

7 Comments

Bobince - thank you so much! This solution is fantastic, and I greatly appreciate your taking the time to post it. I'm having one relatively simple problem now... I can't seem to use a variable as the matching phrase. Here's what I'm trying: findTextExceptInLinks(document.body, "/\b" + variable + "\b/g", function(node, match) {
findTextExceptInLinks(document.body, "/\b" + variable + "\b/g", function(node, match) {
@google: try passing new RegExp('\\b' + variable + '\\b', 'g')
Yep, the RegExp constructor as posted by Crescent will work, but watch out: if your variable contains characters that are special to regex like . or * (most punctuation really), this won't match the literal versions of the string. If you want to match literal strings rather than words-at-boundaries, it'd be better to dump the regular expression matching and replace with string.indexOf.
To be clear, you're suggesting I change: ... while (match= pattern.exec(child.data)) ... to ... while (match= string.indexOf(child.data))
|
1

If you don't mind using JQuery, you can employ its wrap() function to wrap text or html elements in the specified tags.

2 Comments

Does that work for wrapping only part of the text inside some tags, too?
Thinking about it, it will probably not keep you from having nested link tags, or will it?
0

I would do it in three steps:

1) replace <a [^>]+>matching phrase</a> with $1some_other_phrase</a>

2) replace matching phrase with <a...>keyword</a>

3) replace some_other_phrase with matching phrase

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.