3

Looking for a bit of help, my regex is a bit rusty...

I'm trying to replace all characters not within HTML tags in javascript by a character.

For example replace those characters by a dash "-",

<div class="test">Lorem Ipsum <br/> Dolor Sit Amet</div>

Would be replaced by:

<div class="test">------------<br/>--------------</div>

So I'm looking for

str.replace(/YourMagicalRegEx/g, '-');

Please help, I get how to return text not within html tags with regex, text within html tags with regex, but all characters not within html tags seems quite tricky...!

Additional Challenge: Must be IE7 and up compatible.

3
  • Pretty sure this is not possible (at least not that I am aware of) since it would require a lookbehind, which is not supported in JavaScript to my knowledge. Workaround is to introduce jQuery 1.*, loop through all of the DOM elements you want, and then just apply simple [A-Z] and \s replacements. Commented Jun 1, 2014 at 9:15
  • I didn't read thoroughly the question :/ my answer was off. Commented Jun 1, 2014 at 9:46
  • There are plenty of good reasons not to try to solve this problem with regular expressions. Commented Jun 1, 2014 at 11:25

2 Answers 2

2

Using jQuery:

html = '<div class="test">Lorem Ipsum <br/> Dolor Sit Amet</div>';
node = $("<div>" + html + "</div>");
node.find('*').contents().each(function() {
    if(this.nodeType == 3)
        this.nodeValue = Array(this.nodeValue.length).join('-')

});
console.log(node.html())

(I don't have IE7 at hand, let me know if this works).

If you prefer regular expressions, it goes like this:

html = html.replace(/<[^<>]+>|./g, function($0) {
    return $0[0] == '<' ? $0 : '-';
});

Basically, we replace tags with themselves and out-of-tags characters with dashes.

Sign up to request clarification or add additional context in comments.

Comments

1

Instead of using a regex-only approach, you can find all text nodes within the document and replace their content with hyphens.

Using the TreeWalker API:

 var tree = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT);

 while (tree.nextNode()) {
     var textNode = tree.currentNode;
     textNode.nodeValue = textNode.nodeValue.replace(/./g, '-');
 }

A recursive solution:

function findTextNodes(node, fn){
  for (node = node.firstChild; node;node=node.nextSibling){
    if (node.nodeType === Node.TEXT_NODE) fn(node);
    else if(node.nodeType === Node.ELEMENT_NODE && node.nodeName !== 'SCRIPT') findTextNodes(node, fn);
  }
}


findTextNodes(document.body, function (node) {
  node.nodeValue = node.nodeValue.replace(/./g, '-');
});

The predicate node.nodeName !== 'SCRIPT' is required to prevent the function from replacing any script content within the body.

2 Comments

Hm, I appreciate the alternative, but TreeWalkerAPI seems to be for IE9 and up, I should've mentioned this must be IE7 and up compatible... (I know, urh, IE)
Recursive solution still not IE7 compliant :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.