16

I have a string in JavaScript and it includes an a tag with an href. I want to remove all links and the text. I know how to just remove the link and leave the inner text but I want to remove the link completely.

For example:

var s = "check this out <a href='http://www.google.com'>Click me</a>. cool, huh?";

I would like to use a regex so I'm left with:

s = "check this out. cool, huh?";
3
  • The other question is specific to the DOM (eg, browser, jsdom), whereas this question is general JavaScript. Commented Aug 4, 2015 at 16:48
  • @mikemaccana +1. This question is about string manipulation rather than DOM manipulation. Voting to unmark duplicate. Commented Aug 4, 2015 at 18:05
  • To be precise, wouldn't you be left with "check this out . cool, huh?" if you're stripping out the as? Commented Aug 4, 2015 at 18:32

6 Answers 6

21

This will strip out everything between <a and /a>:

mystr = "check this out <a href='http://www.google.com'>Click me</a>. cool, huh?";
alert(mystr.replace(/<a\b[^>]*>(.*?)<\/a>/i,""));

It's not really foolproof, but maybe it'll do the trick for your purpose...

Sign up to request clarification or add additional context in comments.

1 Comment

my suggestion: /<a(\s[^>]*)?>.*?<\/a>/ig
16

Just to clarify, in order to strip link tags and leave everything between them untouched, it is a two step process - remove the opening tag, then remove the closing tag.

txt.replace(/<a\b[^>]*>/i,"").replace(/<\/a>/i, "");

Working sample:

<script>
 function stripLink(txt) {
    return txt.replace(/<a\b[^>]*>/i,"").replace(/<\/a>/i, "");
 }
</script>

<p id="strip">
 <a href="#">
  <em>Here's the text!</em>
 </a>
</p>

<p>
 <input value="Strip" type="button" onclick="alert(stripLink(document.getElementById('strip').innerHTML))">
</p>

Comments

3

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

3 Comments

Duplicate google.com/… ;)
This begins to sound like a cliche. Sometimes you don't need to really parse the HTML into a data structure of some kind, you just have to somehow manipulate that string. There are cases when RegExp makes sense. Right tool for the right job. And by the way, John Resig has written an HTML parser in JavaScript and he used some RegExp in there. ejohn.org/blog/pure-javascript-html-parser
@Ionut G. Stan You always need to parse HTML into a data structure because that is the only way to reliably work with it. Regexes are part of parsing, but these questions always want to use one regex to find or replace something. That is impossible with traditional regexes (as the one of the links in the answer shows) and very hard to get right with the ones where it is possible (e.g. Perl's implementation that adds recursion). There are many libraries available that already perform the task of working with HTML for you. You should use them, not a regex that is guaranteed to fail.
1

If you only want to remove <a> elements, the following should work well:

s.replace(/<a [^>]+>[^<]*<\/a>/, '');

This should work for the example you gave, but it won't work for nested tags, for example it wouldn't work with this HTML:

<a href="http://www.google.com"><em>Google</em></a>

Comments

1

Just commented about John Resig's HTML parser. Maybe it helps on your problem.

Comments

1

Examples above do not remove all occurrences. Here is my solution:

str.replace(/<a\b[^>]*>/gm, '').replace(/<\/a>/gm, '')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.