Regex in Javascript to remove links

Question

I have a string in JavaScript and it includes an a tag with an href. I want to remove all links and the text. I know how to just remove the link and leave the inner text but I want to remove the link completely.

For example:

var s = "check this out <a href='http://www.google.com'>Click me</a>. cool, huh?";

I would like to use a regex so I'm left with:

s = "check this out. cool, huh?";

The other question is specific to the DOM (eg, browser, jsdom), whereas this question is general JavaScript. — mikemaccana
– mikemaccana, Commented Aug 4, 2015 at 16:48
@mikemaccana +1. This question is about string manipulation rather than DOM manipulation. Voting to unmark duplicate. — Maximillian Laumeister
– Maximillian Laumeister, Commented Aug 4, 2015 at 18:05
To be precise, wouldn't you be left with "check this out . cool, huh?" if you're stripping out the as? — Jeroen
– Jeroen, Commented Aug 4, 2015 at 18:32

ChristopheD · Accepted Answer · 2009-06-06 17:41:47Z

21

This will strip out everything between <a and /a>:

mystr = "check this out <a href='http://www.google.com'>Click me</a>. cool, huh?";
alert(mystr.replace(/<a\b[^>]*>(.*?)<\/a>/i,""));

It's not really foolproof, but maybe it'll do the trick for your purpose...

answered Jun 6, 2009 at 17:41

ChristopheD

117k30 gold badges167 silver badges182 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Christoph Over a year ago

my suggestion: /<a(\s[^>]*)?>.*?<\/a>/ig

Amit · Accepted Answer · 2015-08-04 17:50:39Z

16

Just to clarify, in order to strip link tags and leave everything between them untouched, it is a two step process - remove the opening tag, then remove the closing tag.

txt.replace(/<a\b[^>]*>/i,"").replace(/<\/a>/i, "");

Working sample:

<script>
 function stripLink(txt) {
    return txt.replace(/<a\b[^>]*>/i,"").replace(/<\/a>/i, "");
 }
</script>

<p id="strip">
 <a href="#">
  <em>Here's the text!</em>
 </a>
</p>

<p>
 <input value="Strip" type="button" onclick="alert(stripLink(document.getElementById('strip').innerHTML))">
</p>

edited Aug 4, 2015 at 17:50

Amit

46.5k9 gold badges84 silver badges114 bronze badges

answered Jul 29, 2011 at 14:08

Paul Worlton

1611 silver badge2 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 11:33:26Z

3

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

edited May 23, 2017 at 11:33

CommunityBot

11 silver badge

answered Jun 6, 2009 at 17:33

Chas. Owens

65.1k25 gold badges139 silver badges232 bronze badges

3 Comments

Gumbo Over a year ago

Duplicate google.com/… ;)

Ionuț G. Stan Over a year ago

This begins to sound like a cliche. Sometimes you don't need to really parse the HTML into a data structure of some kind, you just have to somehow manipulate that string. There are cases when RegExp makes sense. Right tool for the right job. And by the way, John Resig has written an HTML parser in JavaScript and he used some RegExp in there. ejohn.org/blog/pure-javascript-html-parser

Chas. Owens Over a year ago

@Ionut G. Stan You always need to parse HTML into a data structure because that is the only way to reliably work with it. Regexes are part of parsing, but these questions always want to use one regex to find or replace something. That is impossible with traditional regexes (as the one of the links in the answer shows) and very hard to get right with the ones where it is possible (e.g. Perl's implementation that adds recursion). There are many libraries available that already perform the task of working with HTML for you. You should use them, not a regex that is guaranteed to fail.

georgebrock · Accepted Answer · 2009-06-06 17:41:38Z

1

If you only want to remove <a> elements, the following should work well:

s.replace(/<a [^>]+>[^<]*<\/a>/, '');

This should work for the example you gave, but it won't work for nested tags, for example it wouldn't work with this HTML:

<a href="http://www.google.com"><em>Google</em></a>

answered Jun 6, 2009 at 17:41

georgebrock

30.5k13 gold badges81 silver badges72 bronze badges

Comments

Ionuț G. Stan · Accepted Answer · 2009-06-06 17:49:06Z

1

Just commented about John Resig's HTML parser. Maybe it helps on your problem.

answered Jun 6, 2009 at 17:49

Ionuț G. Stan

180k19 gold badges196 silver badges206 bronze badges

Comments

mazy · Accepted Answer · 2020-04-28 22:33:55Z

1

Examples above do not remove all occurrences. Here is my solution:

str.replace(/<a\b[^>]*>/gm, '').replace(/<\/a>/gm, '')

answered Apr 28, 2020 at 22:33

mazy

6961 gold badge12 silver badges20 bronze badges

Collectives™ on Stack Overflow

Regex in Javascript to remove links

6 Answers 6

1 Comment

Comments

3 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

1 Comment

Comments

3 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related