What regex should I use to remove links from HTML code in C#?

Question

I have a HTML string and want to replace all links to just a text.

E.g. having

Some text <a href="http://google.com/">Google</a>.

need to get

Some text Google.

What regex should I use?

Generally speaking (and probably true in this case), you should not use regex to "parse" HTML and work on it ; instead, you should use some tool to manipulate your HTML document via the DOM. — Pascal MARTIN
– Pascal MARTIN, Commented Mar 13, 2010 at 12:18
"How do I parse HTML with a regex" is probably in the top 10 of asked questions on SO. The answer is: You don't — erikkallen
– erikkallen, Commented Mar 13, 2010 at 12:29
It contains the top voted answer that's for sure! - stackoverflow.com/questions/1732348/… — Russ Cam
– Russ Cam, Commented Mar 13, 2010 at 12:31
The task does look simple at first sight but there are plenty of potential issues that can come out and bite you. Handling the correct, simple case is quite easy but experience tells me there will be plenty of incorrect HTML merrily thrown at your code when you're on holiday or on your next project, and you are usually expected to have written code to handle many oddities. Regexes (well most likely not a single one but a lot of different ones, together with some procedureal code) can do this but handling the bum cases is hard and loads of people have worked hard on this already elsewhere. — martinr
– martinr, Commented Mar 13, 2010 at 12:35
Sometimes there is a need for just the basics, where the input format or HTML formatting quality is known. I needed this to strip off some unwanted content before creating a PDF and it worked fine. — Andreas
– Andreas, Commented Nov 25, 2013 at 22:14

Fadrian Sudaman · Accepted Answer · 2010-03-13 12:22:10Z

2

Several similar questions have been posted and the best practice is to use Html Agility Pack which is built specifically to achieve thing like this.

http://www.codeplex.com/htmlagilitypack

answered Mar 13, 2010 at 12:22

Fadrian Sudaman

6,47524 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Fadrian Sudaman Over a year ago

In second note, if you really need a regex solution, you can do this \<a href=.*?\>(?<text>.*?)\</a\> to extract the text and replace using the same regex string pattern, or simply replace \<a href=.*?\> and \</a\> with empty string

bobince Over a year ago

+1 this answer. <a href=.*?>... will fail even for simple, valid HTML. Allowing .*? is naïve even by the low, low standards of regex; for example a simple difference like the close-tag being </a > and you've just matched a big stretch of document across multiple links by mistake. Plus, of course, the hundred other constructs that'll trip this over. Do yourself a favour. Use an HTML parser. It's what they're there for.

Andrew Theken · Accepted Answer · 2010-03-13 13:55:29Z

1

var html = "<a ....>some text</a>";
var ripper = new Regex("<a.*?>(?<anchortext>.*?)</a>", RegexOptions.IgnoreCase);
html = ripper.Match(html).Groups["anchortext"].Value;
//html = "some text"

answered Mar 13, 2010 at 13:55

Andrew Theken

3,4901 gold badge34 silver badges55 bronze badges

Comments

sashaeve · Accepted Answer · 2010-03-13 12:49:32Z

1

I asked about simple regex (thanks Fabrian). The code will be the following:

var html = @"Some text <a href="http://google.com/">Google</a>.";
Regex r = new Regex(@"\<a href=.*?\>");
html = r.Replace(html, "");
r = new Regex(@"\</a\>");
html = r.Replace(html, "");

answered Mar 13, 2010 at 12:49

sashaeve

9,94711 gold badges51 silver badges61 bronze badges

2 Comments

Fadrian Sudaman Over a year ago

Welcome. So I take it that this is what you wanted then? If you please accept the answer so not wasting other time to post more answers

Andrew Theken Over a year ago

this doesn't handle the case where the tag has a different attribute (i.e. title) before href. See my answer below.

Collectives™ on Stack Overflow

What regex should I use to remove links from HTML code in C#?

3 Answers 3

2 Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related