Regular Expression: Getting url value from hyperlink

Question

I have a string that contains html. I want to get all href value from hyperlinks using C#.
Target String
<a href="~/abc/cde" rel="new">Link1</a> <a href="~/abc/ghq">Link2</a>
I want to get values "~/abc/cde" and "~/abc/ghq"

@balpha: What? That absolutely does not apply here. You can use regex to get the href of an open tag and not even bother with closing tags. — Platinum Azure
– Platinum Azure, Commented Apr 12, 2010 at 17:01
@balpha: Well, I'm glad you have a sense of humor, but given how it has also appeared in EVERY answer below, you can understand why I might think people just have this knee-jerk "omg never use regex to parse HTML" response, emoticon or no. — Platinum Azure
– Platinum Azure, Commented Apr 13, 2010 at 18:36
@Platinum Azure: No harm -- I just love to mention that answer, because if you've read it once, it will stick in your head and haunt you whenever you start markup parsing with regexes. That doesn't mean it's always wrong, but having that answer in your head makes you at least think about it. I sometimes analyze HTML without a real parser, too, but I usually put a comment # the center cannot hold before it :) — balpha
– balpha ♦, Commented Apr 14, 2010 at 15:25

womp · Accepted Answer · 2010-04-12 16:56:06Z

4

Use the HTML Agility Pack for parsing HTML. Right on their examples page they have an example of parsing some HTML for the href values:

 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];

    // Do stuff with attribute value
 }

answered Apr 12, 2010 at 16:56

womp

117k27 gold badges240 silver badges271 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Lucero · Accepted Answer · 2010-04-12 17:06:44Z

2

Using a regex to parse HTML is not advisable (think of text in comments etc.).

That said, the following regex should do the trick, and also gives you the link HTML in the tag if desired:

Regex regex = new Regex(@"\<a\s[^\<\>]*?href=(?<quote>['""])(?<href>((?!\k<quote>).)*)\k<quote>[^\>]*\>(?<linkHtml>((?!\</a\s*\>).)*)\</a\s*\>", RegexOptions.IgnoreCase|RegexOptions.ExplicitCapture);
for (Match match = regex.Match(inputHtml); match.Success; match=match.NextMatch()) {
  Console.WriteLine(match.Groups["href"]);
}

edited Apr 12, 2010 at 17:06

answered Apr 12, 2010 at 17:00

Lucero

60.4k9 gold badges127 silver badges154 bronze badges

3 Comments

coure2011 Over a year ago

Thats exactly what i was looking for, how the groups thing is working?

coure2011 Over a year ago

I am trying same thing for img src but its not working, any idea? Regex srcs = new Regex(@"\<img\s[^\<\>]*?src=(?<quote>['""])(?<src>((?!\k<quote>).)*)\k<quote>[^\>]*\>(?<linkHtml>((?!\</img\s*\>).)*)\</img\s*\>", RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);

Lucero Over a year ago

The img tag is an empty tag, so you have no contents. Try this: \<img\s[^\<\>]*?src=(?<quote>['""])(?<src>((?!\k<quote>).)*)\k<quote>[^\>]*\>

casperOne · Accepted Answer · 2012-01-16 19:44:37Z

Here is a snippet of the regex (use IgnoreWhitespace option):

(?:<)(?<Tag>[^\s/>]+)       # Extract the tag name.
(?![/>])                    # Stop if /> is found
# -- Extract Attributes Key Value Pairs  --

((?:\s+)             # One to many spaces start the attribute
 (?<Key>[^=]+)       # Name/key of the attribute
 (?:=)               # Equals sign needs to be matched, but not captured.

(?([\x22\x27])              # If quotes are found
  (?:[\x22\x27])
  (?<Value>[^\x22\x27]+)    # Place the value into named Capture
  (?:[\x22\x27])
 |                          # Else no quotes
   (?<Value>[^\s/>]*)       # Place the value into named Capture
 )
)+                  # -- One to many attributes found!

This will give you every tag and you can filter out what is needed and target the attribute you want.

I've written more about this in my blog (C# Regex Linq: Extract an Html Node with Attributes of Varying Types).

Collectives™ on Stack Overflow

Regular Expression: Getting url value from hyperlink

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related