Find string using regex and regular expressions

Question

I have this text, and I try to print a1 and a2

<a href="a1" title="t1"> k1 </a>
<a href="a2" title="t2"> k2 </a>

Here is my attempt:

string html =  "<a href=\"a1\" title=\"t1\"> k1 </a>";
       html += "<a href=\"a2\" title=\"t2\"> k2 </a>";

 //here is how I think my logic expression should work:
 //<a href=" [something that is not quote, 0 or more times] " [anything] </a>
Regex regex = new Regex("<a href=\"([^\"]*)\".*</a>");
foreach (Match match in regex.Matches(html)
    Console.WriteLine(match.Groups[1]);

Why does this only print a1? I am pretty sure I am doing it right. What am I missing ?

I don't code in c# but I think the .* should be .*? so it is non-greedy. Currently you'll go to the last </a>. — chris85
– chris85, Commented May 6, 2015 at 0:01

Community · Accepted Answer · 2017-05-23 12:30:04Z

2

Your regular expression .* is consuming all characters upto the second </a>. What you need is lazy consumption with .*? so that it only consumes all characters up to the first </a>:

Regex regex = new Regex("<a href=\"([^\"]*)\".*?</a>");

Meanwhile, Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms

edited May 23, 2017 at 12:30

CommunityBot

11 silver badge

answered May 6, 2015 at 0:02

William

1,0077 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

dimitris93 Over a year ago

this way, ("<a href=\"(.*?)\".*?</a>") works too and its cleaner syntax-wise. I didn't know about the ? symbol

Collectives™ on Stack Overflow

Find string using regex and regular expressions

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related