1

How can I replace

<a href="page">Text</a>

with

<a href="page.html">Text</a>

where page and Text can be any set of characters?

2 Answers 2

1

This will work. Note that I only capture whatever is inside href.

resultString = Regex.Replace(subjectString, @"(?<=<a[^>]*?\bhref\s*=\s*(['""]))(.*)(?=\1.*?>)", "$2.html");

And append the .html to it. You may wish to change it to your needs.

Edit : before flame wars begin. Yes it will work for your specific example not for all possible html in the internet.

Sign up to request clarification or add additional context in comments.

Comments

1

You shouldn't parse HTML with regular expressions. See the answer to this question for details.

UPD: As TrueWill has pointed out, you might want to do the replace with Html Agility Pack. But in some special cases the regexp proposed by FailedDev will do, although I would slightly modify it to look like this: @"(?<=<a\b[^>]*?\bhref\s*=\s*(['""]))(.*)(?=\1.*?>)" (put a \b after the <a to exclude other tags starting with "a").

4 Comments

I'm not trying to parse the HTML, I'm trying to do a string replace in a html file.
One simple regex would be <a.*?href="(.*)".*?>(.*?)</a> to find the parts.
@Justin808 But to do it correctly, you actually need to parse the document. For example, you will probably want to ignore scripts and comments.
@Gebb is correct. Any changes to HTML, particularly those affecting only a specific context (such as in an HREF), involve parsing. Take a look at htmlagilitypack.codeplex.com

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.