0

I am pulling in text from a database that is formatted like the sample below. I want to insert the domain name in front of every URL within this block of text.

<p>We recommend you check out the article 
<a id="navitem" href="/article/why-apples-new-iphones-may-delight-and-worry-it-pros/" target="_top">
Why Apple's new iPhones may delight and worry IT pros</a> to learn more</p>

So with the example above in mind I want to insert http://www.mydomainname.com/ into the URL so it reads:

href="http://www.mydomainname.com/article/why-apples-new-iphones-may-delight-and-worry-it-pros/"

I figured I could use regex and replace href=" with href="http://www.mydomainname.com but this appears to not be working as I intended. Any suggestions or better methods I should be attempting?

var content = Regex.Replace(DataBinder.Eval(e.Item.DataItem, "Content").ToString(), 
              "^href=\"$", "href=\"https://www.mydomainname.com/");
3
  • If this text will always be HTML, you can use Html Agility Pack to parse and manipulate the HTML. Commented Sep 8, 2017 at 18:05
  • @maccettura Yes, it will always be HTML. Commented Sep 8, 2017 at 18:06
  • I dont recommend to use regex for this purpose. Erik has a nice answer. Commented Sep 8, 2017 at 18:35

3 Answers 3

1

You could use regex...

...but it's very much the wrong tool for the job.

Uri has some handy constructors/factory methods for just this purpose:

Uri ConvertHref(Uri sourcePageUri, string href)
{
    //could really just be return new Uri(sourcePageUri, href);
    //but TryCreate gives more options...
    Uri newAbsUri;
    if (Uri.TryCreate(sourcePageUri, href, out newAbsUri))
    {
        return newAbsUri;
    }

    throw new Exception();
}

so, say sourcePageUri is

var sourcePageUri = new Uri("https://somehost/some/page");

the output of our method with a few different values for href:

https://www.foo.com/woo/har => https://www.foo.com/woo/har
/woo/har                    => https://somehost/woo/har
woo/har                     => https://somehost/some/woo/har

...so it's the same interpretation as the browser makes. Perfect, no?

Sign up to request clarification or add additional context in comments.

Comments

0

Try this code:

var content = Regex.Replace(DataBinder.Eval(e.Item.DataItem, "Content").ToString(), 
              "(href=[ \t]*\")\/", "$1https://www.mydomainname.com/", RegexOptions.Multiline);

Comments

0

Use html parser, like CsQuery.

var html = "your html text here";
var path = "http://www.mydomainname.com";

CQ dom = html;
CQ links = dom["a"];

foreach (var link in links)
    link.SetAttribute("href", path + link["href"]);

html = dom.Html();

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.