1

i have a script which will fetch content from a website, what i wanna do is modify all that links. Suppose:

$html = str_get_html('<h2 class="r"><a class="l" href="http://www.example.com/2009/07/page.html" onmousedown="return curwt(this, 'http://www.example.com/2009/07/page.html')">SEO Result Boost <b> </b></a></h2>');

so, is it possible to modify or rewrite it in this way>

<h2 class="r"><a class="l" href="http://www.site.com?http://www.example.com/2009/07/page.html">SEO Result Boost <b> </b></a></h2>


I have read it's manual but can not understand how to figure it ( http://simplehtmldom.sourceforge.net/#fragment-12 )

Is It Possible, Any Idea?

2
  • 4
    Is this a setup for a phishing site? Commented Sep 5, 2012 at 19:44
  • 1
    ow.. never, i have never thought something like this Commented Sep 5, 2012 at 19:50

1 Answer 1

5

Assuming the answer to a related question works,

You should be able to use the following working with Simple HTML DOM

$site = "http://siteyourgettinglinksfrom.com";
$doc = str_get_html($code);
foreach ($doc->find('a[href]') as $a) {
$href = $a->href;
if (/* $href begins with a absolute URL path */) {
    $a->href = 'http://www.site.com?'.$href;
}
else{ /* $href begins with a relative path */        
    $a->href = 'http://www.site.com?'.$site.$href;
}

}
$code = (string) $doc;

or

Using PHP’s native DOM library:

$site = "http://siteyourgettinglinksfrom.com";
$doc = new DOMDocument();
$doc->loadHTML($code);
$xpath = new DOMXpath($doc);
foreach ($xpath->query('//a[@href]') as $a) {
$href = $a->getAttribute('href');
if (/* $href begins with a absolute URL path */) {
    $a->setAttribute('href', 'http://www.site.com?'.$href);
}
else{ /* $href begins with a relative path */
    $a->setAttribute('href', 'http://www.site.com?'.$site.$href);
}
}
$code = $doc->saveHTML();

Checking the $href:

you would be checking for a relative link and prepend the address of the site your pulling the content from, since most sites use relative links. (this is where a regular expression matcher would be your best friend)

for relative links you prepend the absoute path to the site which you are getting links from

  'http://www.site.com?'.$site.$href

for absolute links you just append the relative link

  'http://www.site.com?'.$href

Example links:

site relative: /images/picture.jpg

document relative: ../images/picture.jpg

absolute: http://somesite.com/images/picture.jpg

(Note: there is a little more work that needs done here, because if your handling "document relative" links, then you will have to know what directory you're currently in. Site relative links should be good to go, as long as you have the root folder of the site you're getting links from)

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.