Relatively new to php and looking for some help in updating links on a specific page. The page has numerous links eg. href=/link/ and I would like to code the page to identify these links (links that do not already have http or https) and prepend with a url eg. www.domain.com to each. Basically ending up with href=www.domain.com/link/. Any help would be greatly appreciated.
-
Sorry about the the wording. The Link Name was supposed to read href=/link-name/. I need code that will update find all of these type links to have http:/domain.com placed at the beginning. So after the code runs href='/link-name/ would read href=domain.com/link-nameuser443873– user4438732010-09-10 01:13:40 +00:00Commented Sep 10, 2010 at 1:13
-
Yes.....I know hopeseekr. Thanks for your patience in letting me update and get it going in the right direction. Your example looks good. Plan on implementing this weekend and will post back the results.user443873– user4438732010-09-11 13:06:50 +00:00Commented Sep 11, 2010 at 13:06
4 Answers
I think you want to parse a list of URLs and prepend "http://" to the ones that don't have it.
<?php
$links = array('http://www.redditmirror.cc/', 'phpexperts.pro', 'https://www.paypal.com/', 'www.example.com');
foreach ($links as &$link)
{
// Prepend "http://" to any link missing the HTTP protocol text.
if (preg_match('|^https*://|', $link) === 0)
{
$link = 'http://' . $link . '/';
}
}
print_r($links);
/* Output:
Array
(
[0] => http://www.redditmirror.cc/
[1] => http://phpexperts.pro/
[2] => https://www.paypal.com/
[3] => http://www.example.com/
)
*/
1 Comment
Maybe it suffices to just change the base URI of the document with the BASE element:
<base href="http://example.com/link/">
With this the new base URI is http://example.com/link/ instead of the URI of the document. That means, every relative URI is resolved from http://example.com/link/ instead of the document’s URI.
1 Comment
You could always use output buffering at the top of your page with a callback that reformats your hrefs to how you'd like them:
function callback($buffer)
{
return (str_replace(' href="/', ' href="http://domain.com/', $buffer));
}
ob_start('callback');
// rest of your page goes here
ob_end_flush();
Comments
Because you left out critical details in your first question, here is the second answer.
Doing what @Nev Stokes says may work, but it will also get more than tags. You should never use regular expressions (or, worse, strp_replace) on HTML.
Instead, use the file_get_html() library and do this:
<?php
require 'simplehtmldom/simple_html_dom.php';
ob_start();
?>
<html>
<body>
<a id="id" href="/my_file.txt">My File</a>
<a name="anchor_link" id="boo" href="mydoc2.txt">My Doc 2</a>
<a href="http://www.phpexperts.pro/">PHP Experts</a>
</body>
</html>
<?php
$output = ob_get_clean();
$html = str_get_html($output);
$anchors = $html->find('a');
foreach ($anchors as &$a)
{
if (preg_match('|^https*://|', $a->href) === 0)
{
// Make sure first char is /.
if ($a->href[0] != '/')
{
$a->href = '/' . $a->href;
}
$a->href = 'http://www.domain.com' . $a->href;
}
}
echo $html->save();
Output:
<html>
<body>
<a id="id" href="http://www.domain.com/my_file.txt">My File</a>
<a name="anchor_link" id="boo" href="http://www.domain.com/mydoc2.txt">My Doc 2</a>
<a href="http://www.phpexperts.pro/">PHP Experts</a>
</body>
</html>