So I'm trying to get all the urls from a string with a script that looks like this:
$file = file_get_contents('something.txt');
function getUrls($string) {
preg_match_all('~href=("|\')(.*?)\1~', $string, $out);
print_r($out);
}
getUrls($file);
The urls contained in this document may be imperfect - i.e. "/blah/blah.asp?2". The problem is that when I run this script, I get an array that looks something like this:
Array
(
[0] => Array
(
[0] => href="#A"
[1] => href="#B"
[2] => href="#C"
)
[1] => Array
(
[0] => "
[1] => "
[2] => "
)
[2] => Array
(
[0] => #A
[1] => #B
[2] => #C
)
)
Any idea what could be going on here? I have no idea why it is returning alphabetical lists with hash signs instead of the desired urls. How can I go about just returning the urls?