2

I need to scrape a web page that has a javascript array embeded in inline javascript code, such as:

<script>
    var videos = new Array();
    videos[0] = 'http://myvideos.com/video1.mov'; 
    videos[1] = ....
    ....
</script>

What's the easiest way to approach this and end up with a PHP array of these video urls?

Edit: All videos are .mov extension.

1
  • I have a few lines using file_get_contents and trying out a few regexp. I'm no good at regex, tho. Commented Jan 12, 2012 at 23:17

2 Answers 2

1

This is a bit more complicated, but it will get only those links, that are really of the form videos[0] = 'http://myvideos.com/video1.mov';

$tmp=str_replace(array("\r","\n"),'',$original,$matches);
$pattern='/\<script\>\s+var\ videos.*?((\s*videos\[\d+\]\ \=\ .http\:\/\/.*?\;\s*?)+)(.*?)\<\/script\>/';
$a=preg_match_all($pattern,$tmp,$matches);
unset($tmp);

if (!$a) die("no matches");

$pattern="/videos\[\d+\]\ \=\ /";
$matches=preg_split($pattern,$matches[1][0]);

$final=array();
while(sizeof($matches)>0) {
  $match=trim(array_shift($matches));
  if ($match=='') continue;
  $final[]=substr($match,1,-2);
}
unset($matches);

print_r($final);

After feedback from the OP here is the simplified version:

$original=file_get_contents($url);
$pattern='/http\:\/\/.*?\.mov/';
$a=preg_match_all($pattern,$original,$matches);
if (!$a) die("no matches");
print_r($matches[0]);
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, I'll check on this. I think it can be even easier, since all videos are always .mov
So in fact you want all links to .mov files scraped from that page?
Worked like a charm. Thanks a million!
1

You can scrape this by reading the page with a file_get_contents then retrieve the urls with a regex. This is the simplest way i know, especially if you know the file extensions for your videos. Exemple:

<?php
$file = file_get_contents('http://google.com');
$pattern = '/http:\/\/([a-zA-Z0-9\-\.]+\.[fr|com]+)/i';
preg_match_all($pattern, $file, $matches);
var_dump($matches);

1 Comment

This was exactly my first approach to it. I guess there's not much alternative, is it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.