0

I'm trying to parse a HTML page where the majority of the content is contained in javascript. When I use the Chrome development tools I can see that the div class I'm trying to grab the content from is called div class=doodle-image. However when I either view the page as a source or try to grab it with php:

<?php 
include_once('simple_html_dom.php');
$html = new simple_html_dom();
$html->load_file('http://www.google.com/doodles/finder/2012/All%20doodles');
$doodles = $html->find('.doodle-image');
echo $html;
?>

It returns the frame of the page but contains none of the divs or content. How can I grab the full content of the page?

1 Answer 1

2

That's because the element is empty when your PHP client fetches it, Google is loading in a JSON-object with JavaScript to populate the list of doodles. It does a Ajax-request to this page, and probably you can too.

Sign up to request clarification or add additional context in comments.

3 Comments

How were you able to find that info? I've tried using the chrome developers tools and FireBug (a little complex for me) with no success. BTW your finding made my work infinitely easier.
@Nick - it was easy to see using Firebug. Use either "console" or "net" to see additional requests from the base page.
Sorry to continue this in comments but I was able to easily see the json url in the console I was staring right at it. However when I type a custom search option for the page it seems that a json object is being loaded again but firebug doesn't show a url like i did for choosing different months.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.