0

Working on a community website, converting it from ASP to PHP. At the moment, the client manually enters the movie times each week for our local theater, which they grab from another website. I figured I would try to automate this process since we are redoing the site anyway, so I found PHP Simple HTML DOM Parser. I'm stuck on selecting the rating of the movie (PG, 18, etc).

Here is a div that includes the information for one movie:

            <div class="mshow">
                <span style="float:right; font-size:11px;">
                    <a href="/trailers/enders-game/19330/" title="enders-game movie trailer" style="font-size:11px;">Trailer</a> | 
                    <a href="/reviews/enders-game/30945/" title="Ender's Game movie reviews" style="font-size:11px;">Rating: </a>
                    <b>Tribute</b>
                    <img src="/images/stars/4_sm.gif" alt="Current rating: 3.88" border="0" />
                </span>
                <strong>
                    <a href="/movies/enders-game/30945/" title="Ender's Game movie info">Ender's Game</a>
                </strong>
                (PG)<br />
                <div class="block">&nbsp;</div>
                <div class="rsd">Fri, Nov 15: </div>
                <div class="rst" >7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Sat, Nov 16: </div>
                <div class="rst" >1:00pm &nbsp;&nbsp;3:15pm &nbsp;&nbsp;7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Sun, Nov 17: </div>
                <div class="rst" >1:00pm &nbsp;&nbsp;3:15pm &nbsp;&nbsp;7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Mon, Nov 18: </div>
                <div class="rst" >7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Tue, Nov 19: </div>
                <div class="rst" >7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Wed, Nov 20: </div>
                <div class="rst" >7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Thu, Nov 21: </div>
                <div class="rst" >7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
            </div>

And here is my code so far:

            <?php
            include_once('../simple_html_dom.php');

            $html = file_get_html('http://www.tribute.ca/showtimes/theatres/may-cinema-6/mayc5/?datefilter=-1');
            $movies = array();
            foreach ($html->find("div.mshow") as $movie) {
                $item['trailer'] = $movie->find('a', 0)->href;
                $item['reviews'] = $movie->find('a', 1)->href;
                $item['link'] = $movie->find('a', 2)->href;
                $item['title'] = $movie->find('a', 2)->plaintext;
                $movies[] = $item;
            }

            var_dump($movies);
            ?>

I can't figure out how to grab (PG). Any suggestions?

Edit: This works, but doesn't seem like a great solution.

            function parseDOM($url) {
                $movies = array();
                foreach ($url->find("div.mshow") as $movie) {
                    $item['trailer'] = $movie->find('a', 0)->href;
                    $item['reviews'] = $movie->find('a', 1)->href;
                    $item['link'] = $movie->find('a', 2)->href;
                    $item['title'] = $movie->find('a', 2)->plaintext;
                    $info = $movie->plaintext;
                    preg_match('/\((.*?)\)/', $info, $matches);
                    $item['rating'] = $matches[1];
                    $movies[] = $item;
                }
                return $movies;
            }

1 Answer 1

1

Unfortunately Simple HTML DOM library was a bad choice. It doesn't support full XPath queries nor have a seemly sibling node selector.

With the built-in DOM module you can easily achieve what you want with that:

$dom = new DOMDocument;
@$dom->loadHTMLFile('http://www.tribute.ca/showtimes/theatres/may-cinema-6/mayc5/?datefilter=-1');
$xpath = new DOMXPath($dom);
$movies = array();

foreach ($xpath->query("//div[@class='mshow']") as $movie) {
    $item = array();
    $links = $xpath->query('.//a', $movie);
    $item['trailer'] = $links->item(0)->getAttribute('href');
    $item['reviews'] = $links->item(1)->getAttribute('href');
    $item['link'] = $links->item(2)->getAttribute('href');
    $item['title'] = $links->item(2)->nodeValue;
    $item['rating'] = trim($xpath->query('.//strong/following-sibling::text()',
        $movie)->item(0)->nodeValue);
    $movies[] = $item;
}

var_dump($movies);

This gave me the following:

array(7) {
  [0]=>
  array(5) {
    ["trailer"]=>
    string(28) "/trailers/enders-game/19330/"
    ["reviews"]=>
    string(27) "/reviews/enders-game/30945/"
    ["link"]=>
    string(26) "/movies/enders-game/30945/"
    ["title"]=>
    string(12) "Ender's Game"
    ["rating"]=>
    string(4) "(PG)"
  }
  [1]=>
  array(5) {
    ["trailer"]=>
    string(27) "/trailers/free-birds/19436/"
    ["reviews"]=>
    string(26) "/reviews/free-birds/36183/"
    ["link"]=>
    string(25) "/movies/free-birds/36183/"
    ["title"]=>
    string(10) "Free Birds"
    ["rating"]=>
    string(3) "(G)"
  }
  [2]=>
  array(5) {
    ["trailer"]=>
    string(30) "/trailers/free-birds-3d/14421/"
    ["reviews"]=>
    string(29) "/reviews/free-birds-3d/37230/"
    ["link"]=>
    string(28) "/movies/free-birds-3d/37230/"
    ["title"]=>
    string(13) "Free Birds 3D"
    ["rating"]=>
    string(3) "(G)"
  }
  [3]=>
  array(5) {
    ["trailer"]=>
    string(45) "/trailers/jackass-presents-bad-grandpa/19318/"
    ["reviews"]=>
    string(44) "/reviews/jackass-presents-bad-grandpa/36493/"
    ["link"]=>
    string(43) "/movies/jackass-presents-bad-grandpa/36493/"
    ["title"]=>
    string(29) "Jackass Presents: Bad Grandpa"
    ["rating"]=>
    string(5) "(14A)"
  }
  [4]=>
  array(5) {
    ["trailer"]=>
    string(27) "/trailers/last-vegas/19291/"
    ["reviews"]=>
    string(26) "/reviews/last-vegas/35853/"
    ["link"]=>
    string(25) "/movies/last-vegas/35853/"
    ["title"]=>
    string(10) "Last Vegas"
    ["rating"]=>
    string(4) "(PG)"
  }
  [5]=>
  array(5) {
    ["trailer"]=>
    string(36) "/trailers/thor-the-dark-world/19327/"
    ["reviews"]=>
    string(35) "/reviews/thor-the-dark-world/32002/"
    ["link"]=>
    string(34) "/movies/thor-the-dark-world/32002/"
    ["title"]=>
    string(20) "Thor: The Dark World"
    ["rating"]=>
    string(4) "(PG)"
  }
  [6]=>
  array(5) {
    ["trailer"]=>
    string(39) "/trailers/thor-the-dark-world-3d/14425/"
    ["reviews"]=>
    string(38) "/reviews/thor-the-dark-world-3d/34705/"
    ["link"]=>
    string(37) "/movies/thor-the-dark-world-3d/34705/"
    ["title"]=>
    string(23) "Thor: The Dark World 3D"
    ["rating"]=>
    string(4) "(PG)"
  }
}
Sign up to request clarification or add additional context in comments.

1 Comment

@DERNERSERFT You're welcome! If you have any issues using the answer code, please, let me know! Glad I'm being able to help you! :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.