6

I am just starting with the mentioned Parser and somehow running on problems directly with the beginning.

Referring to this tutorial:

http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/

I want now simply find in a sourcecode tne content of a div with a class ClearBoth Box

I retrieve the code with curl and create a simple html dom object:

$cl = curl_exec($curl);  
$html = new simple_html_dom();
$html->load($cl);

Then I wanted to add the content of the div into an array called divs:

$divs = $html->find('div[.ClearBoth Box]');

But now, when I print_r the $divs, it gives much more, despite the fact that the sourcecode has not more inside the div.

Like this:

Array
(
    [0] => simple_html_dom_node Object
        (
            [nodetype] => 1
            [tag] => br
            [attr] => Array
                (
                    [class] => ClearBoth
                )

            [children] => Array
                (
                )

            [nodes] => Array
                (
                )

            [parent] => simple_html_dom_node Object
                (
                    [nodetype] => 1
                    [tag] => div
                    [attr] => Array
                        (
                            [class] => SocialMedia
                        )

                    [children] => Array
                        (
                            [0] => simple_html_dom_node Object
                                (
                                    [nodetype] => 1
                                    [tag] => iframe
                                    [attr] => Array
                                        (
                                            [id] => ShowFacebookButtons
                                            [class] => SocialWeb FloatLeft
                                            [src] => http://www.facebook.com/plugins/xxx
                                            [style] => border:none; overflow:hidden; width: 250px; height: 70px;
                                        )

                                    [children] => Array
                                        (
                                        )

                                    [nodes] => Array
                                        (
                                        )

I do not understand why the $divs has not simply the code from the div?

Here is an example of the source code at the site:

<div class="ClearBoth Box">
          <div>
<i class="Icon SmallIcon ProductRatingEnabledIconSmall" title="gute peppige Qualität: Sehr empfehlenswert"></i>
<i class="Icon SmallIcon ProductRatingEnabledIconSmall" title="gute peppige Qualität: Sehr empfehlenswert"></i>
<i class="Icon SmallIcon ProductRatingEnabledIconSmall" title="gute peppige Qualität: Sehr empfehlenswert"></i>
<i class="Icon SmallIcon ProductRatingEnabledIconSmall" title="gute peppige Qualität: Sehr empfehlenswert"></i>
<i class="Icon SmallIcon ProductRatingEnabledIconSmall" title="gute peppige Qualität: Sehr empfehlenswert"></i>

              <strong class="AlignMiddle LeftSmallPadding">gute peppige Qualität</strong> <span class="AlignMiddle">(17.03.2013)</span>
          </div>
          <div class="BottomMargin">
            gute Verarbeitung, schönes Design,
          </div>
        </div>

What am I doing wrong?

0

3 Answers 3

9

The right code to get a div with class is:

$ret = $html->find('div.foo');
//OR
$ret = $html->find('div[class=foo]');

Basically you can get elements as you were using a CSS selector.

source: http://simplehtmldom.sourceforge.net/manual.htm
How to find HTML elements? section, tab Advanced

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you so much! Now I am a little step further! In my case, because the class name is in two parts "ClearBoth Box" I have to use: div[class=ClearBoth Box] because div.ClearBoth Box searches a element Box after the div, and only div.ClearBoth return more matches than I need.
what if my div has no class name? I want all the divs on the page?
@amitchhajer You either find an element with a unique ID higher or lower to your div in question and then more with the child, parent methods or you print the outertext of where your are (dom object) and count how many divs there are before the one you need and access it via it's number. 4th div = dom->find('div',3);
how can I print the HTML how can I do this?
9
$html = new simple_html_dom();   
$html->load($output); 
$items = $html->find('div.youclassname',0)->children(1)->outertext; 
print_r($items);

Comments

0

The to find the following elements: DIV -> class(product-inner clearfix) -> class(price) the following XPath can be used:

foreach($html->find('div[class=product-inner  clearfix]') as $element){
        $itemPrice = $element->find('.price',0)->plaintext;
        echo $itemPrice;
    }

1 Comment

Here are some guidelines for How do I write a good answer?. This provided answer may be correct, but it could benefit from an explanation. Code only answers are not considered "good" answers. From review.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.