php simple_html_dom scraping issue [duplicate]

Question

I am trying to scrape a content from one site using the simple_html_dom using this code

$html = file_get_html('http://www.aswaqcity.com/thread1230092.html');
//echo $html;
// Find all article blocks
foreach($html->find('/html/body/div[2]/div[1]/div/div/div/table[1]/tbody/tr[2]/td[2]') as $article) {
    $item['title']      = $article->find('/div[1]/strong', 0)->plaintext;
    $articles[] = $item;
}

print_r($articles);

I got the xpath from firebug options but there is nothing scraped.

@Enissay So, are the answers to this question wrong? Not familiar with PHP, just curious. It seems to me XPath expressions can be used: simplehtmldom.sourceforge.net/manual.htm#section_find. — Mathias Müller
– Mathias Müller, Commented Jan 1, 2015 at 22:07
@MathiasMüller Scratch that, both are supported (my bad)... I tried to explore the code, but it looks like it has some encoding problem when displaying the result and which I couldn't solve... — Enissay
– Enissay, Commented Jan 1, 2015 at 22:22
Please explain what you are trying to find on this page. What would be the expected output? @Enissay No worries - I misread specifications all the time myself.. — Mathias Müller
– Mathias Müller, Commented Jan 1, 2015 at 22:24

hakre · Accepted Answer · 2015-01-03 13:30:30Z

1

Most likely the tbody isn't really there. HTML browsers will add those to the dom whenever they are missing.

Also you should be using css instead of xpath, it's the whole point of using simple-html-dom.

edited Jan 3, 2015 at 13:30

hakre

200k55 gold badges454 silver badges865 bronze badges

answered Jan 1, 2015 at 23:12

pguardiario

55.2k21 gold badges130 silver badges169 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mathias Müller Over a year ago

Why is using CSS the point of simple-html-dom? (No criticism intended, I am asking out of curiosity)

pguardiario Over a year ago

Because you don't need simple-html-dom to get that stuff with xpath. There's built-in Dom functions that can do that.

Mathias Müller Over a year ago

Ah, that makes sense. Thanks! Also, + 1 - tbody is the culprit most likely.

Collectives™ on Stack Overflow

php simple_html_dom scraping issue [duplicate]

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related