0

I am trying to scrape a content from one site using the simple_html_dom using this code

$html = file_get_html('http://www.aswaqcity.com/thread1230092.html');
//echo $html;
// Find all article blocks
foreach($html->find('/html/body/div[2]/div[1]/div/div/div/table[1]/tbody/tr[2]/td[2]') as $article) {
    $item['title']      = $article->find('/div[1]/strong', 0)->plaintext;
    $articles[] = $item;
}

print_r($articles); 

I got the xpath from firebug options but there is nothing scraped.

3
  • @Enissay So, are the answers to this question wrong? Not familiar with PHP, just curious. It seems to me XPath expressions can be used: simplehtmldom.sourceforge.net/manual.htm#section_find. Commented Jan 1, 2015 at 22:07
  • @MathiasMüller Scratch that, both are supported (my bad)... I tried to explore the code, but it looks like it has some encoding problem when displaying the result and which I couldn't solve... Commented Jan 1, 2015 at 22:22
  • 1
    Please explain what you are trying to find on this page. What would be the expected output? @Enissay No worries - I misread specifications all the time myself.. Commented Jan 1, 2015 at 22:24

1 Answer 1

1

Most likely the tbody isn't really there. HTML browsers will add those to the dom whenever they are missing.

Also you should be using css instead of xpath, it's the whole point of using simple-html-dom.

Sign up to request clarification or add additional context in comments.

3 Comments

Why is using CSS the point of simple-html-dom? (No criticism intended, I am asking out of curiosity)
Because you don't need simple-html-dom to get that stuff with xpath. There's built-in Dom functions that can do that.
Ah, that makes sense. Thanks! Also, + 1 - tbody is the culprit most likely.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.