I am using simple html dom parser to scrape a website ... How can i skip a particular class while in a loop
2 Answers
Judging from http://simplehtmldom.sourceforge.net/manual.htm#frag_find_attr you can use:
->find("div[class!=skip_me]")
Or use the DOM methods and check with ->getAttribute("class") against a value.
Comments
// DOM can load HTML soup. But, HTML soup can throw warnings, suppress
// them.
$htmlDom = new DOMDocument();
@$htmlDom->loadHTML($html);
if ($htmlDom) {
// It's much easier to work with simplexml than DOM, luckily enough
// we can just simply import our DOM tree.
$elements = simplexml_import_dom($htmlDom);
This is a quote (almost) from Drupal 7 SimpleTest. After this, it's a lot easier work with the document, the class can be reach as $element['class']
1 Comment
Gordon
The correct way to suppress parsing errors is to use
libxml_use_internal_errors() and then clear them with libxml_clear_errors(). If you use the Error Suppression Operator it will suppress any error, not just the parsing errors.