2

How should parse with PHP (simple html dom/etc..) background and other images of webpage?

case 1: inline css

<div id="id100" style="background:url(/mycar1.jpg)"></div>

case 2: css inside html page

<div id="id100"></div>

<style type="text/css">
#id100{
background:url(/mycar1.jpg);
}
</style>

case 3: separate css file

<div id="id100" style="background:url(/mycar1.jpg);"></div>

external.css

#id100{
background:url(/mycar1.jpg);
}

case 4: image inside img tag

solution to case 4 as he appears in php simple html dom parser:

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';

Please help me to parse case 1,2,3.

If exist more cases please write them, with soltion if you can please.

Thanks

1
  • Getting the content out of HTML files with libraries like DOM has been answered numerous times before (including today). External CSS Files cannot be processed by an SGML/XML library. Also note, that Node content is just character data for those libraries. You have to find an additional parser if you want to parse the content as CSS. Commented Jul 10, 2010 at 19:48

2 Answers 2

3

For Case 1:

// Create DOM from URL or file 
$html = file_get_html('http://www.google.com/');

// Get the style attribute for the item
$style = $html->getElementById("id100")->getAttribute('style');

// $style = background:url(/mycar1.jpg)
// You would now need to put it into a css parser or do some regular expression magic to get the values you need.

For Case 2/3:

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Get the Style element
$style = $html->find('head',0)->find('style');

// $style now contains an array of style elements within the head. You will need to work out using attribute selectors what whether an element has a src attribute, if it does download the external css file and parse (using a css parser), if it doesnt then pass the innertext to the css parser.
Sign up to request clarification or add additional context in comments.

Comments

1

To extract <img> from the page you can try something like:

$doc = new DOMDocument(); 
$doc->loadHTML("<html><body>Foo<br><img src=\"bar.jpg\" title=\"Foo bar\" alt=\"alt\"></body></html>"); 
$xml = simplexml_import_dom($doc);
$images = $xml->xpath('//img'); 
foreach ($images as $img) 
    echo $img['src'] . ' ' . $img['alt'] . ' ' . $img['title']; 

See doc for DOMDocument for more details.

2 Comments

DOMElement implements/allows ArrayAccess?
I already write solution for img tag my answer only for background css image

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.