turn HTML into a PHP array

Question

I have a string containing also HTML in a $html variable:

'Here is some <a href="#">text</a> which I do not need to extract but then there are 
<figure class="class-one">
    <img src="/example.jpg" alt="example alt" class="some-image-class">
    <figcaption>example caption</figcaption>
</figure>

And another one (and many more)
<figure class="class-one some-other-class">
    <img src="/example2.jpg" alt="example2 alt">
</figure>'

I want to extract all <figure> elements and everything they contain including their attributes and other html-elements and put this in an array in PHP so I would get something like:

    $figures = [
        0 => [
            "class" => "class-one",
            "img" => [
                "src" => "/example.jpg",
                "alt" => "example alt",
                "class" => "some-image-class"
            ],
            "figcaption" => "example caption"
        ],
        1 => [
            "class" => "class-one some-other-class",
            "img" => [
                "src" => "/example2.jpg",
                "alt" => "example2 alt",
                "class" => null
            ],
            "figcaption" => null
        ]];

So far I have tried:

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();

$figures = array();
foreach ($figures as $figure) {
    $figures['class'] = $figure->getAttribute('class');
    // here I tried to create the whole array but I can't seem to get the values from the HTML 
    // also I'm not sure how to get all html-elements within <figure>   
}

Here is a Demo.

You are overwriting the $figures variable before the loop. — msg
– msg, Commented Jun 16, 2019 at 11:35

NickyTheWrench · Accepted Answer · 2019-06-16 13:08:28Z

4

Here is the code that should get you where you want to be. I have added comments where I felt they would be helpful:

<?php

$htmlString = 'Here is some <a href="#">text</a> which I do not need to extract but then there are <figure class="class-one"><img src="/example.jpg" alt="example alt" class="some-image-class"><figcaption>example caption</figcaption></figure>And another one (and many more)<figure class="class-one some-other-class"><img src="/example2.jpg" alt="example2 alt"></figure>';

//Create a new DOM document
$dom = new DOMDocument;

//Parse the HTML.
@$dom->loadHTML($htmlString);

//Create new XP
$xp = new DOMXpath($dom);

//Create empty figures array that will hold all of our parsed HTML data
$figures = array();

//Get all <figure> elements
$figureElements = $xp->query('//figure');

//Create number variable to keep track of our $figures array index
$figureCount = 0;

//Loop through each <figure> element
foreach ($figureElements as $figureElement) {
    $figures[$figureCount]["class"] = trim($figureElement->getAttribute('class'));
    $figures[$figureCount]["img"]["src"] = $xp->query('//img', $figureElement)->item($figureCount)->getAttribute('src');
    $figures[$figureCount]["img"]["alt"] = $xp->query('//img', $figureElement)->item($figureCount)->getAttribute('alt');

    //Check that an img class exists, otherwise set the value to null. If we don't do this PHP will throw a NOTICE.
    if (boolval($xp->evaluate('//img', $figureElement)->item($figureCount))) {
        $figures[$figureCount]["img"]["class"] = $xp->query('//img', $figureElement)->item($figureCount)->getAttribute('class');
    } else {
        $figures[$figureCount]["img"]["class"] = null;
    }

    //Check that a <figcaption> element exists, otherwise set the value to null
    if (boolval($xp->evaluate('//figcaption', $figureElement)->item($figureCount))) {
        $figures[$figureCount]["figcaption"] = $xp->query('//figcaption', $figureElement)->item($figureCount)->nodeValue;
    } else {
        $figures[$figureCount]["figcaption"] = null;
    }

    //Increment our $figureCount so that we know we can create a new array index.
    $figureCount++;
}

print_r($figures);
?>

answered Jun 16, 2019 at 13:08

NickyTheWrench

3,2301 gold badge27 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Dirk J. Faber Over a year ago

Those comments are very helpful indeed. I have just one more question about this solution: is there a benefit to using DOMXpath instead of only using DOMDocument to get all the values?

NickyTheWrench Over a year ago

Glad to help! Yes, I use Xpath so that I can easily target the HTML elements that I’m looking to parse, as well as check if the children elements and attributes actually exist using Xpath’s evaluate.

Shivendra Singh · Accepted Answer · 2019-06-16 14:44:26Z

2

 $doc = new \DOMDocument();
      $doc->loadHTML($html);

      $figure = $doc->getElementsByTagName("figure"); // DOMNodeList Object

      //Craete array to add all DOMElement value
      $figures = array();
      $i= 0;
      foreach($figure as $item) { // DOMElement Object

        $figures[$i]['class']= $item->getAttribute('class');
        //DOMElement::getElementsByTagName— Returns html tag
        $img = $item->getElementsByTagName('img')[0];

        if($img){
            //DOMElement::getAttribute — Returns value of attribute
            $figures[$i]['img']['src'] = $img->getAttribute('src');

            $figures[$i]['img']['alt'] = $img->getAttribute('alt');
            $figures[$i]['img']['class'] = $img->getAttribute('class');
        }
        //textContent - use to get the text of tag
        if($item->getElementsByTagName('figcaption')[0]){
            $figures[$i]['figcaption'] = $item->getElementsByTagName('figcaption')[0]->textContent;
        }

        $i++;
      }

      echo "<pre>";
      print_r($figures);
      echo "</pre>";

edited Jun 16, 2019 at 14:44

answered Jun 16, 2019 at 12:54

Shivendra Singh

2,9731 gold badge13 silver badges11 bronze badges

2 Comments

Dirk J. Faber Over a year ago

I do an error for Trying to get property 'textContent' of non-object, because there is no check if the property actually exists.

Shivendra Singh Over a year ago

yes, that was warning. I added the condition to check the object.

Collectives™ on Stack Overflow

turn HTML into a PHP array

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related