2

I have a php variable which contains a html document. I'm trying to extract li>span and li>strong into some sort of associative array.

The html in the $html variable is

<ul class="ul-data" xmlns:utils="urn:utils" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <li><span>
          Vehicle make
        </span><strong>CITROEN</strong></li>
  <li><span>
            Year of manufacture
          </span><strong>1997</strong></li>
  <li><span>
          Cylinder capacity (cc)
        </span><strong>1124cc
        </strong></li>
  <li><span>
          Fuel type
        </span><strong>PETROL</strong></li>
  <li><span>
          Vehicle colour
        </span><strong>BLUE</strong></li>
  <li><span>
          Vehicle type approval
        </span><strong>
              Not available
            </strong></li>
</ul>

and the code I have so far

$dom = new DOMDocument();
//as @Larry.Z comments, you forgot to load the $html
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

//assuming there can be more than one "result set" on each page
$results = array();

$result_divs = $xpath->query('//ul[@class="ul-data"]');
foreach ($result_divs as $result_div) {
    $result=array();
    foreach ($result_div->childNodes as $result_item) {
        $content=trim($result_item->textContent);
        if ($content!='') $result[]=$content;
    } 
    $results[]=$result;
}

echo '<pre>';
print_r($results);
echo '</pre>';

which prints out

Array
(
    [0] => Array
        (
            [0] => Vehicle make
        CITROEN
            [1] => Date of first registration
            27 August 1997
            [2] => Year of manufacture
          1997
            [3] => Cylinder capacity (cc)
        1124cc
            [4] => Fuel type
        PETROL
            [5] => Vehicle colour
        BLUE
            [6] => Vehicle type approval

              Not available
        )

)

How can I get it to set an associative array like

[Vehicle make] => CITREON 

The issue is that I need to get li> span as the key and then the data in between the <strong> as the value.

2
  • If you are parsing remote site I can recommend you Simplehtmldom library. Its working like a charm. Commented Nov 26, 2015 at 15:55
  • 1
    Can you not run another xpath query on each childNode to extract the content of the <span> tag and the <strong> tag separately? Commented Nov 26, 2015 at 16:07

1 Answer 1

2

As your html has only a single ul, there is no need for the outer loop. You can just grab all li tags, and access the 1st and second child elements:

$dom = new DOMDocument();
$dom->loadHTML($html);

$results = array();

foreach ($dom->getElementsByTagName('li') as $li) {        
    $results[$li->childNodes->item(0)->textContent]=$li->childNodes->item(1)->textContent;
}

print_r($results);
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.