2

I've lost half a day with it now. Not an expert though. Why is it harder to traverse through and manipulate xml with php than handling your data on paper (it seems)? Why can't there be a simple system like jQuery for this?
I've been trying to delete some elements out of a long list (580 elements) based on a simple condition: if (element['attr'] == value) {remove element} but I just don't get there.
This is my code:

$xml = simplexml_load_file('xml/suchia.xml');
$dom = new DOMDocument('1.0');
$dom->loadXML($xml->asXML());
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
foreach ($dom->getElementsByTagName('image') as $node) {
   echo 'Checking '.$node->getAttribute('id').'<br />';
   if ($node->getAttribute('value') == 'useless') {
      echo $node->getAttribute('id').' deleted.<br />';
      $node->parentNode->removeChild($node);
   }
}
$dom->save('xml/suchia.xml');

Main problem I can see by the first echo is that the foreach doesn't traverse through every element it seems. A siple loop-through even seems impossible for longer lists (my is xml-file is roughly 180,000 characters).

XML (shortened, maybe it is not possible to reconstruct my problem with a small xml-file):

<?xml version="1.0"?>
<suchia>
  <image id="1" value="useless">
    <sources>
      <src>a</src>
    </sources>
  </image>
  <image id="2" value="useless">
    <sources>
      <src>b</src>
    </sources>
  </image>
  <image id="3" value="useless">
    <sources>
      <src>c</src>
    </sources>
  </image>
  <image id="4" value="useless">
    <sources>
      <src>d</src>
    </sources>
  </image>
  <image id="5" value="useless">
    <sources>
      <src>e</src>
    </sources>
  </image>
  <image id="6" value="useless">
    <sources>
      <src>f</src>
    </sources>
  </image>
  <image id="7" value="useless">
    <sources>
      <src>g</src>
    </sources>
  </image>
  <image id="8" value="useful">
    <sources>
      <src>h</src>
    </sources>
  </image>
</suchia>
0

1 Answer 1

3

Because you're removing nodes from the parent while iterating through the live DOMNodeList, the iterator is only seeing every other node. As Ghost suggests, using XPath allows iteration while removing nodes.

If you set preserveWhiteSpace in the appropriate place (it needs to be set before the XML is parsed, whereas formatOutput applies only to the output), then the extra whitespace won't be present in the output.

<?php

$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->load('input.xml');

$xpath = new DOMXPath($doc);
$nodes = $xpath->query('image[@value="useless"]');

printf("Removing %d useless images\n", $nodes->length);

foreach ($nodes as $node) {
  $node->parentNode->removeChild($node);
}

$doc->formatOutput = true;
$doc->save('output.xml');
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.