2

I have a problem at the moment. I want to modify some XML Values. For example I want to remvove the <![CDATA[" and the "]]> words from the values.

The strange thing is that it is working for title, price and image_link but not for url...

This is my code:

$dom = new DOMDocument('1.0', 'utf-8');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->load('data/kinguin.xml');

$past = time();
echo '(Kinguin) - Starting to remove tags' . "\n";
deleteChildren($dom, 'id');
echo '(Kinguin) - id removed' . "\n";
deleteChildren($dom, 'description');
echo '(Kinguin) - description removed' . "\n";
deleteChildren($dom, 'google_product_category');
echo '(Kinguin) - google_product_category removed' . "\n";
deleteChildren($dom, 'brand');
echo '(Kinguin) - brand removed' . "\n";
deleteChildren($dom, 'mpn');
echo '(Kinguin) - mpn removed' . "\n";
deleteChildren($dom, 'condition');
echo '(Kinguin) - condition removed' . "\n";
deleteChildren($dom, 'product_type');
echo '(Kinguin) - product_type removed' . "\n";
deleteChildren($dom, 'availability');
echo '(Kinguin) - availability removed' . "\n";
deleteChildren($dom, 'quantity');
echo '(Kinguin) - quantity removed' . "\n";
deleteChildren($dom, 'identifier_exists');
echo '(Kinguin) - identifier_exists removed' . "\n";

removeCDATA($dom, 'title');
echo '(Kinguin) - title CDATA removed' . "\n";
removeCDATA($dom, 'price');
echo '(Kinguin) - price CDATA removed' . "\n";
removeCDATA($dom, 'image_link');
echo '(Kinguin) - image_link CDATA removed' . "\n";
removeCDATA($dom, 'url');
echo '(Kinguin) - url CDATA removed' . "\n";

$dom->saveXML();
$dom->save('data/kinguin.xml');

$xml = file_get_contents('data/kinguin.xml');
renameTags($xml, 'link', 'url', 'data/kinguin.xml');
echo '(Kinguin) - Renamed link' . "\n";

$now = time();
echo "(Kinguin) - Time needed: " . ($now - $past) . "s" . "\n";
echo "\n";

Functions:

function deleteChildren($dom, $children){
    $root = $dom->documentElement;
    $marker = $root->getElementsByTagName($children);
    for($i = $marker->length - 1; $i >= 0 ; $i--){
        $child = $marker->item($i);
        $marker->item($i)->parentNode->removeChild($child);
    }
}

function renameTags($xml, $old, $new, $path){
    $dom = new DOMDocument('1.0', 'utf-8');
    $dom->preserveWhiteSpace = false;
    $dom->formatOutput = true;
    $dom->loadXML($xml);

    $nodes = $dom->getElementsByTagName($old);
    $toRemove = array();
    foreach ($nodes as $node) {
        $newNode = $dom->createElement($new);
        foreach ($node->attributes as $attribute) {
            $newNode->setAttribute($attribute->name, $attribute->value);
        }

        foreach ($node->childNodes as $child) {
            $newNode->appendChild($node->removeChild($child));
        }

        $node->parentNode->appendChild($newNode);
        $toRemove[] = $node;
    }

    foreach ($toRemove as $node) {
        $node->parentNode->removeChild($node);
    }

    $dom->saveXML();
    $dom->save($path);
}
function removeCDATA($dom, $tagName){

    $root = $dom->documentElement;
    $marker = $root->getElementsByTagName($tagName);
    for($i = $marker->length - 1; $i >= 0 ; $i--){
        $rename = $marker->item($i)->textContent;
        $newValue = preg_replace('/(<!\[CDATA\[)/', '', $rename);
        $newValue = preg_replace('/(]]>)/', '', $newValue);
        $newValue = preg_replace('/( EUR)/', '', $newValue);
        //ey-Shop\Cronjob.php on line 350 PHP Warning:  preg_replace(): Delimiter must not be alphanumeric or backslash in 351

        $marker->item($i)->nodeValue = $newValue;
    }
}

This is the XML Output:

<?xml version="1.0" encoding="UTF-8"?>
<rss>
  <channel xmlns:g="http://base.google.com/ns/1.0" version="2.0">
    <title>google_EUR_english_1</title>
    <item>
      <title>Anno 2070 Uplay CD Key</title>
      <g:price>3.27</g:price>
      <g:image_link>http://cdn.kinguin.net/media/catalog/category/anno_8.jpg</g:image_link>
      <url><![CDATA[http://www.kinguin.net/category/4/anno-2070/?nosalesbooster=1&country_store=1&currency=EUR]]></url>
    </item>
    <item>
      <title>Anno 2070: Deep Ocean DLC Uplay CD Key</title>
      <g:price>4.75</g:price>
      <g:image_link>http://cdn.kinguin.net/media/catalog/category/anno-2070-deep-ocean-releasing-this-spring-1089268_1.jpg</g:image_link>
      <url><![CDATA[http://www.kinguin.net/category/5/anno-2070-deep-ocean-expansion-pack-dlc/?nosalesbooster=1&country_store=1&currency=EUR]]></url>
    </item>
    <item>

This is the error message:

Warning: removeCDATA(): unterminated entity reference  All Stars-Racing Transformed RU VPN in C:\Users\Jan\PhpstormProjects\censored\Cronjob.php on line 353
PHP Warning:  removeCDATA(): unterminated entity reference  SUV DLC Steam Gift in C:\Users\Jan\PhpstormProjects\censored\Cronjob.php on line 353

Line 353:

$marker->item($i)->nodeValue = $newValue;

Greetings and Thanks!

1
  • 1
    Why do you think you need to remove a CDATA section from an XML document? Any XML parser can handle it. And if you still think you need to do it then I think doing $marker->item($i)->textContent = $marker->item($i)->textContent; suffices as the textContent is a plain string anyway. Commented Jan 3, 2017 at 10:44

2 Answers 2

1

If you really think you need to remove any CDATA section(s) from an element node then simply do $foo->textContent = $foo->textContent, see http://sandbox.onlinephpfunctions.com/code/cca5093433218c7c134f120725988fe6808f906c which does

function removeCDATA($dom, $tagName){

    $marker = $dom->getElementsByTagName($tagName);
    for($i = $marker->length - 1; $i >= 0 ; $i--){
        $marker->item($i)->textContent = $marker->item($i)->textContent;
    }
}

   $xml = '<root><items><item><url><![CDATA[http://example.com/search?a=1&b=2&c=3]]></url></item><item><url><![CDATA[http://example.com/search?a=4&b=5&c=6]]></url></item></items></root>';

   $doc = new DOMDocument();
   $doc->loadXML($xml);

   removeCDATA($doc, 'url');

   echo $doc->saveXML();

and outputs

<root><items><item><url>http://example.com/search?a=1&amp;b=2&amp;c=3</url></item><item><url>http://example.com/search?a=4&amp;b=5&amp;c=6</url></item></items></root>
Sign up to request clarification or add additional context in comments.

Comments

0

If you remove the CDATA section you end up with an element containing a naked & character, this is not legal as & can only exist on its own as its named entity escape (&amp;) or inside a CDATA section.

This is why the CDATA is there in the first place & should probably be left as is for the consuming parser to handle.

1 Comment

It's not a question of whether the link works, before the link can work you need a well-formed XML document, and if you edit your document to make it ill-formed there's no way of even extracting the link to see if it works or not.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.