0

I'm developing a website and I need to load an xml file- let's say test.xml

XML nodes are well-formated, but values inside of them aren't. Value of every node is CDATA nested string (but CDATA isn't always well-formated). Example:

<root>
 <data>
   <value1><![CDATA[Some value]]></value1>
   <value2><![CDATA[ ]]></value2>
   <value3>![CDATA[  ]]></value3>
 </data>
</root>

Original XML structure is more complex, but this is the example of CDATA usage. In node value3, CDATA isn't valid (missing '<' character before '![CDATA').

I've tried to load the file with following code

<?php
  $xml = simplexml_load_file("test.xml"); 
?>

but I was getting warnings.

Then I've tried to use LIBXML_NOCDATA, but it wasn't improved. The second code I've tried was:

<?php
  $xml = simplexml_load_file("test.xml", null, LIBXML_NOCDATA); 
  //$xml = simplexml_load_file("test.xml", 'SimpleXMLElement', LIBXML_NOCDATA); 
?>

but still with warnings (with both lines).

Is it possible to load file and then parse it (e.g $xml->data->value3) or not?

1
  • LIBXML_NOCDATA is not a magic bullet, and contrary to persistent myths, it is actually pretty useless with SimpleXML, because SimpleXML handles CDATA rather nicely by itself. I explained a bit about what it does here: stackoverflow.com/a/13981917/157957 Your problem is much more mundane: you have broken XML; the fact that the broken bits should be CDATA sections doesn't help, because they're broken, so they're not. Commented May 5, 2014 at 1:29

2 Answers 2

0

This is not valid XML file

So you should repair it before usage The simplest way - is to use Tidy lib included in PHP

<?php
error_reporting(E_ALL);
$file = '1.xml';

$tidy = new tidy();
$repaired = $tidy->repairfile($file, array(
    'input-xml' => true,
    'escape-cdata' => false
));
var_dump(simplexml_load_string($repaired));
Sign up to request clarification or add additional context in comments.

Comments

0

If you're getting bad XML the right approach is always to find out why, and eliminate the root cause. If it's a data feed over which you genuinely have no control, seriously consider not using it: if the quality is so poor, is the data really worth having?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.