1

I've got an xml like this:

<father>
  <son>Text with <b>HTML</b>.</son>
</father>

I'm using simplexml_load_string to parse it into SimpleXmlElement. Then I get my node like this

$xml->father->son->__toString(); //output: "Text with .", but expected "Text with <b>HTML</b>."

I need to handle simple HTML such as: <b>text</b> or <br/> inside the xml which is sent by many users.

Me problem is that I can't just ask them to use CDATA because they won't be able to handle it properly, and they are already use to do without. Also, if it's possible I don't want the file to be edited because the information need to be the one sent by the user.

The function simplexml_load_string simply erase anything inside HTML node and the HTML node itself. How can I keep the information ?

SOLUTION

To handle the problem I used the asXml as explained by @ThW:

$tmp = $xml->father->son->asXml(); //<son>Text with <b>HTML</b>.</son>

I just added a preg_match to erase the node.

4
  • Some XML validators will accept that if you fix the broken <br /> tag. But I would say from a "good practice" stance that CDATA was introduced for handling XML-ception. Commented Apr 10, 2017 at 8:38
  • I understand for CDATA, but sadly it's not possible in my case. However can you develop about the broken <br /> ? Is the space mandatory ? And do you have an example of XML validators ? Commented Apr 10, 2017 at 10:29
  • 1
    <br>/> should be <br/> or <br />. Notice the additional > where it is not needed. Commented Apr 10, 2017 at 12:09
  • 1
    Thank but it was just a typo. Commented Apr 11, 2017 at 8:25

1 Answer 1

2

A CDATA section is a character node, just like a text node. But it does less encoding/decoding. This is mostly a downside, actually. On the upside something in a CDATA section might be more readable for a human and it allows for some BC in special cases. (Think HTML script tags.)

For an XML API they are nearly the same. Here is a small DOM example (SimpleXML abstracts to much).

$document = new DOMDocument();
$father = $document->appendChild(
  $document->createElement('father')
);
$son = $father->appendChild(
  $document->createElement('son')
);
$son->appendChild(
  $document->createTextNode('With <b>HTML</b><br>It\'s so nice.')
);
$son = $father->appendChild(
  $document->createElement('son')
);
$son->appendChild(
  $document->createCDataSection('With <b>HTML</b><br>It\'s so nice.')
);

$document->formatOutput = TRUE;
echo $document->saveXml();

Output:

<?xml version="1.0"?>
<father>
  <son>With &lt;b&gt;HTML&lt;/b&gt;&lt;br&gt;It's so nice.</son>
  <son><![CDATA[With <b>HTML</b><br>It's so nice.]]></son>
</father>

As you can see they are serialized very differently - but from the API view they are basically exchangeable. If you're using an XML parser the value you get back should be the same in both cases.

So the first possibility is just letting the HTML fragment be stored in a character node. It is just a string value for the outer XML document itself.

The other way would be using XHTML. XHTML is XML compatible HTML. You can mix an match different XML formats, so you could add the XHTML fragment as part of the outer XML.

That seems to be what you're receiving. But SimpleXML has some problems with mixed nodes. So here is an example how you can read it in DOM.

$xml = <<<'XML'
<father>
  <son>With <b>HTML</b><br/>It's so nice.</son>
</father>
XML;

$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);

$result = '';
foreach ($xpath->evaluate('/father/son[1]/node()') as $child) {
  $result .= $document->saveXml($child);
}
echo $result;

Output:

With <b>HTML</b><br/>It's so nice.

Basically you need to save each child of the son element as XML.

SimpleXML is based on the same DOM library internally. That allows you to convert a SimpleXMLElement into a DOM node. From there you can again save each child as XML.

$father = new SimpleXMLElement($xml);
$sonNode = dom_import_simplexml($father->son);
$document = $sonNode->ownerDocument;

$result = '';
foreach ($sonNode->childNodes as $child) {
  $result .= $document->saveXml($child);
}
echo $result;
Sign up to request clarification or add additional context in comments.

4 Comments

Add some info about it.
I had no use of dom_import, saveXml works directly thanks. Also, do you have an idea why my post got a down vote?
Because you did not post any source that you tried, no reproducible example. stackoverflow.com/help/how-to-ask
I edited the post to improve it. For the solution, I add to use preg_match to erase the node by the way.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.