3

What's going one here?

$string = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
    <album>
        <img src="002.jpg" caption="w&aacute;ssup?" />
    </album>
XML;

$xml = simplexml_load_string($string);
// $xmlobj = simplexml_load_file("xml.xml"); // same thing

echo "<pre>";
var_dump($xml);
echo "</pre>";

Error:

Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 5: parser error : Entity 'aacute' not defined

5 Answers 5

14

&aacute is not an XML entity - you're thinking about HTML.

Special characters are usually used "as is" in XML - an html_entity_decode() on the input data (don't forget to specify UTF-8 as the character set) should do the trick:

$string = html_entity_decode($string, ENT_QUOTES, "utf-8");
Sign up to request clarification or add additional context in comments.

5 Comments

Pekka, in my example if I do: $xmlStr = file_get_contents("xml.xml"); $xml = html_entity_decode($xmlStr, ENT_QUOTES); I get caption="w�ssup?"
@FFish did you add the utf-8?
No, i didn't! To much input in a few minutes. It work's now :-)
I was having problem reading from an XML file with encoding="iso-8859-1" and inserting in a db with utf-8 (the field in the row was truncated at the first accented char while the print_r was perfect). Adding "utf-8" to html_entity_decode resolved. Thanks.
But caption add &lt;, still can't parse. See sandbox.onlinephpfunctions.com/code/…
2

i had this problem the other day. any occurrence of & will need to be inside a CDATA tag

<album>
    <img src="002.jpg" />
    <caption><![CDATA[now you can put whatever characters you need & include html]]></caption>
</album> 

to keep the parser from failing.

2 Comments

Good point if the img tag is supposed to remain HTML. Depending on what the OP wants, it's either this or decoding the entities.
Yeah but I can't use CDATA, the XML files need to be like this. With the caption in the attribute. Pekka, how can I decode the entities? Should I get the XML string with file_get_contents() and than decode than?
2

You may want to look at Matt Robinson's article on an alternative method: Converting named entities to numeric in PHP . It mentions the html_entity_decode method (already pointed out by another answer) and some potential pitfalls:

There are two possible problems with this approach. The first is invalid entities: html_entity_decode() won't touch them, which means you'll still get XML errors. The second is encoding. I suppose it's possible that you don't actually want UTF-8. You should, because it's awesome, but maybe you have a good reason. If you don't tell html_entity_decode() to use UTF-8, it won't convert entities that don't exist in the character set you specify. If you tell it to output in UTF-8 and then use something like iconv() to convert it, then you'll lose any characters that aren't in the output encoding.

Also, if you find the script rather cumbersome, you can also use the one shared on SourceRally.

2 Comments

Ok, but how do I get the data in? With file_get_contents()?
$feed = file_get_contents( 'xml.xml' ); // do whatever you need to the string so the parser wont barf here... $xmlObj = simplexml_load_string( $feed );
1

Another solution is to change

"w&aacute;ssup?" to "w&amp;aacute;ssup?"

Comments

0

Try this func simplexml_load_entity_string

<?php

$string = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
    <album>
        <img src="002.jpg" caption="test&lt;w&aacute;ssup?" />
    </album>
XML;

$xml = simplexml_load_entity_string($string);

var_dump($xml);

function simplexml_load_entity_string($string = '')
{
    // cover entity except Predefined entities in XML
    $string = str_replace([
        '&quot;', '&amp;', '&apos;', '&lt;', '&gt;',
    ], [
        'SPECIALquotMARK', 'SPECIALampMARK', 'SPECIALaposMARK', 'SPECIALltMARK', 'SPECIALgtMARK',
    ], $string);
    $string = html_entity_decode($string, ENT_QUOTES, "utf-8");
    $string = str_replace([
        'SPECIALquotMARK', 'SPECIALampMARK', 'SPECIALaposMARK', 'SPECIALltMARK', 'SPECIALgtMARK',
    ], [
        '&quot;', '&amp;', '&apos;', '&lt;', '&gt;',
    ], $string);

    // load xml
    return simplexml_load_string($string);
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.