26

I have a php file which prints an xml based on a MySql db.

I get an error every time at exactly the point where there is an & sign.

Here is some php:

$query = mysql_query($sql);

$_xmlrows = '';

while ($row = mysql_fetch_array($query)) {
    $_xmlrows .= xmlrowtemplate($row);
}

function xmlrowtemplate($dbrow){
    return "<AD>
              <CATEGORY>".$dbrow['category']."</CATEGORY>
            </AD>
}

The output is what I want, i.e. the file outputs the correct category, but still gives an error.

The error says: xmlParseEntityRef: no name

And then it points to the exact character which is a & sign.

This complains only if the $dbrow['category'] is something with an & sign in it, for example: "cars & trucks", or "computers & telephones".

Anybody know what the problem is?

BTW: I have the encoding set to UTF-8 in all documents, as well as the xml output.

1
  • Please share more details. Also, please explain how this is related to html, mysql, or database Commented Oct 6, 2021 at 8:56

6 Answers 6

57

& in XML starts an entity. As you haven't defined an entity &WhateverIsAfterThat an error is thrown. You should escape it with &amp;.

$string = str_replace('&', '&amp;', $string);

How do I escape ampersands in XML

To escape the other reserved characters:

function xmlEscape($string) {
    return str_replace(array('&', '<', '>', '\'', '"'), array('&amp;', '&lt;', '&gt;', '&apos;', '&quot;'), $string);
}
Sign up to request clarification or add additional context in comments.

5 Comments

(Addition, but for poster) So use &amp; to properly escape it -- although instead of (dumb) string interpolation you should use something that understands XML (e.g. what happens when the input contains "<"?
Or, more compact, htmlspecialchars($string, ENT_QUOTES);
wrapping in <![CDATA tags is the more logical solution
To be sure the string is actually safe, I think it should be done in two stages. For example: $string = 'Foo &amp; Bar'; $string = str_replace('&amp;', '&', $string); // Foo & Bar $string = str_replace('&', '&amp;', $string); // Foo &amp; Bar If there is only one stage, result can be 'Foo &amp;amp; Bar'
I would suggest you want to use the ENT_XML1 option: htmlspecialchars($string, ENT_XML1); to ensure that the string is escaped appropriately for XML.
8
$string = htmlspecialchars($string, ENT_XML1);

Using htmlspecialchars() with ENT_XML1 constant is the most universal way to solve all encoding errors (IMHO better that writing a custom function, also there are other entities to encode than &).

Credit: Put Wrikken's and joshweir's comment as answer to be more visible.

Comments

2

You need to either turn & into its entity &amp;, or wrap the contents in CDATA tags.

If you choose the entity route, there are additional characters you need to turn into entities:

>  &gt;
<  &lt;
'  &apos;
"  &quot;

Background: Beware of the ampersand when using XML

Wikipedia: List of XML character entity references

Comments

0

Switch and regex with using xml escape function.

 function XmlEscape(str) {
    if (!str || str.constructor !== String) {
        return "";
    }

    return str.replace(/[\"&><]/g, function (match) {
        switch (match) {
        case "\"":
            return "&quot;";
        case "&":
            return "&amp;";
        case "<":
            return "&lt;";
        case ">":
            return "&gt;";
        }
    });
};

Comments

0

XML uses the & for encoding. &...; are called entities. The parser takes the characters after the & as the name of the entity, the space is not allowed, so the parser sees an entity without a name. & is represented by the named entity &amp; itself.

The other answers show how to handle this on a string level, but you're generating XML, so using an XML library is another option.

XMLWriter

XMLWriter is an API to write XML sequentially. It allows the handling of large data.

$writer = new XMLWriter();
$writer->openUri('php://stdout');
$writer->setIndent(2);
$writer->startDocument();
$writer->startElement('_');

foreach (getData() as $record) {
    $writer->startElement('AD');
    $writer->writeElement('CATEGORY', $record['category']);
    $writer->endElement();
}

$writer->endElement();
$writer->endDocument();

function getData(): array {
  return [
    ['category' => 'cars & trucks'],
    ['category' => 'computers & telephones'],
  ];
}

Output:

<?xml version="1.0"?>
<_>
 <AD>
  <CATEGORY>cars &amp; trucks</CATEGORY>
 </AD>
 <AD>
  <CATEGORY>computers &amp; telephones</CATEGORY>
 </AD>
</_>

Comments

-1
public function sanitize(string $data) {
    return str_replace('&', '&amp;', $data);
}

You are right: here is more context - the example is in relation to the ' how to deal with data containing '&' when we pass this data to SimpleXml. Of course there is also other solution to use <![CDATA[some stuff]]>

1 Comment

What about some more context?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.