1

Im trying to save some data into a xml file using the following PHP script:

<?php

$string = '<a href="google.com/maps">Go to google maps</a> and some special characters ë è & ä etc.';

$string = htmlentities($string, ENT_QUOTES, 'UTF-8');

$doc = new DOMDocument('1.0', 'UTF-8');
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;

$root = $doc->createElement('top');
$root = $doc->appendChild($root);

$title = $doc->createElement('title');
$title = $root->appendChild($title);

$id = $doc->createAttribute('id');
$id->value = '1';
$text = $title->appendChild($id);

$text = $doc->createTextNode($string);
$text = $title->appendChild($text);

$doc->save('data.xml');

echo 'data saved!';

?>

I'm using htmlentities to translate all of the string into an html format, if I leave this out the special characters won't be translated to html format. this is the output:

<?xml version="1.0" encoding="UTF-8"?>
<top>
  <title id="1">&amp;lt;a href=&amp;quot;google.com/maps&amp;quot;&amp;gt;Go to google maps&amp;lt;/a&amp;gt; and some special characters &amp;euml; &amp;egrave; &amp;amp; &amp;auml; etc.</title>
</top>

The ampersand of the html tags get a double html code: &amp;lt; and an ampersand becomes: &amp;amp;

Is this normal behavior? Or how can I prevent this from happening? Looks like a double encoding.

3 Answers 3

3

Try to remove the line:

$string = htmlentities($string, ENT_QUOTES, 'UTF-8');

Because the text passed to createTextNode() is escaped anyway.

Update: If you want the utf-8 characters to be escaped. You could leave that line and try to add the $string directly in createElement().

For example:

$title = $doc->createElement('title', $string);
$title = $root->appendChild($title);

In PHP documentation it says that $string will not be escaped. I haven't tried it, but it should work.

Sign up to request clarification or add additional context in comments.

5 Comments

When I remove that line the special characters won't translate to HTML code. Like ë has to become &euml; Do you know how to do this if I leave that line out?
Thx for your reply! Your right it will not escape the string if I add it directly. But now I have a XML Parsing Error: undefined entity because it can't save the &euml; in the string.
I've just tested this on my server using this code and it gave me this result. Apparently it works, only quotes aren't escaped.
It works when I download the file from the server and open it but when I load it in google chrome I get an error: error on line 3 at column 107: Entity 'euml' not defined.
Thx for all your help Bojan! Except for google chrome (and maybe other browsers) it works now. This is good enough for my project.
2

It is the htmlentities that turns a & into &amp; When working with xml data you should not use htmlentities, as the DOMDocument will handle a & and not &amp;.

As of php 5.3 the default encoding is UTF-8, so there is no need to convert to UTF-8.

1 Comment

Thx for the explanation of a DOMDocument!
2

This line:

$string = htmlentities($string, ENT_QUOTES, 'UTF-8');

… encodes a string as HTML.

This line:

$text = $doc->createTextNode($string);

… encodes your string of HTML as XML.

This gives you an XML representation of an HTML string. When the XML is parsed you get the HTML back.

how can I prevent this from happening?

If your goal is to store some text in an XML document. Remove the line that encodes it as HTML.

Looks like a double encoding.

Pretty much. It is encoded twice, it just uses different (albeit very similar) encoding methods for each of the two passes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.