0

I have some XML which contains CDATA.

For example the title: <title><![CDATA[School&rsquo;s Latest News]]></title>

When I parse the full XML document with simplexml_load_string, I am able to access the CDATA values using (string). So for example, I get the title:

$title = (string)$news_xml -> {'news'} -> {'title'}

The problem I have is that the ’ is not presented as a ' but instead as ’

If I use html_entity_decode, I get the exact same thing.

If I use the LIBXML_NOCDATA option when calling simplexml_load_string I am able to look at the CDATA using print_r and don't have to explicitly call (string), but my HTML entities are still coming out garbled.

Any ideas why this isn't working?

2
  • The ' is represented as its unicode value (0x8217): rsquo.net Commented Dec 6, 2012 at 11:33
  • Just to quash a recurrent myth of SimpleXML, which you've mostly avoided anyway, there is absolutely no reason to set LIBXML_NOCDATA with SimpleXML. There are many aspects of a SimpleXML object that print_r cannot see, because it is not a "real" PHP object, but a wrapper around lower-level data - a limitation of print_r, not of SimpleXML. You could try one of these instead: github.com/IMSoP/simplexml_debug Commented Dec 11, 2012 at 23:45

1 Answer 1

1

&rsquo; is a unicode character (value 0x8217), see also http://www.rsquo.net/

If you send it to a browser (as I reckon you mean by presented as), make sure the encoding of the page is set to UTF-8.

Sign up to request clarification or add additional context in comments.

2 Comments

That was it. But then why does utf8_decode output the character as a ?
@mattm591 no idea, probably because it has no equivalent for &rsquo; in ISO-8859-1. Note that &rsquo; is not ' but ´. It is a different character, with a different meaning.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.