I seem to be completely unable to get around utf-8 character encoding.
So I'm exporting content from a database as a utf-8 xml file. The software I am importing into is quite strict about character encoding, so I can't just put everything in CDATA tags.
There's a whole bunch of weird characters, e.g. ’, — … already in the data.
These aren't working in the xml and need to be replaced out (normally with just a ' quote).
Ideally, I'd like to decode all the characters, and then use htmlspecialchars($text, ENT_COMPAT, 'UTF-8', FALSE) to encode them back again. But I can't seem to find a function that will decode them. Is there one? I've started to manually go through each entity with a str_replace() but it's turning into a much bigger job than I anticipated.
Any help would be a lifesaver. Thanks