1

I store a json string that contains some (chinese ?) characters in a mysql database. Example of what's in the database:

normal.text.\u8bf1\u60d1.rest.of.text

On my PHP page I just do a json_decode of what I receive from mysql, but it doesn't display right, it shows things like "½±è§�"

I've tried to execute the "SET NAMES 'utf8'" query at the beginning of my file, didn't change anything. I already have the following header on my webpage:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

And of course all my php files are encoded in UTF-8.

Do you have any idea how to display these "\uXXXX" characters nicely?

2
  • Are these the characters that should be displayed: 诱惑 ? Commented Oct 10, 2011 at 7:25
  • Show us more of what exactly you're doing. echo json_decode('"\u8bf1\u60d1"'); should do the trick perfectly fine. Commented Oct 10, 2011 at 23:24

3 Answers 3

8

This seems to work fine for me, with PHP 5.3.5 on Ubuntu 11.04:

<?php
header('Content-Type: text/plain; charset="UTF-8"');
$json = '[ "normal.text.\u8bf1\u60d1.rest.of.text" ]';

$decoded = json_decode($json, true);

var_dump($decoded);

Outputs this:

array(1) {
  [0]=>
  string(31) "normal.text.诱惑.rest.of.text"
}
Sign up to request clarification or add additional context in comments.

Comments

3

Unicode is not UTF-8!

$ echo -en '\x8b\xf1\x60\xd1\x00\n' | iconv -f unicodebig -t utf-8
诱惑

This is a strange "encoding" you have. I guess each character of the normal text is "one byte" long (US-ASCII)? Then you have to extract the \u.... sequences, convert the sequence in a "two byte" character and convert that character with iconv("unicodebig", "utf-8", $character) to an UTF-8 character (see iconv in the PHP-documentation). This worked on my side:

$in = "normal.text.\u8bf1\u60d1.rest.of.text";

function ewchar_to_utf8($matches) {
    $ewchar = $matches[1];
    $binwchar = hexdec($ewchar);
    $wchar = chr(($binwchar >> 8) & 0xFF) . chr(($binwchar) & 0xFF);
    return iconv("unicodebig", "utf-8", $wchar);
}

function special_unicode_to_utf8($str) {
    return preg_replace_callback("/\\\u([[:xdigit:]]{4})/i", "ewchar_to_utf8", $str);
}

echo special_unicode_to_utf8($in);

Otherwise we need more Information on how your string in the database is encoded.

4 Comments

The encoding is the result of json_encode() (or some other compatible encoder), json_decode() should be enough to convert it back.
@therefromhere: you are probably right, but shouldn't the JSON-encoder output valid Javascript? Because the quotes (") are missing its just text and not really JSON. A part from that, on my side json_decode also prints the correct result, just like in your answer.
Indeed, I assume the question example is a snippet of a larger properly formatted JSON string.
-1 for overcomplicated solution. echo json_decode('"\u8bf1\u60d1"'); works perfectly fine. It's also not a "strange encoding", it's a perfectly fine Unicode code point encoding used in JSON.
2

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

That's a red herring. If you serve your page over http, and the response contains a Content-Type header, then the meta tag will be ignored. By default, PHP will set such a header, if you don't do it explicitly. And the default is set as iso-8859-1.

Try with this line:

<?php
header("Content-Type: text/html; charset=UTF-8");

1 Comment

Didn't change anything. Also I have to mention that firefox says the page is in UTF8 so I guess the headers are already good ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.