6

I am having trouble JSON-encoding special characters. These characters display normally on my computer, in Notepad, in browsers, and even in my database. However, they do not JSON encode. An example is as follows:

<?
$array['copyright_str'] = "Copyright site.com © 2011-2012";
echo json_encode($array);
?>

The copyright symbol after site.com is what is making the JSON string echo as {"copyright_str":null}. While this is simple, I have users inputting profile data into a database which can be anything. When one of these funky characters shows up it breaks things. What is a good solution to this issue? The API I coded relies heavily on returning data from the database and printing strings in general as JSON.

My Multibyte settings are as follows:

     php -e phpinfo.php  | grep mb
    Configure Command =>  './configure'  '--enable-bcmath' '--enable-calendar' '--enable-dbase' '--enable-exif' '--enable-ftp' '--enable-gd-native-ttf' '--enable-libxml' '--enable-magic-quotes' '--enable-mbstring' '--enable-pdo=shared' '--enable-sockets' '--enable-zip' '--prefix=/usr/local' '--with-apxs2=/usr/local/apache/bin/apxs' '--with-bz2' '--with-curl=/opt/curlssl/' '--with-curlwrappers' '--with-freetype-dir=/usr' '--with-gd' '--with-imap=/opt/php_with_imap_client/' '--with-imap-ssl=/usr' '--with-jpeg-dir=/usr' '--with-kerberos' '--with-libdir=lib64' '--with-libexpat-dir=/usr' '--with-libxml-dir=/opt/xml2/' '--with-mcrypt=/opt/libmcrypt/' '--with-mhash=/opt/mhash/' '--with-mysql=/usr' '--with-mysql-sock=/var/lib/mysql/mysql.sock' '--with-mysqli=/usr/bin/mysql_config' '--with-openssl=/usr' '--with-openssl-dir=/usr' '--with-pcre-regex=/opt/pcre' '--with-pdo-mysql=shared' '--with-pdo-sqlite=shared' '--with-pic' '--with-png-dir=/usr' '--with-sqlite=shared' '--with-ttf' '--with-xmlrpc' '--with-xpm-dir=/usr' '--with-zlib' '--with-zlib-dir=/usr'
    xmlrpc_error_number => 0 => 0
    mbstring
    Multibyte string engine => libmbfl
    mbstring extension makes use of "streamable kanji code filter and converter", which is distributed under the GNU Lesser General Public License version 2.1.
    mbstring.detect_order => no value => no value
    mbstring.encoding_translation => Off => Off
    mbstring.func_overload => 0 => 0
    mbstring.http_input => pass => pass
    mbstring.http_output => pass => pass
    mbstring.internal_encoding => no value => no value
    mbstring.language => neutral => neutral
    mbstring.strict_detection => Off => Off
    mbstring.substitute_character => no value => no value

I'd like to avoid saving things like &copy;. Some of this data is going to be stored as plain text.

3
  • Is PHP compiled for Unicode/MB? And, furthermore, does json_encode work correctly on Unicode/MB? Commented Mar 15, 2012 at 17:48
  • 4
    @IbrahimAzharArmar There are many Unicode characters that have no ASCII equivalent. Commented Mar 15, 2012 at 17:50
  • This post stackoverflow.com/questions/6058450/problem-json-encode-utf-8 seems to have a solution, although it doesn't strike me as being the "right" solution. It does seem to require UTF-8 or it may silently result in null stackoverflow.com/questions/1972006/… and stackoverflow.com/questions/7938387/… (another failed design choice :-/) Commented Mar 15, 2012 at 17:59

3 Answers 3

12

encode data in UTF-8 format before passing it to json_encode function

<?
    $array['copyright_str'] = utf8_encode("Copyright site.com © 2011-2012");
    echo json_encode($array);
?>
Sign up to request clarification or add additional context in comments.

3 Comments

+1 however this does assume that you're storing and handling all your data as ISO-8859-1, which means your app won't support Unicode characters outside of that one encoding. In the long term you are better off completely migrating to UTF-8.
in that case you can use mb_detect_encoding to check current data is in which format and then convert it to UTF-8 using mb_convert_encoding
Well... bearing in mind that mb_detect_encoding only ever an approximate guess that could easily be wrong, yes.
3

I'm encoding data with tons of UTF-8 symbols with

json_encode($return, JSON_UNESCAPED_UNICODE)

and it works well. I use it to encode all kinds of languages: Arabic, Chinese, Thai, Lithuanian, German, French, Spanish, etc. All those have different unique symbols. Oh, I haven't tried encoding snowmen ☃ :)

Comments

-5

Use urlencode before json_encode

<?
$array['copyright_str'] = "Copyright site.com © 2011-2012";
$array['copyright_str'] = urlencode($array['copyright_str']);
echo json_encode($array);
?>

6 Comments

Why? It is not a URL. That would alter the data and require the consumer to do the reverse.
But it will escape the copyright character and convert it to &copy;. Reversal is trivial.
That's not the issue or a solution. Imagine if it's a different Unicode character (say a ☃, which is a snowman). How would that be handled? If it's a one-off-hacky-edge case, clearly it is not reliable (unless there happens to be a bug with PHP that only affects the Unciode character for the copyright symbol).
I'de like to avoid storing URLENCODED data in the database as I can't directly edit via phpmyadmin if needbe.
only encode data in this format when you want to pass it as url, not for saving it to database
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.