5

I have the following test script on my server:

<?php
echo "Test is: " . $_GET['test'];
?>

If I call it with a url like example.com/script.php?test=ɿ (ɿ being a multibyte character), the resulting page looks like this:

Test is: É¿

If I try to do anything with the value in $_GET['test'], such as save it a mysql database, I have the same problem. What do I need to to do make PHP handle this value correctly?

3 Answers 3

4

Have you told the user agent your HTTP response is UTF-8?

header ('Content-type: text/html; charset=utf-8');

You might also want to ensure your HTML markup declares the encoding also, e.g.

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

For your database, are your tables and mysql client settings set up for UTF-8? If you check your database using a mysql command line client, is your terminal environment set up to expect UTF-8?

In a nutshell, you must check every step: from the raw source data, the code which touches it, the storage systems which retain it, and the tools you use to display and debug it.

Sign up to request clarification or add additional context in comments.

13 Comments

If the document is stored in another retrieval system, the original HTTP headers are lost - for example, if you save the HTML to a local hard disc.
If the default_charset ini parameter is set php sends a content-type header including the charset. http clients (usually) prefer the http header over the http-equiv setting. So you might want to avoid ambiguities/errors caused by different ini settings and make the charset explicit in both the http header and the meta/http-equiv element.
I would say simply "because you can", but there maybe more justification beyond that :) One thing it does allow you to do is probe the content type of the request via HEAD request.
@takteek: that depends a bit on the API you're using to connect to the mysql server. If you're using mysql_connect() (i.e. the php-mysql extension) search Stackoverflow for mysql_set_charset()
After a restful ...nap you might be interested in dev.mysql.com/doc/refman/5.0/en/charset-connection.html to learn more about what mysql_set_charset() does and why SET NAMES 'utf8' is not the whole story when using mysql_query() (it doesn't notify the client lib about the change of character encoding which may -in rare cases- lead to wrong results of mysql_real_escape_string()). I guess SET names is safe when using prepared statements (mysqli, pdo).
|
1

UTF-8 all the way through…


Follow the steps, specifically:

  • SET NAMES 'utf8' upon connection to the MySQL DB
  • <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> in your HTML

Comments

0

By pasting url in browser which cotains high utf8 chars, browser will recode utf8 chars into a multibyte sequence compliant with RFC 3986 and you won't get utf8 chars in php.

BUT, php will get and display utf8 chars from url correctly, if page which calls your url is utf8 encoded.

Try calling your php for test like this:

<iframe src="example.com/script.php?test=ɿ" height="100" width="100" border="1"></iframe>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.