1

I know there were plenty of questions like this but I am creating the new one because to my point of view it is specific to each situation.

So, my page is displayed in UTF-8 format. The data is taken from mySQL that has utf8_unicode_ci collation. The data I am displaying is the string - 1  Bröllops-Festkläder.

There are some unicode characters in here and they should display fine but they do not. On my page these are just a bunch of hieroglyphs.

Now, the interesting situation:

I am using phpMyAdmin to keep track of what is happening in the database. The website has the ability to import CSV documents containing customer data and modify each customer individually. If I import CSV document containing these characters they are written to the database, readable in phpMyAdmin and not readable on my page. If I use my script to modify the customer information and I type those characters from the browser, the it is vice versa - they are readable on the page and they are not readable in phpMyAdmin, so clearly the encoding is different. I spent ages figuring out the right combination and I could not.

UPDATE: Deceze posted a link below that I copy here to make it more noticeable. I am sure this will save hours and days to many people facing similar issues - Handling Unicode Front to Back in a Web App

6
  • Please add a sscce or a link to your website. Commented Aug 27, 2012 at 14:40
  • 1
    All explained in-depth here: Handling Unicode Front To Back In A Web App Commented Aug 27, 2012 at 14:40
  • @deceze, while ago that helped me understand lots of things. Thank you for writing it. Commented Aug 27, 2012 at 14:42
  • Stefan, I am afraid it is not possible as this is a backend of a functional CRM. Deceze, thank for sharing the link. I will read it through Commented Aug 27, 2012 at 14:44
  • You should probably set the connection charset with mysql_set_charset. @deceze great article Commented Aug 27, 2012 at 14:46

1 Answer 1

3

There're couple of things that got involved here. If your database encoding is fine and html encoding is fine and you still see artefact, it's most likely your db connection is not using same encoding, thus leading to data corruption. If you connect by hand, you can easily enforce utf encoding, by doing query SET NAMES UTF8 as very first thing after you connect() to your database. It is sufficient to do this only once per connection.

EDIT: one important note though - depending on how you put your data to the DB, your database content may require fixing as it can be corrupted if you put it via broken connection. So, if anyone is facing the same issue - once you set all things up, ensure you are checking on fresh data set, or you may still see things incorrectly, even all is now fine.

Sign up to request clarification or add additional context in comments.

3 Comments

Don't query "SET NAMES", use mysql_set_charset.
In PDO variation this is $db = new PDO("mysql:host=HOST;dbname=DBNAME", "LOGIN", "PASS", array(PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8') );
Don't use mysql_* extension. Use mysqli_* :) Anyway, my answer is - beside DB schema and website encoding, there's db link that can also wreck your data. It's not obvious for many people. The way you address this is irrelevant, and depends on used libraries/drivers. If you'd be using Zend Framework for example, you should add resources.db.params.charset = UTF8 to your application.ini file instead

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.